LegacyNewRelicHookSensor   F
last analyzed

Complexity

Total Complexity 66

Size/Duplication

Total Lines 294
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
dl 0
loc 294
rs 3.1913
c 0
b 0
f 0
wmc 66

25 Methods

Rating   Name   Duplication   Size   Complexity  
A _is_alert_opened() 0 2 1
A __init__() 0 16 1
A _get_servers() 0 11 3
A _get_headers_as_dict() 0 6 2
A _get_sensor_config_param() 0 6 2
A cleanup() 0 2 1
A _is_downtime_recovered() 0 2 1
A _get_sensor_config() 0 3 1
A _is_alert_closed() 0 2 1
C _server_hook_handler() 0 25 7
A _is_alert_acknowledged() 0 2 1
A add_trigger() 0 2 1
A _dispatch_trigger() 0 3 1
A remove_trigger() 0 2 1
A _get_application() 0 10 4
A _get_hook_handler() 0 10 3
F run() 0 57 12
A _dispatch_application_normal() 0 19 4
A _is_escalated_downtime() 0 2 1
A update_trigger() 0 2 1
B handle_nrhook() 0 41 6
C _dispatch_server_normal() 0 27 7
A setup() 0 2 1
D _app_hook_handler() 0 34 8
A _is_alert_canceled() 0 2 1

How to fix   Complexity   

Complex Class

Complex classes like LegacyNewRelicHookSensor often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# Licensed to the StackStorm, Inc ('StackStorm') under one or more
2
3
# contributor license agreements.  See the NOTICE file distributed with
4
# this work for additional information regarding copyright ownership.
5
# The ASF licenses this file to You under the Apache License, Version 2.0
6
# (the "License"); you may not use this file except in compliance with
7
# the License.  You may obtain a copy of the License at
8
#
9
#     http://www.apache.org/licenses/LICENSE-2.0
10
#
11
# Unless required by applicable law or agreed to in writing, software
12
# distributed under the License is distributed on an "AS IS" BASIS,
13
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
# See the License for the specific language governing permissions and
15
# limitations under the License.
16
17
import six
18
import sys
19
import json
20
21
import eventlet
22
import requests
23
from flask import request, Flask
24
from six.moves import urllib_parse
25
from st2reactor.sensor.base import Sensor
26
27
eventlet.monkey_patch(
28
    os=True,
29
    select=True,
30
    socket=True,
31
    thread=False if '--use-debugger' in sys.argv else True,
32
    time=True)
33
34
PACK = 'newrelic'
35
WEB_APP_ALERT_TRIGGER_REF = '{}.{}'.format(PACK, 'WebAppAlertTrigger')
36
WEB_APP_NORMAL_TRIGGER_REF = '{}.{}'.format(PACK, 'WebAppNormalTrigger')
37
SERVER_ALERT_TRIGGER_REF = '{}.{}'.format(PACK, 'ServerAlertTrigger')
38
SERVER_NORMAL_TRIGGER_REF = '{}.{}'.format(PACK, 'ServerNormalTrigger')
39
40
NR_API_URL_KEY = 'api_url'
41
NR_API_KEY_KEY = 'api_key'
42
43
APP_HOST_KEY = 'host'
44
APP_PORT_KEY = 'port'
45
APP_URL_KEY = 'url'
46
NORMAL_REPORT_DELAY_KEY = 'normal_report_delay'
47
48
49
class LegacyNewRelicHookSensor(Sensor):
50
    """
51
    Sensor class that starts up a flask webapp that listens to alert hooks from NewRelic.
52
    It translates hooks into appropriate triggers using the following mapping -
53
       1. Web app incident and apdex problem opened -> WEB_APP_ALERT_TRIGGER
54
       2. Incident escalated to downtime (app)      -> WEB_APP_ALERT_TRIGGER
55
       3. Apdex problem closed (app)                -> WEB_APP_NORMAL_TRIGGER_REF
56
       4. Downtime problem closed (app)             -> WEB_APP_NORMAL_TRIGGER_REF
57
       5. Server incident and CPU problem opened    -> SERVER_ALERT_TRIGGER_REF
58
       6. Incident escalated after 5 minutes        -> SERVER_ALERT_TRIGGER_REF
59
       7. Server downtime ends                      -> SERVER_NORMAL_TRIGGER_REF
60
       8. CPU problem closed                        -> SERVER_NORMAL_TRIGGER_REF
61
62
    Note : Some hooks like cancel or disable of an inciden and open or close of alert policy
63
    are ignored.
64
65
    All return to normal events are always fired after a delay period.
66
    """
67
68
    def __init__(self, sensor_service, config=None):
69
        self._config = config
70
        self._sensor_service = sensor_service
71
72
        self._api_url = config.get(NR_API_URL_KEY, None)
73
        self._api_key = config.get(NR_API_KEY_KEY, None)
74
75
        self._host = self._get_sensor_config_param(self._config, APP_HOST_KEY)
76
        self._port = self._get_sensor_config_param(self._config, APP_PORT_KEY)
77
        self._url = self._get_sensor_config_param(self._config, APP_URL_KEY)
78
        self._normal_report_delay = self._get_sensor_config_param(self._config,
79
                                                                  NORMAL_REPORT_DELAY_KEY, 300)
80
81
        self._app = Flask(__name__)
82
        self._log = self._sensor_service.get_logger(__name__)
83
        self._headers = {'X-Api-Key': self._api_key}
84
85
    def setup(self):
86
        pass
87
88
    def run(self):
89
        """
90
        Validate required params and starts up the webapp that listen to hooks from NewRelic.
91
        """
92
        if not self._api_url:
93
            raise Exception('NewRelic API url not found.')
94
        if not self._api_key:
95
            raise Exception('NewRelic API key not found.')
96
        if not self._host or not self._port or not self._url:
97
            raise Exception('NewRelic webhook app config (host:%s, port:%s, url:%s)' %
98
                            (self._host, self._port, self._url))
99
        self._log.info('LegacyNewRelicHookSensor up. host %s, port %s, url %s', self._host,
100
                       self._port, self._url)
101
102
        @self._app.route(self._url, methods=['POST'])
103
        def handle_nrhook():
104
105
            # hooks are sent for alerts and deployments. Only care about alerts so ignoring
106
            # deployments. Body expected to be of the form -
107
            #
108
            # alert : {...}
109
            #      OR
110
            # deployment : {...}
111
            #
112
            # JSON inside form encoded data, seriously?
113
            data = request.form
114
            alert_body = data.get('alert', None)
115
116
            if not alert_body:
117
                self._log.info('Request doesn\'t contain "alert" attribute, ignoring...')
118
                return 'IGNORED'
119
120
            try:
121
                alert_body = json.loads(alert_body)
122
            except Exception:
123
                self._log.exception('Failed to parse request body: %s' % (alert_body))
124
                return 'IGNORED'
125
126
            if alert_body.get('severity', None) not in ['critical', 'downtime']:
127
                self._log.debug('Ignoring alert %s as it is not severe enough.', alert_body)
128
                return 'ACCEPTED'
129
130
            hook_headers = self._get_headers_as_dict(request.headers)
131
            hook_handler = self._get_hook_handler(alert_body, hook_headers)
132
133
            # all handling based off 'docs' found in this documentation -
134
            # https://docs.newrelic.com/docs/alerts/alert-policies/examples/webhook-examples
135
136
            try:
137
                if hook_handler:
138
                    hook_handler(alert_body, hook_headers)
139
            except Exception:
140
                self._log.exception('Failed to handle nr hook %s.', alert_body)
141
142
            return 'ACCEPTED'
143
144
        self._app.run(host=self._host, port=self._port)
145
146
    def _get_hook_handler(self, alert_body, hook_headers):
147
        if not alert_body:
148
            return None
149
150
        if 'servers' in alert_body:
151
            return self._server_hook_handler
152
153
        # For now everything else is web app hook. Hooks for key transaction, mobile app or plugin
154
        # alert all would be rolled up the application level.
155
        return self._app_hook_handler
156
157
    def _app_hook_handler(self, alert_body, hook_headers):
158
        if not alert_body['application_name']:
159
            self._log.info('No application found for alert %s. Will Ignore.', alert_body)
160
            return
161
162
        long_description = alert_body['long_description']
163
164
        if self._is_alert_opened(long_description) or \
165
           self._is_escalated_downtime(long_description):
166
167
            # handled opened and escalation to downtime immediately.
168
            payload = {
169
                'alert': alert_body,
170
                'header': hook_headers
171
            }
172
            self._dispatch_trigger(WEB_APP_ALERT_TRIGGER_REF, payload)
173
174
        elif (self._is_alert_closed(long_description) or
175
                self._is_downtime_recovered(long_description)):
176
177
            # handled closed and recovered after a delay.
178
            payload = {
179
                'alert': alert_body,
180
                'header': hook_headers
181
            }
182
            self._log.info('App alert closed. Delay.')
183
            eventlet.spawn_after(self._normal_report_delay, self._dispatch_application_normal,
184
                                 payload)
185
186
        elif (self._is_alert_canceled(long_description) or
187
                self._is_alert_acknowledged(long_description)):
188
189
            # ignore canceled or acknowledged
190
            self._log.info('Ignored alert : %s.', alert_body)
191
192
    def _dispatch_application_normal(self, payload, attempt_no=0):
193
        '''
194
        Dispatches WEB_APP_NORMAL_TRIGGER_REF if the application health_status is 'green'.
195
        '''
196
        # basic guard to avoid queuing up forever.
197
        if attempt_no == 10:
198
            self._log.warning('Abandoning WEB_APP_NORMAL_TRIGGER_REF dispatch. Payload %s', payload)
199
            return
200
        try:
201
            application = self._get_application(payload['alert']['application_name'])
202
            if application['health_status'] in ['green']:
203
                self._dispatch_trigger(WEB_APP_NORMAL_TRIGGER_REF, payload)
204
            else:
205
                self._log.info('Application %s has state %s. Rescheduling normal check.',
206
                               application['name'], application['health_status'])
207
                eventlet.spawn_after(self._normal_report_delay, self._dispatch_application_normal,
208
                                     payload, attempt_no + 1)
209
        except Exception:
210
            self._log.exception('Failed delay dispatch. Payload %s.', payload)
211
212
    def _server_hook_handler(self, alert_body, hook_headers):
213
        long_description = alert_body['long_description']
214
        if self._is_alert_opened(long_description) or \
215
           self._is_escalated_downtime(long_description):
216
217
            payload = {
218
                'alert': alert_body,
219
                'header': hook_headers
220
            }
221
            self._dispatch_trigger(SERVER_ALERT_TRIGGER_REF, payload)
222
223
        elif (self._is_alert_closed(long_description) or
224
                self._is_downtime_recovered(long_description)):
225
226
            payload = {
227
                'alert': alert_body,
228
                'header': hook_headers
229
            }
230
            self._log.info('App alert closed. Delay.')
231
            eventlet.spawn_after(self._normal_report_delay, self._dispatch_server_normal,
232
                                 payload)
233
234
        elif (self._is_alert_canceled(long_description) or
235
                self._is_alert_acknowledged(long_description)):
236
            self._log.info('Ignored alert : %s.', alert_body)
237
238
    def _dispatch_server_normal(self, payload, attempt_no=0):
239
        '''
240
        Dispatches SERVER_NORMAL_TRIGGER_REF if the all servers health_status is 'green'.
241
        '''
242
        # basic guard to avoid queuing up forever.
243
        if attempt_no == 10:
244
            self._log.warning('Abandoning SERVER_NORMAL_TRIGGER_REF dispatch. Payload %s', payload)
245
            return
246
        try:
247
            servers = self._get_servers(payload['alert']['servers'])
248
            # make sure all servers are ok.
249
            all_servers_ok = True
250
            for name, server in six.iteritems(servers):
251
                all_servers_ok &= server['health_status'] in ['green']
252
                if not all_servers_ok:
253
                    break
254
255
            if all_servers_ok:
256
                self._dispatch_trigger(SERVER_NORMAL_TRIGGER_REF, payload)
257
            else:
258
                for server in servers:
259
                    self._log.info('server %s has state %s. Rescheduling normal check.',
260
                                   server['name'], server['health_status'])
261
                eventlet.spawn_after(self._normal_report_delay, self._dispatch_server_normal,
262
                                     payload, attempt_no + 1)
263
        except:
264
            self._log.exception('Failed delay dispatch. Payload %s.', payload)
265
266
    def _dispatch_trigger(self, trigger, payload):
267
        self._sensor_service.dispatch(trigger, payload)
268
        self._log.info('Dispatched %s with payload %s.', trigger, payload)
269
270
    # alert test methods
271
    def _is_alert_opened(self, long_description):
272
        return long_description and long_description.startswith('Alert opened')
273
274
    def _is_alert_closed(self, long_description):
275
        return long_description and long_description.startswith('Alert ended')
276
277
    def _is_alert_canceled(self, long_description):
278
        return long_description and long_description.startswith('Alert canceled')
279
280
    def _is_alert_acknowledged(self, long_description):
281
        return long_description and long_description.startswith('Alert acknowledged')
282
283
    def _is_escalated_downtime(self, long_description):
284
        return long_description and long_description.startswith('Alert escalated to downtime')
285
286
    def _is_downtime_recovered(self, long_description):
287
        return long_description and long_description.startswith('Alert downtime recovered')
288
289
    # newrelic API methods
290
    def _get_application(self, app_name):
291
        params = None
292
        if app_name:
293
            params = {'filter[name]': app_name}
294
        url = urllib_parse.urljoin(self._api_url, 'applications.json')
295
        resp = requests.get(url, headers=self._headers, params=params).json()
296
        if 'applications' in resp:
297
            # pick 1st application
298
            return resp['applications'][0] if resp['applications'] else None
299
        return None
300
301
    def _get_servers(self, server_names):
302
        servers = {}
303
        # No batch query by name support so making API calls in a tight loop. Might be
304
        # ok to get all severs and filter manually but that gets complex for a large number
305
        # of server since the API pages data.
306
        for server_name in server_names:
307
            params = {'filter[name]': server_name}
308
            url = urllib_parse.urljoin(self._api_url, 'servers.json')
309
            resp = requests.get(url, headers=self._headers, params=params).json()
310
            servers[server_name] = resp['servers'][0] if resp['servers'] else None
311
        return servers
312
313
    @staticmethod
314
    def _get_sensor_config_param(config, param_name, default=None):
315
        sensor_config = LegacyNewRelicHookSensor._get_sensor_config(config)
316
        if sensor_config:
317
            return sensor_config.get(param_name, default)
318
        return default
319
320
    @staticmethod
321
    def _get_sensor_config(config):
322
        return config.get('sensor_config', None)
323
324
    @staticmethod
325
    def _get_headers_as_dict(headers):
326
        headers_dict = {}
327
        for k, v in headers:
328
            headers_dict[k] = v
329
        return headers_dict
330
331
    # ignore
332
    def cleanup(self):
333
        pass
334
335
    def add_trigger(self, trigger):
336
        pass
337
338
    def update_trigger(self, trigger):
339
        pass
340
341
    def remove_trigger(self, trigger):
342
        pass
343