Passed
Pull Request — master (#595)
by
unknown
13:12
created

EvaluationPlaneHandler._post_impl()   D

Complexity

Conditions 13

Size

Total Lines 74
Code Lines 56

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 33
CRAP Score 16.8141

Importance

Changes 0
Metric Value
eloc 56
dl 0
loc 74
ccs 33
cts 46
cp 0.7174
rs 4.2
c 0
b 0
f 0
cc 13
nop 1
crap 16.8141

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like tabpy.tabpy_server.handlers.evaluation_plane_handler.EvaluationPlaneHandler._post_impl() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1 1
import pandas
2 1
import pyarrow
3 1
import uuid
4
5 1
from tabpy.tabpy_server.handlers import BaseHandler, arrow_client
6 1
import json
7 1
import simplejson
8 1
import logging
9 1
from tabpy.tabpy_server.common.util import format_exception
10 1
import requests
11 1
from tornado import gen
12 1
from datetime import timedelta
13 1
from tabpy.tabpy_server.handlers.util import AuthErrorStates
14
15 1
class RestrictedTabPy:
16 1
    def __init__(self, protocol, port, logger, timeout, headers):
17 1
        self.protocol = protocol
18 1
        self.port = port
19 1
        self.logger = logger
20 1
        self.timeout = timeout
21 1
        self.headers = headers
22
23 1
    def query(self, name, *args, **kwargs):
24
        url = f"{self.protocol}://localhost:{self.port}/query/{name}"
25
        self.logger.log(logging.DEBUG, f"Querying {url}...")
26
        internal_data = {"data": args or kwargs}
27
        data = json.dumps(internal_data)
28
        headers = self.headers
29
        response = requests.post(
30
            url=url, data=data, headers=headers, timeout=self.timeout, verify=False
31
        )
32
        return response.json()
33
34
35 1
class EvaluationPlaneDisabledHandler(BaseHandler):
36
    """
37
    EvaluationPlaneDisabledHandler responds with error message when ad-hoc scripts have been disabled.
38
    """
39
40 1
    def initialize(self, executor, app):
41 1
        super(EvaluationPlaneDisabledHandler, self).initialize(app)
42 1
        self.executor = executor
43
44 1
    @gen.coroutine
45
    def post(self):
46 1
        if self.should_fail_with_auth_error() != AuthErrorStates.NONE:
47 1
            self.fail_with_auth_error()
48 1
            return
49 1
        self.error_out(404, "Ad-hoc scripts have been disabled on this analytics extension, please contact your "
50
                            "administrator.")
51
52
53 1
class EvaluationPlaneHandler(BaseHandler):
54
    """
55
    EvaluationPlaneHandler is responsible for running arbitrary python scripts.
56
    """
57
58 1
    def initialize(self, executor, app):
59 1
        super(EvaluationPlaneHandler, self).initialize(app)
60 1
        self.executor = executor
61 1
        self._error_message_timeout = (
62
            f"User defined script timed out. "
63
            f"Timeout is set to {self.eval_timeout} s."
64
        )
65
66 1
    @gen.coroutine
67
    def _post_impl(self):
68 1
        body = json.loads(self.request.body.decode("utf-8"))
69 1
        self.logger.log(logging.DEBUG, f"Processing POST request...")
70 1
        if "script" not in body:
71 1
            self.error_out(400, "Script is empty.")
72 1
            return
73
74
        # Transforming user script into a proper function.
75 1
        user_code = body["script"]
76 1
        arguments = None
77 1
        arguments_str = ""
78 1
        if "dataPath" in body:
79
            # arrow flight scenario
80
            arrow_data = self.get_arrow_data(body["dataPath"])
81
            if arrow_data is not None:
82
                arguments = {"_arg1": arrow_data}
83 1
        elif "data" in body:
84
            # backwarding
85 1
            arguments = body["data"]
86
87 1
        if arguments is not None:
88 1
            if not isinstance(arguments, dict):
89
                self.error_out(
90
                    400, "Script parameters need to be provided as a dictionary."
91
                )
92
                return
93 1
            args_in = sorted(arguments.keys())
94 1
            n = len(arguments)
95 1
            if sorted('_arg'+str(i+1) for i in range(n)) == args_in:
96 1
                arguments_str = ", " + ", ".join(args_in)
97
            else:
98 1
                self.error_out(
99
                    400,
100
                    "Variables names should follow "
101
                    "the format _arg1, _arg2, _argN",
102
                )
103 1
                return
104 1
        function_to_evaluate = f"def _user_script(tabpy{arguments_str}):\n"
105 1
        for u in user_code.splitlines():
106 1
            function_to_evaluate += " " + u + "\n"
107
108 1
        self.logger.log(
109
            logging.INFO, f"function to evaluate={function_to_evaluate}"
110
        )
111
112 1
        try:
113 1
            result = yield self._call_subprocess(function_to_evaluate, arguments)
114 1
        except (
115
            gen.TimeoutError,
116
            requests.exceptions.ConnectTimeout,
117
            requests.exceptions.ReadTimeout,
118
        ):
119
            self.logger.log(logging.ERROR, self._error_message_timeout)
120
            self.error_out(408, self._error_message_timeout)
121
            return
122
123 1
        if result is not None:
124 1
            if "dataPath" in body:
125
                # arrow flight scenario
126
                output_data_id = str(uuid.uuid4())
127
                self.upload_arrow_data(result, output_data_id, {
128
                    'removeOnDelete': 'True',
129
                    'linkedIDs': body["dataPath"]
130
                })
131
                result = { 'outputDataPath': output_data_id }
132
                self.logger.log(logging.WARN, f'outputDataPath={output_data_id}')
133
            else:
134 1
                if isinstance(result, pandas.DataFrame):
135
                    result = result.to_dict(orient='list')
136 1
            self.write(simplejson.dumps(result, ignore_nan=True))
137
        else:
138 1
            self.write("null")
139 1
        self.finish()
140
141 1
    def get_arrow_data(self, filename):
142
        scheme = "grpc+tcp"
143
        host = "localhost"
144
        port = 13622
145
146
        connection_args = {}
147
        client = pyarrow.flight.FlightClient(f"{scheme}://{host}:{port}", **connection_args)
148
        return arrow_client.get_flight_by_path(filename, client)
149
150 1
    def upload_arrow_data(self, data, filename, metadata):
151
        scheme = "grpc+tcp"
152
        host = "localhost"
153
        port = 13622
154
155
        connection_args = {}
156
        client = pyarrow.flight.FlightClient(f"{scheme}://{host}:{port}", **connection_args)
157
        return arrow_client.upload_data(client, data, filename, metadata)
158
159 1
    @gen.coroutine
160
    def post(self):
161 1
        if self.should_fail_with_auth_error() != AuthErrorStates.NONE:
162 1
            self.fail_with_auth_error()
163 1
            return
164
165 1
        self._add_CORS_header()
166 1
        try:
167 1
            yield self._post_impl()
168 1
        except Exception as e:
169 1
            import traceback
170 1
            print(traceback.format_exc())
171 1
            err_msg = f"{e.__class__.__name__} : {str(e)}"
172 1
            if err_msg != "KeyError : 'response'":
173 1
                err_msg = format_exception(e, "POST /evaluate")
174 1
                self.error_out(500, "Error processing script", info=err_msg)
175
            else:
176
                self.error_out(
177
                    404,
178
                    "Error processing script",
179
                    info="The endpoint you're "
180
                    "trying to query did not respond. Please make sure the "
181
                    "endpoint exists and the correct set of arguments are "
182
                    "provided.",
183
                )
184
185 1
    @gen.coroutine
186
    def _call_subprocess(self, function_to_evaluate, arguments):
187 1
        restricted_tabpy = RestrictedTabPy(
188
            self.protocol, self.port, self.logger, self.eval_timeout, self.request.headers
189
        )
190
        # Exec does not run the function, so it does not block.
191 1
        exec(function_to_evaluate, globals())
192
193
        # 'noqa' comments below tell flake8 to ignore undefined _user_script
194
        # name - the name is actually defined with user script being wrapped
195
        # in _user_script function (constructed as a striong) and then executed
196
        # with exec() call above.
197 1
        future = self.executor.submit(_user_script,  # noqa: F821
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable _user_script does not seem to be defined.
Loading history...
198
                                      restricted_tabpy,
199
                                      **arguments if arguments is not None else None)
200
201 1
        ret = yield gen.with_timeout(timedelta(seconds=self.eval_timeout), future)
202
        raise gen.Return(ret)
203