Passed
Pull Request — master (#18)
by Dafne van
03:18
created

e2edutch.predict.main()   D

Complexity

Conditions 13

Size

Total Lines 75
Code Lines 59

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
eloc 59
dl 0
loc 75
rs 4.2
c 0
b 0
f 0
cc 13
nop 1

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like e2edutch.predict.main() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import sys
0 ignored issues
show
introduced by
Missing module docstring
Loading history...
2
import json
3
import os
4
import io
5
import collections
6
import argparse
7
import logging
8
9
from e2edutch import conll
10
from e2edutch import minimize
11
from e2edutch import util
12
from e2edutch import coref_model as cm
13
from e2edutch import naf
14
15
import tensorflow as tf
0 ignored issues
show
introduced by
third party import "import tensorflow as tf" should be placed before "from e2edutch import conll"
Loading history...
16
17
18
class Predictor(object):
0 ignored issues
show
introduced by
Missing class docstring
Loading history...
introduced by
Class 'Predictor' inherits from object, can be safely removed from bases in python3
Loading history...
19
    def __init__(self, model_name='best', cfg_file=None):
20
        self.config = util.initialize_from_env(model_name, cfg_file)
21
        self.session = tf.compat.v1.Session()
22
        self.model = cm.CorefModel(self.config)
23
        self.model.restore(self.session)
24
25
    def predict(self, example):
26
        """
27
        Predict coreference spans for a tokenized text.
28
29
30
        Args:
31
            example (dict): dict with the following fields:
32
                              sentences ([[str]])
33
                              doc_id (str)
34
                              clusters ([[(int, int)]]) (optional)
35
36
        Returns:
37
            [[(int, int)]]: a list of clusters. The items of the cluster are
38
                            spans, denoted by their start end end token index
39
40
        """
41
        tensorized_example = self.model.tensorize_example(
42
            example, is_training=False)
43
        feed_dict = {i: t for i, t in zip(
0 ignored issues
show
Unused Code introduced by
Unnecessary use of a comprehension
Loading history...
44
            self.model.input_tensors, tensorized_example)}
45
        _, _, _, top_span_starts, top_span_ends, top_antecedents, top_antecedent_scores = self.session.run(
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (107/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
46
            self.model.predictions, feed_dict=feed_dict)
47
        predicted_antecedents = self.model.get_predicted_antecedents(
48
            top_antecedents, top_antecedent_scores)
49
        predicted_clusters, _ = self.model.get_predicted_clusters(
50
            top_span_starts, top_span_ends, predicted_antecedents)
51
52
        return predicted_clusters
53
54
    def end_session(self):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
55
        self.session.close()
56
57
58
def get_parser():
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
59
    parser = argparse.ArgumentParser()
60
    parser.add_argument('config')
61
    parser.add_argument('input_filename')
62
    parser.add_argument('-o', '--output_file',
63
                        type=argparse.FileType('w'), default=sys.stdout)
64
    parser.add_argument('-f', '--format_out', default='conll',
65
                        choices=['conll', 'jsonlines', 'naf'])
66
    parser.add_argument('-c', '--word_col', type=int, default=2)
67
    parser.add_argument('--cfg_file',
68
                        type=str,
69
                        default=None,
70
                        help="config file")
71
    parser.add_argument('-v', '--verbose', action='store_true')
72
    return parser
73
74
75
def read_jsonlines(input_filename):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
76
    for line in open(input_filename).readlines():
77
        example = json.loads(line)
78
        yield example
79
80
81
def main(args=None):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
Comprehensibility introduced by
This function exceeds the maximum number of variables (21/15).
Loading history...
82
    parser = get_parser()
83
    args = parser.parse_args()
84
    if args.verbose:
85
        logging.basicConfig(level=logging.DEBUG)
86
    # config = util.initialize_from_env(args.config, args.cfg_file)
87
88
    # Input file in .jsonlines format or .conll.
89
    input_filename = args.input_filename
90
91
    ext_input = os.path.splitext(input_filename)[-1]
92
    if ext_input not in ['.conll', '.jsonlines', '.txt', '.naf']:
93
        raise Exception(
94
            'Input file should be .naf, .conll, .txt or .jsonlines, but is {}.'
95
            .format(ext_input))
96
97
    if ext_input == '.conll':
98
        labels = collections.defaultdict(set)
99
        stats = collections.defaultdict(int)
100
        docs = minimize.minimize_partition(
101
            input_filename, labels, stats, args.word_col)
102
    elif ext_input == '.jsonlines':
103
        docs = read_jsonlines(input_filename)
104
    elif ext_input == '.naf':
105
        naf_obj = naf.get_naf(input_filename)
106
        jsonlines_obj, term_ids, tok_ids = naf.get_jsonlines(naf_obj)
0 ignored issues
show
Unused Code introduced by
The variable tok_ids seems to be unused.
Loading history...
107
        docs = [jsonlines_obj]
108
    else:
109
        text = open(input_filename).read()
110
        docs = [util.create_example(text)]
111
112
    output_file = args.output_file
113
    predictor = Predictor(args.config, args.cfg_file)
114
    sentences = {}
115
    predictions = {}
116
    for example_num, example in enumerate(docs):
117
        # logging.info(example['doc_key'])
118
        example["predicted_clusters"], _ = predictor.predict(example)
119
        if args.format_out == 'jsonlines':
120
            output_file.write(json.dumps(example))
121
            output_file.write("\n")
122
        else:
123
            predictions[example['doc_key']] = example["predicted_clusters"]
124
            sentences[example['doc_key']] = example["sentences"]
125
        if example_num % 100 == 0:
126
            logging.info("Decoded {} examples.".format(example_num + 1))
0 ignored issues
show
introduced by
Use lazy % formatting in logging functions
Loading history...
127
    if args.format_out == 'conll':
128
        conll.output_conll(output_file, sentences, predictions)
129
    elif args.format_out == 'naf':
130
        # Check number of docs - what to do if multiple?
131
        # Create naf obj if input format was not naf
132
        if ext_input != '.naf':
133
            # To do: add linguistic processing layers for terms and tokens
134
            logging.warn(
0 ignored issues
show
Coding Style Best Practice introduced by
Use lazy % formatting in logging functions
Loading history...
introduced by
Using deprecated method warn()
Loading history...
135
                'Outputting NAF when input was not naf,'
136
                + 'no dependency information available')
137
            for doc_key in sentences:
138
                naf_obj, term_ids = naf.get_naf_from_sentences(
139
                    sentences[doc_key])
140
                naf_obj = naf.create_coref_layer(
141
                    naf_obj, predictions[doc_key], term_ids)
142
                naf_obj = naf.add_linguistic_processors(naf_obj)
143
                buffer = io.BytesIO()
144
                naf_obj.dump(buffer)
145
                output_file.write(buffer.getvalue().decode('utf-8'))
146
                # To do, make sepearate outputs?
147
                # TO do, use dependency information from conll?
148
        else:
149
            # We only have one input doc
150
            naf_obj = naf.create_coref_layer(
151
                naf_obj, example["predicted_clusters"], term_ids)
0 ignored issues
show
introduced by
The variable example does not seem to be defined in case the for loop on line 116 is not entered. Are you sure this can never be the case?
Loading history...
introduced by
The variable term_ids does not seem to be defined for all execution paths.
Loading history...
introduced by
The variable naf_obj does not seem to be defined for all execution paths.
Loading history...
Bug introduced by
The loop variable example might not be defined here.
Loading history...
152
            naf_obj = naf.add_linguistic_processors(naf_obj)
153
            buffer = io.BytesIO()
154
            naf_obj.dump(buffer)
155
            output_file.write(buffer.getvalue().decode('utf-8'))
156
157
158
if __name__ == "__main__":
159
    main()
160