Completed
Pull Request — master (#1098)
by Mischa
01:41
created

coalib.bearlib.languages.documentation.extract_documentation_with_docstyle()   F

Complexity

Conditions 18

Size

Total Lines 151

Duplication

Lines 0
Ratio 0 %
Metric Value
cc 18
dl 0
loc 151
rs 2

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like coalib.bearlib.languages.documentation.extract_documentation_with_docstyle() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import re
2
3
from coalib.bearlib.languages.documentation.DocstyleDefinition import (
4
    DocstyleDefinition)
5
from coalib.bearlib.languages.documentation.DocumentationComment import (
6
    DocumentationComment)
7
from coalib.results.TextRange import TextRange
8
9
10
# Used to break out of outer loops via exception raise.
11
class _BreakOut(Exception):
12
    pass
13
14
15
def _compile_multi_match_regex(strings):
16
    """
17
    Compiles a regex object that matches each of the given strings.
18
19
    :param strings: The strings to match.
20
    :return:        A regex object.
21
    """
22
    return re.compile("|".join(re.escape(strings)))
23
24
25
def extract_documentation_with_docstyle(content, docstyle_definition):
26
    """
27
    Extracts all documentation texts inside the given source-code-string.
28
29
    :param content:             The source-code-string where to extract
30
                                documentation from or an iterable with strings
31
                                where each string is a single line (including
32
                                ending whitespaces like `\\n`).
33
    :param docstyle_definition: The DocstyleDefinition that identifies the
34
                                documentation comments.
35
    :return:                    An iterator returning each documentation text
36
                                found in the content.
37
    """
38
    if isinstance(content, str):
39
        content = content.splitlines(keepends=True)
40
    else:
41
        content = list(content)
42
43
    # Prepare marker-tuple dict that maps a begin pattern to the corresponding
44
    # marker_set(s). This makes it faster to retrieve a marker-set from a
45
    # begin sequence we initially want to search for in source code. Then
46
    # the possible found documentation match is processed further with the
47
    # rest markers.
48
    begin_sequence_dict = {}
49
    for marker_set in docstyle_definition.markers:
50
        if marker_set[0] not in begin_sequence_dict:
51
            begin_sequence_dict[marker_set[0]] = [marker_set]
52
        else:
53
            begin_sequence_dict[marker_set[0]].append(marker_set)
54
55
    # Using regexes to perform a variable match is faster than finding each
56
    # substring with `str.find()` choosing the lowest match.
57
    begin_regex = _compile_multi_match_regex(
58
        marker_set[0] for marker_set in docstyle_definition.markers)
59
60
    line = 0
61
    line_pos = 0
62
    while line < len(content):
63
        begin_match = begin_regex.search(content[line], line_pos)
64
65
        if begin_match:
66
            begin_match_line = line
67
            # Prevents infinite loop when the start marker matches but not the
68
            # complete documentation comment.
69
            line_pos = begin_match.end()
70
71
            # begin_sequence_dict[begin_match.group()] returns the marker set
72
            # the begin sequence from before matched.
73
            for marker_set in begin_sequence_dict[begin_match.group()]:
74
                try:
75
                    # If the each-line marker and the end marker do equal,
76
                    # search for the each-line marker until it runs out.
77
                    if marker_set[1] == marker_set[2]:
78
                        docstring = content[line][begin_match.end():]
79
80
                        line2 = line + 1
81
                        stripped_content = content[line2].lstrip()
82
83
                        # Now the each-line marker is no requirement for a
84
                        # docstring any more, just extract as long as there are
85
                        # no each-line markers any more.
86
                        while (stripped_content[:len(marker_set[1])] ==
87
                               marker_set[1]):
88
                            docstring += stripped_content[len(marker_set[1]):]
89
90
                            line2 += 1
91
                            if line2 >= len(content):
92
                                # End of content reached, done with
93
                                # doc-extraction.
94
                                break
95
96
                            stripped_content = content[line2].lstrip()
97
98
                        line = line2 - 1
99
                        line_pos = len(content[line])
100
                    else:
101
                        end_marker_pos = content[line].find(marker_set[2],
102
                                                            begin_match.end())
103
104
                        if end_marker_pos == -1:
105
                            docstring = content[line][begin_match.end():]
106
107
                            line2 = line + 1
108
                            if line2 >= len(content):
109
                                continue
110
111
                            end_marker_pos = content[line2].find(marker_set[2])
112
113
                            while end_marker_pos == -1:
114
                                if marker_set[1] == "":
115
                                    # When no each-line marker is set (i.e. for
116
                                    # Python docstrings), then align the
117
                                    # comment to the start-marker.
118
                                    stripped_content = (
119
                                        content[line2][begin_match.start():])
120
                                else:
121
                                    # Check whether we violate the each-line
122
                                    # marker "rule".
123
                                    current_each_line_marker = (content[line2]
124
                                        [begin_match.start():
125
                                         begin_match.start()
126
                                             + len(marker_set[1])])
127
                                    if (current_each_line_marker !=
128
                                            marker_set[1]):
129
                                        # Effectively a 'continue' for the
130
                                        # outer for-loop.
131
                                        raise _BreakOut
132
133
                                    stripped_content = (
134
                                        content[line2][begin_match.start()
135
                                                       + len(marker_set[1]):])
136
137
                                docstring += stripped_content
138
                                line2 += 1
139
140
                                if line2 >= len(content):
141
                                    # End of content reached, so there's no
142
                                    # closing marker and that's a mismatch.
143
                                    raise _BreakOut
144
145
                                end_marker_pos = content[line2].find(
146
                                    marker_set[2])
147
148
                            docstring += (content[line2]
149
                                [begin_match.start():end_marker_pos])
150
                            line = line2
151
                        else:
152
                            docstring = (content[line]
153
                                [begin_match.end():end_marker_pos])
154
155
                        line_pos = end_marker_pos + len(marker_set[2])
156
157
                    rng = TextRange.from_values(begin_match_line + 1,
158
                                                begin_match.start() + 1,
159
                                                line + 1,
160
                                                line_pos + 1)
161
162
                    yield DocumentationComment(docstring,
163
                                               docstyle_definition,
164
                                               marker_set,
165
                                               rng)
166
167
                    break
168
169
                except _BreakOut:
170
                    # Continues the marker_set loop.
171
                    pass
172
173
        else:
174
            line += 1
175
            line_pos = 0
176
177
178
def extract_documentation(content, language, docstyle):
179
    """
180
    Extracts all documentation texts inside the given source-code-string using
181
    the coala docstyle definition files.
182
183
    The documentation texts are sorted by their order appearing in `content`.
184
185
    For more information about how documentation comments are identified and
186
    extracted, see DocstyleDefinition.doctypes enumeration.
187
188
    :param content:            The source-code-string where to extract
189
                               documentation from.
190
    :param language:           The programming language used.
191
    :param docstyle:           The documentation style/tool used
192
                               (i.e. doxygen).
193
    :raises FileNotFoundError: Raised when the docstyle definition file was not
194
                               found. This is a compatability exception from
195
                               `coalib.misc.Compatability` module.
196
    :raises KeyError:          Raised when the given language is not defined in
197
                               given docstyle.
198
    :raises ValueError:        Raised when a docstyle definition setting has an
199
                               invalid format.
200
    :return:                   An iterator returning each DocumentationComment
201
                               found in the content.
202
    """
203
    docstyle_definition = DocstyleDefinition.load(language, docstyle)
204
    return extract_documentation_with_docstyle(content, docstyle_definition)
205