coalib.bearlib.languages.documentation.extract_documentation_with_docstyle() - Code Metrics - Inspection of "Makman2/doc format bear6" - coala-analyzer/coala - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Pull Request — master (#1098)

by Mischa

created 2015-12-12 17:00 UTC

coalib.bearlib.languages.documentation.extract_documentation_with_docstyle() F

↳ Parent: Project

Complexity

Conditions

Size

Total Lines

151

Duplication

Lines	0
Ratio	0 %

Metric	Value
cc	18
dl	0
loc	151
rs	2

How to fix Long Method Complexity

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

If many parameters/temporary variables are present:
- Replace temporary variables with Query
- Introduce parameter object; often combined with preserve whole object
- If the above two are insufficient: Replace method with method object
If you have long conditionals: Decompose Conditional
Otherwise: Extract method

Complexity

Complex classes like coalib.bearlib.languages.documentation.extract_documentation_with_docstyle() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

import re

from coalib.bearlib.languages.documentation.DocstyleDefinition import (
    DocstyleDefinition)
from coalib.bearlib.languages.documentation.DocumentationComment import (
    DocumentationComment)
from coalib.results.TextRange import TextRange


# Used to break out of outer loops via exception raise.
class _BreakOut(Exception):
    pass


def _compile_multi_match_regex(strings):
    """
    Compiles a regex object that matches each of the given strings.

    :param strings: The strings to match.
    :return:        A regex object.
    """
    return re.compile("|".join(re.escape(strings)))


def extract_documentation_with_docstyle(content, docstyle_definition):
    """
    Extracts all documentation texts inside the given source-code-string.

    :param content:             The source-code-string where to extract
                                documentation from or an iterable with strings
                                where each string is a single line (including
                                ending whitespaces like `\\n`).
    :param docstyle_definition: The DocstyleDefinition that identifies the
                                documentation comments.
    :return:                    An iterator returning each documentation text
                                found in the content.
    """
    if isinstance(content, str):
        content = content.splitlines(keepends=True)
    else:
        content = list(content)

    # Prepare marker-tuple dict that maps a begin pattern to the corresponding
    # marker_set(s). This makes it faster to retrieve a marker-set from a
    # begin sequence we initially want to search for in source code. Then
    # the possible found documentation match is processed further with the
    # rest markers.
    begin_sequence_dict = {}
    for marker_set in docstyle_definition.markers:
        if marker_set[0] not in begin_sequence_dict:
            begin_sequence_dict[marker_set[0]] = [marker_set]
        else:
            begin_sequence_dict[marker_set[0]].append(marker_set)

    # Using regexes to perform a variable match is faster than finding each
    # substring with `str.find()` choosing the lowest match.
    begin_regex = _compile_multi_match_regex(
        marker_set[0] for marker_set in docstyle_definition.markers)

    line = 0
    line_pos = 0
    while line < len(content):
        begin_match = begin_regex.search(content[line], line_pos)

        if begin_match:
            begin_match_line = line
            # Prevents infinite loop when the start marker matches but not the
            # complete documentation comment.
            line_pos = begin_match.end()

            # begin_sequence_dict[begin_match.group()] returns the marker set
            # the begin sequence from before matched.
            for marker_set in begin_sequence_dict[begin_match.group()]:
                try:
                    # If the each-line marker and the end marker do equal,
                    # search for the each-line marker until it runs out.
                    if marker_set[1] == marker_set[2]:
                        docstring = content[line][begin_match.end():]

                        line2 = line + 1
                        stripped_content = content[line2].lstrip()

                        # Now the each-line marker is no requirement for a
                        # docstring any more, just extract as long as there are
                        # no each-line markers any more.
                        while (stripped_content[:len(marker_set[1])] ==
                               marker_set[1]):
                            docstring += stripped_content[len(marker_set[1]):]

                            line2 += 1
                            if line2 >= len(content):
                                # End of content reached, done with
                                # doc-extraction.
                                break

                            stripped_content = content[line2].lstrip()

                        line = line2 - 1
                        line_pos = len(content[line])
                    else:
                        end_marker_pos = content[line].find(marker_set[2],
                                                            begin_match.end())

                        if end_marker_pos == -1:
                            docstring = content[line][begin_match.end():]

                            line2 = line + 1
                            if line2 >= len(content):
                                continue

                            end_marker_pos = content[line2].find(marker_set[2])

                            while end_marker_pos == -1:
                                if marker_set[1] == "":
                                    # When no each-line marker is set (i.e. for
                                    # Python docstrings), then align the
                                    # comment to the start-marker.
                                    stripped_content = (
                                        content[line2][begin_match.start():])
                                else:
                                    # Check whether we violate the each-line
                                    # marker "rule".
                                    current_each_line_marker = (content[line2]
                                        [begin_match.start():
                                         begin_match.start()
                                             + len(marker_set[1])])
                                    if (current_each_line_marker !=
                                            marker_set[1]):
                                        # Effectively a 'continue' for the
                                        # outer for-loop.
                                        raise _BreakOut

                                    stripped_content = (
                                        content[line2][begin_match.start()
                                                       + len(marker_set[1]):])

                                docstring += stripped_content
                                line2 += 1

                                if line2 >= len(content):
                                    # End of content reached, so there's no
                                    # closing marker and that's a mismatch.
                                    raise _BreakOut

                                end_marker_pos = content[line2].find(
                                    marker_set[2])

                            docstring += (content[line2]
                                [begin_match.start():end_marker_pos])
                            line = line2
                        else:
                            docstring = (content[line]
                                [begin_match.end():end_marker_pos])

                        line_pos = end_marker_pos + len(marker_set[2])

                    rng = TextRange.from_values(begin_match_line + 1,
                                                begin_match.start() + 1,
                                                line + 1,
                                                line_pos + 1)

                    yield DocumentationComment(docstring,
                                               docstyle_definition,
                                               marker_set,
                                               rng)

                    break

                except _BreakOut:
                    # Continues the marker_set loop.
                    pass

        else:
            line += 1
            line_pos = 0


def extract_documentation(content, language, docstyle):
    """
    Extracts all documentation texts inside the given source-code-string using
    the coala docstyle definition files.

    The documentation texts are sorted by their order appearing in `content`.

    For more information about how documentation comments are identified and
    extracted, see DocstyleDefinition.doctypes enumeration.

    :param content:            The source-code-string where to extract
                               documentation from.
    :param language:           The programming language used.
    :param docstyle:           The documentation style/tool used
                               (i.e. doxygen).
    :raises FileNotFoundError: Raised when the docstyle definition file was not
                               found. This is a compatability exception from
                               `coalib.misc.Compatability` module.
    :raises KeyError:          Raised when the given language is not defined in
                               given docstyle.
    :raises ValueError:        Raised when a docstyle definition setting has an
                               invalid format.
    :return:                   An iterator returning each DocumentationComment
                               found in the content.
    """
    docstyle_definition = DocstyleDefinition.load(language, docstyle)
    return extract_documentation_with_docstyle(content, docstyle_definition)


1			import re
2
3			from coalib.bearlib.languages.documentation.DocstyleDefinition import (
4			DocstyleDefinition)
5			from coalib.bearlib.languages.documentation.DocumentationComment import (
6			DocumentationComment)
7			from coalib.results.TextRange import TextRange
8
9
10			# Used to break out of outer loops via exception raise.
11			class _BreakOut(Exception):
12			pass
13
14
15			def _compile_multi_match_regex(strings):
16			"""
17			Compiles a regex object that matches each of the given strings.
18
19			:param strings: The strings to match.
20			:return: A regex object.
21			"""
22			return re.compile("\|".join(re.escape(strings)))
23
24
25			def extract_documentation_with_docstyle(content, docstyle_definition):
26			"""
27			Extracts all documentation texts inside the given source-code-string.
28
29			:param content: The source-code-string where to extract
30			documentation from or an iterable with strings
31			where each string is a single line (including
32			ending whitespaces like `\\n`).
33			:param docstyle_definition: The DocstyleDefinition that identifies the
34			documentation comments.
35			:return: An iterator returning each documentation text
36			found in the content.
37			"""
38			if isinstance(content, str):
39			content = content.splitlines(keepends=True)
40			else:
41			content = list(content)
42
43			# Prepare marker-tuple dict that maps a begin pattern to the corresponding
44			# marker_set(s). This makes it faster to retrieve a marker-set from a
45			# begin sequence we initially want to search for in source code. Then
46			# the possible found documentation match is processed further with the
47			# rest markers.
48			begin_sequence_dict = {}
49			for marker_set in docstyle_definition.markers:
50			if marker_set[0] not in begin_sequence_dict:
51			begin_sequence_dict[marker_set[0]] = [marker_set]
52			else:
53			begin_sequence_dict[marker_set[0]].append(marker_set)
54
55			# Using regexes to perform a variable match is faster than finding each
56			# substring with `str.find()` choosing the lowest match.
57			begin_regex = _compile_multi_match_regex(
58			marker_set[0] for marker_set in docstyle_definition.markers)
59
60			line = 0
61			line_pos = 0
62			while line < len(content):
63			begin_match = begin_regex.search(content[line], line_pos)
64
65			if begin_match:
66			begin_match_line = line
67			# Prevents infinite loop when the start marker matches but not the
68			# complete documentation comment.
69			line_pos = begin_match.end()
70
71			# begin_sequence_dict[begin_match.group()] returns the marker set
72			# the begin sequence from before matched.
73			for marker_set in begin_sequence_dict[begin_match.group()]:
74			try:
75			# If the each-line marker and the end marker do equal,
76			# search for the each-line marker until it runs out.
77			if marker_set[1] == marker_set[2]:
78			docstring = content[line][begin_match.end():]
79
80			line2 = line + 1
81			stripped_content = content[line2].lstrip()
82
83			# Now the each-line marker is no requirement for a
84			# docstring any more, just extract as long as there are
85			# no each-line markers any more.
86			while (stripped_content[:len(marker_set[1])] ==
87			marker_set[1]):
88			docstring += stripped_content[len(marker_set[1]):]
89
90			line2 += 1
91			if line2 >= len(content):
92			# End of content reached, done with
93			# doc-extraction.
94			break
95
96			stripped_content = content[line2].lstrip()
97
98			line = line2 - 1
99			line_pos = len(content[line])
100			else:
101			end_marker_pos = content[line].find(marker_set[2],
102			begin_match.end())
103
104			if end_marker_pos == -1:
105			docstring = content[line][begin_match.end():]
106
107			line2 = line + 1
108			if line2 >= len(content):
109			continue
110
111			end_marker_pos = content[line2].find(marker_set[2])
112
113			while end_marker_pos == -1:
114			if marker_set[1] == "":
115			# When no each-line marker is set (i.e. for
116			# Python docstrings), then align the
117			# comment to the start-marker.
118			stripped_content = (
119			content[line2][begin_match.start():])
120			else:
121			# Check whether we violate the each-line
122			# marker "rule".
123			current_each_line_marker = (content[line2]
124			[begin_match.start():
125			begin_match.start()
126			+ len(marker_set[1])])
127			if (current_each_line_marker !=
128			marker_set[1]):
129			# Effectively a 'continue' for the
130			# outer for-loop.
131			raise _BreakOut
132
133			stripped_content = (
134			content[line2][begin_match.start()
135			+ len(marker_set[1]):])
136
137			docstring += stripped_content
138			line2 += 1
139
140			if line2 >= len(content):
141			# End of content reached, so there's no
142			# closing marker and that's a mismatch.
143			raise _BreakOut
144
145			end_marker_pos = content[line2].find(
146			marker_set[2])
147
148			docstring += (content[line2]
149			[begin_match.start():end_marker_pos])
150			line = line2
151			else:
152			docstring = (content[line]
153			[begin_match.end():end_marker_pos])
154
155			line_pos = end_marker_pos + len(marker_set[2])
156
157			rng = TextRange.from_values(begin_match_line + 1,
158			begin_match.start() + 1,
159			line + 1,
160			line_pos + 1)
161
162			yield DocumentationComment(docstring,
163			docstyle_definition,
164			marker_set,
165			rng)
166
167			break
168
169			except _BreakOut:
170			# Continues the marker_set loop.
171			pass
172
173			else:
174			line += 1
175			line_pos = 0
176
177
178			def extract_documentation(content, language, docstyle):
179			"""
180			Extracts all documentation texts inside the given source-code-string using
181			the coala docstyle definition files.
182
183			The documentation texts are sorted by their order appearing in `content`.
184
185			For more information about how documentation comments are identified and
186			extracted, see DocstyleDefinition.doctypes enumeration.
187
188			:param content: The source-code-string where to extract
189			documentation from.
190			:param language: The programming language used.
191			:param docstyle: The documentation style/tool used
192			(i.e. doxygen).
193			:raises FileNotFoundError: Raised when the docstyle definition file was not
194			found. This is a compatability exception from
195			`coalib.misc.Compatability` module.
196			:raises KeyError: Raised when the given language is not defined in
197			given docstyle.
198			:raises ValueError: Raised when a docstyle definition setting has an
199			invalid format.
200			:return: An iterator returning each DocumentationComment
201			found in the content.
202			"""
203			docstyle_definition = DocstyleDefinition.load(language, docstyle)
204			return extract_documentation_with_docstyle(content, docstyle_definition)
205

coala-analyzer / coala

Pull Request — master (#1098)

coalib.bearlib.languages.documentation.extract_documentation_with_docstyle() F

Complexity

Size

Duplication

How to fix Long Method Complexity

Long Method

Complexity

Duplication Side-by-Side

Filter issues like