osm_poi_matchmaker.libs.address.clean_street()   B
last analyzed

Complexity

Conditions 2

Size

Total Lines 165
Code Lines 159

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 2
eloc 159
nop 1
dl 0
loc 165
rs 7
c 0
b 0
f 0

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
# -*- coding: utf-8 -*-
0 ignored issues
show
introduced by
Missing module docstring
Loading history...
2
3
try:
4
    import logging
5
    import sys
6
    import re
7
    import phonenumbers
8
    import json
9
    from functools import reduce
10
except ImportError as err:
11
    logging.error('Error %s import module: %s', __name__, err)
12
    logging.exception('Exception occurred')
13
14
    sys.exit(128)
15
16
# Patterns for re
17
PATTERN_POSTCODE_CITY = re.compile('^((\d){4})([.\s]{0,2})([a-zA-ZáÁéÉíÍóÓúÚüÜöÖőŐűŰ]{3,40})')
0 ignored issues
show
Bug introduced by
A suspicious escape sequence \d was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
18
PATTERN_CITY_ADDRESS = re.compile('^([a-zA-ZáÁéÉíÍóÓúÚüÜöÖőŐűŰ]{3,40})')
19
PATTERN_CITY = re.compile('\s?[XVI]{1,5}[.:,]{0,3}\s*$')
0 ignored issues
show
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
20
PATTERN_JS_2 = re.compile('\s*;\s*$')
0 ignored issues
show
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
21
PATTERN_HOUSENUMBER = re.compile('[0-9]{1,3}(\/[A-z]{1}|\-[0-9]{1,3}|)', re.IGNORECASE)
0 ignored issues
show
Bug introduced by
A suspicious escape sequence \/ was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \- was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
22
PATTERN_CONSCRIPTIONNUMBER = re.compile(
23
    '(hrsz[.:]{0,2}\s*([0-9]{2,6}(\/[0-9]{1,3}){0,1})[.]{0,1}|\s*([0-9]{2,6}(\/[0-9]{1,3}){0,1})[.]{0,1}\s*hrsz[s.]{0,1})',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (123/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \/ was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
24
    re.IGNORECASE)
25
PATTERN_CONSCRIPTIONNUMBER_1 = re.compile('(hrsz[.:]{0,2}\s*([0-9]{2,6}(\/[0-9]{1,3}){0,1})[.]{0,1})', re.IGNORECASE)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (117/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \/ was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
26
PATTERN_CONSCRIPTIONNUMBER_2 = re.compile('(\s*([0-9]{2,6}(\/[0-9]{1,3}){0,1})[.]{0,1}\s*hrsz[s.]{0,1})', re.IGNORECASE)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (120/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \/ was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
27
PATTERN_OPENING_HOURS = re.compile('0*[0-9]{1,2}\:0*[0-9]{1,2}\s*-\s*0*[0-9]{1,2}:0*[0-9]{1,2}')
0 ignored issues
show
Bug introduced by
A suspicious escape sequence \: was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
28
PATTERN_PHONE = re.compile('[\/\\\(\)\-\+]')
0 ignored issues
show
Bug introduced by
A suspicious escape sequence \/ was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \( was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \) was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \- was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \+ was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
29
30
PATTERN_STREET_RICH = re.compile(
31
    '\s*(.*)\s+(akna|alja|almáskert|alsó|alsósor|aluljáró|autópálya|autóversenypálya|állomás|árok|átjáró|barakképület|bánya|bányatelep|bekötőút|benzinkút|bérc|bisztró|bokor|burgundia|büfé|camping|campingsor|centrum|célgazdaság|csapás|csarnok|csárda|cser|csoport|domb|dunapart|dunasor|dűlő|dűlője|dűlők|dűlőút|egyesület|egyéb|elágazás|erdeje|erdészház|erdészlak|erdő|erdősarok|erdősor|épület|épületek|észak|étterem|falu|farm|fasor|fasora|feketeerdő|feketeföldek|felső|felsősor|fennsík|fogadó|fok|forduló|forrás|föld|földek|földje|főcsatorna|főtér|főút|fürdő|fürdőhely|fürésztelepe|gazdaság|gát|gátőrház|gátsor|gimnázium|gödör|gulyakút|gyár|gyártelep|halom|határátkelőhely|határrész|határsor|határút|hatházak|hát|ház|háza|házak|hegy|hegyhát|hegyhát dűlő|hely|hivatal|híd|hídfő|horgásztanya|hotel|intézet|ipari park|ipartelep|iparterület|irodaház|irtás|iskola|jánoshegy|járás|juhászház|kapcsolóház|kapu|kastély|kálvária|kemping|kert|kertek|kertek-köze|kertsor|kertváros|kerület|kikötő|kilátó|kishajtás|kitérő|kocsiszín|kolónia|korzó|kórház|környék|körönd|körtér|körút|körútja|körvasútsor|körzet|köz|köze|középsor|központ|kút|kútház|kültelek|külterület|külterülete|lakás|lakások|lakóház|lakókert|lakónegyed|lakópark|lakótelep|laktanya|legelő|lejáró|lejtő|lépcső|liget|lovasiskola|lovastanya|magánút|major|malom|malomsor|megálló|mellékköz|mező|mélyút|MGTSZ|munkásszálló|műút|nagymajor|nagyút|nádgazdaság|nyaraló|oldal|országút|otthon|otthona|öböl|öregszőlők|ösvény|ötház|őrház|őrházak|pagony|pallag|palota|park|parkfalu|parkja|parkoló|part|pavilonsor|pálya|pályafenntartás|pályaudvar|piac|pihenő|pihenőhely|pince|pinceköz|pincesor|pincék|présházak|puszta|rakodó|rakpart|repülőtér|rész|rét|rétek|rév|ring|sarok|sertéstelep|sétatér|sétány|sikátor|sor|sora|sportpálya|sporttelep|stadion|strand|strandfürdő|sugárút|szabadstrand|szakiskola|szállás|szálló|szárító|szárnyasliget|szektor|szer|szél|széle|sziget|szigete|szivattyútelep|szög|szőlő|szőlőhegy|szőlők|szőlőkert|szőlős|szőlősor|tag|tanya|tanyaközpont|tanyasor|tanyák|tavak|tábor|tároló|társasház|teherpályaudvar|telek|telep|telepek|település|temető|tere|terményraktár|terület|teteje|tető|téglagyár|tér|tipegő|tormás|torony|tó|tömb|TSZ|turistaház|udvar|udvara|ugarok|utca|utcája|újfalu|újsor|újtelep|útfél|útgyűrű|útja|út|üdülő|üdülő központ|üdülő park|üdülők|üdülőközpont|üdülőpart|üdülő-part|üdülősor|üdülő-sor|üdülőtelep|üdülő-telep|üdülőterület|ürbő|üzem|üzletház|üzletsor|vadászház|varroda|vasútállomás|vasúti megálló|vasúti őrház|vasútsor|vám|vár|város|városrész|vásártér|vendéglő|vég|villa|villasor|viztároló|vízmű|vízmű telep|völgy|zsilip|zug|ltp\.|ltp|krt\.|krt|sgt\.|u\.|u\s+|Várkerület){1}.*',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (2650/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \. was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
32
    re.UNICODE | re.IGNORECASE)
33
PATTERN_URL_SLASH = re.compile('(?<!:)(//{1,})')
34
PATTERN_FULL_URL = re.compile('((https?):((//)|(\\\\))+([\w\d:#@%/;$()~_?\+-=\\\.&](#!)?)*)')
0 ignored issues
show
Bug introduced by
A suspicious escape sequence \w was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \d was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \+ was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \. was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
35
36
37
def clean_javascript_variable(clearable, removable):
38
    """Remove javascript variable notation from the selected JSON variable.
39
40
    :param clearable: This is the text with Javascript text
41
    :param removable: The name of Javascript variable
42
    :return: Javascript clean text/JSON file
43
    """
44
    # Match on start
45
    PATTERN_JS = re.compile('^\s*var\s*{}\s*=\s*'.format(removable))
0 ignored issues
show
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Coding Style Naming introduced by
Variable name "PATTERN_JS" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
46
    data = re.sub(PATTERN_JS, '', clearable)
47
    # Match on end
48
    return re.sub(PATTERN_JS_2, '', data)
49
50
51
def extract_javascript_variable(input_soup, removable, use_replace=False):
52
    """Extract JavaScript variable from <script> tag from a soup.
53
54
    :param sp: Input soup
55
    :param removable: The name of Javascript variable
56
    :param user_replace: Additional step to replace from ' to "
57
    :return: Javascript clean text/JSON file
58
    """
59
    # Match on start
60
    try:
61
        pattern = re.compile('.*\s*var\s*{}\s*=\s*(.*?[}}\]]);'.format(removable), re.MULTILINE | re.DOTALL)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (108/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
Bug introduced by
A suspicious escape sequence \s was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
Bug introduced by
A suspicious escape sequence \] was found. Did you maybe forget to add an r prefix?

Escape sequences in Python are generally interpreted according to rules similar to standard C. Only if strings are prefixed with r or R are they interpreted as regular expressions.

The escape sequence that was used indicates that you might have intended to write a regular expression.

Learn more about the available escape sequences. in the Python documentation.

Loading history...
62
        script = str(input_soup.find('script', text=pattern))
63
        if use_replace is True: script = script.replace("'", '"')
0 ignored issues
show
Coding Style introduced by
More than one statement on a single line
Loading history...
64
        m = pattern.match(script)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "m" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
65
        try:
66
            if m is not None:
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
67
                return m.group(1)
68
            else:
69
                return None
70
        except AttributeError as e:
0 ignored issues
show
Coding Style Naming introduced by
Variable name "e" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
71
            logging.warning('An exception has occured during JavaScript variable extraction.')
72
    except Exception as e:
0 ignored issues
show
Best Practice introduced by
Catching very general exceptions such as Exception is usually not recommended.

Generally, you would want to handle very specific errors in the exception handler. This ensure that you do not hide other types of errors which should be fixed.

So, unless you specifically plan to handle any error, consider adding a more specific exception.

Loading history...
Coding Style Naming introduced by
Variable name "e" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
73
        logging.error(e)
74
        logging.exception('Exception occurred')
75
76
        logging.error(pattern)
77
        logging.error(script)
0 ignored issues
show
introduced by
The variable script does not seem to be defined for all execution paths.
Loading history...
78
        logging.error(m.group(1))
79
80
81
def extract_street_housenumber(clearable):
82
    '''Try to separate street and house number from a Hungarian style address string
83
84
    :param clearable: An input string with Hungarian style address string
85
    return: Separated street and housenumber
86
    '''
87
    # Split and clean up house number
88
    housenumber = clearable.split('(')[0]
89
    housenumber = housenumber.split(' ')[-1]
90
    housenumber = housenumber.replace('.', '')
91
    housenumber = housenumber.replace('–', '-')
92
    housenumber = housenumber.upper()
93
    # Split and clean up street
94
    street = clearable.split('(')[0]
95
    street = street.rsplit(' ', 1)[0]
96
    street = street.replace(' u.', ' utca')
97
    street = street.replace(' u ', ' utca')
98
    street = street.replace(' krt.', ' körút')
99
    return street, housenumber
100
101
102
def extract_all_address(clearable):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
103
    if clearable is not None and clearable != '':
104
        clearable = clearable.strip()
105
        pc_match = PATTERN_POSTCODE_CITY.search(clearable)
106
        if pc_match is not None:
107
            postcode = pc_match.group(1)
108
        else:
109
            postcode = None
110
        if pc_match is not None:
111
            city = pc_match.group(4)
112
        else:
113
            city = None
114
        if len(clearable.split(',')) > 1:
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
115
            street, housenumber, conscriptionnumber = extract_street_housenumber_better_2(
116
                clearable.split(',')[1].strip())
117
            return (postcode, city, street, housenumber, conscriptionnumber)
118
        else:
119
            space_separated = ' '.join(clearable.split(' ')[2:]).strip()
120
            street, housenumber, conscriptionnumber = extract_street_housenumber_better_2(space_separated)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (106/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
121
            return (postcode, city, street, housenumber, conscriptionnumber)
122
    else:
123
        return None, None, None, None, None
124
125
126
def extract_city_street_housenumber_address(clearable):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
127
    if clearable is not None and clearable != '':
128
        clearable = clearable.strip()
129
        pc_match = PATTERN_CITY_ADDRESS.search(clearable)
130
        if pc_match is not None:
131
            city = pc_match.group(1)
132
        else:
133
            city = None
134
        if len(clearable.split(',')) > 1:
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
135
            street, housenumber, conscriptionnumber = extract_street_housenumber_better_2(
136
                clearable.split(',')[1].strip())
137
            return (city, street, housenumber, conscriptionnumber)
138
        else:
139
            return city, None, None, None
140
    else:
141
        return None, None, None, None, None
142
143
144
def extract_street_housenumber_better_2(clearable):
145
    '''Try to separate street and house number from a Hungarian style address string
146
147
    :param clearable: An input string with Hungarian style address string
148
    return: Separated street and housenumber
149
    '''
150
    # Split and clean up street
151
    if clearable is not None and clearable.strip() != '':
152
        clearable = clearable.strip()
153
        # Remove bulding names
154
        clearable = clearable.replace(' - Savoya Park', '')
155
        clearable = clearable.replace('Park Center,', '')
156
        clearable = clearable.replace('Duna Center', '')
157
        clearable = clearable.replace('Family Center,', '')
158
        clearable = clearable.replace('Sostói ipari park, ', '')
159
        data = clearable.split('(')[0]
160
        cn_match_1 = PATTERN_CONSCRIPTIONNUMBER_1.search(data)
161
        cn_match_2 = PATTERN_CONSCRIPTIONNUMBER_2.search(data)
162
        if cn_match_1 is not None:
163
            conscriptionnumber = cn_match_1.group(2) if cn_match_1.group(2) is not None else None
164
            cnn_length = len(cn_match_1.group(0))
165
            logging.debug(
166
                'Matching conscription number with method 1: %s from %s', conscriptionnumber, clearable)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (104/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
167
        elif cn_match_2 is not None:
168
            conscriptionnumber = cn_match_2.group(2) if cn_match_2.group(2) is not None else None
169
            cnn_length = len(cn_match_2.group(0))
170
            logging.debug(
171
                'Matching conscription number with method 2: %s from %s', conscriptionnumber, clearable)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (104/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
172
        else:
173
            conscriptionnumber = None
174
            cnn_length = None
175
        # Try to match street
176
        street_corrected = clean_street(data)
177
        street_match = PATTERN_STREET_RICH.search(street_corrected)
178
        if street_match is None:
179
            logging.debug('Non matching street: %s', clearable)
180
            street, housenumber = None, None
181
        else:
182
            # Normalize street
183
            street = street_match.group(1)
184
            street_type = street_match.group(2)
185
            # Usually street_type is lower but we got few exceptions
186
            if street_type not in ['Vám']:
187
                street_type = street_type.lower()
188
            street_length = len(street) + len(street_type)
189
            # Search for house number
190
            if cnn_length is not None:
191
                hn_match = PATTERN_HOUSENUMBER.search(street_corrected[street_length:-cnn_length])
0 ignored issues
show
introduced by
bad operand type for unary -: NoneType
Loading history...
192
            else:
193
                hn_match = PATTERN_HOUSENUMBER.search(street_corrected[street_length:])
194
            if hn_match is not None:
195
                # Split and clean up house number
196
                housenumber = hn_match.group(0)
197
                housenumber = housenumber.replace('.', '')
198
                housenumber = housenumber.replace('–', '-')
199
                housenumber = housenumber.upper()
200
            else:
201
                housenumber = None
202
        if 'street_type' in locals():
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
203
            return '{} {}'.format(street, street_type).strip(), housenumber, conscriptionnumber
204
        else:
205
            return street, housenumber, conscriptionnumber
206
    else:
207
        return None, None, None
208
209
210
def clean_city(clearable):
211
    '''Remove additional things from cityname
212
213
    :param clearable: Not clear cityname
214
    :return: Clear cityname
215
    '''
216
    if clearable is not None:
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
217
        city = re.sub(PATTERN_CITY, '', clearable)
218
        repls = ('Mikolc', 'Miskolc'), ('Iinárcs', 'Inárcs')
219
        city = reduce(lambda a, kv: a.replace(*kv), repls, city)
220
        city = city.split('-')[0]
221
        city = city.split(',')[0]
222
        city = city.split('/')[0]
223
        city = city.split('/')[0]
224
        city = city.split('(')[0]
225
        city = city.split(' ')[0]
226
        return city.title().strip()
227
    else:
228
        return None
229
230
231
def clean_opening_hours(oh_from_to):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
232
    oh_match = PATTERN_OPENING_HOURS.search(oh_from_to)
233
    if oh_match is not None:
234
        tmp = oh_match.group(0)
235
    else:
236
        # No match return None
237
        return None, None
238
    # Remove all whitespaces
239
    tmp = ''.join(tmp.split())
240
    # We expect exactly two parts, for example: 09:40
241
    if len(tmp.split('-')) == 2:
242
        tmf = tmp.split('-')[0].zfill(5)
243
        tmt = tmp.split('-')[1].zfill(5)
244
    else:
245
        tmf, tmt = None, None
246
    return tmf, tmt
247
248
249
def clean_opening_hours_2(oh):
0 ignored issues
show
Coding Style Naming introduced by
Argument name "oh" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
introduced by
Missing function or method docstring
Loading history...
250
    if oh == '-1':
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
251
        return None
252
    else:
253
        tmp = oh.strip().zfill(4)
254
        fmt = '{}:{}'.format(tmp[:2], tmp[-2:])
255
    return fmt
256
257
258
def clean_phone(phone):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
259
    # Remove all whitespaces
260
    original = phone
0 ignored issues
show
Unused Code introduced by
The variable original seems to be unused.
Loading history...
261
    if '(' in phone:
262
        phone = phone.split('(')[0]
263
    phone = phone.replace('-', ' ')
264
    if ',' in phone:
265
        phone = phone.replace(',', ';')
266
    if ';' in phone:
267
        phone = phone.split(';')
268
    try:
269
        if type(phone) is list:
0 ignored issues
show
introduced by
Using type() instead of isinstance() for a typecheck.
Loading history...
270
            for i in phone:
0 ignored issues
show
Unused Code introduced by
The variable i seems to be unused.
Loading history...
271
                pn = [phonenumbers.parse(i, 'HU') for i in phone]
0 ignored issues
show
Coding Style Naming introduced by
Variable name "pn" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
272
        else:
273
            pn = []
0 ignored issues
show
Coding Style Naming introduced by
Variable name "pn" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
274
            pn.append(phonenumbers.parse(phone, 'HU'))
275
    except phonenumbers.phonenumberutil.NumberParseException:
276
        logging.debug('This is string is cannot converted to phone number: %s', phone)
277
        return None
278
    if pn is not None:
0 ignored issues
show
introduced by
The variable pn does not seem to be defined in case the for loop on line 270 is not entered. Are you sure this can never be the case?
Loading history...
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
279
        return [phonenumbers.format_number(i, phonenumbers.PhoneNumberFormat.INTERNATIONAL) for i in pn]
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (104/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
280
    else:
281
        return None
282
283
284
def clean_phone_to_json(phone):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
285
    cleaned = clean_phone(phone)
286
    if cleaned is not None:
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
287
        return json.dumps(cleaned)
288
    else:
289
        return None
290
291
292
def clean_phone_to_str(phone):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
293
    cleaned = clean_phone(phone)
294
    if cleaned is not None:
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
295
        return ';'.join(cleaned)
296
    else:
297
        return None
298
299
300
def clean_email(email):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
301
    # Remove all whitespaces
302
    if ',' in email:
303
        email = email.split(',')[0]
304
    if ';' in email:
305
        email = email.split(';')[0]
306
    return email
307
308
309
def clean_string(clearable):
310
    '''
311
    Remove extra spaces from strings and surrounding whitespace characters
312
    :param clearable: String that has to clean
313
    :return: Cleaned string
314
    '''
315
    if clearable is not None:
316
        clearable = clearable.replace('  ', ' ').strip()
317
    return clearable
318
319
320
def clean_url(clearable):
321
    '''
322
    Remove extra slashes from URL strings and surrounding whitespace characters
323
    :param clearable: String that has to clean
324
    :return: Cleaned string
325
    '''
326
    if clearable is not None:
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
327
        url_match = PATTERN_URL_SLASH.sub('/', clearable)
328
        return url_match.strip()
329
    else:
330
        return None
331
332
333
def clean_street(clearable):
334
    '''
335
336
    :param clearable:
337
    :return:
338
    '''
339
340
    street = clearable.strip()
341
    repls = ('Nyúl 82. sz. főút', 'Kossuth Lajos út'), \
342
            ('Nyúl  82. sz. főút', '82. számú főközlekedési út'), \
343
            ('Budafoki út, 6-os sz. főút', '6. számú főközlekedési út'), \
344
            ('. Sz. Főút felső', '. számú főközlekedési út'), \
345
            ('. számú - Némedi út sarok', '. számú főközlekedési út'), \
346
            ('076/15. hrsz 86. számú főút mellett', '86. számú főközlekedési út'), \
347
            ('50.sz.út jobb oldal', '50. számú főközlekedési út'), \
348
            ('. sz. fkl.út', '. számú főközlekedési út'), \
349
            ('.sz. fkl. út', '. számú főközlekedési út'), \
350
            ('-es sz. főút', '. számú főközlekedési út'), \
351
            ('. sz. főút', '. számú főközlekedési út'), \
352
            ('.sz.fkl.', '. számú főközlekedési'), \
353
            ('. sz. fkl.', '. számú főközlekedési'), \
354
            ('. számú fkl. út', '. számú főközlekedési út'), \
355
            ('. Sz. főút', '. számú főközlekedési út'), \
356
            ('. számú főút', '. számú főközlekedési út'), \
357
            ('. főút', '. számú főközlekedési út'), \
358
            ('. sz út', '. számú főközlekedési út'), \
359
            (' sz. főút', '. számú főközlekedési út'), \
360
            ('-es fő út', '. számú főközlekedési út'), \
361
            ('-es főút', '. számú főközlekedési út'), \
362
            (' - es út', '. számú főközlekedési út'), \
363
            ('-es út', '. számú főközlekedési út'), \
364
            ('-as fő út', '. számú főközlekedési út'), \
365
            ('-as főút', '. számú főközlekedési út'), \
366
            (' - as út', '. számú főközlekedési út'), \
367
            ('-as út', '. számú főközlekedési út'), \
368
            ('-ös fő út', '. számú főközlekedési út'), \
369
            ('-ös főút', '. számú főközlekedési út'), \
370
            (' - ös út', '. számú főközlekedési út'), \
371
            ('-ös út', '. számú főközlekedési út'), \
372
            ('Omsz park', 'Omszk park'), \
373
            ('01.máj.', 'Május 1.'), \
374
            ('15.márc.', 'Március 15.'), \
375
            ('Ady E.', 'Ady Endre'), \
376
            ('Áchim A.', 'Áchim András'), \
377
            ('Bajcsy-Zs. E.', 'Bajcsy-Zsilinszky Endre'), \
378
            ('Bajcsy-Zs. E. u.', 'Bajcsy-Zsilinszky Endre utca'), \
379
            ('Bajcsy-Zs. u.', 'Bajcsy-Zsilinszky utca'), \
380
            ('Bajcsy Zs.u.', 'Bajcsy-Zsilinszky utca'), \
381
            ('Bajcsy-Zs. u.', 'Bajcsy-Zsilinszky utca'), \
382
            ('Bajcsy Zs. u.', 'Bajcsy-Zsilinszky utca'), \
383
            ('Bajcsy-Zs.', 'Bajcsy-Zsilinszky'), \
384
            ('Bajcsy Zs.', 'Bajcsy-Zsilinszky'), \
385
            ('Bartók B.', 'Bartók Béla'), \
386
            ('Baross G.', 'Baross Gábor'), \
387
            ('BERCSÉNYI U.', 'Bercsényi Miklós utca'), \
388
            ('Berzsenyi D.', 'Berzsenyi Dániel'), \
389
            ('Borics P.', 'Borics Pál'), \
390
            ('Corvin J.', 'Corvin'), \
391
            ('Dózsa Gy.u.', 'Dózsa György utca'), \
392
            ('Dózsa Gy.', 'Dózsa György'), \
393
            ('dr. Géfin Lajos', 'Dr. Géfin Lajos'), \
394
            ('Erkel F.', 'Erkel Ferenc'), \
395
            ('Hegedű/(Király)', 'Hegedű'), \
396
            ('Hevesi S.', 'Hevesi Sándor'), \
397
            ('Hunyadi J.', 'Hunyadi János'), \
398
            ('Ii. Rákóczi Ferenc', 'II. Rákóczi Ferenc'), \
399
            ('Innovációs kp. Fő út', 'Fő út'), \
400
            ('Ix. körzet', 'IX. körzet'), \
401
            ('Kölcsey F.', 'Kölcsey Ferenc'), \
402
            ('Kiss J.', 'Kiss József'), \
403
            ('Nagy L. király', 'Nagy Lajos király'), \
404
            ('Kaszás u. 2.-Dózsa György út', 'Dózsa György út'), \
405
            ('Váci út 117-119. „A” épület', 'Váci út'), \
406
            ('56-Osok tere', 'Ötvenhatosok tere'), \
407
            ('11-es út', '11. számú főközlekedési út'), \
408
            ('11-es Huszár út', 'Huszár út'), \
409
            ('Kölcsey-Pozsonyi út sarok', 'Kölcsey Ferenc utca '), \
410
            ('Március 15-e', 'Március 15.'), \
411
            ('Tiszavasvári út - Alkotás u sarok', 'Tiszavasvári út'), \
412
            ('Tiszavasvári út- Alkotás út sarok', 'Tiszavasvári út'), \
413
            ('Hőforrás-Rákóczi utca', 'Rákóczi utca'), \
414
            ('Kiss Tábornok - Kandó Kálmán utca sarok', 'Kiss Tábornok utca'), \
415
            ('Soroksári út - Határ út sarok', 'Soroksári út'), \
416
            ('Szentendrei- Czetz János utca sarok', 'Szentendrei út'), \
417
            ('Külső - Kádártai utca', 'Külső-Kádártai utca'), \
418
            ('Károlyi út - Ságvári út', 'Károlyi Mihály utca'), \
419
            ('Szlovák út - Csömöri út sarok', 'Szlovák út'), \
420
            ('Maglódi út - Jászberényi út sarok', 'Maglódi út'), \
421
            ('Dobogókői út- Kesztölci út sarok', 'Dobogókői út'), \
422
            ('DR. KOCH L. UTCA', 'Dr. Koch László utca'), \
423
            ('DR KOCH L.', 'Dr. Koch László'), \
424
            ('Koch L.u.', 'Dr. Koch László utca'), \
425
            ('Kiss J. ', 'Kiss József'), \
426
            ('Kossuth L.u.', 'Kossuth Lajos utca '), \
427
            ('Kossuth L.', 'Kossuth Lajos'), \
428
            ('Kossuth F. u', 'Kossuth Ferenc utca'), \
429
            ('Kossuth F.', 'Kossuth Ferenc'), \
430
            ('Korányi F.', 'Korányi Frigyes'), \
431
            ('Kőrösi Csoma S.', 'Kőrösi Csoma Sándor'), \
432
            ('Páter K.', 'Páter Károly'), \
433
            ('Petőfi S.', 'Petőfi Sándor'), \
434
            ('Somogyi B.', 'Somogyi Béla'), \
435
            ('Szondy', 'Szondi'), \
436
            ('Szt.István', 'Szent István'), \
437
            ('szt.istván', 'Szent István'), \
438
            ('Táncsics M.', 'Táncsics Mihály'), \
439
            ('Vass J.', 'Vass János'), \
440
            ('Vámház.', 'Vámház'), \
441
            ('Várkörút .', 'Várkörút'), \
442
            ('Vásárhelyi P.', 'Vásárhelyi Pál'), \
443
            ('Vi. utca', 'VI. utca'), \
444
            ('XXI. II. Rákóczi Ferenc', 'II. Rákóczi Ferenc'), \
445
            ('Zsolnay V.', 'Zsolnay Vilmos'), \
446
            ('Radnóti M.', 'Radnóti Miklós'), \
447
            ('Fehérvári út (Andor u. 1.)', 'Fehérvári'), \
448
            ('Szent István kir.', 'Szent István király'), \
449
            ('Dr Batthyány S. László', 'Dr. Batthyány-Strattmann László'), \
450
            ('Bacsinszky A.', 'Bacsinszky András'), \
451
            ('Fáy A.', 'Fáy András'), \
452
            ('József a.', 'József Attila'), \
453
            ('Juhász Gy. ', 'Juhász Gyula'), \
454
            ('Hock j.', 'Hock János'), \
455
            ('Vak B.', 'Vak Bottyán'), \
456
            ('Arany J.', 'Arany János'), \
457
            ('Könyves K.', 'Könyves Kálmán'), \
458
            ('Szilágyi E.', 'Szilágyi Erzsébet'), \
459
            ('Liszt F.', 'Liszt Ferenc'), \
460
            ('Bethlen G.', 'Bethlen Gábor'), \
461
            ('Gazdag E.', 'Gazdag Erzsi'), \
462
            ('Hátsókapu.', 'Hátsókapu'), \
463
            ('Herman O.', 'Herman Ottó'), \
464
            ('József A.', 'József Attila'), \
465
            ('Kazinczy F.', 'Kazinczy Ferenc'), \
466
            ('Király J.', 'Király Jenő'), \
467
            ('Királyhidai utca', 'Királyhidai út'), \
468
            ('Lackner K.', 'Lackner Kristóf'), \
469
            ('Mécs L.', 'Mécs László'), \
470
            ('Nagyváthy J.', 'Nagyváthy János'), \
471
            ('Szent I. kir.', 'Szent István király'), \
472
            ('Szigethy A. u.', 'Szigethy Attila út'), \
473
            ('Rákóczi F.', 'Rákóczi Ferenc'), \
474
            ('Jókai M.', 'Jókai Mór'), \
475
            ('Szabó D.', 'Szabó Dezső'), \
476
            ('Kossuth F.', 'Kossuth F.'), \
477
            ('Móricz Zs.', 'Móricz Zsigmond'), \
478
            ('Hunyadi J ', 'Hunyadi János'), \
479
            ('Szilágyi E ', 'Szilágyi Erzsébet fasor'), \
480
            ('Erzsébet Királyné út', 'Erzsébet királyné útja'), \
481
            ('Mammut', ''), \
482
            ('Szt. ', 'Szent '), \
483
            (' u.', ' utca '), \
484
            (' U.', ' utca '), \
485
            ('.u.', ' utca '), \
486
            (' u ', ' utca '), \
487
            (' krt.', ' körút'), \
488
            (' Krt.', ' körút'), \
489
            (' KRT.', ' körút'), \
490
            (' ltp.', ' lakótelep'), \
491
            (' Ltp.', ' lakótelep'), \
492
            (' LTP.', ' lakótelep'), \
493
            (' ltp', ' lakótelep'), \
494
            (' sgt.', ' sugárút'), \
495
            ('^4. sz$', '4. számú főközlekedési')
496
    street = reduce(lambda a, kv: a.replace(*kv), repls, street)
497
    return street
498
499
500
def clean_street_type(clearable):
501
    '''
502
503
    :param clearable:
504
    :return:
505
    '''
506
507
    street = clearable.replace('fkl. út', 'főközlekedési út')
508
    street = street.replace('főút', 'főközlekedési út')
509
    street = street.replace('ltp.', ' lakótelep')
510
    street = street.replace('LTP.', ' lakótelep')
511
    street = street.replace('pu.', 'pályaudvar')
512
    street = street.replace('út.', 'út')
513
    street = street.replace('u.', 'utca')
514
    street = street.replace('(nincs)', '')
515
    street = street.replace('.', '')
516
    return street
517
518
519
def clean_branch(clearable):
520
    '''
521
522
    :param clearable:
523
    :return:
524
    '''
525
    if clearable is not None and clearable != '':
0 ignored issues
show
unused-code introduced by
Unnecessary "else" after "return"
Loading history...
526
        clearable = clearable.strip()
527
        branch = clearable.replace('sz.', 'számú')
528
        return branch
529
    else:
530
        return None
531