etlt.cleaner.DateCleaner.DateCleaner.clean()   F
last analyzed

Complexity

Conditions 27

Size

Total Lines 52
Code Lines 28

Duplication

Lines 52
Ratio 100 %

Code Coverage

Tests 25
CRAP Score 27

Importance

Changes 0
Metric Value
cc 27
eloc 28
nop 2
dl 52
loc 52
ccs 25
cts 25
cp 1
crap 27
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like etlt.cleaner.DateCleaner.DateCleaner.clean() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1 1
import re
2 1
from typing import Optional
3
4
5 1 View Code Duplication
class DateCleaner:
6
    """
7
    Utility class for converting dates in miscellaneous formats to ISO-8601 (YYYY-MM-DD) format.
8
    """
9
10
    # ------------------------------------------------------------------------------------------------------------------
11 1
    month_map = {
12
        # English
13
        'jan': '01',
14
        'feb': '02',
15
        'mar': '03',
16
        'apr': '04',
17
        'may': '05',
18
        'jun': '06',
19
        'jul': '07',
20
        'aug': '08',
21
        'sep': '09',
22
        'oct': '10',
23
        'nov': '11',
24
        'dec': '12',
25
26
        # Dutch
27
        'mrt': '03',
28
        'mei': '05',
29
        'okt': '10'
30
    }
31
32
    # ------------------------------------------------------------------------------------------------------------------
33 1
    @staticmethod
34 1
    def clean(date: Optional[str], ignore_time: bool = False) -> Optional[str]:
35
        """
36
        Converts a date in miscellaneous format to ISO-8601 (YYYY-MM-DD) format.
37
38
        :param str|None date: The input date.
39
        :param bool ignore_time: If true any trailing time prt is ignore.
40
41
        :rtype: strNone
42
        """
43
        # Return empty input immediately.
44 1
        if not date:
45 1
            return date
46
47 1
        parts = re.split(r'[\-/. ]', date)
48
49 1
        if (len(parts) == 3) or \
50
                (len(parts) > 3 and ignore_time) or \
51
                (len(parts) == 4 and re.match(r'^[0:]*$', parts[3])) or \
52
                (len(parts) == 5 and re.match(r'^[0:]*$', parts[3]) and re.match(r'^0*$', parts[4])):
53 1
            if len(parts[0]) == 4 and len(parts[1]) <= 2 and len(parts[2]) <= 2:
54
                # Assume date is in  YYYY-MM-DD of YYYY-M-D format.
55 1
                return parts[0] + '-' + ('00' + parts[1])[-2:] + '-' + ('00' + parts[2])[-2:]
56
57 1
            if len(parts[0]) <= 2 and len(parts[1]) <= 2 and len(parts[2]) == 4:
58
                # Assume date is in  DD-MM-YYYY or D-M-YYYY format.
59 1
                return parts[2] + '-' + ('00' + parts[1])[-2:] + '-' + ('00' + parts[0])[-2:]
60
61 1
            if len(parts[0]) <= 2 and len(parts[1]) <= 2 and len(parts[2]) == 2:
62
                # Assume date is in  DD-MM-YY or D-M-YY format.
63 1
                year = '19' + parts[2] if parts[2] >= '20' else '20' + parts[2]
64
65 1
                return year + '-' + ('00' + parts[1])[-2:] + '-' + ('00' + parts[0])[-2:]
66
67
        # Try DDmonYYYY or DDmonYYYY HH:mm:ss format
68 1
        pattern = r'^(\d{2})([a-z]{3})(\d{4})' + ('.*$' if ignore_time else r'(\D(\d{1,2})\D(\d{1,2})\D(\d{1,2}))?$')
69 1
        match = re.match(pattern, date.lower())
70 1
        if match and match.group(2) in DateCleaner.month_map:
71 1
            ret = match.group(3) + '-' + DateCleaner.month_map[match.group(2)] + '-' + match.group(1)
72 1
            if len(match.groups()) == 7 and match.group(4):
73 1
                ret += 'T' + match.group(5) + ':' + match.group(6) + ':' + match.group(7)
74 1
            return ret
75
76
        # Try YYYYMMDD format.
77 1
        pattern = r'^\d{8}' + ('.*$' if ignore_time else '$')
78 1
        match = re.match(pattern, date)
79 1
        if match:
80
            # Assume date is YYYYMMDD format
81 1
            return date[0:4] + '-' + date[4:6] + '-' + date[6:8]
82
83
        # Format not recognized. Just return the original string.
84 1
        return date
85
86
# ----------------------------------------------------------------------------------------------------------------------
87