db_extractor.DataDiskRead   A
last analyzed

Complexity

Total Complexity 20

Size/Duplication

Total Lines 87
Duplicated Lines 32.18 %

Importance

Changes 0
Metric Value
eloc 76
dl 28
loc 87
rs 10
c 0
b 0
f 0
wmc 20

5 Methods

Rating   Name   Duplication   Size   Complexity  
A DataDiskRead.fn_internal_load_excel_file_into_data_frame() 0 14 4
A DataDiskRead.fn_internal_load_pickle_file_into_data_frame() 14 14 4
A DataDiskRead.fn_internal_load_json_file_into_data_frame() 14 14 4
A DataDiskRead.fn_internal_load_csv_file_into_data_frame() 0 16 4
A DataDiskRead.fn_internal_load_parquet_file_into_data_frame() 0 13 4

How to fix   Duplicated Code   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

1
"""
2
DataOutput - class to handle disk file storage
3
"""
4
# package to handle files/folders and related metadata/operations
5
import os
6
# package facilitating Data Frames manipulation
7
import pandas
8
9
10
class DataDiskRead:
11
12
    @staticmethod
13
    def fn_internal_load_csv_file_into_data_frame(in_dict):
14
        if in_dict['format'].lower() == 'csv':
15
            try:
16
                out_data_frame = []
17
                for index_file, crt_file in enumerate(in_dict['files list']):
18
                    out_data_frame.append(index_file)
19
                    out_data_frame[index_file] = pandas.read_csv(
20
                        filepath_or_buffer=crt_file, delimiter=in_dict['field delimiter'],
21
                        cache_dates=True, index_col=None, memory_map=True, low_memory=False,
22
                        encoding='utf-8')
23
                    out_data_frame[index_file]['Source Data File Name'] = os.path.basename(crt_file)
24
                in_dict['out data frame'] = pandas.concat(out_data_frame)
25
            except Exception as err:
26
                in_dict['error details'] = err
27
        return in_dict
28
29
    @staticmethod
30
    def fn_internal_load_excel_file_into_data_frame(in_dict):
31
        if in_dict['format'].lower() == 'excel':
32
            try:
33
                out_data_frame = []
34
                for index_file, crt_file in enumerate(in_dict['files list']):
35
                    out_data_frame.append(index_file)
36
                    out_data_frame[index_file] = pandas.read_excel(
37
                        io=crt_file, sheet_name=in_dict['worksheet list'][0], verbose=False)
38
                    out_data_frame[index_file]['Source Data File Name'] = os.path.basename(crt_file)
39
                in_dict['out data frame'] = pandas.concat(out_data_frame)
40
            except Exception as err:
41
                in_dict['error details'] = err
42
        return in_dict
43
44 View Code Duplication
    @staticmethod
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
45
    def fn_internal_load_json_file_into_data_frame(in_dict):
46
        if in_dict['format'].lower() == 'json':
47
            try:
48
                out_data_frame = []
49
                for index_file, crt_file in enumerate(in_dict['files list']):
50
                    out_data_frame.append(index_file)
51
                    out_data_frame[index_file] = pandas.read_json(
52
                        path_or_buf=crt_file, compression=in_dict['compression'])
53
                    out_data_frame[index_file]['Source Data File Name'] = os.path.basename(crt_file)
54
                in_dict['out data frame'] = pandas.concat(out_data_frame)
55
            except Exception as err:
56
                in_dict['error details'] = err
57
        return in_dict
58
59
    @staticmethod
60
    def fn_internal_load_parquet_file_into_data_frame(in_dict):
61
        if in_dict['format'].lower() == 'parquet':
62
            try:
63
                out_data_frame = []
64
                for index_file, crt_file in enumerate(in_dict['files list']):
65
                    out_data_frame.append(index_file)
66
                    out_data_frame[index_file] = pandas.read_parquet(path=crt_file)
67
                    out_data_frame[index_file]['Source Data File Name'] = os.path.basename(crt_file)
68
                in_dict['out data frame'] = pandas.concat(out_data_frame)
69
            except Exception as err:
70
                in_dict['error details'] = err
71
        return in_dict
72
73 View Code Duplication
    @staticmethod
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
74
    def fn_internal_load_pickle_file_into_data_frame(in_dict):
75
        if in_dict['format'].lower() == 'pickle':
76
            try:
77
                out_data_frame = []
78
                for index_file, crt_file in enumerate(in_dict['files list']):
79
                    out_data_frame.append(index_file)
80
                    out_data_frame[index_file] = pandas.read_pickle(
81
                        filepath_or_buffer=crt_file, compression=in_dict['compression'])
82
                    out_data_frame[index_file]['Source Data File Name'] = os.path.basename(crt_file)
83
                in_dict['out data frame'] = pandas.concat(out_data_frame)
84
            except Exception as err:
85
                in_dict['error details'] = err
86
        return in_dict
87