Completed
Push — master ( 4de886...4b7065 )
by Mubdi
02:09
created

KYD_data_summary.make_txt_struct()   A

Complexity

Conditions 1

Size

Total Lines 55
Code Lines 39

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 1
eloc 39
nop 1
dl 0
loc 55
rs 9.7692
c 0
b 0
f 0

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
"""
2
KnowYourData
3
============
4
5
A rapid and lightweight module to describe the statistics and structure of
6
data arrays for interactive use.
7
8
The most simple use case to display data is if you have a numpy array 'x':
9
10
    >>> from knowyourdata import kyd
11
    >>> kyd(x)
12
13
"""
14
15
import sys
16
import numpy as np
17
from IPython.display import display
18
19
# Getting HTML Template
20
from . import kyd_html_display_template
21
kyd_html_template = kyd_html_display_template.kyd_html_template
0 ignored issues
show
Coding Style Naming introduced by
The name kyd_html_template does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
22
23
24
class KYD_data_summary(object):
0 ignored issues
show
Coding Style Naming introduced by
The name KYD_data_summary does not conform to the class naming conventions ([A-Z_][a-zA-Z0-9]+$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
25
    """A class to store and display the summary information"""
26
27
    text_repr = ""
28
    html_repr = ""
29
30
    # Display Settings
31
    col_width = 10
32
    precision = 4
33
34
    def __repr__(self):
35
        """
36
        The Plain String Representation of the Data Summary
37
        """
38
        return self.text_repr
39
40
    def _repr_html_(self):
41
        """
42
        The HTML Representation of the Data Summary
43
        """
44
        return self.html_repr
45
46
    def make_html_repr(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
47
        self.html_repr = kyd_html_template.format(kyd_class=self.kyd_class)
48
49
    def make_txt_basic_stats(self):
50
        """Make Text Representation of Basic Statistics"""
51
        pstr_list = []
52
53
        pstr_struct_header1 = "Basic Statistics  "
54
        pstr_struct_header2 = ''
55
56
        pstr_list.append(pstr_struct_header1)
57
        pstr_list.append(pstr_struct_header2)
58
59
        template_str = (
60
            " {0:^10} "
61
            " {1:>8} "
62
            " {2:<10} "
63
            " {3:>8} "
64
            " {4:<10} "
65
        )
66
67
        tmp_data = [
68
            [
69
                "Mean:", "{kyd_class.mean:.{kyd_class.precision}}".format(
70
                    kyd_class=self.kyd_class),
71
                "",
72
                "Std Dev:", "{kyd_class.std:.{kyd_class.precision}}".format(
73
                    kyd_class=self.kyd_class)
74
            ],
75
            ["Min:", "1Q:", "Median:", "3Q:", "Max:"],
76
            [
77
                "{kyd_class.min: .{kyd_class.precision}}".format(
78
                    kyd_class=self.kyd_class),
79
                "{kyd_class.firstquartile: .{kyd_class.precision}}".format(
80
                    kyd_class=self.kyd_class),
81
                "{kyd_class.median: .{kyd_class.precision}}".format(
82
                    kyd_class=self.kyd_class),
83
                "{kyd_class.thirdquartile: .{kyd_class.precision}}".format(
84
                    kyd_class=self.kyd_class),
85
                "{kyd_class.max: .{kyd_class.precision}}".format(
86
                    kyd_class=self.kyd_class),
87
            ],
88
            ['-99 CI:', '-95 CI:', '-68 CI:', '+68 CI:', '+95 CI:', '+99 CI:'],
89
            [
90
                "{kyd_class.ci_99[0]: .{kyd_class.precision}}".format(
91
                    kyd_class=self.kyd_class),
92
                "{kyd_class.ci_95[0]: .{kyd_class.precision}}".format(
93
                    kyd_class=self.kyd_class),
94
                "{kyd_class.ci_68[0]: .{kyd_class.precision}}".format(
95
                    kyd_class=self.kyd_class),
96
                "{kyd_class.ci_68[1]: .{kyd_class.precision}}".format(
97
                    kyd_class=self.kyd_class),
98
                "{kyd_class.ci_95[1]: .{kyd_class.precision}}".format(
99
                    kyd_class=self.kyd_class),
100
                "{kyd_class.ci_99[1]: .{kyd_class.precision}}".format(
101
                    kyd_class=self.kyd_class),
102
            ],
103
        ]
104
105
        n_tmp_data = len(tmp_data)
106
107
        num_rows_in_cols = [len(i) for i in tmp_data]
108
        num_rows = np.max(num_rows_in_cols)
109
110
        for i in range(n_tmp_data):
111
            tmp_col = tmp_data[i]
112
            for j in range(num_rows_in_cols[i], num_rows):
0 ignored issues
show
Unused Code introduced by
The variable j seems to be unused.
Loading history...
113
                tmp_col.append("")
114
115
        for i in range(num_rows):
116
            pstr_list.append(
117
                template_str.format(
118
                    tmp_data[0][i],
119
                    tmp_data[1][i],
120
                    tmp_data[2][i],
121
                    tmp_data[3][i],
122
                    tmp_data[4][i],
123
                )
124
            )
125
126
        return pstr_list
127
128
    def make_txt_struct(self):
129
        """Make Text Representation of Array"""
130
131
        pstr_list = []
132
133
        # pstr_struct_header0 = "................."
134
        # Commenting out Ansi Coloured Version
135
        # pstr_struct_header1 = '\033[1m' + "Array Structure  " + '\033[0m'
136
        pstr_struct_header1 = "Array Structure  "
137
        pstr_struct_header2 = "                 "
138
139
        # pstr_list.append(pstr_struct_header0)
140
        pstr_list.append(pstr_struct_header1)
141
        pstr_list.append(pstr_struct_header2)
142
143
        pstr_n_dim = (
144
            "Number of Dimensions:\t"
145
            "{kyd_class.ndim}").format(
146
                kyd_class=self.kyd_class)
147
        pstr_list.append(pstr_n_dim)
148
149
        pstr_shape = (
150
            "Shape of Dimensions:\t"
151
            "{kyd_class.shape}").format(
152
                kyd_class=self.kyd_class)
153
        pstr_list.append(pstr_shape)
154
155
        pstr_dtype = (
156
            "Array Data Type:\t"
157
            "{kyd_class.dtype}").format(
158
                kyd_class=self.kyd_class)
159
        pstr_list.append(pstr_dtype)
160
161
        pstr_memsize = (
162
            "Memory Size:\t\t"
163
            "{kyd_class.human_memsize}").format(
164
                kyd_class=self.kyd_class)
165
        pstr_list.append(pstr_memsize)
166
167
        pstr_spacer = ("")
168
        pstr_list.append(pstr_spacer)
169
170
        pstr_numnan = (
171
            "Number of NaN:\t"
172
            "{kyd_class.num_nan}").format(
173
                kyd_class=self.kyd_class)
174
        pstr_list.append(pstr_numnan)
175
176
        pstr_numinf = (
177
            "Number of Inf:\t"
178
            "{kyd_class.num_inf}").format(
179
                kyd_class=self.kyd_class)
180
        pstr_list.append(pstr_numinf)
181
182
        return pstr_list
183
184
    def make_text_repr(self):
185
        """Making final text string for plain text representation"""
186
187
        tmp_text_repr = ""
188
189
        tmp_text_repr += "\n"
190
191
        pstr_basic = self.make_txt_basic_stats()
192
        pstr_struct = self.make_txt_struct()
193
194
        n_basic = len(pstr_basic)
195
        n_struct = len(pstr_struct)
196
197
        l_colwidth = max([len(x) for x in pstr_basic]) + 1
198
199
        r_colwidth = max([len(x) for x in pstr_struct]) + 2
200
201
        # new_colwidth = self.col_width + 20
202
203
        # Finding the longest string
204
        len_list = max([n_basic, n_struct])
205
206
        for i in range(len_list):
207
            tmp_str = '| '
208
            if i < n_basic:
209
                tmp_str += (pstr_basic[i].ljust(l_colwidth))
210
            else:
211
                tmp_str += ''.ljust(l_colwidth)
212
            tmp_str += ' | '
213
214
            if i < n_struct:
215
                tmp_str += (pstr_struct[i].expandtabs().ljust(r_colwidth))
216
            else:
217
                tmp_str += ''.ljust(r_colwidth)
218
            tmp_str += '\t|'
219
220
            tmp_text_repr += tmp_str + "\n"
221
222
        tmp_text_repr += "\n"
223
        self.text_repr = tmp_text_repr
224
225
    def __init__(self, kyd_class):
226
        super(KYD_data_summary, self).__init__()
227
        self.kyd_class = kyd_class
228
        self.make_text_repr()
229
        self.make_html_repr()
230
231
232
class KYD(object):
0 ignored issues
show
best-practice introduced by
Too many instance attributes (26/7)
Loading history...
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
233
    """The Central Class for KYD"""
234
235
    # Variable for Data Vector
236
    data = None
237
238
    # Initial Flags
239
    f_allfinite = False
240
    f_allnonfinite = False
241
    f_hasnan = False
242
    f_hasinf = False
243
244
    # Initialized Numbers
245
    num_nan = 0
246
    num_inf = 0
247
248
    # Display Settings
249
    col_width = 10
250
    precision = 4
251
252
    def check_finite(self):
253
        """Checking to see if all elements are finite and setting flags"""
254
        if np.all(np.isfinite(self.data)):
255
            self.filt_data = self.data
256
            self.f_allfinite = True
257
        else:
258
            finite_inds = np.where(np.isfinite(self.data))
259
260
            self.filt_data = self.data[finite_inds]
261
262
            if self.filt_data.size == 0:
263
                self.f_allnonfinite = True
264
265
            if np.any(np.isnan(self.data)):
266
                self.f_hasnan = True
267
                self.num_nan = np.sum(np.isnan(self.data))
268
269
            if np.any(np.isinf(self.data)):
270
                self.f_hasinf = True
271
                self.num_inf = np.sum(np.isinf(self.data))
272
273
    def check_struct(self):
274
        """Determining the Structure of the Numpy Array"""
275
        self.dtype = self.data.dtype
276
        self.ndim = self.data.ndim
277
        self.shape = self.data.shape
278
        self.size = self.data.size
279
        self.memsize = sys.getsizeof(self.data)
280
        self.human_memsize = sizeof_fmt(self.memsize)
281
282
    def get_basic_stats(self):
283
        """Get basic statistics about array"""
284
285
        if self.f_allnonfinite:
286
            self.min = self.max = self.range = np.nan
287
            self.mean = self.std = self.median = np.nan
288
            self.firstquartile = self.thirdquartile = np.nan
289
            self.ci_68 = self.ci_95 = self.ci_99 = np.array([np.nan, np.nan])
290
291
            return
292
293
        self.min = np.float_(np.min(self.filt_data))
294
        self.max = np.float_(np.max(self.filt_data))
295
        self.range = self.max - self.min
296
        self.mean = np.mean(self.filt_data)
297
        self.std = np.std(self.filt_data)
298
        self.median = np.float_(np.median(self.filt_data))
299
        self.firstquartile = np.float_(np.percentile(self.filt_data, 25))
300
        self.thirdquartile = np.float_(np.percentile(self.filt_data, 75))
301
        self.ci_99 = np.float_(
302
            np.percentile(self.filt_data, np.array([0.5, 99.5])))
303
        self.ci_95 = np.float_(
304
            np.percentile(self.filt_data, np.array([2.5, 97.5])))
305
        self.ci_68 = np.float_(
306
            np.percentile(self.filt_data, np.array([16.0, 84.0])))
307
308
    def make_summary(self):
309
        """Making Data Summary"""
310
        self.data_summary = KYD_data_summary(self)
311
312
    def clear_memory(self):
313
        """Ensuring the Numpy Array does not exist in memory"""
314
        del self.data
315
        del self.filt_data
316
317
    def display(self, short=False):
318
        """Displaying all relevant statistics"""
319
320
        if short:
321
            pass
322
        try:
323
            get_ipython
2 ignored issues
show
Comprehensibility Best Practice introduced by
The variable get_ipython does not seem to be defined.
Loading history...
Unused Code introduced by
This statement seems to have no effect and could be removed.

This issue is typically triggered when a function that does not have side-effects is called and the return value is discarded:

class SomeClass:
    def __init__(self):
        self._x = 5

    def squared(self):
        return self._x * self._x

some_class = SomeClass()
some_class.squared()        # Flagged, as the return value is not used
print(some_class.squared()) # Ok
Loading history...
324
            display(self.data_summary)
325
        except NameError:
326
            print(self.data_summary)
327
328
    def __init__(self, data):
329
        super(KYD, self).__init__()
330
331
        # Ensuring that the array is a numpy array
332
        if not isinstance(data, np.ndarray):
333
            data = np.array(data)
334
335
        self.data = data
336
337
        self.check_finite()
338
        self.check_struct()
339
        self.get_basic_stats()
340
        self.clear_memory()
341
        self.make_summary()
342
343
344
def sizeof_fmt(num, suffix='B'):
345
    """Return human readable version of in-memory size.
346
    Code from Fred Cirera from Stack Overflow:
347
    https://stackoverflow.com/questions/1094841/reusable-library-to-get-human-readable-version-of-file-size
348
    """
349
    for unit in ['', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi']:
350
        if abs(num) < 1024.0:
351
            return "%3.1f%s%s" % (num, unit, suffix)
352
        num /= 1024.0
353
    return "%.1f%s%s" % (num, 'Yi', suffix)
354
355
356
def kyd(data, full_statistics=False):
357
    """Print statistics of any numpy array
358
359
    data -- Numpy Array of Data
360
361
    Keyword arguments:
362
    full_statistics -- printing all detailed statistics of the sources
363
    (Currently Not Implemented)
364
365
    """
366
367
    data_kyd = KYD(data)
368
    if full_statistics:
369
        data_kyd.display()
370
    else:
371
        data_kyd.display(short=True)
372
373
    return data_kyd
374