GitHub Access Token became invalid

It seems like the GitHub access token used for retrieving details about this repository from GitHub became invalid. This might prevent certain types of inspections from being run (in particular, everything related to pull requests).
Please ask an admin of your repository to re-new the access token on this website.
Passed
Push — master ( 0c3829...de7c60 )
by Keertana
02:25
created

core.read_file()   C

Complexity

Conditions 10

Size

Total Lines 46
Code Lines 27

Duplication

Lines 46
Ratio 100 %

Importance

Changes 0
Metric Value
cc 10
eloc 27
nop 1
dl 46
loc 46
rs 5.9999
c 0
b 0
f 0

How to fix   Complexity   

Complexity

Complex classes like core.read_file() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# This is a tool to automate cyclic voltametry analysis.
2
# Current Version = 1
3
4
import pandas as pd
5
import numpy as np
6
import csv
7
import matplotlib.pyplot as plt
8
import warnings
9
import matplotlib.cbook
10
import peakutils
11
import copy
12
from matplotlib import rcParams
13
14
15 View Code Duplication
def read_cycle(data):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
16
    """This function reads a segment of datafile (corresponding a cycle)
17
    and generates a dataframe with columns 'Potential' and 'Current'
18
19
    Parameters
20
    __________
21
    data: segment of data file
22
23
    Returns
24
    _______
25
    A dataframe with potential and current columns  
26
    """     
27
28
    current = []
29
    potential = []
30
    for i in data[3:]:
31
        current.append(float(i.split("\t")[4]))
32
        potential.append(float(i.split("\t")[3]))
33
    zippedList = list(zip(potential, current))
34
    df = pd.DataFrame(zippedList, columns = ['Potential' , 'Current'])
35
    return df
36
37
38 View Code Duplication
def read_file_dash(lines):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
39
    """This function is exactly similar to read_file, but it is for dash
40
41
    Parameters
42
    __________
43
    file: lines from dash input file
44
45
    Returns:
46
    ________
47
    dict_of_df: dictionary of dataframes with keys = cycle numbers and
48
    values = dataframes for each cycle
49
    n_cycle: number of cycles in the raw file
50
    """
51
    dict_of_df = {}
52
    h = 0
53
    l = 0
54
    n_cycle = 0
55
    number = 0
56
    #a = []
57
    #with open(file, 'rt') as f:
58
    #    print(file + ' Opened')
59
    for line in lines:
60
        record = 0
61
        if not (h and l):
62
            if line.startswith('SCANRATE'):
63
                scan_rate = float(line.split()[2])
64
                h = 1
65
            if line.startswith('STEPSIZE'):
66
                step_size = float(line.split()[2])
67
                l = 1
68
        if line.startswith('CURVE'):
69
            n_cycle += 1
70
            if n_cycle > 1:
71
                number = n_cycle - 1
72
                df = read_cycle(a)
0 ignored issues
show
introduced by
The variable a does not seem to be defined for all execution paths.
Loading history...
73
                key_name = 'cycle_' + str(number)
74
                #key_name = number
75
                dict_of_df[key_name] = copy.deepcopy(df)
76
            a = []
77
        if n_cycle:
78
            a.append(line)
79
    return dict_of_df, number
80
81
82 View Code Duplication
def read_file(file):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
83
    """This function reads the raw data file, gets the scanrate and stepsize
84
    and then reads the lines according to cycle number. Once it reads the data
85
    for one cycle, it calls read_cycle function to generate a dataframe. It 
86
    does the same thing for all the cycles and finally returns a dictionary,
87
    the keys of which are the cycle numbers and the values are the 
88
    corresponding dataframes.
89
90
    Parameters
91
    __________
92
    file: raw data file
93
94
    Returns:
95
    ________
96
    dict_of_df: dictionary of dataframes with keys = cycle numbers and
97
    values = dataframes for each cycle
98
    n_cycle: number of cycles in the raw file  
99
    """   
100
    dict_of_df = {} 
101
    h = 0
102
    l = 0
103
    n_cycle = 0
104
    #a = []
105
    with open(file, 'rt') as f:
106
        print(file + ' Opened')
107
        for line in f:
108
            record = 0
109
            if not (h and l):
110
                if line.startswith('SCANRATE'):
111
                    scan_rate = float(line.split()[2])
112
                    h = 1
113
                if line.startswith('STEPSIZE'):
114
                    step_size = float(line.split()[2])
115
                    l = 1
116
            if line.startswith('CURVE'):
117
                n_cycle += 1
118
                if n_cycle > 1:
119
                    number = n_cycle - 1
120
                    df = read_cycle(a)
0 ignored issues
show
introduced by
The variable a does not seem to be defined for all execution paths.
Loading history...
121
                    key_name = 'cycle_' + str(number)
122
                    #key_name = number
123
                    dict_of_df[key_name] = copy.deepcopy(df)
124
                a = []
125
            if n_cycle:
126
                a.append(line)
127
    return dict_of_df, number
0 ignored issues
show
introduced by
The variable number does not seem to be defined for all execution paths.
Loading history...
128
129
130
#df = pd.DataFrame(list(dict1['df_1'].items()))
131
#list1, list2 = list(dict1['df_1'].items())
132
#list1, list2 = list(dict1.get('df_'+str(1)))
133
134 View Code Duplication
def data_frame(dict_cycle, n):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
135
    """Reads the dictionary of dataframes and returns dataframes for each cycle
136
137
    Parameters
138
    __________
139
    dict_cycle: Dictionary of dataframes
140
    n: cycle number
141
142
    Returns:
143
    _______
144
    Dataframe correcponding to the cycle number 
145
    """
146
    list1, list2 = (list(dict_cycle.get('cycle_'+str(n)).items()))
147
    zippedList = list(zip(list1[1], list2[1]))
148
    data  = pd.DataFrame(zippedList, columns = ['Potential' , 'Current'])
149
    return data
150
151
152 View Code Duplication
def plot_fig(dict_cycle, n):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
153
    """For basic plotting of the cycle data
154
  
155
    Parameters
156
    __________
157
    dict: dictionary of dataframes for all the cycles
158
    n: number of cycles
159
160
    Saves the plot in a file called cycle.png 
161
    """
162
163
    for i in range(n):
164
        print(i+1)
165
        df = data_frame(dict_cycle, i+1)
166
        plt.plot(df.Potential, df.Current, label = "Cycle{}".format(i+1))
167
        
168
    #print(df.head())
169
    plt.xlabel('Voltage')
170
    plt.ylabel('Current')
171
    plt.legend()
172
    plt.savefig('cycle.png')
173
    print('executed')
174
175
176
#split forward and backward sweping data, to make it easier for processing.
177
def split(vector):
178
    """
179
    This function takes an array and splits it into two half.
180
    """
181
    split = int(len(vector)/2)
182
    end = int(len(vector))
183
    vector1 = np.array(vector)[0:split]
184
    vector2 = np.array(vector)[split:end]
185
    return vector1, vector2
186
187
188 View Code Duplication
def critical_idx(x, y): ## Finds index where data set is no longer linear 
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
189
    """
190
    This function takes x and y values callculate the derrivative of x and y, and calculate moving average of 5 and 15 points.
191
    Finds intercepts of different moving average curves and return the indexs of the first intercepts.
192
    """
193
    k = np.diff(y)/(np.diff(x)) #calculated slops of x and y
194
    ## Calculate moving average for 5 and 15 points.
195
    ## This two arbitrary number can be tuned to get better fitting.
196
    ave5 = []
197
    ave15 = []
198
    for i in range(len(k)-10):  # The reason to minus 5 is to prevent j from running out of index.
199
        a = 0 
200
        for j in range(0,10):
201
            a = a + k[i+j]
202
        ave5.append(round(a/10, 5)) # keeping 9 desimal points for more accuracy
203
    
204
    for i in range(len(k)-15): 
205
        b = 0 
206
        for j in range(0,15):
207
            b = b + k[i+j]
208
        ave15.append(round(b/15, 5))
209
    ave5i = np.asarray(ave5)
210
    #print(ave10i)
211
    ave15i = np.asarray(ave15)
212
    #print(ave15i)
213
    ## Find intercepts of different moving average curves
214
    idx = np.argwhere(np.diff(np.sign(ave15i - ave5i[:len(ave15i)])!= 0)).reshape(-1)+0 #reshape into one row.
215
    return idx[5]
216
217
# This is based on the method 1 where user can't choose the baseline.
218
# If wanted to add that, choose method2.
219
def sum_mean(vector):
220
    """
221
    This function returns the mean values.
222
    """
223
    a = 0
224
    for i in vector:
225
        a = a + i
226
    return [a,a/len(vector)]
227
228
229
def multiplica(vetor_x, vetor_y):
230
    a = 0
231
    for x,y in zip(vetor_x, vetor_y):
232
        a = a + (x * y)
233
    return a
234
235
236
def linear_coeff(x, y):
237
    """
238
    This function returns the inclination coeffecient and y axis interception coeffecient m and b. 
239
    """
240
    m = (multiplica(x,y) - sum_mean(x)[0] * sum_mean(y)[1]) / (multiplica(x,x) - sum_mean(x)[0] * sum_mean(x)[1])  
241
    b = sum_mean(y)[1] - m * sum_mean(x)[1]
242
    return m, b
243
244
245
def y_fitted_line(m, b, x):
246
    y_base = []
247
    for i in x:
248
        y = m * i + b
249
        y_base.append(y)
250
    return y_base
251
252
253
def linear_background(x, y):
254
    idx = critical_idx(x, y) + 5 #this is also arbitrary number we can play with.
255
    m, b = linear_coeff(x[(idx - int(0.5 * idx)) : (idx + int(0.5 * idx))], y[(idx - int(0.5 * idx)) : (idx + int(0.5 * idx))])
256
    y_base = y_fitted_line(m, b, x)
257
    return y_base
258
259 View Code Duplication
def peak_detection_fxn(data_y):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
260
    """The function takes an input of the column containing the y variables in the dataframe,
261
    associated with the current. The function calls the split function, which splits the
262
    column into two arrays, one of the positive and one of the negative values.
263
    This is because cyclic voltammetry delivers negative peaks, but the peakutils function works
264
    better with positive peaks. The function also runs on the middle 80% of data to eliminate
265
    unnecessary noise and messy values associated with pseudo-peaks.The vectors are then imported
266
    into the peakutils.indexes function to determine the significant peak for each array.
267
    The values are stored in a list, with the first index corresponding to the top peak and the
268
    second corresponding to the bottom peak.
269
    Parameters
270
    ______________
271
    y column: must be a column from a pandas dataframe
272
273
    Returns
274
    _____________
275
    A list with the index of the peaks from the top curve and bottom curve.
276
    """
277
278
    # initialize storage list
279
    index_list = []
280
281
    # split data into above and below the baseline
282
    col_y1, col_y2 = split(data_y) # removed main. head.
283
284
    # detemine length of data and what 10% of the data is
285
    len_y = len(col_y1)
286
    ten_percent = int(np.around(0.1*len_y))
287
288
    # adjust both input columns to be the middle 80% of data
289
    # (take of the first and last 10% of data)
290
    # this avoid detecting peaks from electrolysis
291
    # (from water splitting and not the molecule itself,
292
    # which can form random "peaks")
293
    mod_col_y2 = col_y2[ten_percent:len_y-ten_percent]
294
    mod_col_y1 = col_y1[ten_percent:len_y-ten_percent]
295
296
    # run peakutils package to detect the peaks for both top and bottom
297
    peak_top = peakutils.indexes(mod_col_y2, thres=0.99, min_dist=20)
298
    peak_bottom = peakutils.indexes(abs(mod_col_y1), thres=0.99, min_dist=20)
299
300
    # detemine length of both halves of data
301
    len_top = len(peak_top)
302
    len_bot = len(peak_bottom)
303
304
    # append the values to the storage list
305
    # manipulate values by adding the ten_percent value back
306
    # (as the indecies have moved)
307
    # to detect the actual peaks and not the modified values
308
    index_list.append(peak_top[int(len_top/2)]+ten_percent)
309
    index_list.append(peak_bottom[int(len_bot/2)]+ten_percent)
310
311
    # return storage list
312
    # first value is the top, second value is the bottom
313
    return index_list
314
315
316 View Code Duplication
def peak_values(DataFrame_x, DataFrame_y):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
317
    """Outputs x (potentials) and y (currents) values from data indices
318
        given by peak_detection function.
319
320
       ----------
321
       Parameters
322
       ----------
323
       DataFrame_x : should be in the form of a pandas DataFrame column.
324
         For example, df['potentials'] could be input as the column of x
325
         data.
326
327
        DataFrame_y : should be in the form of a pandas DataFrame column.
328
          For example, df['currents'] could be input as the column of y
329
          data.
330
331
       Returns
332
       -------
333
       Result : numpy array of coordinates at peaks in the following order:
334
         potential of peak on top curve, current of peak on top curve,
335
         potential of peak on bottom curve, current of peak on bottom curve"""
336
    index = peak_detection_fxn(DataFrame_y)
337
    potential1, potential2 = split(DataFrame_x)
338
    current1, current2 = split(DataFrame_y)
339
    Peak_values = []
340
    Peak_values.append(potential2[(index[0])])  # TOPX (bottom part of curve is
341
    # the first part of DataFrame)
342
    Peak_values.append(current2[(index[0])])  # TOPY
343
    Peak_values.append(potential1[(index[1])])  # BOTTOMX
344
    Peak_values.append(current1[(index[1])])  # BOTTOMY
345
    Peak_array = np.array(Peak_values)
346
    return Peak_array
347
348
349
def del_potential(DataFrame_x, DataFrame_y):
350
    """Outputs the difference in potentials between anoidc and
351
       cathodic peaks in cyclic voltammetry data.
352
353
       Parameters
354
       ----------
355
       DataFrame_x : should be in the form of a pandas DataFrame column.
356
         For example, df['potentials'] could be input as the column of x
357
         data.
358
359
        DataFrame_y : should be in the form of a pandas DataFrame column.
360
          For example, df['currents'] could be input as the column of y
361
          data.
362
363
        Returns
364
        -------
365
        Results: difference in peak potentials in the form of a numpy array."""
366
    del_potentials = (peak_values(DataFrame_x, DataFrame_y)[0] -
367
                      peak_values(DataFrame_x, DataFrame_y)[2])
368
    return del_potentials
369
370
371
def half_wave_potential(DataFrame_x, DataFrame_y):
372
    """Outputs the half wave potential(redox potential) from cyclic
373
       voltammetry data.
374
375
       Parameters
376
       ----------
377
       DataFrame_x : should be in the form of a pandas DataFrame column.
378
         For example, df['potentials'] could be input as the column of x
379
         data.
380
381
        DataFrame_y : should be in the form of a pandas DataFrame column.
382
          For example, df['currents'] could be input as the column of y
383
          data.
384
385
       Returns
386
       -------
387
       Results : the half wave potential in the form of a
388
         floating point number."""
389
    half_wave_potential = (del_potential(DataFrame_x, DataFrame_y))/2
390
    return half_wave_potential
391
392
393 View Code Duplication
def peak_heights(DataFrame_x, DataFrame_y):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
394
    """Outputs heights of minimum peak and maximum
395
         peak from cyclic voltammetry data.
396
397
       Parameters
398
       ----------
399
       DataFrame_x : should be in the form of a pandas DataFrame column.
400
         For example, df['potentials'] could be input as the column of x
401
         data.
402
403
        DataFrame_y : should be in the form of a pandas DataFrame column.
404
          For example, df['currents'] could be input as the column of y
405
          data.
406
407
        Returns
408
        -------
409
        Results: height of maximum peak, height of minimum peak
410
          in that order in the form of a list."""
411
    current_max = peak_values(DataFrame_x, DataFrame_y)[1]
412
    current_min = peak_values(DataFrame_x, DataFrame_y)[3]
413
    x1, x2 = split(DataFrame_x)
414
    y1, y2 = split(DataFrame_y)
415
    line_at_min = linear_background(x1, y1)[peak_detection_fxn(DataFrame_y)[1]]
416
    line_at_max = linear_background(x2, y2)[peak_detection_fxn(DataFrame_y)[0]]
417
    height_of_max = current_max - line_at_max
418
    height_of_min = abs(current_min - line_at_min)
419
    return [height_of_max, height_of_min]
420
421
422
def peak_ratio(DataFrame_x, DataFrame_y):
423
    """Outputs the peak ratios from cyclic voltammetry data.
424
425
       Parameters
426
       ----------
427
       DataFrame_x : should be in the form of a pandas DataFrame column.
428
         For example, df['potentials'] could be input as the column of x
429
         data.
430
431
        DataFrame_y : should be in the form of a pandas DataFrame column.
432
          For example, df['currents'] could be input as the column of y
433
          data.
434
435
       Returns
436
       -------
437
       Result : returns a floating point number, the peak ratio."""
438
    ratio = (peak_heights(DataFrame_x, DataFrame_y)[0] /
439
             peak_heights(DataFrame_x, DataFrame_y)[1])
440
    return ratio
441