submodule.baseline.split() - Code Metrics - Inspection of "To merge branch backto master." - sabiharustam/voltcycle - Measure and Improve Code Quality continuously with Scrutinizer

Passed

Push — master ( dc5def...e4b26c )

by Sabiha

created 2019-03-21 20:55 UTC

submodule.baseline.split() A

↳ Parent: Project

Complexity

Conditions

Size

Total Lines	9
Code Lines	6

Duplication

Lines	0
Ratio	0 %

Importance

Changes

Metric	Value
cc	1
eloc	6
nop	1
dl	0
loc	9
rs	10
c	0
b	0
f	0

# This module is to fit baseline to calculate peak current
# values from cyclic voltammetry data.
# If you wish to choose best fitted baseline,
# checkout branch baseline_old method2.
# If have any questions contact [email protected]
 
import pandas as pd
import numpy as np
import csv
import matplotlib.pyplot as plt
import warnings
import matplotlib.cbook

#split forward and backward sweping data, to make it easier for processing.
def split(vector):
    """
    This function takes an array and splits it into equal two half.
    ----------
    Parameters
    ----------
    vector : Can be in any form of that can be turned into numpy array.
    Normally, for the use of this function, it expects pandas DataFrame column.
    For example, df['potentials'] could be input as the column of x data.
    -------
    Returns
    -------
    This function returns two equally splited vector. 
    The output then can be used to ease the implementation of peak detection and baseline finding.
    """
    assert type(vector) == pd.core.series.Series, "Input of the function should be pandas series"
    split = int(len(vector)/2)
    end = int(len(vector))
    vector1 = np.array(vector)[0:split]
    vector2 = np.array(vector)[split:end]
    return vector1, vector2


def critical_idx(x, y): ## Finds index where data set is no longer linear 
    """
    This function takes x and y values callculate the derrivative of x and y, and calculate moving average of 5 and 15 points.
    Finds intercepts of different moving average curves and return the indexs of the first intercepts.
    ----------
    Parameters
    ----------
    x : Numpy array.
    y : Numpy array.
    Normally, for the use of this function, it expects numpy array that came out from split function.
    For example, output of split.df['potentials'] could be input for this function as x.
    -------
    Returns
    -------
    This function returns 5th index of the intercepts of different moving average curves. 
    User can change this function according to baseline branch method 2 to get various indexes..
    """
    assert type(x) == np.ndarray, "Input of the function should be numpy array"
    assert type(y) == np.ndarray, "Input of the function should be numpy array"
    if x.shape[0] != y.shape[0]:
        raise ValueError("x and y must have same first dimension, but "
                        "have shapes {} and {}".format(x.shape, y.shape))
    k = np.diff(y)/(np.diff(x)) #calculated slops of x and y
    ## Calculate moving average for 10 and 15 points.
    ## This two arbitrary number can be tuned to get better fitting.
    ave10 = []
    ave15 = []
    for i in range(len(k)-10):
	# The reason to minus 10 is to prevent j from running out of index.
        a = 0 
        for j in range(0,5):
            a = a + k[i+j]
        ave10.append(round(a/10, 5)) 
	# keeping 5 desimal points for more accuracy
	# This numbers affect how sensitive to noise.
    for i in range(len(k)-15): 
        b = 0 
        for j in range(0,15):
            b = b + k[i+j]
        ave15.append(round(b/15, 5))
    ave10i = np.asarray(ave10)
    ave15i = np.asarray(ave15)
    ## Find intercepts of different moving average curves
    #reshape into one row. 
		idx = np.argwhere(np.diff(np.sign(ave15i - ave10i[:len(ave15i)])!= 0)).reshape(-1)+0
    return idx[5]
# This is based on the method 1 where user can't choose the baseline.
# If wanted to add that, choose method2.


def sum_mean(vector):
    """
    This function returns the mean and sum of the given vector. 
    ----------                                                                                                             
    Parameters
    ----------
    vector : Can be in any form of that can be turned into numpy array.
    Normally, for the use of this function, it expects pandas DataFrame column.
    For example, df['potentials'] could be input as the column of x data.
    """
    assert type(vector) == np.ndarray, "Input of the function should be numpy array"
    a = 0
    for i in vector:
        a = a + i
    return [a,a/len(vector)]


def multiplica(vector_x, vector_y):
    """
    This function returns the sum of the multilica of two given vector. 
    ----------                                                                                                             
    Parameters
    ----------
    vector_x, vector_y : Output of the split vector function.
    Two inputs can be the same vector or different vector with same length.
    -------
    Returns
    -------
    This function returns a number that is the sum of multiplicity of given two vector.
    """
    assert type(vector_x) == np.ndarray, "Input of the function should be numpy array"
    assert type(vector_y) == np.ndarray, "Input of the function should be numpy array"
    a = 0
    for x,y in zip(vector_x, vector_y):
        a = a + (x * y)
    return a

def linear_coeff(x, y):
    """
    This function returns the inclination coeffecient and y axis interception coeffecient m and b. 
    ----------                                                                                                             
    Parameters
    ----------
    x : Output of the split vector function.
    y : Output of the split vector function.
    -------
    Returns
    -------
    float number of m and b.
    """
    m = (multiplica(x,y) - sum_mean(x)[0] * sum_mean(y)[1]) / (multiplica(x,x) - sum_mean(x)[0] * sum_mean(x)[1])  
    b = sum_mean(y)[1] - m * sum_mean(x)[1]
    return m, b


def y_fitted_line(m, b, x):
    """
    This function returns the fitted baseline constructed by coeffecient m and b and x values. 
    ----------                                                                                                             
    Parameters
    ----------
    x : Output of the split vector function. x value of the input.
    m : inclination of the baseline.
    b : y intercept of the baseline.
    -------
    Returns
    -------
    List of constructed y_labels.
    """
    y_base = []
    for i in x:
        y = m * i + b
        y_base.append(y)
    return y_base


def linear_background(x, y):
    """
    This function is wrapping function for calculating linear fitted line.
    It takes x and y values of the cv data, returns the fitted baseline. 
    ----------                                                                                                             
    Parameters
    ----------
    x : Output of the split vector function. x value of the cyclic voltammetry data.
    y : Output of the split vector function. y value of the cyclic voltammetry data. 
    -------
    Returns
    -------
    List of constructed y_labels.
    """
    assert type(x) == np.ndarray, "Input of the function should be numpy array"
    assert type(y) == np.ndarray, "Input of the function should be numpy array"
    idx = critical_idx(x, y) + 5 #this is also arbitrary number we can play with.
    m, b = linear_coeff(x[(idx - int(0.5 * idx)) : (idx + int(0.5 * idx))], y[(idx - int(0.5 * idx)) : (idx + int(0.5 * idx))])
    y_base = y_fitted_line(m, b, x)
    return y_base


1			# This module is to fit baseline to calculate peak current
2			# values from cyclic voltammetry data.
3			# If you wish to choose best fitted baseline,
4			# checkout branch baseline_old method2.
5			# If have any questions contact [email protected]
6
7			import pandas as pd
8			import numpy as np
9			import csv
10			import matplotlib.pyplot as plt
11			import warnings
12			import matplotlib.cbook
13
14			#split forward and backward sweping data, to make it easier for processing.
15			def split(vector):
16			"""
17			This function takes an array and splits it into equal two half.
18			----------
19			Parameters
20			----------
21			vector : Can be in any form of that can be turned into numpy array.
22			Normally, for the use of this function, it expects pandas DataFrame column.
23			For example, df['potentials'] could be input as the column of x data.
24			-------
25			Returns
26			-------
27			This function returns two equally splited vector.
28			The output then can be used to ease the implementation of peak detection and baseline finding.
29			"""
30			assert type(vector) == pd.core.series.Series, "Input of the function should be pandas series"
31			split = int(len(vector)/2)
32			end = int(len(vector))
33			vector1 = np.array(vector)[0:split]
34			vector2 = np.array(vector)[split:end]
35			return vector1, vector2
36
37
38			def critical_idx(x, y): ## Finds index where data set is no longer linear
39			"""
40			This function takes x and y values callculate the derrivative of x and y, and calculate moving average of 5 and 15 points.
41			Finds intercepts of different moving average curves and return the indexs of the first intercepts.
42			----------
43			Parameters
44			----------
45			x : Numpy array.
46			y : Numpy array.
47			Normally, for the use of this function, it expects numpy array that came out from split function.
48			For example, output of split.df['potentials'] could be input for this function as x.
49			-------
50			Returns
51			-------
52			This function returns 5th index of the intercepts of different moving average curves.
53			User can change this function according to baseline branch method 2 to get various indexes..
54			"""
55			assert type(x) == np.ndarray, "Input of the function should be numpy array"
56			assert type(y) == np.ndarray, "Input of the function should be numpy array"
57			if x.shape[0] != y.shape[0]:
58			raise ValueError("x and y must have same first dimension, but "
59			"have shapes {} and {}".format(x.shape, y.shape))
60			k = np.diff(y)/(np.diff(x)) #calculated slops of x and y
61			## Calculate moving average for 10 and 15 points.
62			## This two arbitrary number can be tuned to get better fitting.
63			ave10 = []
64			ave15 = []
65			for i in range(len(k)-10):
66			# The reason to minus 10 is to prevent j from running out of index.
67			a = 0
68			for j in range(0,5):
69			a = a + k[i+j]
70			ave10.append(round(a/10, 5))
71			# keeping 5 desimal points for more accuracy
72			# This numbers affect how sensitive to noise.
73			for i in range(len(k)-15):
74			b = 0
75			for j in range(0,15):
76			b = b + k[i+j]
77			ave15.append(round(b/15, 5))
78			ave10i = np.asarray(ave10)
79			ave15i = np.asarray(ave15)
80			## Find intercepts of different moving average curves
81			#reshape into one row.
82			idx = np.argwhere(np.diff(np.sign(ave15i - ave10i[:len(ave15i)])!= 0)).reshape(-1)+0
83			return idx[5]
84			# This is based on the method 1 where user can't choose the baseline.
85			# If wanted to add that, choose method2.
86
87
88			def sum_mean(vector):
89			"""
90			This function returns the mean and sum of the given vector.
91			----------
92			Parameters
93			----------
94			vector : Can be in any form of that can be turned into numpy array.
95			Normally, for the use of this function, it expects pandas DataFrame column.
96			For example, df['potentials'] could be input as the column of x data.
97			"""
98			assert type(vector) == np.ndarray, "Input of the function should be numpy array"
99			a = 0
100			for i in vector:
101			a = a + i
102			return [a,a/len(vector)]
103
104
105			def multiplica(vector_x, vector_y):
106			"""
107			This function returns the sum of the multilica of two given vector.
108			----------
109			Parameters
110			----------
111			vector_x, vector_y : Output of the split vector function.
112			Two inputs can be the same vector or different vector with same length.
113			-------
114			Returns
115			-------
116			This function returns a number that is the sum of multiplicity of given two vector.
117			"""
118			assert type(vector_x) == np.ndarray, "Input of the function should be numpy array"
119			assert type(vector_y) == np.ndarray, "Input of the function should be numpy array"
120			a = 0
121			for x,y in zip(vector_x, vector_y):
122			a = a + (x * y)
123			return a
124
125			def linear_coeff(x, y):
126			"""
127			This function returns the inclination coeffecient and y axis interception coeffecient m and b.
128			----------
129			Parameters
130			----------
131			x : Output of the split vector function.
132			y : Output of the split vector function.
133			-------
134			Returns
135			-------
136			float number of m and b.
137			"""
138			m = (multiplica(x,y) - sum_mean(x)[0] * sum_mean(y)[1]) / (multiplica(x,x) - sum_mean(x)[0] * sum_mean(x)[1])
139			b = sum_mean(y)[1] - m * sum_mean(x)[1]
140			return m, b
141
142
143			def y_fitted_line(m, b, x):
144			"""
145			This function returns the fitted baseline constructed by coeffecient m and b and x values.
146			----------
147			Parameters
148			----------
149			x : Output of the split vector function. x value of the input.
150			m : inclination of the baseline.
151			b : y intercept of the baseline.
152			-------
153			Returns
154			-------
155			List of constructed y_labels.
156			"""
157			y_base = []
158			for i in x:
159			y = m * i + b
160			y_base.append(y)
161			return y_base
162
163
164			def linear_background(x, y):
165			"""
166			This function is wrapping function for calculating linear fitted line.
167			It takes x and y values of the cv data, returns the fitted baseline.
168			----------
169			Parameters
170			----------
171			x : Output of the split vector function. x value of the cyclic voltammetry data.
172			y : Output of the split vector function. y value of the cyclic voltammetry data.
173			-------
174			Returns
175			-------
176			List of constructed y_labels.
177			"""
178			assert type(x) == np.ndarray, "Input of the function should be numpy array"
179			assert type(y) == np.ndarray, "Input of the function should be numpy array"
180			idx = critical_idx(x, y) + 5 #this is also arbitrary number we can play with.
181			m, b = linear_coeff(x[(idx - int(0.5 * idx)) : (idx + int(0.5 * idx))], y[(idx - int(0.5 * idx)) : (idx + int(0.5 * idx))])
182			y_base = y_fitted_line(m, b, x)
183			return y_base
184

sabiharustam / voltcycle

GitHub Access Token became invalid

Push — master ( dc5def...e4b26c )

submodule.baseline.split() A

Complexity

Size

Duplication

Importance

Duplication Side-by-Side

Filter issues like