| Conditions | 5 | 
| Total Lines | 61 | 
| Code Lines | 22 | 
| Lines | 0 | 
| Ratio | 0 % | 
| Changes | 0 | ||
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
| 1 | '''  | 
            ||
| 90 | def train_dev_test_split(data, target, dev_size=0.1, test_size=0.1, stratify=None, random_state=1234):  | 
            ||
| 91 | '''  | 
            ||
| 92 | Split a dataset and a label column into train, dev and test sets.  | 
            ||
| 93 | |||
| 94 | Parameters:  | 
            ||
| 95 | ----------  | 
            ||
| 96 | |||
| 97 | data: 2D dataset that can be coerced into Pandas DataFrame. If a Pandas DataFrame is provided, the index/column \  | 
            ||
| 98 | information is used to label the plots.  | 
            ||
| 99 | |||
| 100 | target: string, list, np.array or pd.Series, default None  | 
            ||
| 101 | Specify target for correlation. E.g. label column to generate only the correlations between each feature \  | 
            ||
| 102 | and the label.  | 
            ||
| 103 | |||
| 104 | dev_size: float, default 0.1  | 
            ||
| 105 | If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the dev \  | 
            ||
| 106 | split.  | 
            ||
| 107 | |||
| 108 | test_size: float, default 0.1  | 
            ||
| 109 | If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test \  | 
            ||
| 110 | split.  | 
            ||
| 111 | |||
| 112 | stratify: target column, default None  | 
            ||
| 113 | If not None, data is split in a stratified fashion, using the input as the class labels.  | 
            ||
| 114 | |||
| 115 | random_state: integer  | 
            ||
| 116 | Random_state is the seed used by the random number generator.  | 
            ||
| 117 | |||
| 118 | Returns  | 
            ||
| 119 | -------  | 
            ||
| 120 | tuple: Tuple containing train-dev-test split of inputs.  | 
            ||
| 121 | '''  | 
            ||
| 122 | |||
| 123 | # Validate Inputs  | 
            ||
| 124 | _validate_input_range(dev_size, 'dev_size', 0, 1)  | 
            ||
| 125 | _validate_input_range(test_size, 'test_size', 0, 1)  | 
            ||
| 126 | _validate_input_int(random_state, 'random_state')  | 
            ||
| 127 | |||
| 128 | target_data = []  | 
            ||
| 129 | if isinstance(target, str):  | 
            ||
| 130 | target_data = data[target]  | 
            ||
| 131 | data = data.drop(target, axis=1)  | 
            ||
| 132 | |||
| 133 | elif isinstance(target, (list, pd.Series, np.ndarray)):  | 
            ||
| 134 | target_data = pd.Series(target)  | 
            ||
| 135 | target = target.name  | 
            ||
| 136 | |||
| 137 | X_train, X_dev_test, y_train, y_dev_test = train_test_split(data, target_data,  | 
            ||
| 138 | test_size=dev_size+test_size,  | 
            ||
| 139 | random_state=random_state,  | 
            ||
| 140 | stratify=stratify)  | 
            ||
| 141 | |||
| 142 | if (dev_size == 0) or (test_size == 0):  | 
            ||
| 143 | return X_train, X_dev_test, y_train, y_dev_test  | 
            ||
| 144 | |||
| 145 | else:  | 
            ||
| 146 | X_dev, X_test, y_dev, y_test = train_test_split(X_dev_test, y_dev_test,  | 
            ||
| 147 | test_size=test_size/(dev_size+test_size),  | 
            ||
| 148 | random_state=random_state,  | 
            ||
| 149 | stratify=y_dev_test)  | 
            ||
| 150 | return X_train, X_dev, X_test, y_train, y_dev, y_test  | 
            ||
| 151 |