| Conditions | 12 |
| Total Lines | 62 |
| Code Lines | 29 |
| Lines | 0 |
| Ratio | 0 % |
| Changes | 0 | ||
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like data.datasets.emobility.motorized_individual_travel.helpers.reduce_mem_usage() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
| 1 | """ |
||
| 137 | def reduce_mem_usage( |
||
| 138 | df: pd.DataFrame, show_reduction: bool = False |
||
| 139 | ) -> pd.DataFrame: |
||
| 140 | """Function to automatically check if columns of a pandas DataFrame can |
||
| 141 | be reduced to a smaller data type. Source: |
||
| 142 | https://www.mikulskibartosz.name/how-to-reduce-memory-usage-in-pandas/ |
||
| 143 | |||
| 144 | Parameters |
||
| 145 | ---------- |
||
| 146 | df: pd.DataFrame |
||
| 147 | DataFrame to reduce memory usage on |
||
| 148 | show_reduction : bool |
||
| 149 | If True, print amount of memory reduced |
||
| 150 | |||
| 151 | Returns |
||
| 152 | ------- |
||
| 153 | pd.DataFrame |
||
| 154 | DataFrame with memory usage decreased |
||
| 155 | """ |
||
| 156 | start_mem = df.memory_usage().sum() / 1024 ** 2 |
||
| 157 | |||
| 158 | for col in df.columns: |
||
| 159 | col_type = df[col].dtype |
||
| 160 | |||
| 161 | if col_type != object and str(col_type) != "category": |
||
| 162 | c_min = df[col].min() |
||
| 163 | c_max = df[col].max() |
||
| 164 | |||
| 165 | if str(col_type)[:3] == "int": |
||
| 166 | if ( |
||
| 167 | c_min > np.iinfo(np.int16).min |
||
| 168 | and c_max < np.iinfo(np.int16).max |
||
| 169 | ): |
||
| 170 | df[col] = df[col].astype("int16") |
||
| 171 | elif ( |
||
| 172 | c_min > np.iinfo(np.int32).min |
||
| 173 | and c_max < np.iinfo(np.int32).max |
||
| 174 | ): |
||
| 175 | df[col] = df[col].astype("int32") |
||
| 176 | else: |
||
| 177 | df[col] = df[col].astype("int64") |
||
| 178 | else: |
||
| 179 | if ( |
||
| 180 | c_min > np.finfo(np.float32).min |
||
| 181 | and c_max < np.finfo(np.float32).max |
||
| 182 | ): |
||
| 183 | df[col] = df[col].astype("float32") |
||
| 184 | else: |
||
| 185 | df[col] = df[col].astype("float64") |
||
| 186 | |||
| 187 | else: |
||
| 188 | df[col] = df[col].astype("category") |
||
| 189 | |||
| 190 | end_mem = df.memory_usage().sum() / 1024 ** 2 |
||
| 191 | |||
| 192 | if show_reduction is True: |
||
| 193 | print( |
||
| 194 | "Reduced memory usage of DataFrame by " |
||
| 195 | f"{(1 - end_mem/start_mem) * 100:.2f} %." |
||
| 196 | ) |
||
| 197 | |||
| 198 | return df |
||
| 199 |