Conditions | 12 |
Total Lines | 62 |
Code Lines | 29 |
Lines | 0 |
Ratio | 0 % |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like motorized_individual_travel.helpers.reduce_mem_usage() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | """ |
||
137 | def reduce_mem_usage( |
||
138 | df: pd.DataFrame, show_reduction: bool = False |
||
139 | ) -> pd.DataFrame: |
||
140 | """Function to automatically check if columns of a pandas DataFrame can |
||
141 | be reduced to a smaller data type. Source: |
||
142 | https://www.mikulskibartosz.name/how-to-reduce-memory-usage-in-pandas/ |
||
143 | |||
144 | Parameters |
||
145 | ---------- |
||
146 | df: pd.DataFrame |
||
147 | DataFrame to reduce memory usage on |
||
148 | show_reduction : bool |
||
149 | If True, print amount of memory reduced |
||
150 | |||
151 | Returns |
||
152 | ------- |
||
153 | pd.DataFrame |
||
154 | DataFrame with memory usage decreased |
||
155 | """ |
||
156 | start_mem = df.memory_usage().sum() / 1024 ** 2 |
||
157 | |||
158 | for col in df.columns: |
||
159 | col_type = df[col].dtype |
||
160 | |||
161 | if col_type != object and str(col_type) != "category": |
||
162 | c_min = df[col].min() |
||
163 | c_max = df[col].max() |
||
164 | |||
165 | if str(col_type)[:3] == "int": |
||
166 | if ( |
||
167 | c_min > np.iinfo(np.int16).min |
||
168 | and c_max < np.iinfo(np.int16).max |
||
169 | ): |
||
170 | df[col] = df[col].astype("int16") |
||
171 | elif ( |
||
172 | c_min > np.iinfo(np.int32).min |
||
173 | and c_max < np.iinfo(np.int32).max |
||
174 | ): |
||
175 | df[col] = df[col].astype("int32") |
||
176 | else: |
||
177 | df[col] = df[col].astype("int64") |
||
178 | else: |
||
179 | if ( |
||
180 | c_min > np.finfo(np.float32).min |
||
181 | and c_max < np.finfo(np.float32).max |
||
182 | ): |
||
183 | df[col] = df[col].astype("float32") |
||
184 | else: |
||
185 | df[col] = df[col].astype("float64") |
||
186 | |||
187 | else: |
||
188 | df[col] = df[col].astype("category") |
||
189 | |||
190 | end_mem = df.memory_usage().sum() / 1024 ** 2 |
||
191 | |||
192 | if show_reduction is True: |
||
193 | print( |
||
194 | "Reduced memory usage of DataFrame by " |
||
195 | f"{(1 - end_mem/start_mem) * 100:.2f} %." |
||
196 | ) |
||
197 | |||
198 | return df |
||
199 |