Conditions | 2 |
Total Lines | 115 |
Code Lines | 30 |
Lines | 0 |
Ratio | 0 % |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Methods with many parameters are not only hard to understand, but their parameters also often become inconsistent when you need more, or different data.
There are several approaches to avoid long parameter lists:
1 | ''' |
||
236 | def corr_plot(data, split=None, threshold=0, cmap='BrBG', figsize=(12, 10), annot=True, dev=False, **kwargs): |
||
237 | ''' |
||
238 | Two-dimensional visualization of the correlation between feature-columns, excluding NA values. |
||
239 | |||
240 | Parameters |
||
241 | ---------- |
||
242 | data: 2D dataset that can be coerced into Pandas DataFrame. If a Pandas DataFrame is provided, the index/column \ |
||
243 | information is used to label the plots. |
||
244 | |||
245 | split: {None, 'pos', 'neg', 'high', 'low'}, default None |
||
246 | Type of split to be performed. |
||
247 | |||
248 | * None: visualize all correlations between the feature-columns. |
||
249 | * pos: visualize all positive correlations between the feature-columns above the threshold. |
||
250 | * neg: visualize all negative correlations between the feature-columns below the threshold. |
||
251 | * high: visualize all correlations between the feature-columns for which abs(corr) > threshold is True. |
||
252 | * low: visualize all correlations between the feature-columns for which abs(corr) < threshold is True. |
||
253 | |||
254 | threshold: float, default 0 |
||
255 | Value between 0 <= threshold <= 1 |
||
256 | |||
257 | cmap: matplotlib colormap name or object, or list of colors, default 'BrBG' |
||
258 | The mapping from data values to color space. |
||
259 | |||
260 | figsize: tuple, default (12, 10) |
||
261 | Use to control the figure size. |
||
262 | |||
263 | annot: bool, default True |
||
264 | Use to show or hide annotations. |
||
265 | |||
266 | dev: bool, default False |
||
267 | Display figure settings in the plot by setting dev = True. If False, the settings are not displayed. Use for \ |
||
268 | presentations. |
||
269 | |||
270 | **kwargs: optional |
||
271 | Additional elements to control the visualization of the plot, e.g.: |
||
272 | |||
273 | * mask: bool, default True |
||
274 | If set to False the entire correlation matrix, including the upper triangle is shown. Set dev = False in this \ |
||
275 | case to avoid overlap. |
||
276 | * vmax: float, default is calculated from the given correlation coefficients. |
||
277 | Value between -1 or vmin <= vmax <= 1, limits the range of the colorbar. |
||
278 | * vmin: float, default is calculated from the given correlation coefficients. |
||
279 | Value between -1 <= vmin <= 1 or vmax, limits the range of the colorbar. |
||
280 | * linewidths: float, default 0.5 |
||
281 | Controls the line-width inbetween the squares. |
||
282 | * annot_kws: dict, default {'size' : 10} |
||
283 | Controls the font size of the annotations. Only available when annot = True. |
||
284 | * cbar_kws: dict, default {'shrink': .95, 'aspect': 30} |
||
285 | Controls the size of the colorbar. |
||
286 | * Many more kwargs are available, i.e. 'alpha' to control blending, or options to adjust labels, ticks ... |
||
287 | |||
288 | Kwargs can be supplied through a dictionary of key-value pairs (see above). |
||
289 | |||
290 | Returns |
||
291 | ------- |
||
292 | figure |
||
293 | ''' |
||
294 | |||
295 | data = pd.DataFrame(data) |
||
296 | |||
297 | # Obtain correlation matrix |
||
298 | corr = corr_mat(data, split=split, threshold=threshold).data |
||
299 | |||
300 | # Generate mask for the upper triangle |
||
301 | mask = np.triu(np.ones_like(corr, dtype=np.bool)) |
||
302 | |||
303 | # Compute dimensions and correlation range to adjust settings |
||
304 | vmax = np.round(np.nanmax(corr.where(~mask))-0.05, 2) |
||
305 | vmin = np.round(np.nanmin(corr.where(~mask))+0.05, 2) |
||
306 | |||
307 | # Set up the matplotlib figure and generate colormap |
||
308 | fig, ax = plt.subplots(figsize=figsize) |
||
309 | |||
310 | # Specify kwargs for the heatmap |
||
311 | kwargs = {'mask': mask, |
||
312 | 'cmap': cmap, |
||
313 | 'annot': annot, |
||
314 | 'vmax': vmax, |
||
315 | 'vmin': vmin, |
||
316 | 'linewidths': .5, |
||
317 | 'annot_kws': {'size': 10}, |
||
318 | 'cbar_kws': {'shrink': .95, 'aspect': 30}, |
||
319 | **kwargs} |
||
320 | |||
321 | # Draw heatmap with mask and some default settings |
||
322 | sns.heatmap(corr, |
||
323 | center=0, |
||
324 | square=True, |
||
325 | fmt='.2f', |
||
326 | **kwargs |
||
327 | ) |
||
328 | |||
329 | ax.set_title('Feature-correlation Matrix', fontdict={'fontsize': 18}) |
||
330 | |||
331 | # Display settings |
||
332 | if dev: |
||
333 | fig.suptitle(f"\ |
||
334 | Settings (dev-mode): \n\ |
||
335 | - split-mode: {split} \n\ |
||
336 | - threshold: {threshold} \n\ |
||
337 | - annotations: {annot} \n\ |
||
338 | - cbar: \n\ |
||
339 | - vmax: {vmax} \n\ |
||
340 | - vmin: {vmin} \n\ |
||
341 | - linewidths: {kwargs['linewidths']} \n\ |
||
342 | - annot_kws: {kwargs['annot_kws']} \n\ |
||
343 | - cbar_kws: {kwargs['cbar_kws']}", |
||
344 | fontsize=12, |
||
345 | color='gray', |
||
346 | x=0.35, |
||
347 | y=0.85, |
||
348 | ha='left') |
||
349 | |||
350 | return ax |
||
351 |