Conditions | 18 |
Total Lines | 97 |
Lines | 0 |
Ratio | 0 % |
Tests | 25 |
CRAP Score | 21.3891 |
Changes | 8 | ||
Bugs | 0 | Features | 3 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like read_sdf() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | #! /usr/bin/env python |
||
36 | 1 | def read_sdf(sdf, error_bad_mol=False, warn_bad_mol=True, nmols=None, |
|
37 | skipmols=None, skipfooter=None, read_props=True, mol_props=False, |
||
38 | *args, **kwargs): |
||
39 | |||
40 | """Read an sdf file into a `pd.DataFrame`. |
||
41 | |||
42 | The function wraps the RDKit `ForwardSDMolSupplier` object. |
||
43 | |||
44 | Args: |
||
45 | sdf (str or file-like): |
||
46 | The location of data to load as a file path, or a file-like object. |
||
47 | error_bad_mol (bool): |
||
48 | Whether an error should be raised if a molecule fails to parse. |
||
49 | Default is False. |
||
50 | warn_bad_mol (bool): |
||
51 | Whether a warning should be output if a molecule fails to parse. |
||
52 | Default is True. |
||
53 | nmols (int): |
||
54 | The number of molecules to read. If `None`, read all molecules. |
||
55 | Default is `None`. |
||
56 | skipmols (int): |
||
57 | The number of molecules to skip at start. |
||
58 | Default is `0`. |
||
59 | skipfooter (int): |
||
60 | The number of molecules to skip from the end. |
||
61 | Default is `0`. |
||
62 | read_props (bool): |
||
63 | Whether to read the properties into the data frame. |
||
64 | Default is `True`. |
||
65 | mol_props (bool): |
||
66 | Whether to keep properties in the molecule dictionary after they |
||
67 | are extracted to the DataFrame. |
||
68 | Default is `False`. |
||
69 | args, kwargs: |
||
70 | Arguments will be passed to RDKit ForwardSDMolSupplier. |
||
71 | |||
72 | Returns: |
||
73 | pandas.DataFrame: |
||
74 | The loaded data frame, with Mols supplied in the `structure` field. |
||
75 | |||
76 | See also: |
||
77 | rdkit.Chem.SDForwardMolSupplier |
||
78 | skchem.read_smiles |
||
79 | """ |
||
80 | |||
81 | # nmols is actually the index to cutoff. If we skip some at start, we need |
||
82 | # to add this number |
||
83 | 1 | if skipmols: |
|
84 | nmols += skipmols |
||
85 | |||
86 | 1 | if isinstance(sdf, str): |
|
87 | 1 | sdf = open(sdf, 'rb') # use read bytes for python 3 compatibility |
|
88 | |||
89 | # use the suppression context manager to not pollute our stdout with rdkit |
||
90 | # errors and warnings. |
||
91 | # perhaps this should be captured better by Mol etc. |
||
92 | 1 | with Suppressor(): |
|
93 | |||
94 | 1 | mol_supp = Chem.ForwardSDMolSupplier(sdf, *args, **kwargs) |
|
95 | |||
96 | 1 | mols = [] |
|
97 | |||
98 | # single loop through sdf |
||
99 | 1 | for i, mol in enumerate(mol_supp): |
|
100 | |||
101 | 1 | if skipmols and i < skipmols: |
|
102 | continue |
||
103 | |||
104 | 1 | if nmols and i >= nmols: |
|
105 | break |
||
106 | |||
107 | 1 | if mol is None: |
|
108 | 1 | msg = 'Molecule {} could not be decoded.'.format(i + 1) |
|
109 | 1 | if error_bad_mol: |
|
110 | 1 | raise ValueError(msg) |
|
111 | elif warn_bad_mol: |
||
112 | warnings.warn(msg) |
||
113 | continue |
||
114 | |||
115 | 1 | mols.append(Mol(mol)) |
|
116 | |||
117 | 1 | if skipfooter: |
|
118 | mols = mols[:-skipfooter] |
||
119 | |||
120 | 1 | idx = pd.Index((m.name for m in mols), name='batch') |
|
121 | 1 | data = pd.DataFrame(mols, columns=['structure']) |
|
122 | |||
123 | 1 | if read_props: |
|
124 | 1 | props = pd.DataFrame([{k: v for (k, v) in mol.props.items()} |
|
125 | for mol in mols]) |
||
126 | 1 | data = pd.concat([data, props], axis=1) |
|
127 | # now we have extracted the props, we can delete if required |
||
128 | 1 | if not mol_props: |
|
129 | 1 | data.apply(_drop_props, axis=1) |
|
130 | |||
131 | 1 | data.index = idx |
|
132 | 1 | return squeeze(data, axis=1) |
|
133 | |||
217 |
Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.