| Conditions | 10 |
| Total Lines | 95 |
| Lines | 0 |
| Ratio | 0 % |
| Changes | 5 | ||
| Bugs | 0 | Features | 2 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like read_smiles() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
| 1 | #! /usr/bin/env python |
||
| 20 | def read_smiles(smiles_file, smiles_column=0, name_column=None, delimiter='\t', |
||
| 21 | title_line=False, error_bad_mol=False, warn_bad_mol=True, |
||
| 22 | drop_bad_mol=True, *args, **kwargs): |
||
| 23 | |||
| 24 | """Read a smiles file into a pandas dataframe. |
||
| 25 | |||
| 26 | The class wraps the pandas read_csv function. |
||
| 27 | |||
| 28 | smiles_file (str, file-like): |
||
| 29 | Location of data to load, specified as a string or passed directly as a |
||
| 30 | file-like object. URLs may also be used, see the pandas.read_csv |
||
| 31 | documentation. |
||
| 32 | smiles_column (int): |
||
| 33 | The column index at which SMILES are provided. |
||
| 34 | Defaults to `0`. |
||
| 35 | name_column (int): |
||
| 36 | The column index at which compound names are provided, for use as the |
||
| 37 | index in the dataframe. If None, use the default index. |
||
| 38 | Defaults to `None`. |
||
| 39 | delimiter (str): |
||
| 40 | The delimiter used. |
||
| 41 | Defaults to `\t`. |
||
| 42 | title_line (bool): |
||
| 43 | Whether a title line is provided, to use as column titles. |
||
| 44 | Defaults to `False`. |
||
| 45 | error_bad_mol (bool): |
||
| 46 | Whether an error should be raised when a molecule fails to parse. |
||
| 47 | Defaults to `False`. |
||
| 48 | warn_bad_mol (bool): |
||
| 49 | Whether a warning should be raised when a molecule fails to parse. |
||
| 50 | Defaults to `True`. |
||
| 51 | drop_bad_mol (bool): |
||
| 52 | If true, drop any column with smiles that failed to parse. Otherwise, |
||
| 53 | the field is None. Defaults to `True`. |
||
| 54 | *args, **kwargs: |
||
| 55 | Arguments will be passed to pandas read_csv arguments. |
||
| 56 | |||
| 57 | Returns: |
||
| 58 | pandas.DataFrame: |
||
| 59 | The loaded data frame, with Mols supplied in the `structure` field. |
||
| 60 | |||
| 61 | See Also: |
||
| 62 | pandas.read_csv |
||
| 63 | skchem.Mol.from_smiles |
||
| 64 | skchem.io.sdf |
||
| 65 | |||
| 66 | """ |
||
| 67 | |||
| 68 | with Suppressor(): |
||
| 69 | |||
| 70 | # set the header line to pass to the pandas parser |
||
| 71 | # we accept True as being line zero, as is usual for smiles |
||
| 72 | # if user specifies a header already, then do nothing |
||
| 73 | |||
| 74 | header = kwargs.pop('header', None) |
||
| 75 | if title_line is True: |
||
| 76 | header = 0 |
||
| 77 | elif header is not None: |
||
| 78 | pass #remove from the kwargs to not pass it twice |
||
| 79 | else: |
||
| 80 | header = None |
||
| 81 | |||
| 82 | # read the smiles file |
||
| 83 | data = pd.read_csv(smiles_file, delimiter=delimiter, header=header, |
||
| 84 | *args, **kwargs) |
||
| 85 | |||
| 86 | # replace the smiles column with the structure column |
||
| 87 | lst = list(data.columns) |
||
| 88 | lst[smiles_column] = 'structure' |
||
| 89 | data.columns = lst |
||
| 90 | |||
| 91 | def parse(row): |
||
| 92 | """ Parse smiles for row """ |
||
| 93 | try: |
||
| 94 | return Mol.from_smiles(row.structure) |
||
| 95 | except ValueError: |
||
| 96 | msg = 'Molecule {} could not be decoded.'.format(row.name) |
||
| 97 | if error_bad_mol: |
||
| 98 | raise ValueError(msg) |
||
| 99 | elif warn_bad_mol: |
||
| 100 | warnings.warn(msg) |
||
| 101 | |||
| 102 | return None |
||
| 103 | |||
| 104 | data['structure'] = data['structure'].apply(str) |
||
| 105 | data['structure'] = data.apply(parse, axis=1) |
||
| 106 | |||
| 107 | if drop_bad_mol: |
||
| 108 | data = data[data['structure'].notnull()] |
||
| 109 | |||
| 110 | # set index if passed |
||
| 111 | if name_column is not None: |
||
| 112 | data = data.set_index(data.columns[name_column]) |
||
| 113 | |||
| 114 | return data |
||
| 115 | |||
| 123 |
Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.