| Conditions | 4 |
| Total Lines | 66 |
| Lines | 0 |
| Ratio | 0 % |
| Changes | 1 | ||
| Bugs | 0 | Features | 1 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
| 1 | #! /usr/bin/env python |
||
| 24 | def __init__(self, inp, threshold=0.5, fper='morgan', |
||
| 25 | similarity_metric='jaccard', memory_optimized=False, |
||
| 26 | fingerprints=None, similarity_matrix=None): |
||
| 27 | """ Threshold similarity split for chemical datasets. |
||
| 28 | |||
| 29 | This class implements a splitting technique that will pool compounds |
||
| 30 | with similarity above a theshold into the same splits. |
||
| 31 | |||
| 32 | Machine learning techniques should be able to extrapolate outside of a |
||
| 33 | molecular series, or scaffold, however random splits will result in some |
||
| 34 | 'easy' test sets that are either *identical* or in the same molecular |
||
| 35 | series or share a significant scaffold with training set compounds. |
||
| 36 | |||
| 37 | This splitting technique reduces or eliminates (depending on the |
||
| 38 | threshold set) this effect, making the problem harder. |
||
| 39 | |||
| 40 | Args: |
||
| 41 | inp (scipy.sparse.dok, pd.Series or pd.DataFrame): |
||
| 42 | Either: |
||
| 43 | - a series of skchem.Mols |
||
| 44 | - dataframe of precalculated fingerprints |
||
| 45 | |||
| 46 | n_splits (int): |
||
| 47 | The number of splits to give. This will be overridden if ratio |
||
| 48 | is passed. |
||
| 49 | |||
| 50 | ratio (list[floats]): |
||
| 51 | Split ratios to use. |
||
| 52 | |||
| 53 | threshold (float): |
||
| 54 | The similarity threshold, above which, compounds will all be |
||
| 55 | assigned to the same split. |
||
| 56 | |||
| 57 | fper (str or skchem.Fingerprinter): |
||
| 58 | The fingerprinting technique to use to generate the similarity |
||
| 59 | matrix. |
||
| 60 | |||
| 61 | fingerprints (bool): |
||
| 62 | Whether percalculated fingerprints were passed directly. |
||
| 63 | |||
| 64 | similarity_matrix (scipy.sparse.dok): |
||
| 65 | A precalculated similarity matrix. |
||
| 66 | |||
| 67 | Notes: |
||
| 68 | The splits will not always be exactly the size requested, due to the |
||
| 69 | constraint and requirement to maintain random shuffling. |
||
| 70 | """ |
||
| 71 | |||
| 72 | if isinstance(fper, str): |
||
| 73 | fper = descriptors.get(fper) |
||
| 74 | |||
| 75 | self.fper = fper |
||
| 76 | fps = inp if fingerprints else self.fper.transform(inp) |
||
| 77 | |||
| 78 | self.n_instances = len(inp) |
||
| 79 | |||
| 80 | self.threshold = threshold |
||
| 81 | self.similarity_metric = similarity_metric |
||
| 82 | self.memory_optimized = memory_optimized |
||
| 83 | |||
| 84 | if not similarity_matrix: |
||
| 85 | similarity_matrix = self.similarity_matrix(fps) |
||
| 86 | |||
| 87 | self.clusters = pd.Series(self._cluster(similarity_matrix), |
||
| 88 | index=fps.index, |
||
| 89 | name='clusters') |
||
| 90 | |||
| 190 |
This can be caused by one of the following:
1. Missing Dependencies
This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.
2. Missing __init__.py files
This error could also result from missing
__init__.pyfiles in your module folders. Make sure that you place one file in each sub-folder.