Completed
Push — master ( 4d243d...e4a84f )
by Rich
01:28
created

MullerAmesConverter.parse_splits()   C

Complexity

Conditions 8

Size

Total Lines 8

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 1
Metric Value
c 1
b 0
f 1
dl 0
loc 8
rs 6.6666
cc 8
1
#! /usr/bin/env python
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
import os
7
import zipfile
8
import logging
9
logger = logging.getLogger(__name__)
0 ignored issues
show
Coding Style Naming introduced by
The name logger does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
10
11
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
12
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13
import skchem
14
15
from .base import Converter
16
17
from ... import standardizers
0 ignored issues
show
Unused Code introduced by
The import standardizers seems to be unused.
Loading history...
18
19
PATCHES = {
20
    '820-75-7': r'NNC(=O)CNC(=O)C=[N+]=[N-]',
21
    '2435-76-9': r'[N-]=[N+]=C1C=NC(=O)NC1=O',
22
    '817-99-2': r'NC(=O)CNC(=O)\C=[N+]=[N-]',
23
    '116539-70-9': r'CCCCN(CC(O)C1=C\C(=[N+]=[N-])\C(=O)C=C1)N=O',
24
    '115-02-6': r'NC(COC(=O)\C=[N+]=[N-])C(=O)O',
25
    '122341-55-3': r'NC(COC(=O)\C=[N+]=[N-])C(=O)O'
26
}
27
28
class MullerAmesConverter(Converter):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
29
30
    def __init__(self, directory, output_directory, output_filename='muller_ames.h5'):
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Converter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
31
32
        """
33
        Args:
34
            directory (str):
35
                Directory in which input files reside.
36
            output_directory (str):
37
                Directory in which to save the converted dataset.
38
            output_filename (str):
39
                Name of the saved dataset. Defaults to `muller_ames.h5`.
40
41
        Returns:
42
            tuple of str:
43
                Single-element tuple containing the path to the converted dataset.
44
        """
45
46
        zip_path = os.path.join(directory, 'ci900161g_si_001.zip')
47
        output_path = os.path.join(output_directory, output_filename)
48
49
        with zipfile.ZipFile(zip_path) as f:
0 ignored issues
show
Coding Style Naming introduced by
The name f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
50
            f.extractall()
51
52
        # create dataframe
53
        data = pd.read_csv(os.path.join(directory, 'smiles_cas_N6512.smi'),
54
                           delimiter='\t', index_col=1,
55
                           converters={1: lambda s: s.strip()},
56
                           header=None, names=['structure', 'id', 'is_mutagen'])
57
58
        data = self.patch_data(data, PATCHES)
59
60
        data['structure'] = data.structure.apply(skchem.Mol.from_smiles)
0 ignored issues
show
Bug introduced by
The Class Mol does not seem to have a member named from_smiles.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
61
62
        data = self.standardize(data)
63
64
        # filter step
65
        keep = self.filter(data)
66
67
        ms, ys = keep.structure, keep.is_mutagen
0 ignored issues
show
Coding Style Naming introduced by
The name ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
68
69
        indices = data.reset_index().index.difference(keep.reset_index().index)
70
71
        train = self.parse_splits(os.path.join('splits_train_N6512.csv'))
72
        train = self.drop_indices(train, indices)
73
        splits = self.create_split_dict(train, 'train')
74
75
        test = self.parse_splits(os.path.join(directory, 'splits_test_N6512.csv'))
76
        test = self.drop_indices(test, indices)
77
        splits.update(self.create_split_dict(test, 'test'))
78
79
        self.run(ms, ys, output_path, splits=splits)
80
81
    def patch_data(self, data, patches):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
82
        """ Patch smiles in a DataFrame with rewritten ones that specify diazo
83
        groups in rdkit friendly way. """
84
85
        logger.info('Patching data...')
86
        for cas, smiles in patches.items():
87
            data.loc[cas, 'structure'] = smiles
88
89
        return data
90
91
    def parse_splits(self, f_path):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
92
        logger.info('Parsing splits...')
93
        with open(f_path) as f:
0 ignored issues
show
Coding Style Naming introduced by
The name f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
94
            splits = [split for split in f.read().strip().splitlines()]
95
96
        splits = [[n for n in split.strip().split(',')] for split in splits]
97
        splits = [sorted(int(n) for n in split) for split in splits] # sorted ints
98
        return [np.array(split) - 1 for split in splits] # zero based indexing
99
100
    def drop_indices(self, splits, indices):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
101
        logger.info('Dropping failed compounds from split indices...')
102
        for i, split in enumerate(splits):
103
            split = split - sum(split > ix for ix in indices)
104
            splits[i] = np.delete(split, indices)
105
106
        return splits
107
108
    def create_split_dict(self, splits, name):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
109
        return {'{}_{}'.format(name, i + 1): split \
110
                        for i, split in enumerate(splits)}
111
112
if __name__ == '__main__':
113
    logging.basicConfig(level=logging.INFO)
114
    MullerAmesConverter.convert()
115