Passed
Push — main ( bfa577...eb6882 )
by Douglas
04:37
created

BindingSearch.should_include()   F

Complexity

Conditions 14

Size

Total Lines 36
Code Lines 24

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 14
eloc 24
nop 5
dl 0
loc 36
rs 3.6
c 0
b 0
f 0

How to fix   Complexity   

Complexity

Complex classes like mandos.search.chembl.binding_search.BindingSearch.should_include() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import logging
0 ignored issues
show
introduced by
Missing module docstring
Loading history...
2
from dataclasses import dataclass
3
from typing import Sequence, Set, Optional
4
import re
5
6
from pocketutils.core.dot_dict import NestedDotDict
0 ignored issues
show
introduced by
Unable to import 'pocketutils.core.dot_dict'
Loading history...
7
8
from mandos.model.chembl_api import ChemblApi
9
from mandos.model.chembl_support import ChemblCompound
10
from mandos.model.chembl_support.chembl_targets import ChemblTarget
11
from mandos.model.defaults import Defaults
0 ignored issues
show
Unused Code introduced by
Unused Defaults imported from mandos.model.defaults
Loading history...
introduced by
Unable to import 'mandos.model.defaults'
Loading history...
Bug introduced by
The name defaults does not seem to exist in module mandos.model.
Loading history...
12
from mandos.model.taxonomy import Taxonomy
13
from mandos.search.chembl._protein_search import ProteinHit, ProteinSearch
14
from mandos.search.chembl.target_traversal import (
15
    TargetTraversalStrategy,
16
    TargetTraversalStrategies,
17
)
18
19
logger = logging.getLogger("mandos")
20
21
22
@dataclass(frozen=True, order=True, repr=True)
23
class BindingHit(ProteinHit):
24
    """
25
    An "activity" hit for a compound.
26
    """
27
28
    taxon_id: int
29
    taxon_name: str
30
    pchembl: float
31
    std_type: str
32
    src_id: str
33
    exact_target_id: str
34
35
    @property
36
    def predicate(self) -> str:
37
        return "activity"
38
39
40
class BindingSearch(ProteinSearch[BindingHit]):
41
    """
42
    Search for ``activity`` of type "B".
43
    """
44
45
    def __init__(
0 ignored issues
show
best-practice introduced by
Too many arguments (9/5)
Loading history...
46
        self,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
47
        chembl_api: ChemblApi,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
48
        tax: Taxonomy,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
49
        traversal_strategy: str,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
50
        allowed_target_types: Set[str],
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
51
        min_confidence_score: Optional[int],
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
52
        allowed_relations: Set[str],
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
53
        min_pchembl: float,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
54
        banned_flags: Set[str],
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
55
    ):
56
        super().__init__(chembl_api, tax, traversal_strategy)
57
        self.allowed_target_types = allowed_target_types
58
        self.min_confidence_score = min_confidence_score
59
        self.allowed_relations = allowed_relations
60
        self.min_pchembl = min_pchembl
61
        self.banned_flags = banned_flags
62
63
    @property
64
    def default_traversal_strategy(self) -> TargetTraversalStrategy:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
65
        return TargetTraversalStrategies.strategy0(self.api)
66
67
    def query(self, parent_form: ChemblCompound) -> Sequence[NestedDotDict]:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
68
        def set_to_regex(values) -> str:
69
            return "(" + "|".join([f"(?:{re.escape(v)})" for v in values]) + ")"
70
71
        filters = dict(
72
            parent_molecule_chembl_id=parent_form.chid,
73
            assay_type="B",
74
            standard_relation__iregex=set_to_regex(self.allowed_relations),
75
            pchembl_value__isnull=False,
76
            target_organism__isnull=None if self.taxonomy is None else False,
77
        )
78
        # I'd rather not figure out how the API interprets None, so remove them
79
        filters = {k: v for k, v in filters.items() if v is not None}
80
        return list(self.api.activity.filter(**filters))
81
82
    def should_include(
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
83
        self, lookup: str, compound: ChemblCompound, data: NestedDotDict, target: ChemblTarget
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
Unused Code introduced by
The argument compound seems to be unused.
Loading history...
84
    ) -> bool:
85
        if (
86
            (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (7/5)
Loading history...
87
                data.get_as("data_validity_comment", lambda s: s.lower())
88
                in {s.lower() for s in self.banned_flags}
89
            )
90
            or (data.req_as("standard_relation", str) not in self.allowed_relations)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
91
            or (data.req_as("assay_type", str) != "B")
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
92
            or (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
93
                self.taxonomy is not None and data.get_as("target_tax_id", int) not in self.taxonomy
94
            )
95
            or (data.get("pchembl_value") is None)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
96
            or (data.req_as("pchembl_value", float) < self.min_pchembl)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
97
        ):
98
            return False
99
        if data.get("data_validity_comment") is not None:
100
            logger.warning(
0 ignored issues
show
introduced by
Use lazy % formatting in logging functions
Loading history...
101
                f"Activity annotation for {lookup} has flag '{data.get('data_validity_comment')} (ok)"
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (102/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
102
            )
103
        # The `target_organism` doesn't always match the `assay_organism`
104
        # Ex: see assay CHEMBL823141 / document CHEMBL1135642 for homo sapiens in xenopus laevis
105
        # However, it's often something like yeast expressing a human / mouse / etc receptor
106
        # So there's no need to filter by it
107
        assay = self.api.assay.get(data.req_as("assay_chembl_id", str))
108
        confidence_score = assay.get("confidence_score")
109
        if target.type.name.lower() not in {s.lower() for s in self.allowed_target_types}:
110
            logger.warning(f"Excluding {target} with type {target.type}")
0 ignored issues
show
introduced by
Use lazy % formatting in logging functions
Loading history...
111
            return False
112
        if self.min_confidence_score is not None:
113
            if confidence_score is None or confidence_score < self.min_confidence_score:
114
                return False
115
            # Some of these are non-protein types
116
            # And if it's unknown, we don't know what to do with it
117
        return True
118
119
    def to_hit(
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
120
        self, lookup: str, compound: ChemblCompound, data: NestedDotDict, target: ChemblTarget
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
121
    ) -> Sequence[BindingHit]:
122
        # these must match the constructor of the Hit,
123
        # EXCEPT for object_id and object_name, which come from traversal
124
        x = self._extract(lookup, compound, data)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "x" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
125
        return [BindingHit(**x, object_id=target.chembl, object_name=target.name)]
126
127
    def _extract(self, lookup: str, compound: ChemblCompound, data: NestedDotDict) -> NestedDotDict:
128
        # we know these exist from the query
129
        if self.taxonomy is None:
130
            tax = None
131
        else:
132
            organism = data.req_as("target_organism", str)
133
            tax_id = data.req_as("target_tax_id", int)
134
            tax = self.taxonomy.req(tax_id)
135
            if organism != tax.name:
136
                logger.warning(f"Target organism {organism} is not {tax.name}")
0 ignored issues
show
introduced by
Use lazy % formatting in logging functions
Loading history...
137
        return NestedDotDict(
138
            dict(
139
                record_id=data.req_as("activity_id", str),
140
                compound_id=compound.chid,
141
                inchikey=compound.inchikey,
142
                compound_name=compound.name,
143
                compound_lookup=lookup,
144
                taxon_id=None if tax is None else tax.id,
145
                taxon_name=None if tax is None else tax.name,
146
                pchembl=data.req_as("pchembl_value", float),
147
                std_type=data.req_as("standard_type", str),
148
                src_id=data.req_as("src_id", str),
149
                exact_target_id=data.req_as("target_chembl_id", str),
150
            )
151
        )
152
153
154
__all__ = ["BindingHit", "BindingSearch"]
155