Passed
Push — main ( ddff4b...7b3fbc )
by Douglas
04:33
created

mandos.model.apis.chembl_support.chembl_target_graphs   B

Complexity

Total Complexity 46

Size/Duplication

Total Lines 383
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 199
dl 0
loc 383
rs 8.72
c 0
b 0
f 0
wmc 46

26 Methods

Rating   Name   Duplication   Size   Complexity  
A ChemblTargetGraph.__init__() 0 4 2
A ChemblTargetGraph.__str__() 0 2 1
A ChemblTargetGraph.chembl() 0 3 1
A TargetEdgeReqs.cross() 0 33 4
A ChemblTargetGraph.__hash__() 0 2 1
A ChemblTargetGraph.links() 0 28 5
A ChemblTargetGraph.name() 0 3 1
A ChemblTargetGraph.target() 0 3 1
A ChemblTargetGraph.at_node() 0 5 2
B ChemblTargetGraph._traverse() 0 61 8
A ChemblTargetGraphFactory.at_node() 0 2 1
A ChemblTargetGraph.factory() 0 8 1
A AbstractTargetEdgeReqs.matches() 0 7 1
A ChemblTargetGraph.__repr__() 0 2 1
A ChemblTargetGraph.__eq__() 0 4 2
A TargetRelType.of() 0 3 1
A ChemblTargetGraphFactory.at_target() 0 4 1
A TargetEdgeReqs.matches() 0 30 1
A ChemblTargetGraphFactory.create() 0 12 1
A ChemblTargetGraphFactory.__init__() 0 2 1
A ChemblTargetGraph.traverse() 0 18 2
A TargetNode.is_start() 0 3 1
A ChemblTargetGraph.api() 0 8 1
A ChemblTargetGraph.at_target() 0 7 2
A ChemblTargetGraph.__lt__() 0 4 2
A ChemblTargetGraph.type() 0 3 1

How to fix   Complexity   

Complexity

Complex classes like mandos.model.apis.chembl_support.chembl_target_graphs often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
from __future__ import annotations
0 ignored issues
show
introduced by
Missing module docstring
Loading history...
2
import abc
3
import enum
4
import re
5
from dataclasses import dataclass
6
from functools import total_ordering
7
from typing import Optional, Set, Sequence, Tuple as Tup, Type
8
9
from mandos.model.apis.chembl_api import ChemblApi
10
from mandos.model.apis.chembl_support.chembl_targets import ChemblTarget, TargetType, TargetFactory
11
12
13
@dataclass(frozen=True, order=True, repr=True)
14
class TargetNode:
15
    """
16
    A target with information about how we reached it from a traversal.
17
18
    Attributes:
19
        depth: The number of steps taken to get here, with 0 for the root
20
        is_end: If there was no edge to follow from here (that we hadn't already visited)
21
        target: Our target
22
        link_reqs: The set of requirements for the link that we matched to get here
23
        origin: The parent of our target node
24
    """
25
26
    depth: int
27
    is_end: bool
28
    target: ChemblTarget
29
    link_reqs: Optional[TargetEdgeReqs]
30
    origin: Optional[TargetNode]
31
32
    @property
33
    def is_start(self) -> bool:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
34
        return self.depth == 0
35
36
37
class AbstractTargetEdgeReqs(metaclass=abc.ABCMeta):
38
    """
39
    A set of requirements for a (source, rel, dest) triple.
40
    This determines the edges we're allowed to follow in the graph.
41
    """
42
43
    def matches(
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
44
        self,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
45
        src: TargetNode,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
46
        rel_type: TargetRelType,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
47
        dest: TargetNode,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
48
    ) -> bool:
49
        raise NotImplementedError()
50
51
52
@dataclass(frozen=True, order=True, repr=True)
53
class TargetEdgeReqs(AbstractTargetEdgeReqs):
54
    """
55
    A set of requirements for a (source, rel, dest) triple.
56
    This determines the edges we're allowed to follow in the graph.
57
    """
58
59
    src_type: TargetType
60
    src_pattern: Optional[re.Pattern]
61
    rel_type: TargetRelType
62
    dest_type: TargetType
63
    dest_pattern: Optional[re.Pattern]
64
65
    @classmethod
66
    def cross(
67
        cls,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
68
        source_types: Set[TargetType],
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
69
        rel_types: Set[TargetRelType],
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
70
        dest_types: Set[TargetType],
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
71
    ) -> Set[TargetEdgeReqs]:
72
        """
73
        Returns a "cross-product" over the three types.
74
        Note that none will contain text patterns.
75
76
        Args:
77
            source_types:
78
            rel_types:
79
            dest_types:
80
81
        Returns:
82
83
        """
84
        st = set()
0 ignored issues
show
Coding Style Naming introduced by
Variable name "st" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
85
        for source in source_types:
86
            for rel in rel_types:
87
                for dest in dest_types:
88
                    st.add(
89
                        TargetEdgeReqs(
90
                            src_type=source,
91
                            src_pattern=None,
92
                            rel_type=rel,
93
                            dest_type=dest,
94
                            dest_pattern=None,
95
                        )
96
                    )
97
        return st
98
99
    def matches(
100
        self,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
101
        src: TargetNode,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
102
        rel_type: TargetRelType,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
103
        dest: TargetNode,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
104
    ) -> bool:
105
        """
106
        Determines whether a (source, rel, dest) triple matches this set of requirements.
107
        Args:
108
            src:
109
            rel_type:
110
            dest:
111
112
        Returns:
113
114
        """
115
        srcx = src.target
116
        destx = dest.target
117
        return (
118
            (
119
                self.src_pattern is None
120
                or (srcx.name is not None and self.src_pattern.fullmatch(srcx.name))
121
            )
122
            and (
123
                self.dest_pattern is None
124
                or (destx.name is not None and self.dest_pattern.fullmatch(destx.name))
125
            )
126
            and self.src_type == srcx.type
127
            and self.rel_type == rel_type
128
            and self.dest_type == destx.type
129
        )
130
131
132
class TargetRelType(enum.Enum):
133
    """
134
    A relationship between two targets.
135
136
    Types:
137
138
        - subset_of, superset_of, overlaps_with, and equivalent_to are actual types in ChEMBL.
139
        - any_link means any of the ChEMBL-defined types
140
        - self_link is an implicit link from any target to itself
141
    """
142
143
    subset_of = enum.auto()
144
    superset_of = enum.auto()
145
    overlaps_with = enum.auto()
146
    equivalent_to = enum.auto()
147
    any_link = enum.auto()
148
    self_link = enum.auto()
149
150
    @classmethod
151
    def of(cls, s: str) -> TargetRelType:
0 ignored issues
show
Coding Style Naming introduced by
Method name "of" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
introduced by
Missing function or method docstring
Loading history...
Coding Style Naming introduced by
Argument name "s" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
152
        return TargetRelType[s.replace(" ", "_").replace("-", "_").lower()]
153
154
155
@total_ordering
156
class ChemblTargetGraph(metaclass=abc.ABCMeta):
157
    # noinspection PyUnresolvedReferences
158
    """
159
    A target from ChEMBL, from the ``target`` table.
160
    ChEMBL targets form a DAG via the ``target_relation`` table using links of type "SUPERSET OF" and "SUBSET OF".
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (114/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
161
    (There are additional link types ("OVERLAPS WITH", for ex), which we are ignoring.)
162
    For some receptors the DAG happens to be a tree. This is not true in general. See the GABAA receptor, for example.
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (118/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
163
    To fetch a target, use the ``find`` factory method.
164
    """
165
166
    def __init__(self, node: TargetNode):
167
        if not isinstance(node, TargetNode):
168
            raise TypeError(f"Bad type {type(node)} for {node}")
169
        self.node = node
170
171
    def __repr__(self):
172
        return f"{self.__class__.__name__}({self.node})"
173
174
    def __str__(self):
175
        return f"{self.__class__.__name__}({self.node})"
176
177
    def __hash__(self):
178
        return hash(self.node)
179
180
    def __eq__(self, target):
181
        if not isinstance(target, ChemblTargetGraph):
182
            raise TypeError(f"Bad type {type(target)} for {target}")
183
        return self.node == target.node
184
185
    def __lt__(self, target):
186
        if not isinstance(target, ChemblTargetGraph):
187
            raise TypeError(f"Bad type {type(target)} for {target}")
188
        return self.node.__lt__(target.node)
189
190
    @classmethod
191
    def at_node(cls, target: TargetNode) -> ChemblTargetGraph:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
192
        if not isinstance(target, TargetNode):
193
            raise TypeError(f"Bad type {type(target)} for {target}")
194
        return cls(target)
195
196
    @classmethod
197
    def at_target(cls, target: ChemblTarget) -> ChemblTargetGraph:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
198
        # lie and fill in None -- we don't know because we haven't traversed
199
        if not isinstance(target, ChemblTarget):
200
            raise TypeError(f"Bad type {type(target)} for {target}")
201
        # noinspection PyTypeChecker
202
        return cls(TargetNode(0, None, target, None, None))
203
204
    @classmethod
205
    def api(cls) -> ChemblApi:
206
        """
207
208
        Returns:
209
210
        """
211
        raise NotImplementedError()
212
213
    @classmethod
214
    def factory(cls) -> TargetFactory:
215
        """
216
217
        Returns:
218
219
        """
220
        raise NotImplementedError()
221
222
    @property
223
    def target(self) -> ChemblTarget:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
224
        return self.node.target
225
226
    @property
227
    def chembl(self) -> str:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
228
        return self.target.chembl
229
230
    @property
231
    def name(self) -> Optional[str]:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
232
        return self.target.name
233
234
    @property
235
    def type(self) -> TargetType:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
236
        return self.target.type
237
238
    def links(
239
        self, rel_types: Set[TargetRelType]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
240
    ) -> Sequence[Tup[ChemblTargetGraph, TargetRelType]]:
241
        """
242
        Gets adjacent targets in the graph.
243
244
        Args:
245
            rel_types: Relationship types (e.g. "superset of") to include
246
                       If ``TargetRelType.self_link`` is included, will add a single self-link
247
248
        Returns:
249
        """
250
        api = self.__class__.api()
251
        relations = api.target_relation.filter(target_chembl_id=self.target.chembl)
252
        links = []
253
        # "subset" means "up" (it's reversed from what's on the website)
254
        for superset in relations:
255
            linked_id = superset["related_target_chembl_id"]
256
            rel_type = TargetRelType.of(superset["relationship"])
257
            if rel_type in rel_types or TargetRelType.any_link in rel_types:
258
                linked_target = self.__class__.at_target(self.factory().find(linked_id))
259
                links.append((linked_target, rel_type))
260
        # we need to add self-links separately
261
        if TargetRelType.self_link in rel_types:
262
            links.append(
263
                (self.at_target(self.factory().find(self.target.chembl)), TargetRelType.self_link)
264
            )
265
        return sorted(links)
266
267
    def traverse(self, permitting: Set[TargetEdgeReqs]) -> Set[TargetNode]:
268
        """
269
        Traverses the DAG from this node, hopping only to targets with type in the given set.
270
271
        Args:
272
            permitting: The set of target types we're allowed to follow links onto
273
274
        Returns:
275
            The targets in the set, in a breadth-first order (then sorted by CHEMBL ID)
276
            The int is the depth, starting at 0 (this protein), going to +inf for the highest ancestors
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (103/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
277
        """
278
        results: Set[TargetNode] = set()
279
        # purposely use the invalid value None for is_root
280
        # noinspection PyTypeChecker
281
        self._traverse(TargetNode(0, None, self, None, None), permitting, results)
282
        if any((x.is_end is None for x in results)):
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable x does not seem to be defined.
Loading history...
283
            raise AssertionError()
284
        return results
285
286
    @classmethod
287
    def _traverse(
288
        cls, source: TargetNode, permitting: Set[TargetEdgeReqs], results: Set[TargetNode]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
289
    ) -> None:
290
        # recursive method called from traverse
291
        # this got really complex
292
        # basically, we just want to:
293
        # for each link (relationship) to another target:
294
        # for every allowed link type (DagTargetLinkType), try:
295
        # if the link type is acceptable, add the found target and associated link type, and break
296
        # all good if we've already traversed this
297
        if source.target.chembl in {s.target.chembl for s in results}:
298
            return
299
        # find all links from ChEMBL, then filter to only the valid links
300
        # do not traverse yet -- we just want to find these links
301
        link_candidates = cls.at_node(source).links({q.rel_type for q in permitting})
302
        links = []
303
        for linked_target, rel_type in link_candidates:
304
            # try out all of the link types that could match
305
            # record ALL of the ones that matched, even for duplicate targets
306
            # that's because the caller might care about the edge type that matched, not just the dest target
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (109/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
307
            # The caller might also care about the src target
308
            for permitted in permitting:
309
                if permitted.matches(
310
                    src=source,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
311
                    rel_type=rel_type,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
312
                    dest=linked_target.node,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
313
                ):
314
                    link_type = TargetEdgeReqs(
315
                        src_type=source.target.type,
316
                        src_pattern=permitted.src_pattern,
317
                        rel_type=rel_type,
318
                        dest_type=linked_target.type,
319
                        dest_pattern=permitted.dest_pattern,
320
                    )
321
                    # purposely use the invalid value None for is_root
322
                    # noinspection PyTypeChecker
323
                    linked = TargetNode(source.depth + 1, None, linked_target, link_type, source)
324
                    links.append(linked)
325
                    # now add a self-link
326
                    # don't worry -- we'll make sure not to traverse it
327
        # now, we'll add our own (breadth-first, remember)
328
        # we know whether we're at an "end" node by whether we found any links
329
        # note that this is an invariant of the node (and permitted link types): it doesn't depend on traversal order
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (117/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
330
        is_at_end = len(links) == 0
331
        # this is BASICALLY the same as ``results.add(source)``:
332
        # the only difference is we NOW know whether we're at the end (there's nowhere to go from there)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (104/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
333
        # (we had no idea before checking all of its children)
334
        # source.origin is the parent DagTarget OF source; it's None *iff* this is the root (``self`` in ``traverse``)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (118/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
335
        final_origin_target = TargetNode(
336
            source.depth, is_at_end, source.target, source.link_reqs, source.origin
337
        )
338
        results.add(final_origin_target)
339
        # alright! now traverse on the links
340
        for link in links:
341
            # this check is needed
342
            # otherwise we can go superset --- subset --- superset ---
343
            # or just --- overlaps with --- overlaps with ---
344
            # obviously also don't traverse self-links
345
            if link not in results and link.link_reqs.rel_type is not TargetRelType.self_link:
346
                cls._traverse(link, permitting, results)
347
        # we've added: ``source``, and then each of its children (with recursion)
348
        # we're done now
349
350
351
class ChemblTargetGraphFactory:
0 ignored issues
show
introduced by
Missing class docstring
Loading history...
352
    def __init__(self, graph_type: Type[ChemblTargetGraph]):
353
        self.graph_type = graph_type
354
355
    @classmethod
356
    def create(cls, api: ChemblApi, target_factory: TargetFactory) -> ChemblTargetGraphFactory:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
357
        class CreatedChemblTargetGraph(ChemblTargetGraph):
0 ignored issues
show
introduced by
Missing class docstring
Loading history...
358
            @classmethod
359
            def api(cls) -> ChemblApi:
360
                return api
361
362
            @classmethod
363
            def factory(cls) -> TargetFactory:
364
                return target_factory
365
366
        return ChemblTargetGraphFactory(CreatedChemblTargetGraph)
367
368
    def at_node(self, target: TargetNode) -> ChemblTargetGraph:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
369
        return self.graph_type.at_node(target)
370
371
    def at_target(self, target: ChemblTarget) -> ChemblTargetGraph:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
372
        # lie and fill in None -- we don't know because we haven't traversed
373
        # noinspection PyTypeChecker
374
        return self.graph_type.at_target(target)
375
376
377
__all__ = [
378
    "TargetNode",
379
    "TargetRelType",
380
    "TargetEdgeReqs",
381
    "ChemblTargetGraph",
382
    "ChemblTargetGraphFactory",
383
]
384