Passed
Push — main ( 5ce644...da77b5 )
by Douglas
01:50
created

mandos.entry.searchers.Searcher._search_one()   C

Complexity

Conditions 9

Size

Total Lines 51
Code Lines 38

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 9
eloc 38
nop 4
dl 0
loc 51
rs 6.6346
c 0
b 0
f 0

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
"""
2
Run searches and write files.
3
"""
4
5
from __future__ import annotations
6
7
from pathlib import Path
8
from typing import Optional, Sequence
9
10
from pocketutils.core.dot_dict import NestedDotDict
0 ignored issues
show
introduced by
Unable to import 'pocketutils.core.dot_dict'
Loading history...
11
from pocketutils.core.exceptions import IllegalStateError
0 ignored issues
show
introduced by
Unable to import 'pocketutils.core.exceptions'
Loading history...
12
from pocketutils.tools.common_tools import CommonTools
0 ignored issues
show
introduced by
Unable to import 'pocketutils.tools.common_tools'
Loading history...
13
from typeddfs import TypedDfs
0 ignored issues
show
introduced by
Unable to import 'typeddfs'
Loading history...
14
15
from mandos.model import CompoundNotFoundError
16
from mandos.model.hit_dfs import HitDf
17
from mandos.model.hits import AbstractHit
18
from mandos.model.search_caches import SearchCache
19
from mandos.model.searches import Search, SearchError
20
from mandos.model.settings import SETTINGS
21
from mandos.model.utils.setup import logger
22
23
24
def _fix_cols(df):
0 ignored issues
show
Coding Style Naming introduced by
Argument name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
25
    return df.rename(columns={s: s.lower() for s in df.columns})
26
27
28
InputCompoundsDf = (
29
    TypedDfs.typed("InputCompoundsDf")
30
    .require("inchikey")
31
    .reserve("inchi", "smiles", "compound_id", dtype=str)
32
    .post(_fix_cols)
33
    .strict(cols=False)
34
    .secure()
35
).build()
36
37
38
class Searcher:
39
    """
40
    Executes one or more searches and saves the results.
41
    Create and use once.
42
    """
43
44
    def __init__(self, searches: Sequence[Search], to: Sequence[Path], input_path: Path):
45
        self.what = searches
46
        self.input_path: Optional[Path] = input_path
47
        self.input_df: InputCompoundsDf = None
48
        self.output_paths = {what.key: path for what, path in CommonTools.zip_list(searches, to)}
49
50
    def search(self) -> Searcher:
51
        """
52
        Performs the search, and writes data.
53
        """
54
        if self.input_df is not None:
55
            raise IllegalStateError(f"Already ran a search")
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
56
        self.input_df = InputCompoundsDf.read_file(self.input_path)
57
        logger.info(f"Read {len(self.input_df)} input compounds")
58
        inchikeys = self.input_df["inchikey"].unique()
59
        for what in self.what:
60
            output_path = self.output_paths[what.key]
61
            self._search_one(what, inchikeys, output_path)
62
        return self
63
64
    def _search_one(self, search: Search, inchikeys: Sequence[str], path: Path) -> None:
65
        """
66
        Loops over every compound and calls ``find``.
67
        Comes with better logging.
68
        Writes a logging ERROR for each compound that was not found.
69
70
        Args:
71
            inchikeys: A list of InChI key strings
72
            path: Path to write to
73
        """
74
        logger.info(f"Will save every {SETTINGS.save_every} compounds")
75
        logger.info(f"Writing {search.key} to {path}")
76
        annotes = []
77
        compounds_run = set()
78
        cache = SearchCache(path, inchikeys)
79
        self._save_metadata(path, search)
80
        while True:
81
            try:
82
                compound = cache.next()
0 ignored issues
show
Bug introduced by
cache.next does not seem to be callable.
Loading history...
83
            except StopIteration:
84
                break
85
            try:
86
                with logger.contextualize(compound=compound):
87
                    x = search.find(compound)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "x" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
88
                annotes.extend(x)
89
            except CompoundNotFoundError:
90
                logger.info(f"Compound {compound} not found for {search.key}")
91
                x = []
0 ignored issues
show
Coding Style Naming introduced by
Variable name "x" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
92
            except Exception:
93
                raise SearchError(
94
                    f"Failed {search.key} [{search.search_class}] on compound {compound}",
95
                    compound=compound,
96
                    search_key=search.key,
97
                    search_class=search.search_class,
98
                )
99
            compounds_run.add(compound)
100
            logger.debug(f"Found {len(x)} {search.search_name()} annotations for {compound}")
101
            # logging, caching, and such:
102
            on_nth = cache.at % SETTINGS.save_every == SETTINGS.save_every - 1
103
            is_last = cache.at == len(inchikeys) - 1
104
            if on_nth or is_last:
105
                logger.log(
106
                    "NOTICE" if is_last else "INFO",
107
                    f"Found {len(annotes)} {search.search_name()} annotations"
108
                    + f" for {cache.at} of {len(inchikeys)} compounds",
109
                )
110
                self._save_annotations(annotes, path, done=is_last)
111
            cache.save(*compounds_run)  # CRITICAL -- do this AFTER saving
112
        # done!
113
        cache.kill()
114
        logger.info(f"Wrote {search.key} to {path}")
115
116
    def _save_annotations(self, hits: Sequence[AbstractHit], output_path: Path, *, done: bool):
117
        df = HitDf.from_hits(hits)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
118
        # keep all of the original extra columns from the input
119
        # e.g. if the user had 'inchi' or 'smiles' or 'pretty_name'
120
        for extra_col in [c for c in self.input_df.columns if c != "inchikey"]:
121
            extra_mp = self.input_df.set_index("inchikey")[extra_col].to_dict()
122
            df[extra_col] = df["origin_inchikey"].map(extra_mp.get)
123
        # write the file
124
        df.write_file(output_path, mkdirs=True, dir_hash=done)
125
126
    def _save_metadata(self, output_path: Path, search: Search):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
127
        metadata_path = output_path.with_suffix(".metadata.json")
128
        params = {k: str(v) for k, v in search.get_params().items() if k not in {"key", "api"}}
129
        metadata = NestedDotDict(dict(key=search.key, search=search.search_class, params=params))
130
        metadata.write_json(metadata_path, indent=True)
131
132
133
__all__ = ["Searcher", "InputCompoundsDf"]
134