Passed
Push — main ( da77b5...65730f )
by Douglas
02:28
created

Searcher._search_one()   C

Complexity

Conditions 9

Size

Total Lines 53
Code Lines 39

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 9
eloc 39
nop 4
dl 0
loc 53
rs 6.6106
c 0
b 0
f 0

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
"""
2
Run searches and write files.
3
"""
4
5
from __future__ import annotations
6
7
from pathlib import Path
8
from typing import Optional, Sequence
9
10
from pocketutils.core.dot_dict import NestedDotDict
0 ignored issues
show
introduced by
Unable to import 'pocketutils.core.dot_dict'
Loading history...
11
from pocketutils.core.exceptions import IllegalStateError
0 ignored issues
show
introduced by
Unable to import 'pocketutils.core.exceptions'
Loading history...
12
from pocketutils.tools.common_tools import CommonTools
0 ignored issues
show
introduced by
Unable to import 'pocketutils.tools.common_tools'
Loading history...
13
from typeddfs import TypedDfs
0 ignored issues
show
introduced by
Unable to import 'typeddfs'
Loading history...
14
from typeddfs.checksums import Checksums
0 ignored issues
show
introduced by
Unable to import 'typeddfs.checksums'
Loading history...
15
16
from mandos import logger
17
from mandos.model.hit_dfs import HitDf
18
from mandos.model.hits import AbstractHit
19
from mandos.model.search_caches import SearchCache
20
from mandos.model.searches import Search, SearchError
21
from mandos.model.settings import SETTINGS
22
from mandos.model.utils import CompoundNotFoundError
23
24
25
def _fix_cols(df):
0 ignored issues
show
Coding Style Naming introduced by
Argument name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
26
    return df.rename(columns={s: s.lower() for s in df.columns})
27
28
29
InputCompoundsDf = (
30
    TypedDfs.typed("InputCompoundsDf")
31
    .require("inchikey")
32
    .reserve("inchi", "smiles", "compound_id", dtype=str)
33
    .post(_fix_cols)
34
    .strict(cols=False)
35
    .secure()
36
).build()
37
38
39
class Searcher:
40
    """
41
    Executes one or more searches and saves the results.
42
    Create and use once.
43
    """
44
45
    def __init__(self, searches: Sequence[Search], to: Sequence[Path], input_path: Path):
46
        self.what = searches
47
        self.input_path: Optional[Path] = input_path
48
        self.input_df: InputCompoundsDf = None
49
        self.output_paths = {what.key: path for what, path in CommonTools.zip_list(searches, to)}
50
51
    def search(self) -> Searcher:
52
        """
53
        Performs the search, and writes data.
54
        """
55
        if self.input_df is not None:
56
            raise IllegalStateError(f"Already ran a search")
0 ignored issues
show
introduced by
Using an f-string that does not have any interpolated variables
Loading history...
57
        self.input_df = InputCompoundsDf.read_file(self.input_path)
58
        logger.info(f"Read {len(self.input_df)} input compounds")
59
        inchikeys = self.input_df["inchikey"].unique()
60
        for what in self.what:
61
            output_path = self.output_paths[what.key]
62
            self._search_one(what, inchikeys, output_path)
63
        return self
64
65
    def _search_one(self, search: Search, inchikeys: Sequence[str], path: Path) -> None:
66
        """
67
        Loops over every compound and calls ``find``.
68
        Comes with better logging.
69
        Writes a logging ERROR for each compound that was not found.
70
71
        Args:
72
            inchikeys: A list of InChI key strings
73
            path: Path to write to
74
        """
75
        logger.info(f"Will save every {SETTINGS.save_every} compounds")
76
        logger.info(f"Writing {search.key} to {path}")
77
        annotes = []
78
        compounds_run = set()
79
        cache = SearchCache(path, inchikeys)
80
        # refresh so we know it's (no longer) complete
81
        Checksums.delete_dir_hashes(Checksums.get_hash_dir(path), [path], missing_ok=True)
82
        self._save_metadata(path, search)
83
        while True:
84
            try:
85
                compound = cache.next()
0 ignored issues
show
Bug introduced by
cache.next does not seem to be callable.
Loading history...
86
            except StopIteration:
87
                break
88
            try:
89
                with logger.contextualize(compound=compound):
90
                    x = search.find(compound)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "x" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
91
                annotes.extend(x)
92
            except CompoundNotFoundError:
93
                logger.info(f"Compound {compound} not found for {search.key}")
94
                x = []
0 ignored issues
show
Coding Style Naming introduced by
Variable name "x" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
95
            except Exception:
96
                raise SearchError(
97
                    f"Failed {search.key} [{search.search_class}] on compound {compound}",
98
                    compound=compound,
99
                    search_key=search.key,
100
                    search_class=search.search_class,
101
                )
102
            compounds_run.add(compound)
103
            logger.debug(f"Found {len(x)} {search.search_name()} annotations for {compound}")
104
            # logging, caching, and such:
105
            on_nth = cache.at % SETTINGS.save_every == SETTINGS.save_every - 1
106
            is_last = cache.at == len(inchikeys) - 1
107
            if on_nth or is_last:
108
                logger.log(
109
                    "NOTICE" if is_last else "INFO",
110
                    f"Found {len(annotes)} {search.search_name()} annotations"
111
                    + f" for {cache.at} of {len(inchikeys)} compounds",
112
                )
113
                self._save_annotations(annotes, path, done=is_last)
114
            cache.save(*compounds_run)  # CRITICAL -- do this AFTER saving
115
        # done!
116
        cache.kill()
117
        logger.info(f"Wrote {search.key} to {path}")
118
119
    def _save_annotations(self, hits: Sequence[AbstractHit], output_path: Path, *, done: bool):
120
        df = HitDf.from_hits(hits)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
121
        # keep all of the original extra columns from the input
122
        # e.g. if the user had 'inchi' or 'smiles' or 'pretty_name'
123
        for extra_col in [c for c in self.input_df.columns if c != "inchikey"]:
124
            extra_mp = self.input_df.set_index("inchikey")[extra_col].to_dict()
125
            df[extra_col] = df["origin_inchikey"].map(extra_mp.get)
126
        # write the file
127
        df = HitDf.of(df)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
128
        df.write_file(output_path, mkdirs=True, dir_hash=done)
129
130
    def _save_metadata(self, output_path: Path, search: Search):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
131
        metadata_path = output_path.with_suffix(".metadata.json")
132
        params = {k: str(v) for k, v in search.get_params().items() if k not in {"key", "api"}}
133
        metadata = NestedDotDict(dict(key=search.key, search=search.search_class, params=params))
134
        metadata.write_json(metadata_path, indent=True)
135
136
137
__all__ = ["Searcher", "InputCompoundsDf"]
138