Passed
Push — main ( a80564...ec3fe3 )
by Douglas
03:59
created

mandos.commands.MiscCommands.score()   A

Complexity

Conditions 1

Size

Total Lines 68
Code Lines 18

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 1
eloc 18
nop 5
dl 0
loc 68
rs 9.5
c 0
b 0
f 0

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
"""
2
Command-line interface for mandos.
3
"""
4
5
from __future__ import annotations
6
7
import re
0 ignored issues
show
Unused Code introduced by
The import re seems to be unused.
Loading history...
8
from pathlib import Path
9
from typing import Optional
10
11
import typer
0 ignored issues
show
introduced by
Unable to import 'typer'
Loading history...
12
13
from mandos.analysis import SimilarityDf
14
from mandos.analysis.concordance import ConcordanceCalculation, TauConcordanceCalculator
0 ignored issues
show
Unused Code introduced by
Unused TauConcordanceCalculator imported from mandos.analysis.concordance
Loading history...
15
from mandos.analysis.distances import JPrimeMatrixCalculator, MatrixCalculation
0 ignored issues
show
Unused Code introduced by
Unused JPrimeMatrixCalculator imported from mandos.analysis.distances
Loading history...
16
from mandos.analysis.filtration import Filtration
17
from mandos.analysis.enrichment import EnrichmentAlg, EnrichmentCalculation, ScoreDf
18
from mandos.analysis.reification import Reifier
19
from mandos.entries.common_args import Arg
20
from mandos.entries.common_args import CommonArgs as Ca
21
from mandos.entries.common_args import Opt
22
from mandos.entries.multi_searches import MultiSearch
23
from mandos.entries.searcher import SearcherUtils
24
from mandos.model import START_TIMESTAMP, MiscUtils
25
from mandos.model.hits import HitFrame
26
from mandos.model.settings import MANDOS_SETTINGS
27
from mandos.model.taxonomy_caches import TaxonomyFactories
28
29
30
class MiscCommands:
0 ignored issues
show
introduced by
Missing class docstring
Loading history...
31
    @staticmethod
32
    def search(
33
        path: Path = Ca.compounds,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
34
        config: Path = Arg.in_file(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
35
            r"""
36
            A TOML config file. See docs.
37
            """
38
        ),
39
        out_dir: Path = Ca.out_dir,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
40
    ) -> None:
41
        """
42
        Run multiple searches.
43
        """
44
        MultiSearch.build(path, out_dir, config).run()
45
46
    @staticmethod
47
    def serve(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "db" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
48
        port: int = Opt.val("A port to serve on", default=1540),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
49
        db: str = Opt.val("The name of the MySQL database", default="mandos"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
50
    ) -> None:
51
        r"""
52
        Start the REST server.
53
54
        The connection information is stored in your global settings file.
55
        """
56
57
    @staticmethod
58
    def deposit(
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
Coding Style Naming introduced by
Argument name "db" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
59
        path: Path = Ca.file_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
60
        db: str = Opt.val("The name of the MySQL database", default="mandos"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
61
        host: str = Opt.val(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
62
            "Database hostname (ignored if ``--socket`` is passed", default="127.0.0.1"
63
        ),
64
        socket: Optional[str] = Opt.val("Path to a Unix socket (if set, ``--host`` is ignored)"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
65
        user: Optional[str] = Opt.val("Database username (empty if not set)"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
66
        password: Optional[str] = Opt.val("Database password (empty if not set)"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
67
    ) -> None:
68
        r"""
69
        Export to a relational database.
70
71
        Saves data from Mandos search commands to a database for serving via REST.
72
73
        See also: ``:serve``.
74
        """
75
76
    @staticmethod
77
    def find(
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
78
        path: Path = Ca.compounds,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
79
        to: Path = Opt.out_path(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
80
            rf"""
81
            A table of compounds and their matching database IDs will be written here.
82
83
            {Ca.output_formats}
84
85
            [default: <path>-ids-<start-time>.{MANDOS_SETTINGS.default_table_suffix}]
86
            """
87
        ),
88
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
89
        pubchem: bool = typer.Option(True, help="Download data from PubChem"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
90
        chembl: bool = typer.Option(True, help="Download data from ChEMBL"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
91
        hmdb: bool = typer.Option(True, help="Download data from HMDB"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
92
        complain: bool = Opt.flag("Log each time a compound is not found"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
93
    ) -> None:
94
        r"""
95
        Fetches and caches compound data.
96
97
        Useful to check what you can see before running a search.
98
        """
99
        default = str(path) + "-ids" + START_TIMESTAMP + MANDOS_SETTINGS.default_table_suffix
100
        to = MiscUtils.adjust_filename(to, default, replace)
101
        inchikeys = SearcherUtils.read(path)
102
        df = SearcherUtils.dl(
0 ignored issues
show
Coding Style Naming introduced by
Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
103
            inchikeys, pubchem=pubchem, chembl=chembl, hmdb=hmdb, complain=complain
104
        )
105
        df.write_file(to)
106
        typer.echo(f"Wrote to {to}")
107
108
    @staticmethod
109
    def build_taxonomy(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
110
        taxa: str = Ca.taxa,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
111
        forbid: str = Opt.val(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
112
            r"""Exclude descendents of these taxa IDs or names (comma-separated).""", default=""
113
        ),
114
        to: Path = typer.Option(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
115
            None,
116
            help=rf"""
117
            Where to export a table of the taxonomy.
118
119
            {Ca.output_formats}
120
121
            [default: ./<taxa>-<datetime>.{MANDOS_SETTINGS.default_table_suffix}]
122
            """,
123
        ),
124
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
125
    ):
126
        """
127
        Exports a taxonomic tree to a table.
128
129
        Writes a taxonomy of given taxa and their descendants to a table.
130
        """
131
        concat = taxa + "-" + forbid
132
        taxa = Ca.parse_taxa(taxa)
133
        forbid = Ca.parse_taxa(forbid)
134
        default = concat + "-" + START_TIMESTAMP + MANDOS_SETTINGS.default_table_suffix
135
        to = MiscUtils.adjust_filename(to, default, replace)
136
        my_tax = TaxonomyFactories.get_smart_taxonomy(taxa, forbid)
137
        my_tax = my_tax.to_df()
138
        to.parent.mkdir(exist_ok=True, parents=True)
139
        my_tax.write_file(to)
140
141
    @staticmethod
142
    def dl_tax(
143
        taxon: int = Arg.x("The **ID** of the UniProt taxon"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
144
    ) -> None:
145
        """
146
        Preps a new taxonomy file for use in mandos.
147
        Just returns if a corresponding file already exists in the resources dir or mandos cache (``~/.mandos``).
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (113/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
148
        Otherwise, downloads a tab-separated file from UniProt.
149
        (To find manually, follow the ``All lower taxonomy nodes`` link and click ``Download``.)
150
        Then applies fixes and reduces the file size, creating a new file alongside.
151
        Puts both the raw data and fixed data in the cache under ``~/.mandos/taxonomy/``.
152
        """
153
        TaxonomyFactories.from_uniprot(MANDOS_SETTINGS.taxonomy_cache_path).load(taxon)
154
155
    @staticmethod
156
    def concat(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
157
        path: Path = Ca.input_dir,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
158
        to: Optional[Path] = Ca.to_single,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
159
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
160
    ) -> None:
161
        r"""
162
        Concatenates Mandos annotation files into one.
163
164
        Note that ``:search`` automatically performs this;
165
        this is needed only if you want to combine results from multiple independent searches.
166
        """
167
        default = path / ("concat" + MANDOS_SETTINGS.default_table_suffix)
168
        to = MiscUtils.adjust_filename(to, default, replace)
169
        for found in path.iterdir():
0 ignored issues
show
Unused Code introduced by
The variable found seems to be unused.
Loading history...
170
            pass
171
172
    @staticmethod
173
    def filter_taxa(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
174
        path: Path = Ca.file_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
175
        to: Path = Opt.out_path(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
176
            f"""
177
            An output path (file or directory).
178
179
            {Ca.output_formats}
180
181
            [default: <path>/<filters>.feather]
182
            """
183
        ),
184
        allow: str = Ca.taxa,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
185
        forbid: str = Ca.taxa,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
186
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
187
    ):
188
        """
189
        Filter by taxa.
190
191
        You can include any number of taxa to allow and any number to forbid.
192
        All descendents of the specified taxa are used.
193
        Taxa will be excluded if they fall under both.
194
195
        Note that the <path> argument *could* not be from Mandos.
196
        All that is required is a column called ``taxon``, ``taxon_id``, or ``taxon_name``.
197
198
        See also: :filter, which is more general.
199
        """
200
        concat = allow + "-" + forbid
201
        allow = Ca.parse_taxa(allow)
202
        forbid = Ca.parse_taxa(forbid)
203
        if to is None:
204
            to = path.parent / (concat + MANDOS_SETTINGS.default_table_suffix)
205
        default = str(path) + "-filter-taxa-" + concat + MANDOS_SETTINGS.default_table_suffix
206
        to = MiscUtils.adjust_filename(to, default, replace)
207
        df = HitFrame.read_file(path)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
208
        my_tax = TaxonomyFactories.get_smart_taxonomy(allow, forbid)
209
        cols = [c for c in ["taxon", "taxon_id", "taxon_name"] if c in df.columns]
210
211
        def permit(row) -> bool:
212
            return any((my_tax.get_by_id_or_name(getattr(row, c)) is not None for c in cols))
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable c does not seem to be defined.
Loading history...
213
214
        df = df[df.apply(permit)]
0 ignored issues
show
Coding Style Naming introduced by
Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
215
        df.write_file(to)
216
217
    @staticmethod
218
    def filter(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "by" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
219
        path: Path = Ca.to_single,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
220
        by: Optional[Path] = Arg.in_file(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
221
            """
222
            The path to a TOML (.toml) file containing filters.
223
224
            The file contains a list of ``mandos.filter`` keys,
225
            each containing an expression on a single column.
226
            This is only meant for simple, quick-and-dirty filtration.
227
228
            See the docs for more info.
229
            """
230
        ),
231
        to: Optional[Path] = Ca.to_single,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
232
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
233
    ) -> None:
234
        """
235
        Filters by simple expressions.
236
        """
237
        if to is None:
238
            to = path.parent / (by.stem + MANDOS_SETTINGS.default_table_suffix)
239
        default = str(path) + "-filter-" + by.stem + MANDOS_SETTINGS.default_table_suffix
240
        to = MiscUtils.adjust_filename(to, default, replace)
241
        df = HitFrame.read_file(path)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
242
        Filtration.from_file(by).apply(df).write_file(to)
243
244
    @staticmethod
245
    def state(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
246
        path: Path = Ca.file_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
247
        to: Optional[Path] = Opt.out_path(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
248
            """
249
            The path to the output file.
250
251
            Valid formats and filename suffixes are .nt and .txt with an optional .gz, .zip, or .xz.
252
            If only a filename suffix is provided, will use that suffix with the default directory.
253
            If no suffix is provided, will interpret the path as a directory but use the default filename.
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (106/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
254
            Will fail if the file exists and ``--replace`` is not set.
255
256
            [default: <path>-statements.nt]
257
        """
258
        ),
259
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
260
    ) -> None:
261
        """
262
        Outputs simple N-triples statements.
263
264
        Each statement is of this form, where the InChI Key refers to the input data:
265
266
        `"InChI Key" "predicate" "object" .`
267
        """
268
        default = str(path) + "-statements.nt"
269
        to = MiscUtils.adjust_filename(to, default, replace)
270
        hits = HitFrame.read_file(path).to_hits()
271
        with to.open() as f:
0 ignored issues
show
Coding Style Naming introduced by
Variable name "f" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
272
            for hit in hits:
273
                f.write(hit.to_triple.n_triples)
274
275
    @staticmethod
276
    def reify(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
277
        path: Path = Ca.file_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
278
        to: Optional[Path] = Opt.out_path(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
279
            r"""
280
            The path to the output file.
281
282
            The filename suffix should be either .nt (N-triples) or .ttl (Turtle),
283
            with an optional .gz, .zip, or .xz.
284
            If only a filename suffix is provided, will use that suffix with the default directory.
285
            If no suffix is provided, will interpret the path as a directory but use the default filename.
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (106/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
286
            Will fail if the file exists and ``--replace`` is not set.
287
288
            [default: <path>-reified.nt]
289
        """
290
        ),
291
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
292
    ) -> None:
293
        """
294
        Outputs reified semantic triples.
295
        """
296
        default = str(path) + "-reified.nt"
297
        to = MiscUtils.adjust_filename(to, default, replace)
298
        hits = HitFrame.read_file(path).to_hits()
299
        with to.open() as f:
0 ignored issues
show
Coding Style Naming introduced by
Variable name "f" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
300
            for triple in Reifier().reify(hits):
301
                f.write(triple.n_triples)
302
303
    @staticmethod
304
    def copy(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
305
        path: Path = Ca.file_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
306
        to: Optional[Path] = Opt.out_path(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
307
            rf"""
308
            The path to the output file.
309
310
            {Ca.output_formats}
311
312
            [default: <path.parent>/export{MANDOS_SETTINGS.default_table_suffix}]
313
        """
314
        ),
315
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
316
    ) -> None:
317
        """
318
        Copies and/or converts annotation files.
319
320
        Example: ``:export:copy --to .snappy`` to highly compress a data set.
321
        """
322
323
        default = str(path.parent / MANDOS_SETTINGS.default_table_suffix)
324
        to = MiscUtils.adjust_filename(to, default, replace)
325
326
    @staticmethod
327
    def alpha(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
328
        path: Path = Ca.file_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
329
        scores: Path = Ca.alpha_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
330
        to: Optional[Path] = Ca.alpha_to,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
331
        algorithm: Optional[str] = Opt.val(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
332
            rf"""
333
            Algorithm to use.
334
335
            Will be applied to all scores / columns.
336
            Allowed values:
337
338
            {Ca.definition_list({a.name: a.description for a in EnrichmentAlg})}
339
            """,
340
            default="alpha",
341
        ),
342
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
343
    ) -> None:
344
        """
345
        Compares annotations to user-supplied values.
346
347
        Calculates correlation between provided scores and object/predicate pairs.
348
349
        See the docs for more info.
350
        """
351
        default = str(path) + "-" + scores.name + MANDOS_SETTINGS.default_table_suffix
352
        to = MiscUtils.adjust_filename(to, default, replace)
353
        hits = HitFrame.read_file(path)
354
        scores = ScoreDf.read_file(scores)
355
        calculator = EnrichmentCalculation.create(algorithm)
356
        df = calculator.calc_many(hits, scores)
0 ignored issues
show
Coding Style Naming introduced by
Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
357
        df.write_file(to)
358
359
    @staticmethod
360
    def beta(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
361
        path: Path = Ca.file_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
362
        scores: Path = Ca.beta_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
363
        how: bool = Opt.val(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
Unused Code introduced by
The argument how seems to be unused.
Loading history...
364
            r"""
365
            Determines whether the resulting rows mark single predicate/object pairs,
366
            or sets of pairs.
367
368
            **If "choose"**, decides whether to use intersection or union based on the search type.
369
            For example, ``chembl:mechanism`` use the intersection,
370
            while most others will use the union.
371
372
            **If "intersection"**, each compound will contribute to a single row
373
            for its associated set of pairs.
374
            For example, a compound annotated for ``increase dopamine`` and ``decrease serotonin``
375
            increment the count for a single row:
376
            object ``["dopamine", "serotonin"]`` and predicate ``["increase", "decrease"]``.
377
            (Double quotes will be escaped.)
378
379
            **If "union"**, each compound will contribute to one row per associated pair.
380
            In the above example, the compound will increment the counts
381
            of two rows: object=``dopamine`` / predicate=``increase``
382
            and ``object=serotonin`` and predicate=``decrease``.
383
384
            In general, this flag is useful for variables in which:
385
386
            - A *set of pairs* best is needed to describe a compound, AND
387
388
            - There are likely to be relatively few unique predicate/object pairs.
389
390
            For example, binding to a hand-selected list of 20 targets with high confidence
391
            may allow for multipharmacology. However, co-mentions of genes will likely result
392
            in a very large number of unique rows.
393
        """
394
        ),
395
        to: Optional[Path] = Ca.beta_to,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
396
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
397
    ) -> None:
398
        """
399
        Compares annotations for hits and non-hits.
400
401
        This is a very simple function.
402
        For each object/predicate pair, counts the annotations for hits and annotations for non-hits.
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (101/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
403
404
        See the docs for more info.
405
        """
406
        default = str(path) + "-" + scores.name + MANDOS_SETTINGS.default_table_suffix
407
        to = MiscUtils.adjust_filename(to, default, replace)
408
        hits = HitFrame.read_file(path)
0 ignored issues
show
Unused Code introduced by
The variable hits seems to be unused.
Loading history...
409
        scores = ScoreDf.read_file(scores)
410
        # calculator = EnrichmentCalculation.create(algorithm)
411
        # df = calculator.calc_many(hits, scores)
412
        # df.write_file(to)
413
414
    @staticmethod
415
    def psi(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
416
        path: Path = Ca.file_input,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
417
        algorithm: str = Opt.val(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
418
            r"""
419
            The algorithm for calculating similarity between annotation sets.
420
421
            Currently, only "j" (J') is supported. Refer to the docs for the equation.
422
            """,
423
            default="j",
424
        ),
425
        to: Optional[Path] = Opt.out_file(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
426
            rf"""
427
            The path to a similarity matrix file.
428
429
            {Ca.output_formats}
430
431
            [default: <input-path.parent>/<algorithm>-similarity.{MANDOS_SETTINGS.default_table_suffix}]
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (104/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
432
            """
433
        ),
434
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
435
    ) -> None:
436
        r"""
437
        Calculates a similarity matrix from annotations.
438
439
        The data are output as a dataframe (CSV by default), where rows and columns correspond
440
        to compounds, and the cell i,j is the overlap J' in annotations between compounds i and j.
441
        """
442
        default = path.parent / (algorithm + MANDOS_SETTINGS.default_table_suffix)
443
        to = MiscUtils.adjust_filename(to, default, replace)
444
        hits = HitFrame.read_file(path).to_hits()
445
        calculator = MatrixCalculation.create(algorithm)
446
        matrix = calculator.calc_all(hits)
447
        matrix.write_file(to)
448
449
    @staticmethod
450
    def tau(
0 ignored issues
show
best-practice introduced by
Too many arguments (9/5)
Loading history...
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
451
        phi_matrix: Path = Ca.input_matrix,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
452
        psi_matrix: Path = Ca.input_matrix,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
453
        algorithm: str = Opt.val(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
454
            r"""
455
            The algorithm for calculating concordance.
456
457
            Currently, only "tau" is supported.
458
            This calculation is a modified Kendall’s  τ-a, where disconcordant ignores ties.
459
            See the docs for more info.
460
            """,
461
            default="tau",
462
        ),
463
        phi: str = Opt.val("A name for phi", default="phi"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
464
        psi: str = Opt.val("A name for psi", default="psi"),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
465
        seed: int = Ca.seed,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
466
        samples: int = Ca.n_samples,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
467
        to: Optional[Path] = Opt.out_file(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
468
            rf"""
469
            The path to a dataframe file for output.
470
471
            {Ca.output_formats}
472
473
            [default: <input-path.parent>/<algorithm>-concordance.{MANDOS_SETTINGS.default_table_suffix}]
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (105/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
474
            """
475
        ),
476
        replace: bool = Ca.replace,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
477
    ) -> None:
478
        r"""
479
        Calculate correlation between matrices.
480
481
        Values are calculated over bootstrap, outputting a dataframe (CSV by default).
482
483
        Phi is typically a phenotypic matrix, and psi a matrix from Mandos.
484
        Alternatively, these might be two matrices from Mandos.
485
486
        This command is designed to calculate the similarity between compound annotations
487
        (from Mandos) and some user-input compound–compound similarity matrix.
488
        (For example, vectors from a high-content cell screen.
489
        See ``:calc:score`` if you have a single variable,
490
        such as a hit or lead-like score.
491
        """
492
        if to is None:
493
            to = phi_matrix.parent / (
494
                psi_matrix.stem + "-" + algorithm + MANDOS_SETTINGS.default_table_suffix
495
            )
496
        if to.exists() and not replace:
497
            raise FileExistsError(f"File {to} already exists")
498
        phi_matrix = SimilarityDf.read_file(phi_matrix)
0 ignored issues
show
Bug introduced by
The Class _SpecialForm does not seem to have a member named read_file.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
499
        psi_matrix = SimilarityDf.read_file(psi_matrix)
0 ignored issues
show
Bug introduced by
The Class _SpecialForm does not seem to have a member named read_file.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
500
        calculator = ConcordanceCalculation.create(algorithm, phi, psi, samples, seed)
501
        concordance = calculator.calc(phi_matrix, psi_matrix)
502
        concordance.write_file(to)
503
504
    @staticmethod
505
    def plot_umap(
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
506
        psi_matrix: Path = Ca.input_matrix,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
507
        colors: Optional[Path] = Ca.colors,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
508
        markers: Optional[Path] = Ca.markers,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
509
        color_col: Optional[str] = Ca.color_col,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
510
        marker_col: Optional[str] = Ca.marker_col,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
511
        cols: int = Opt.val("""The number of columns to use (before going down a row)"""),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
512
        to: Optional[Path] = Ca.plot_to,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
513
    ) -> None:
514
        r"""
515
        Plot UMAP of psi matrices.
516
517
        The input will probably be calculated from ``:calc:matrix``.
518
519
        Will plot each variable (psi) over a grid.
520
        """
521
522
    @staticmethod
523
    def plot_pairing_scatter(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
524
        path: Path = Ca.input_matrix,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
525
        to: Optional[Path] = Ca.plot_to,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
526
    ) -> None:
527
        r"""
528
        Plots scatter plots of phi against psi.
529
530
        Plots scatter plots of (phi, psi) values, sorted by phi values.
531
532
        For each unique phi matrix and psi matrix, flattens the matrices and plots
533
        the flattened (n choose 2 - n) pairs of each jointly, phi mapped to the x-axis
534
        and psi mapped to the y-axis.
535
536
        Will plot each (phi, psi) pair over a grid, one plot per cell:
537
        One row per phi and one column per psi.
538
        """
539
540
    @staticmethod
541
    def plot_pairing_violin(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
542
        path: Path = Ca.input_matrix,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
543
        split: bool = Opt.flag(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
544
            r"""
545
            Split each violin into phi_1 on the left and phi_2 on the right.
546
547
            Useful to compare two phi variables. Requires exactly 2.
548
            """
549
        ),
550
        to: Optional[Path] = Ca.plot_to,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
551
    ) -> None:
552
        r"""
553
        Plots violin plots from data generated by ``:calc:matrix-tau``.
554
555
        Will plot each (phi, psi) pair over a grid, one row per phi and one column per psi
556
        (unless ``--split`` is set).
557
        """
558
559
    @staticmethod
560
    def plot_score_correlation(
0 ignored issues
show
Coding Style Naming introduced by
Argument name "to" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
561
        path: Path = Ca.input_matrix,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
562
        to: Optional[Path] = Ca.plot_to,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
563
    ) -> None:
564
        r"""
565
        Plots violin plots from data generated by ``:calc:matrix-tau``.
566
567
        Will plot (phi, psi) pairs over a grid, one row per phi and one column per psi.
568
        """
569
570
571
__all__ = ["MiscCommands"]
572