ocrd.cli.resmgr.download()   F
last analyzed

Complexity

Conditions 31

Size

Total Lines 117
Code Lines 92

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
eloc 92
dl 0
loc 117
rs 0
c 0
b 0
f 0
cc 31
nop 9

How to fix   Long Method    Complexity    Many Parameters   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like ocrd.cli.resmgr.download() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

Many Parameters

Methods with many parameters are not only hard to understand, but their parameters also often become inconsistent when you need more, or different data.

There are several approaches to avoid long parameter lists:

1
"""
2
OCR-D CLI: management of processor resources
3
4
.. click:: ocrd.cli.resmgr:resmgr_cli
5
    :prog: ocrd resmgr
6
    :nested: full
7
"""
8
import sys
9
from pathlib import Path
10
from shutil import which
11
from yaml import safe_load, safe_dump
12
13
import requests
14
import click
15
16
from ocrd_utils import (
17
    directory_size,
18
    getLogger,
19
    get_moduledir,
20
    get_ocrd_tool_json,
21
    initLogging,
22
    RESOURCE_LOCATIONS,
23
)
24
from ocrd.constants import RESOURCE_USER_LIST_COMMENT
25
26
from ..resource_manager import OcrdResourceManager
27
28
29
def print_resources(executable, reslist, resmgr):
30
    print(f"{executable}")
31
    for resdict in reslist:
32
        res_loc = resmgr.resource_dir_to_location(resdict['path']) if 'path' in resdict else ''
33
        print(f"- {resdict['name']} @ {res_loc} ({resdict['url']})\n  {resdict['description']}")
34
    print()
35
36
37
@click.group("resmgr")
38
def resmgr_cli():
39
    """
40
    Managing processor resources
41
    """
42
    initLogging()
43
44
45
@resmgr_cli.command('list-available')
46
@click.option('-D', '--no-dynamic', is_flag=True, default=False,
47
              help="Whether to skip looking into each processor's --dump-{json,module-dir} for module-level resources")
48
@click.option('-e', '--executable', metavar='EXEC', default='ocrd-*',
49
              help='Show only resources for executable beginning with EXEC', )
50
def list_available(executable, no_dynamic):
51
    """
52
    List available resources
53
    """
54
    resmgr = OcrdResourceManager()
55
    for executable, reslist in resmgr.list_available(executable=executable, dynamic=not no_dynamic):
56
        print_resources(executable, reslist, resmgr)
57
58
59
@resmgr_cli.command('list-installed')
60
@click.option('-e', '--executable', help='Show only resources for executable EXEC', metavar='EXEC')
61
def list_installed(executable=None):
62
    """
63
    List installed resources
64
    """
65
    resmgr = OcrdResourceManager()
66
    for executable, reslist in resmgr.list_installed(executable):
67
        print_resources(executable, reslist, resmgr)
68
69
70
@resmgr_cli.command('download')
71
@click.option('-n', '--any-url', default='', help='URL of unregistered resource to download/copy from')
72
@click.option('-D', '--no-dynamic', default=False, is_flag=True,
73
              help="Whether to skip looking into each processor's --dump-{json,module-dir} for module-level resources")
74
@click.option('-t', '--resource-type', type=click.Choice(['file', 'directory', 'archive']), default='file',
75
              help='Type of resource',)
76
@click.option('-P', '--path-in-archive', default='.', help='Path to extract in case of archive type')
77
@click.option('-a', '--allow-uninstalled', is_flag=True,
78
              help="Allow installing resources for uninstalled processors",)
79
@click.option('-o', '--overwrite', help='Overwrite existing resources', is_flag=True)
80
@click.option('-l', '--location', type=click.Choice(RESOURCE_LOCATIONS),
81
              help="Where to store resources - defaults to first location in processor's 'resource_locations' "
82
                   "list or finally 'data'")
83
@click.argument('executable', required=True)
84
@click.argument('name', required=False)
85
def download(any_url, no_dynamic, resource_type, path_in_archive, allow_uninstalled, overwrite, location, executable,
86
             name):
87
    """
88
    Download resource NAME for processor EXECUTABLE.
89
90
    NAME is the name of the resource made available by downloading or copying.
91
92
    If NAME is '*' (asterisk), then download all known registered resources for this processor.
93
94
    If ``--any-url=URL`` or ``-n URL`` is given, then URL is accepted regardless of registered resources for ``NAME``.
95
    (This can be used for unknown resources or for replacing registered resources.)
96
97
    If ``--resource-type`` is set to `archive`, then that archive gets unpacked after download,
98
    and its ``--path-in-archive`` will subsequently be renamed to NAME.
99
    """
100
    log = getLogger('ocrd.cli.resmgr')
101
    resmgr = OcrdResourceManager()
102
    if executable != '*' and not name:
103
        log.error(f"Unless EXECUTABLE ('{executable}') is the '*' wildcard, NAME is required")
104
        sys.exit(1)
105
    elif executable == '*':
106
        executable = None
107
    if name == '*':
108
        name = None
109
    is_url = (any_url.startswith('https://') or any_url.startswith('http://')) if any_url else False
110
    is_filename = Path(any_url).exists() if any_url else False
111
    if executable and not which(executable):
112
        if not allow_uninstalled:
113
            log.error(f"Executable '{executable}' is not installed. "
114
                      f"To download resources anyway, use the -a/--allow-uninstalled flag")
115
            sys.exit(1)
116
        else:
117
            log.info(f"Executable '{executable}' is not installed, but downloading resources anyway")
118
    reslist = resmgr.list_available(executable=executable, dynamic=not no_dynamic, name=name)
119
    if not any(r[1] for r in reslist):
120
        log.info(f"No resources {name} found in registry for executable {executable}")
121
        if executable and name:
122
            reslist = [(executable, [{
123
                'url': any_url or '???',
124
                'name': name,
125
                'type': resource_type,
126
                'path_in_archive': path_in_archive}]
127
            )]
128
    for this_executable, this_reslist in reslist:
129
        for resdict in this_reslist:
130
            if 'size' in resdict:
131
                registered = "registered"
132
            else:
133
                registered = "unregistered"
134
            if any_url:
135
                resdict['url'] = any_url
136
            if resdict['url'] == '???':
137
                log.warning(f"Cannot download user resource {resdict['name']}")
138
                continue
139
            if resdict['url'].startswith('https://') or resdict['url'].startswith('http://'):
140
                log.info(f"Downloading {registered} resource '{resdict['name']}' ({resdict['url']})")
141
                if 'size' not in resdict:
142
                    with requests.head(resdict['url']) as r:
143
                        resdict['size'] = int(r.headers.get('content-length', 0))
144
            else:
145
                log.info(f"Copying {registered} resource '{resdict['name']}' ({resdict['url']})")
146
                urlpath = Path(resdict['url'])
147
                resdict['url'] = str(urlpath.resolve())
148
                if Path(urlpath).is_dir():
149
                    resdict['size'] = directory_size(urlpath)
150
                else:
151
                    resdict['size'] = urlpath.stat().st_size
152
            if not location:
153
                location = get_ocrd_tool_json(this_executable)['resource_locations'][0]
154
            elif location not in get_ocrd_tool_json(this_executable)['resource_locations']:
155
                log.error(f"The selected --location {location} is not in the {this_executable}'s resource search path, "
156
                          f"refusing to install to invalid location")
157
                sys.exit(1)
158
            if location != 'module':
159
                basedir = resmgr.location_to_resource_dir(location)
160
            else:
161
                basedir = get_moduledir(this_executable)
162
                if not basedir:
163
                    basedir = resmgr.location_to_resource_dir('data')
164
165
            try:
166
                with click.progressbar(length=resdict['size']) as bar:
167
                    fpath = resmgr.download(
168
                        this_executable,
169
                        resdict['url'],
170
                        basedir,
171
                        name=resdict['name'],
172
                        resource_type=resdict.get('type', resource_type),
173
                        path_in_archive=resdict.get('path_in_archive', path_in_archive),
174
                        overwrite=overwrite,
175
                        no_subdir=location in ['cwd', 'module'],
176
                        progress_cb=lambda delta: bar.update(delta)
177
                    )
178
                if registered == 'unregistered':
179
                    log.info(f"{this_executable} resource '{name}' ({any_url}) not a known resource, creating stub "
180
                             f"in {resmgr.user_list}'")
181
                    resmgr.add_to_user_database(this_executable, fpath, url=any_url)
182
                resmgr.save_user_list()
183
                log.info(f"Installed resource {resdict['url']} under {fpath}")
184
            except FileExistsError as exc:
185
                log.info(str(exc))
186
            log.info(f"Use in parameters as "
187
                     f"'{resmgr.parameter_usage(resdict['name'], usage=resdict.get('parameter_usage', 'as-is'))}'")
188
189
190
@resmgr_cli.command('migrate')
191
@click.argument('migration', type=click.Choice(['2.37.0']))
192
def migrate(migration):
193
    """
194
    Update the configuration after updating core to MIGRATION
195
    """
196
    resmgr = OcrdResourceManager(skip_init=True)
197
    log = getLogger('ocrd.resmgr.migrate')
198
    if not resmgr.user_list.exists():
199
        log.info(f'No configuration file found at {resmgr.user_list}, nothing to do')
200
    if migration == '2.37.0':
201
        backup_file = resmgr.user_list.with_suffix(f'.yml.before-{migration}')
202
        yaml_in_str = resmgr.user_list.read_text()
203
        log.info(f'Backing {resmgr.user_list} to {backup_file}')
204
        backup_file.write_text(yaml_in_str)
205
        log.info(f'Applying migration {migration} to {resmgr.user_list}')
206
        yaml_in = safe_load(yaml_in_str)
207
        yaml_out = {}
208
        for executable, reslist_in in yaml_in.items():
209
            yaml_out[executable] = []
210
            for resdict_in in reslist_in:
211
                resdict_out = {}
212
                for k_in, v_in in resdict_in.items():
213
                    k_out, v_out = k_in, v_in
214
                    if k_in == 'type' and v_in in ['github-dir', 'tarball']:
215
                        if v_in == 'github-dir':
216
                            v_out = 'directory'
217
                        elif v_in == 'tarball':
218
                            v_out = 'directory'
219
                    resdict_out[k_out] = v_out
220
                yaml_out[executable].append(resdict_out)
221
        resmgr.user_list.write_text(
222
            RESOURCE_USER_LIST_COMMENT + '\n# migrated with ocrd resmgr migrate {migration}\n' + safe_dump(yaml_out))
223
        log.info(f'Applied migration {migration} to {resmgr.user_list}')
224