Passed
Pull Request — master (#1079)
by Konstantin
02:29
created

ocrd.cli.workspace.workspace_find()   D

Complexity

Conditions 13

Size

Total Lines 66
Code Lines 57

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
eloc 57
dl 0
loc 66
rs 4.2
c 0
b 0
f 0
cc 13
nop 9

How to fix   Long Method    Complexity    Many Parameters   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like ocrd.cli.workspace.workspace_find() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

Many Parameters

Methods with many parameters are not only hard to understand, but their parameters also often become inconsistent when you need more, or different data.

There are several approaches to avoid long parameter lists:

1
"""
2
OCR-D CLI: workspace management
3
4
.. click:: ocrd.cli.workspace:workspace_cli
5
    :prog: ocrd workspace
6
    :nested: full
7
"""
8
import os
9
from os import getcwd
10
from os.path import relpath, exists, join, isabs
11
from pathlib import Path
12
from json import loads
13
import sys
14
from glob import glob   # XXX pathlib.Path.glob does not support absolute globs
15
import re
16
import time
17
18
import click
19
20
from ocrd import Resolver, Workspace, WorkspaceValidator, WorkspaceBackupManager
21
from ocrd.mets_server import OcrdMetsServer
22
from ocrd_utils import getLogger, initLogging, pushd_popd, EXT_TO_MIME, safe_filename, parse_json_string_or_file
23
from ocrd.decorators import mets_find_options
24
from . import command_with_replaced_help
25
26
27
class WorkspaceCtx():
28
29
    def __init__(self, directory, mets_url, mets_basename, mets_server_url, automatic_backup):
30
        self.log = getLogger('ocrd.cli.workspace')
31
        if mets_basename:
32
            self.log.warning(DeprecationWarning('--mets-basename is deprecated. Use --mets/--directory instead.'))
33
        self.resolver = Resolver()
34
        self.directory, self.mets_url, self.mets_basename, self.mets_server_url \
35
                = self.resolver.resolve_mets_arguments(directory, mets_url, mets_basename, mets_server_url)
36
        self.automatic_backup = automatic_backup
37
38
39
pass_workspace = click.make_pass_decorator(WorkspaceCtx)
40
41
# ----------------------------------------------------------------------
42
# ocrd workspace
43
# ----------------------------------------------------------------------
44
45
@click.group("workspace")
46
@click.option('-d', '--directory', envvar='WORKSPACE_DIR', type=click.Path(file_okay=False), metavar='WORKSPACE_DIR', help='Changes the workspace folder location [default: METS_URL directory or .]"')
47
@click.option('-M', '--mets-basename', default=None, help='METS file basename. Deprecated, use --mets/--directory')
48
@click.option('-m', '--mets', default=None, help='The path/URL of the METS file [default: WORKSPACE_DIR/mets.xml]', metavar="METS_URL")
49
@click.option('-U', '--mets-server-url', 'mets_server_url', help="TCP host of METS server")
50
@click.option('--backup', default=False, help="Backup mets.xml whenever it is saved.", is_flag=True)
51
@click.pass_context
52
def workspace_cli(ctx, directory, mets, mets_basename, mets_server_url, backup):
53
    """
54
    Managing workspaces
55
56
    A workspace comprises a METS file and a directory as point of reference.
57
58
    Operates on the file system directly or via a METS server 
59
    (already running via some prior `server start` subcommand).
60
    """
61
    initLogging()
62
    ctx.obj = WorkspaceCtx(
63
        directory,
64
        mets_url=mets,
65
        mets_basename=mets_basename,
66
        mets_server_url=mets_server_url,
67
        automatic_backup=backup
68
    )
69
70
# ----------------------------------------------------------------------
71
# ocrd workspace validate
72
# ----------------------------------------------------------------------
73
74
@workspace_cli.command('validate', cls=command_with_replaced_help(
75
    (r' \[METS_URL\]', ''))) # XXX deprecated argument
76
@pass_workspace
77
@click.option('-a', '--download', is_flag=True, help="Download all files")
78
@click.option('-s', '--skip', help="Tests to skip", default=[], multiple=True, type=click.Choice(
79
    ['imagefilename', 'dimension', 'pixel_density', 'page', 'url', 'page_xsd', 'mets_fileid_page_pcgtsid',
80
     'mets_unique_identifier', 'mets_file_group_names', 'mets_files', 'mets_xsd']))
81
@click.option('--page-textequiv-consistency', '--page-strictness', help="How strict to check PAGE multi-level textequiv consistency", type=click.Choice(['strict', 'lax', 'fix', 'off']), default='strict')
82
@click.option('--page-coordinate-consistency', help="How fierce to check PAGE multi-level coordinate consistency", type=click.Choice(['poly', 'baseline', 'both', 'off']), default='poly')
83
@click.argument('mets_url', default=None, required=False)
84
def workspace_validate(ctx, mets_url, download, skip, page_textequiv_consistency, page_coordinate_consistency):
85
    """
86
    Validate a workspace
87
88
    METS_URL can be a URL, an absolute path or a path relative to $PWD.
89
    If not given, use --mets accordingly.
90
91
    Check that the METS and its referenced file contents
92
    abide by the OCR-D specifications.
93
    """
94
    LOG = getLogger('ocrd.cli.workspace.validate')
95
    if mets_url:
96
        LOG.warning(DeprecationWarning("Use 'ocrd workspace --mets METS init' instead of argument 'METS_URL' ('%s')" % mets_url))
97
    else:
98
        mets_url = ctx.mets_url
99
    report = WorkspaceValidator.validate(
100
        ctx.resolver,
101
        mets_url,
102
        src_dir=ctx.directory,
103
        skip=skip,
104
        download=download,
105
        page_strictness=page_textequiv_consistency,
106
        page_coordinate_consistency=page_coordinate_consistency
107
    )
108
    print(report.to_xml())
109
    if not report.is_valid:
110
        sys.exit(128)
111
112
# ----------------------------------------------------------------------
113
# ocrd workspace clone
114
# ----------------------------------------------------------------------
115
116
@workspace_cli.command('clone', cls=command_with_replaced_help(
117
    (r' \[WORKSPACE_DIR\]', ''))) # XXX deprecated argument
118
@click.option('-f', '--clobber-mets', help="Overwrite existing METS file", default=False, is_flag=True)
119
@click.option('-a', '--download', is_flag=True, help="Download all files and change location in METS file after cloning")
120
@click.argument('mets_url')
121
# XXX deprecated
122
@click.argument('workspace_dir', default=None, required=False)
123
@pass_workspace
124
def workspace_clone(ctx, clobber_mets, download, mets_url, workspace_dir):
125
    """
126
    Create a workspace from METS_URL and return the directory
127
128
    METS_URL can be a URL, an absolute path or a path relative to $PWD.
129
    If METS_URL is not provided, use --mets accordingly.
130
    METS_URL can also be an OAI-PMH GetRecord URL wrapping a METS file.
131
    """
132
    LOG = getLogger('ocrd.cli.workspace.clone')
133
    if workspace_dir:
134
        LOG.warning(DeprecationWarning("Use 'ocrd workspace --directory DIR clone' instead of argument 'WORKSPACE_DIR' ('%s')" % workspace_dir))
135
        ctx.directory = workspace_dir
136
137
    workspace = ctx.resolver.workspace_from_url(
138
        mets_url,
139
        dst_dir=ctx.directory,
140
        mets_basename=ctx.mets_basename,
141
        clobber_mets=clobber_mets,
142
        download=download,
143
    )
144
    workspace.save_mets()
145
    print(workspace.directory)
146
147
# ----------------------------------------------------------------------
148
# ocrd workspace init
149
# ----------------------------------------------------------------------
150
151
@workspace_cli.command('init', cls=command_with_replaced_help(
152
    (r' \[DIRECTORY\]', ''))) # XXX deprecated argument
153
@click.option('-f', '--clobber-mets', help="Clobber mets.xml if it exists", is_flag=True, default=False)
154
# XXX deprecated
155
@click.argument('directory', default=None, required=False)
156
@pass_workspace
157
def workspace_init(ctx, clobber_mets, directory):
158
    """
159
    Create a workspace with an empty METS file in --directory.
160
161
    """
162
    LOG = getLogger('ocrd.cli.workspace.init')
163
    if directory:
164
        LOG.warning(DeprecationWarning("Use 'ocrd workspace --directory DIR init' instead of argument 'DIRECTORY' ('%s')" % directory))
165
        ctx.directory = directory
166
    workspace = ctx.resolver.workspace_from_nothing(
167
        directory=ctx.directory,
168
        mets_basename=ctx.mets_basename,
169
        clobber_mets=clobber_mets
170
    )
171
    workspace.save_mets()
172
    print(workspace.directory)
173
174
# ----------------------------------------------------------------------
175
# ocrd workspace add
176
# ----------------------------------------------------------------------
177
178
@workspace_cli.command('add')
179
@click.option('-G', '--file-grp', help="fileGrp USE", required=True, metavar='FILE_GRP')
180
@click.option('-i', '--file-id', help="ID for the file", required=True, metavar='FILE_ID')
181
@click.option('-m', '--mimetype', help="Media type of the file. Guessed from extension if not provided", required=False, metavar='TYPE')
182
@click.option('-g', '--page-id', help="ID of the physical page", metavar='PAGE_ID')
183
@click.option('-C', '--check-file-exists', help="Whether to ensure FNAME exists", is_flag=True, default=False)
184
@click.option('--ignore', help="Do not check whether file exists.", default=False, is_flag=True)
185
@click.option('--force', help="If file with ID already exists, replace it. No effect if --ignore is set.", default=False, is_flag=True)
186
@click.argument('fname', required=True)
187
@pass_workspace
188
def workspace_add_file(ctx, file_grp, file_id, mimetype, page_id, ignore, check_file_exists, force, fname):
189
    """
190
    Add a file or http(s) URL FNAME to METS in a workspace.
191
    If FNAME is not an http(s) URL and is not a workspace-local existing file, try to copy to workspace.
192
    """
193
    workspace = Workspace(
194
        ctx.resolver,
195
        directory=ctx.directory,
196
        mets_basename=ctx.mets_basename,
197
        automatic_backup=ctx.automatic_backup,
198
        mets_server_url=ctx.mets_server_url,
199
    )
200
201
    log = getLogger('ocrd.cli.workspace.add')
202
    if not mimetype:
203
        try:
204
            mimetype = EXT_TO_MIME[Path(fname).suffix]
205
            log.info("Guessed mimetype to be %s" % mimetype)
206
        except KeyError:
207
            log.error("Cannot guess mimetype from extension '%s' for '%s'. Set --mimetype explicitly" % (Path(fname).suffix, fname))
208
209
    log.debug("Adding '%s'", fname)
210
    local_filename = None
211
    if not (fname.startswith('http://') or fname.startswith('https://')):
212
        if not fname.startswith(ctx.directory):
213
            if not isabs(fname) and exists(join(ctx.directory, fname)):
214
                fname = join(ctx.directory, fname)
215
            else:
216
                log.debug("File '%s' is not in workspace, copying", fname)
217
                try:
218
                    fname = ctx.resolver.download_to_directory(ctx.directory, fname, subdir=file_grp)
219
                except FileNotFoundError:
220
                    if check_file_exists:
221
                        log.error("File '%s' does not exist, halt execution!" % fname)
222
                        sys.exit(1)
223
        if check_file_exists and not exists(fname):
224
            log.error("File '%s' does not exist, halt execution!" % fname)
225
            sys.exit(1)
226
        if fname.startswith(ctx.directory):
227
            fname = relpath(fname, ctx.directory)
228
        local_filename = fname
229
230
    if not page_id:
231
        log.warning("You did not provide '--page-id/-g', so the file you added is not linked to a specific page.")
232
    kwargs = {
233
        'file_id': file_id,
234
        'mimetype': mimetype,
235
        'page_id': page_id,
236
        'force': force,
237
        'ignore': ignore,
238
        'local_filename': local_filename,
239
        'url': fname
240
    }
241
    workspace.add_file(file_grp, **kwargs)
242
    workspace.save_mets()
243
244
# ----------------------------------------------------------------------
245
# ocrd workspace bulk-add
246
# ----------------------------------------------------------------------
247
248
# pylint: disable=broad-except
249
@workspace_cli.command('bulk-add')
250
@click.option('-r', '--regex', help="Regular expression matching the FILE_GLOB filesystem paths to define named captures usable in the other parameters", required=True)
251
@click.option('-m', '--mimetype', help="Media type of the file. If not provided, guess from filename", required=False)
252
@click.option('-g', '--page-id', help="physical page ID of the file", required=False)
253
@click.option('-i', '--file-id', help="ID of the file. If not provided, derive from fileGrp and filename", required=False)
254
@click.option('-u', '--url', help="local filesystem path in the workspace directory (copied from source file if different)", required=False)
255
@click.option('-G', '--file-grp', help="File group USE of the file", required=True)
256
@click.option('-n', '--dry-run', help="Don't actually do anything to the METS or filesystem, just preview", default=False, is_flag=True)
257
@click.option('-S', '--source-path', 'src_path_option', help="File path to copy from (if different from FILE_GLOB values)", required=False)
258
@click.option('-I', '--ignore', help="Disable checking for existing file entries (faster)", default=False, is_flag=True)
259
@click.option('-f', '--force', help="Replace existing file entries with the same ID (no effect when --ignore is set, too)", default=False, is_flag=True)
260
@click.option('-s', '--skip', help="Skip files not matching --regex (instead of failing)", default=False, is_flag=True)
261
@click.argument('file_glob', nargs=-1, required=True)
262
@pass_workspace
263
def workspace_cli_bulk_add(ctx, regex, mimetype, page_id, file_id, url, file_grp, dry_run, file_glob, src_path_option, ignore, force, skip):
264
    """
265
    Add files in bulk to an OCR-D workspace.
266
267
    FILE_GLOB can either be a shell glob expression to match file names,
268
    or a list of expressions or '-', in which case expressions are read from STDIN.
269
270
    After globbing, --regex is matched against each expression resulting from FILE_GLOB, and can
271
    define named groups reusable in the --page-id, --file-id, --mimetype, --url, --source-path and
272
    --file-grp options, e.g. by referencing the group name 'grp' from the regex as '{{ grp }}'.
273
274
    If the FILE_GLOB expressions do not denote the file names themselves
275
    (but arbitrary strings for --regex matching), then use --source-path to set
276
    the actual file paths to use. (This could involve fixed strings or group references.)
277
278
    \b
279
    Examples:
280
        ocrd workspace bulk-add \\
281
                --regex '(?P<fileGrp>[^/]+)/page_(?P<pageid>.*)\.[^.]+' \\
282
                --page-id 'PHYS_{{ pageid }}' \\
283
                --file-grp "{{ fileGrp }}" \\
284
                path/to/files/*/*.*
285
        \b
286
        echo "path/to/src/file.xml SEG/page_p0001.xml" \\
287
        | ocrd workspace bulk-add \\
288
                --regex '(?P<src>.*?) (?P<fileGrp>.+?)/page_(?P<pageid>.*)\.(?P<ext>[^\.]*)' \\
289
                --file-id 'FILE_{{ fileGrp }}_{{ pageid }}' \\
290
                --page-id 'PHYS_{{ pageid }}' \\
291
                --file-grp "{{ fileGrp }}" \\
292
                --url '{{ fileGrp }}/FILE_{{ pageid }}.{{ ext }}' \\
293
                -
294
295
        \b
296
        { echo PHYS_0001 BIN FILE_0001_BIN.IMG-wolf BIN/FILE_0001_BIN.IMG-wolf.png; \\
297
          echo PHYS_0001 BIN FILE_0001_BIN BIN/FILE_0001_BIN.xml; \\
298
          echo PHYS_0002 BIN FILE_0002_BIN.IMG-wolf BIN/FILE_0002_BIN.IMG-wolf.png; \\
299
          echo PHYS_0002 BIN FILE_0002_BIN BIN/FILE_0002_BIN.xml; \\
300
        } | ocrd workspace bulk-add -r '(?P<pageid>.*) (?P<filegrp>.*) (?P<fileid>.*) (?P<url>.*)' \\
301
          -G '{{ filegrp }}' -g '{{ pageid }}' -i '{{ fileid }}' -S '{{ url }}' -
302
    """
303
    log = getLogger('ocrd.cli.workspace.bulk-add') # pylint: disable=redefined-outer-name
304
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
305
306
    try:
307
        pat = re.compile(regex)
308
    except Exception as e:
309
        log.error("Invalid regex: %s" % e)
310
        sys.exit(1)
311
312
    file_paths = []
313
    from_stdin = file_glob == ('-',)
314
    if from_stdin:
315
        file_paths += [Path(x.strip('\n')) for x in sys.stdin.readlines()]
316
    else:
317
        for fglob in file_glob:
318
            expanded = glob(fglob)
319
            if not expanded:
320
                file_paths += [Path(fglob)]
321
            else:
322
                file_paths += [Path(x) for x in expanded]
323
324
    for i, file_path in enumerate(file_paths):
325
        log.info("[%4d/%d] %s" % (i + 1, len(file_paths), file_path))
326
327
        # match regex
328
        m = pat.match(str(file_path))
329
        if not m:
330
            if skip:
331
                continue
332
            log.error("File '%s' not matched by regex: '%s'" % (file_path, regex))
333
            sys.exit(1)
334
        group_dict = m.groupdict()
335
336
        # set up file info
337
        file_dict = {'url': url, 'mimetype': mimetype, 'file_id': file_id, 'page_id': page_id, 'file_grp': file_grp}
338
339
        # Flag to track whether 'url' should be 'src'
340
        url_is_src = False
341
342
        # expand templates
343
        for param_name in file_dict:
344
            if not file_dict[param_name]:
345
                if param_name == 'url':
346
                    url_is_src = True
347
                    continue
348
                elif param_name in ['mimetype', 'file_id']:
349
                    # auto-filled below once the other
350
                    # replacements have happened
351
                    continue
352
                raise ValueError(f"OcrdFile attribute '{param_name}' unset ({file_dict})")
353
            for group_name in group_dict:
354
                file_dict[param_name] = file_dict[param_name].replace('{{ %s }}' % group_name, group_dict[group_name])
355
356
        # Where to copy from
357
        if src_path_option:
358
            src_path = src_path_option
359
            for group_name in group_dict:
360
                src_path = src_path.replace('{{ %s }}' % group_name, group_dict[group_name])
361
            srcpath = Path(src_path)
362
        else:
363
            srcpath = file_path
364
365
        # derive --file-id from filename if not --file-id not explicitly set
366
        if not file_id:
367
            id_field = srcpath.stem if file_path != srcpath else file_path.stem
368
            file_dict['file_id'] = safe_filename('%s_%s' % (file_dict['file_grp'], id_field))
369
        if not mimetype:
370
            try:
371
                file_dict['mimetype'] = EXT_TO_MIME[srcpath.suffix]
372
            except KeyError:
373
                log.error("Cannot guess MIME type from extension '%s' for '%s'. Set --mimetype explicitly" % (srcpath.suffix, srcpath))
374
375
        # copy files if src != url
376
        if url_is_src:
377
            file_dict['url'] = str(srcpath)
378
        else:
379
            destpath = Path(workspace.directory, file_dict['url'])
380
            if srcpath != destpath and not destpath.exists():
381
                log.info("cp '%s' '%s'", srcpath, destpath)
382
                if not dry_run:
383
                    if not destpath.parent.is_dir():
384
                        destpath.parent.mkdir()
385
                    destpath.write_bytes(srcpath.read_bytes())
386
387
        # Add to workspace (or not)
388
        fileGrp = file_dict.pop('file_grp')
389
        if dry_run:
390
            log.info('workspace.add_file(%s)' % file_dict)
391
        else:
392
            workspace.add_file(fileGrp, ignore=ignore, force=force, **file_dict)
393
394
    # save changes to disk
395
    workspace.save_mets()
396
397
398
# ----------------------------------------------------------------------
399
# ocrd workspace find
400
# ----------------------------------------------------------------------
401
402
@workspace_cli.command('find')
403
@mets_find_options
404
@click.option('-k', '--output-field', help="Output field. Repeat for multiple fields, will be joined with tab",
405
        default=['local_filename'],
406
        multiple=True,
407
        type=click.Choice([
408
            'url',
409
            'mimetype',
410
            'page_id',
411
            'pageId',
412
            'file_id',
413
            'ID',
414
            'file_grp',
415
            'fileGrp',
416
            'basename',
417
            'basename_without_extension',
418
            'local_filename',
419
        ]))
420
@click.option('--download', is_flag=True, help="Download found files to workspace and change location in METS file ")
421
@click.option('--undo-download', is_flag=True, help="Remove all downloaded files from the METS")
422
@click.option('--wait', type=int, default=0, help="Wait this many seconds between download requests")
423
@pass_workspace
424
def workspace_find(ctx, file_grp, mimetype, page_id, file_id, output_field, download, undo_download, wait):
425
    """
426
    Find files.
427
428
    (If any ``FILTER`` starts with ``//``, then its remainder
429
     will be interpreted as a regular expression.)
430
    """
431
    snake_to_camel = {"file_id": "ID", "page_id": "pageId", "file_grp": "fileGrp"}
432
    output_field = [snake_to_camel.get(x, x) for x in output_field]
433
    modified_mets = False
434
    ret = list()
435
    workspace = Workspace(
436
        ctx.resolver,
437
        directory=ctx.directory,
438
        mets_basename=ctx.mets_basename,
439
        mets_server_url=ctx.mets_server_url,
440
    )
441
    for f in workspace.find_files(
442
            file_id=file_id,
443
            file_grp=file_grp,
444
            mimetype=mimetype,
445
            page_id=page_id,
446
        ):
447
        ret_entry = [f.ID if field == 'pageId' else str(getattr(f, field)) or '' for field in output_field]
448
        if download and not f.local_filename:
449
            workspace.download_file(f)
450
            modified_mets = True
451
            if wait:
452
                time.sleep(wait)
453
        if undo_download and f.local_filename:
454
            ret_entry = [f'Removed local_filename {f.local_filename}']
455
            f.local_filename = None
456
            modified_mets = True
457
        ret.append(ret_entry)
458
    if modified_mets:
459
        workspace.save_mets()
460
    if 'pageId' in output_field:
461
        idx = output_field.index('pageId')
462
        fileIds = list(map(lambda fields: fields[idx], ret))
0 ignored issues
show
introduced by
The variable idx does not seem to be defined in case 'pageId' in output_field on line 460 is False. Are you sure this can never be the case?
Loading history...
463
        pages = workspace.mets.get_physical_pages(for_fileIds=fileIds)
464
        for fields, page in zip(ret, pages):
465
            fields[idx] = page or ''
466
    for fields in ret:
467
        print('\t'.join(fields))
468
469
# ----------------------------------------------------------------------
470
# ocrd workspace remove
471
# ----------------------------------------------------------------------
472
473
@workspace_cli.command('remove')
474
@click.option('-k', '--keep-file', help="Do not delete file from file system", default=False, is_flag=True)
475
@click.option('-f', '--force', help="Continue even if mets:file or file on file system does not exist", default=False, is_flag=True)
476
@click.argument('ID', nargs=-1)
477
@pass_workspace
478
def workspace_remove_file(ctx, id, force, keep_file):  # pylint: disable=redefined-builtin
479
    """
480
    Delete files (given by their ID attribute ``ID``).
481
482
    (If any ``ID`` starts with ``//``, then its remainder
483
     will be interpreted as a regular expression.)
484
    """
485
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
486
    for i in id:
487
        workspace.remove_file(i, force=force, keep_file=keep_file)
488
    workspace.save_mets()
489
490
491
# ----------------------------------------------------------------------
492
# ocrd workspace rename-group
493
# ----------------------------------------------------------------------
494
495
@workspace_cli.command('rename-group')
496
@click.argument('OLD', nargs=1)
497
@click.argument('NEW', nargs=1)
498
@pass_workspace
499
def rename_group(ctx, old, new):
500
    """
501
    Rename fileGrp (USE attribute ``NEW`` to ``OLD``).
502
    """
503
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
504
    workspace.rename_file_group(old, new)
505
    workspace.save_mets()
506
507
# ----------------------------------------------------------------------
508
# ocrd workspace remove-group
509
# ----------------------------------------------------------------------
510
511
@workspace_cli.command('remove-group')
512
@click.option('-r', '--recursive', help="Delete any files in the group before the group itself", default=False, is_flag=True)
513
@click.option('-f', '--force', help="Continue removing even if group or containing files not found in METS", default=False, is_flag=True)
514
@click.option('-k', '--keep-files', help="Do not delete files from file system", default=False, is_flag=True)
515
@click.argument('GROUP', nargs=-1)
516
@pass_workspace
517
def remove_group(ctx, group, recursive, force, keep_files):
518
    """
519
    Delete fileGrps (given by their USE attribute ``GROUP``).
520
521
    (If any ``GROUP`` starts with ``//``, then its remainder
522
     will be interpreted as a regular expression.)
523
    """
524
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
525
    for g in group:
526
        workspace.remove_file_group(g, recursive=recursive, force=force, keep_files=keep_files)
527
    workspace.save_mets()
528
529
# ----------------------------------------------------------------------
530
# ocrd workspace prune-files
531
# ----------------------------------------------------------------------
532
533
@workspace_cli.command('prune-files')
534
@click.option('-G', '--file-grp', help="fileGrp USE", metavar='FILTER')
535
@click.option('-m', '--mimetype', help="Media type to look for", metavar='FILTER')
536
@click.option('-g', '--page-id', help="Page ID", metavar='FILTER')
537
@click.option('-i', '--file-id', help="ID", metavar='FILTER')
538
@pass_workspace
539
def prune_files(ctx, file_grp, mimetype, page_id, file_id):
540
    """
541
    Removes mets:files that point to non-existing local files
542
543
    (If any ``FILTER`` starts with ``//``, then its remainder
544
     will be interpreted as a regular expression.)
545
    """
546
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
547
    with pushd_popd(workspace.directory):
548
        for f in workspace.find_files(
549
            file_id=file_id,
550
            file_grp=file_grp,
551
            mimetype=mimetype,
552
            page_id=page_id,
553
        ):
554
            try:
555
                if not f.local_filename or not exists(f.local_filename):
556
                    workspace.mets.remove_file(f.ID)
557
            except Exception as e:
558
                ctx.log.exception("Error removing %f: %s", f, e)
559
                raise(e)
560
        workspace.save_mets()
561
562
# ----------------------------------------------------------------------
563
# ocrd workspace list-group
564
# ----------------------------------------------------------------------
565
566
@workspace_cli.command('list-group')
567
@pass_workspace
568
def list_groups(ctx):
569
    """
570
    List fileGrp USE attributes
571
    """
572
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
573
    print("\n".join(workspace.mets.file_groups))
574
575
# ----------------------------------------------------------------------
576
# ocrd workspace list-pages
577
# ----------------------------------------------------------------------
578
579
@workspace_cli.command('list-page')
580
@pass_workspace
581
def list_pages(ctx):
582
    """
583
    List physical page IDs
584
    """
585
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
586
    print("\n".join(workspace.mets.physical_pages))
587
588
# ----------------------------------------------------------------------
589
# ocrd workspace get-id
590
# ----------------------------------------------------------------------
591
592
@workspace_cli.command('get-id')
593
@pass_workspace
594
def get_id(ctx):
595
    """
596
    Get METS id if any
597
    """
598
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
599
    ID = workspace.mets.unique_identifier
600
    if ID:
601
        print(ID)
602
603
# ----------------------------------------------------------------------
604
# ocrd workspace set-id
605
# ----------------------------------------------------------------------
606
607
@workspace_cli.command('set-id')
608
@click.argument('ID')
609
@pass_workspace
610
def set_id(ctx, id):   # pylint: disable=redefined-builtin
611
    """
612
    Set METS ID.
613
614
    If one of the supported identifier mechanisms is used, will set this identifier.
615
616
    Otherwise will create a new <mods:identifier type="purl">{{ ID }}</mods:identifier>.
617
    """
618
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
619
    workspace.mets.unique_identifier = id
620
    workspace.save_mets()
621
622
# ----------------------------------------------------------------------
623
# ocrd workspace merge
624
# ----------------------------------------------------------------------
625
626
def _handle_json_option(ctx, param, value):
627
    return parse_json_string_or_file(value) if value else None
628
629
@workspace_cli.command('merge')
630
@click.argument('METS_PATH')
631
@click.option('--overwrite/--no-overwrite', is_flag=True, default=False, help="Overwrite on-disk file in case of file name conflicts with data from METS_PATH")
632
@click.option('--force/--no-force', is_flag=True, default=False, help="Overwrite mets:file from --mets with mets:file from METS_PATH if IDs clash")
633
@click.option('--copy-files/--no-copy-files', is_flag=True, help="Copy files as well", default=True, show_default=True)
634
@click.option('--fileGrp-mapping', help="JSON object mapping src to dest fileGrp", callback=_handle_json_option)
635
@click.option('--fileId-mapping', help="JSON object mapping src to dest file ID", callback=_handle_json_option)
636
@click.option('--pageId-mapping', help="JSON object mapping src to dest page ID", callback=_handle_json_option)
637
@mets_find_options
638
@pass_workspace
639
def merge(ctx, overwrite, force, copy_files, filegrp_mapping, fileid_mapping, pageid_mapping, file_grp, file_id, page_id, mimetype, mets_path):   # pylint: disable=redefined-builtin
640
    """
641
    Merges this workspace with the workspace that contains ``METS_PATH``
642
643
    Pass a JSON string or file to ``--fileGrp-mapping``, ``--fileId-mapping`` or ``--pageId-mapping``
644
    in order to rename all fileGrp, file ID or page ID values, respectively.
645
646
    The ``--file-id``, ``--page-id``, ``--mimetype`` and ``--file-grp`` options have
647
    the same semantics as in ``ocrd workspace find``, see ``ocrd workspace find --help``
648
    for an explanation.
649
    """
650
    mets_path = Path(mets_path)
651
    if filegrp_mapping:
652
        filegrp_mapping = loads(filegrp_mapping)
653
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
654
    other_workspace = Workspace(ctx.resolver, directory=str(mets_path.parent), mets_basename=str(mets_path.name))
655
    workspace.merge(
656
        other_workspace,
657
        force=force,
658
        overwrite=overwrite,
659
        copy_files=copy_files,
660
        fileGrp_mapping=filegrp_mapping,
661
        fileId_mapping=fileid_mapping,
662
        pageId_mapping=pageid_mapping,
663
        file_grp=file_grp,
664
        file_id=file_id,
665
        page_id=page_id,
666
        mimetype=mimetype
667
    )
668
    workspace.save_mets()
669
670
# ----------------------------------------------------------------------
671
# ocrd workspace backup
672
# ----------------------------------------------------------------------
673
674
@workspace_cli.group('backup')
675
@click.pass_context
676
def workspace_backup_cli(ctx): # pylint: disable=unused-argument
677
    """
678
    Backing and restoring workspaces - dev edition
679
    """
680
681
@workspace_backup_cli.command('add')
682
@pass_workspace
683
def workspace_backup_add(ctx):
684
    """
685
    Create a new backup
686
    """
687
    backup_manager = WorkspaceBackupManager(Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup))
688
    backup_manager.add()
689
690
@workspace_backup_cli.command('list')
691
@pass_workspace
692
def workspace_backup_list(ctx):
693
    """
694
    List backups
695
    """
696
    backup_manager = WorkspaceBackupManager(Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup))
697
    for b in backup_manager.list():
698
        print(b)
699
700
@workspace_backup_cli.command('restore')
701
@click.option('-f', '--choose-first', help="Restore first matching version if more than one", is_flag=True)
702
@click.argument('bak') #, type=click.Path(dir_okay=False, readable=True, resolve_path=True))
703
@pass_workspace
704
def workspace_backup_restore(ctx, choose_first, bak):
705
    """
706
    Restore backup BAK
707
    """
708
    backup_manager = WorkspaceBackupManager(Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup))
709
    backup_manager.restore(bak, choose_first)
710
711
@workspace_backup_cli.command('undo')
712
@pass_workspace
713
def workspace_backup_undo(ctx):
714
    """
715
    Restore the last backup
716
    """
717
    backup_manager = WorkspaceBackupManager(Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup))
718
    backup_manager.undo()
719
720
721
# ----------------------------------------------------------------------
722
# ocrd workspace serve
723
# ----------------------------------------------------------------------
724
725
@workspace_cli.group('server')
726
@pass_workspace
727
def workspace_serve_cli(ctx): # pylint: disable=unused-argument
728
    """Control a METS server for this workspace"""
729
    assert ctx.mets_server_url, "For METS server commands, you must provide '-U/--mets-server-url'"
730
731
@workspace_serve_cli.command('stop')
732
@pass_workspace
733
def workspace_serve_stop(ctx): # pylint: disable=unused-argument
734
    """Stop the METS server"""
735
    workspace = Workspace(
736
        ctx.resolver,
737
        directory=ctx.directory,
738
        mets_basename=ctx.mets_basename,
739
        mets_server_url=ctx.mets_server_url,
740
    )
741
    workspace.mets.stop()
742
743
@workspace_serve_cli.command('start')
744
@pass_workspace
745
def workspace_serve_start(ctx): # pylint: disable=unused-argument
746
    """
747
    Start a METS server
748
749
    (For TCP backend, pass a network interface to bind to as the '-U/--mets-server-url' parameter.)
750
    """
751
    OcrdMetsServer(
752
        workspace=Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename),
753
        url=ctx.mets_server_url,
754
    ).startup()
755