Passed
Pull Request — master (#1079)
by Konstantin
02:27
created

ocrd.cli.workspace.workspace_find()   D

Complexity

Conditions 13

Size

Total Lines 60
Code Lines 52

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
eloc 52
dl 0
loc 60
rs 4.2
c 0
b 0
f 0
cc 13
nop 9

How to fix   Long Method    Complexity    Many Parameters   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like ocrd.cli.workspace.workspace_find() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

Many Parameters

Methods with many parameters are not only hard to understand, but their parameters also often become inconsistent when you need more, or different data.

There are several approaches to avoid long parameter lists:

1
"""
2
OCR-D CLI: workspace management
3
4
.. click:: ocrd.cli.workspace:workspace_cli
5
    :prog: ocrd workspace
6
    :nested: full
7
"""
8
import os
9
from os import getcwd
10
from os.path import relpath, exists, join, isabs
11
from pathlib import Path
12
from json import loads
13
import sys
14
from glob import glob   # XXX pathlib.Path.glob does not support absolute globs
15
import re
16
import time
17
18
import click
19
20
from ocrd import Resolver, Workspace, WorkspaceValidator, WorkspaceBackupManager
21
from ocrd_utils import getLogger, initLogging, pushd_popd, EXT_TO_MIME, safe_filename, parse_json_string_or_file
22
from ocrd.decorators import mets_find_options
23
from . import command_with_replaced_help
24
25
26
class WorkspaceCtx():
27
28
    def __init__(self, directory, mets_url, mets_basename, automatic_backup):
29
        self.log = getLogger('ocrd.cli.workspace')
30
        self.resolver = Resolver()
31
        if mets_basename:
32
            self.log.warning(DeprecationWarning('--mets-basename is deprecated. Use --mets/--directory instead.'))
33
        self.directory, self.mets_url, self.mets_basename = self.resolver.resolve_mets_arguments(directory, mets_url, mets_basename)
34
        self.automatic_backup = automatic_backup
35
36
pass_workspace = click.make_pass_decorator(WorkspaceCtx)
37
38
# ----------------------------------------------------------------------
39
# ocrd workspace
40
# ----------------------------------------------------------------------
41
42
@click.group("workspace")
43
@click.option('-d', '--directory', envvar='WORKSPACE_DIR', type=click.Path(file_okay=False), metavar='WORKSPACE_DIR', help='Changes the workspace folder location [default: METS_URL directory or .]"')
44
@click.option('-M', '--mets-basename', default=None, help='METS file basename. Deprecated, use --mets/--directory')
45
@click.option('-m', '--mets', default=None, help='The path/URL of the METS file [default: WORKSPACE_DIR/mets.xml]', metavar="METS_URL")
46
@click.option('--backup', default=False, help="Backup mets.xml whenever it is saved.", is_flag=True)
47
@click.pass_context
48
def workspace_cli(ctx, directory, mets, mets_basename, backup):
49
    """
50
    Working with workspace
51
    """
52
    initLogging()
53
    ctx.obj = WorkspaceCtx(directory, mets_url=mets, mets_basename=mets_basename, automatic_backup=backup)
54
55
# ----------------------------------------------------------------------
56
# ocrd workspace validate
57
# ----------------------------------------------------------------------
58
59
@workspace_cli.command('validate', cls=command_with_replaced_help(
60
    (r' \[METS_URL\]', ''))) # XXX deprecated argument
61
@pass_workspace
62
@click.option('-a', '--download', is_flag=True, help="Download all files")
63
@click.option('-s', '--skip', help="Tests to skip", default=[], multiple=True, type=click.Choice(
64
    ['imagefilename', 'dimension', 'pixel_density', 'page', 'url', 'page_xsd', 'mets_fileid_page_pcgtsid',
65
     'mets_unique_identifier', 'mets_file_group_names', 'mets_files', 'mets_xsd']))
66
@click.option('--page-textequiv-consistency', '--page-strictness', help="How strict to check PAGE multi-level textequiv consistency", type=click.Choice(['strict', 'lax', 'fix', 'off']), default='strict')
67
@click.option('--page-coordinate-consistency', help="How fierce to check PAGE multi-level coordinate consistency", type=click.Choice(['poly', 'baseline', 'both', 'off']), default='poly')
68
@click.argument('mets_url', default=None, required=False)
69
def workspace_validate(ctx, mets_url, download, skip, page_textequiv_consistency, page_coordinate_consistency):
70
    """
71
    Validate a workspace
72
73
    METS_URL can be a URL, an absolute path or a path relative to $PWD.
74
    If not given, use --mets accordingly.
75
76
    Check that the METS and its referenced file contents
77
    abide by the OCR-D specifications.
78
    """
79
    LOG = getLogger('ocrd.cli.workspace.validate')
80
    if mets_url:
81
        LOG.warning(DeprecationWarning("Use 'ocrd workspace --mets METS init' instead of argument 'METS_URL' ('%s')" % mets_url))
82
    else:
83
        mets_url = ctx.mets_url
84
    report = WorkspaceValidator.validate(
85
        ctx.resolver,
86
        mets_url,
87
        src_dir=ctx.directory,
88
        skip=skip,
89
        download=download,
90
        page_strictness=page_textequiv_consistency,
91
        page_coordinate_consistency=page_coordinate_consistency
92
    )
93
    print(report.to_xml())
94
    if not report.is_valid:
95
        sys.exit(128)
96
97
# ----------------------------------------------------------------------
98
# ocrd workspace clone
99
# ----------------------------------------------------------------------
100
101
@workspace_cli.command('clone', cls=command_with_replaced_help(
102
    (r' \[WORKSPACE_DIR\]', ''))) # XXX deprecated argument
103
@click.option('-f', '--clobber-mets', help="Overwrite existing METS file", default=False, is_flag=True)
104
@click.option('-a', '--download', is_flag=True, help="Download all files and change location in METS file after cloning")
105
@click.argument('mets_url')
106
# XXX deprecated
107
@click.argument('workspace_dir', default=None, required=False)
108
@pass_workspace
109
def workspace_clone(ctx, clobber_mets, download, mets_url, workspace_dir):
110
    """
111
    Create a workspace from METS_URL and return the directory
112
113
    METS_URL can be a URL, an absolute path or a path relative to $PWD.
114
    If METS_URL is not provided, use --mets accordingly.
115
    METS_URL can also be an OAI-PMH GetRecord URL wrapping a METS file.
116
    """
117
    LOG = getLogger('ocrd.cli.workspace.clone')
118
    if workspace_dir:
119
        LOG.warning(DeprecationWarning("Use 'ocrd workspace --directory DIR clone' instead of argument 'WORKSPACE_DIR' ('%s')" % workspace_dir))
120
        ctx.directory = workspace_dir
121
122
    workspace = ctx.resolver.workspace_from_url(
123
        mets_url,
124
        dst_dir=ctx.directory,
125
        mets_basename=ctx.mets_basename,
126
        clobber_mets=clobber_mets,
127
        download=download,
128
    )
129
    workspace.save_mets()
130
    print(workspace.directory)
131
132
# ----------------------------------------------------------------------
133
# ocrd workspace init
134
# ----------------------------------------------------------------------
135
136
@workspace_cli.command('init', cls=command_with_replaced_help(
137
    (r' \[DIRECTORY\]', ''))) # XXX deprecated argument
138
@click.option('-f', '--clobber-mets', help="Clobber mets.xml if it exists", is_flag=True, default=False)
139
# XXX deprecated
140
@click.argument('directory', default=None, required=False)
141
@pass_workspace
142
def workspace_init(ctx, clobber_mets, directory):
143
    """
144
    Create a workspace with an empty METS file in --directory.
145
146
    """
147
    LOG = getLogger('ocrd.cli.workspace.init')
148
    if directory:
149
        LOG.warning(DeprecationWarning("Use 'ocrd workspace --directory DIR init' instead of argument 'DIRECTORY' ('%s')" % directory))
150
        ctx.directory = directory
151
    workspace = ctx.resolver.workspace_from_nothing(
152
        directory=ctx.directory,
153
        mets_basename=ctx.mets_basename,
154
        clobber_mets=clobber_mets
155
    )
156
    workspace.save_mets()
157
    print(workspace.directory)
158
159
# ----------------------------------------------------------------------
160
# ocrd workspace add
161
# ----------------------------------------------------------------------
162
163
@workspace_cli.command('add')
164
@click.option('-G', '--file-grp', help="fileGrp USE", required=True, metavar='FILE_GRP')
165
@click.option('-i', '--file-id', help="ID for the file", required=True, metavar='FILE_ID')
166
@click.option('-m', '--mimetype', help="Media type of the file. Guessed from extension if not provided", required=False, metavar='TYPE')
167
@click.option('-g', '--page-id', help="ID of the physical page", metavar='PAGE_ID')
168
@click.option('-C', '--check-file-exists', help="Whether to ensure FNAME exists", is_flag=True, default=False)
169
@click.option('--ignore', help="Do not check whether file exists.", default=False, is_flag=True)
170
@click.option('--force', help="If file with ID already exists, replace it. No effect if --ignore is set.", default=False, is_flag=True)
171
@click.argument('fname', required=True)
172
@pass_workspace
173
def workspace_add_file(ctx, file_grp, file_id, mimetype, page_id, ignore, check_file_exists, force, fname):
174
    """
175
    Add a file or http(s) URL FNAME to METS in a workspace.
176
    If FNAME is not an http(s) URL and is not a workspace-local existing file, try to copy to workspace.
177
    """
178
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
179
180
    log = getLogger('ocrd.cli.workspace.add')
181
    if not mimetype:
182
        try:
183
            mimetype = EXT_TO_MIME[Path(fname).suffix]
184
            log.info("Guessed mimetype to be %s" % mimetype)
185
        except KeyError:
186
            log.error("Cannot guess mimetype from extension '%s' for '%s'. Set --mimetype explicitly" % (Path(fname).suffix, fname))
187
188
    log.debug("Adding '%s'", fname)
189
    local_filename = None
190
    if not (fname.startswith('http://') or fname.startswith('https://')):
191
        if not fname.startswith(ctx.directory):
192
            if not isabs(fname) and exists(join(ctx.directory, fname)):
193
                fname = join(ctx.directory, fname)
194
            else:
195
                log.debug("File '%s' is not in workspace, copying", fname)
196
                try:
197
                    fname = ctx.resolver.download_to_directory(ctx.directory, fname, subdir=file_grp)
198
                except FileNotFoundError:
199
                    if check_file_exists:
200
                        log.error("File '%s' does not exist, halt execution!" % fname)
201
                        sys.exit(1)
202
        if check_file_exists and not exists(fname):
203
            log.error("File '%s' does not exist, halt execution!" % fname)
204
            sys.exit(1)
205
        if fname.startswith(ctx.directory):
206
            fname = relpath(fname, ctx.directory)
207
        local_filename = fname
208
209
    if not page_id:
210
        log.warning("You did not provide '--page-id/-g', so the file you added is not linked to a specific page.")
211
    workspace.add_file(file_grp, file_id=file_id, mimetype=mimetype, page_id=page_id, force=force, ignore=ignore, local_filename=local_filename, url=fname)
212
    workspace.save_mets()
213
214
# ----------------------------------------------------------------------
215
# ocrd workspace bulk-add
216
# ----------------------------------------------------------------------
217
218
# pylint: disable=broad-except
219
@workspace_cli.command('bulk-add')
220
@click.option('-r', '--regex', help="Regular expression matching the FILE_GLOB filesystem paths to define named captures usable in the other parameters", required=True)
221
@click.option('-m', '--mimetype', help="Media type of the file. If not provided, guess from filename", required=False)
222
@click.option('-g', '--page-id', help="physical page ID of the file", required=False)
223
@click.option('-i', '--file-id', help="ID of the file. If not provided, derive from fileGrp and filename", required=False)
224
@click.option('-u', '--url', help="local filesystem path in the workspace directory (copied from source file if different)", required=False)
225
@click.option('-G', '--file-grp', help="File group USE of the file", required=True)
226
@click.option('-n', '--dry-run', help="Don't actually do anything to the METS or filesystem, just preview", default=False, is_flag=True)
227
@click.option('-S', '--source-path', 'src_path_option', help="File path to copy from (if different from FILE_GLOB values)", required=False)
228
@click.option('-I', '--ignore', help="Disable checking for existing file entries (faster)", default=False, is_flag=True)
229
@click.option('-f', '--force', help="Replace existing file entries with the same ID (no effect when --ignore is set, too)", default=False, is_flag=True)
230
@click.option('-s', '--skip', help="Skip files not matching --regex (instead of failing)", default=False, is_flag=True)
231
@click.argument('file_glob', nargs=-1, required=True)
232
@pass_workspace
233
def workspace_cli_bulk_add(ctx, regex, mimetype, page_id, file_id, url, file_grp, dry_run, file_glob, src_path_option, ignore, force, skip):
234
    """
235
    Add files in bulk to an OCR-D workspace.
236
237
    FILE_GLOB can either be a shell glob expression to match file names,
238
    or a list of expressions or '-', in which case expressions are read from STDIN.
239
240
    After globbing, --regex is matched against each expression resulting from FILE_GLOB, and can
241
    define named groups reusable in the --page-id, --file-id, --mimetype, --url, --source-path and
242
    --file-grp options, e.g. by referencing the group name 'grp' from the regex as '{{ grp }}'.
243
244
    If the FILE_GLOB expressions do not denote the file names themselves
245
    (but arbitrary strings for --regex matching), then use --source-path to set
246
    the actual file paths to use. (This could involve fixed strings or group references.)
247
248
    \b
249
    Examples:
250
        ocrd workspace bulk-add \\
251
                --regex '(?P<fileGrp>[^/]+)/page_(?P<pageid>.*)\.[^.]+' \\
252
                --page-id 'PHYS_{{ pageid }}' \\
253
                --file-grp "{{ fileGrp }}" \\
254
                path/to/files/*/*.*
255
        \b
256
        echo "path/to/src/file.xml SEG/page_p0001.xml" \\
257
        | ocrd workspace bulk-add \\
258
                --regex '(?P<src>.*?) (?P<fileGrp>.+?)/page_(?P<pageid>.*)\.(?P<ext>[^\.]*)' \\
259
                --file-id 'FILE_{{ fileGrp }}_{{ pageid }}' \\
260
                --page-id 'PHYS_{{ pageid }}' \\
261
                --file-grp "{{ fileGrp }}" \\
262
                --url '{{ fileGrp }}/FILE_{{ pageid }}.{{ ext }}' \\
263
                -
264
265
        \b
266
        { echo PHYS_0001 BIN FILE_0001_BIN.IMG-wolf BIN/FILE_0001_BIN.IMG-wolf.png; \\
267
          echo PHYS_0001 BIN FILE_0001_BIN BIN/FILE_0001_BIN.xml; \\
268
          echo PHYS_0002 BIN FILE_0002_BIN.IMG-wolf BIN/FILE_0002_BIN.IMG-wolf.png; \\
269
          echo PHYS_0002 BIN FILE_0002_BIN BIN/FILE_0002_BIN.xml; \\
270
        } | ocrd workspace bulk-add -r '(?P<pageid>.*) (?P<filegrp>.*) (?P<fileid>.*) (?P<url>.*)' \\
271
          -G '{{ filegrp }}' -g '{{ pageid }}' -i '{{ fileid }}' -S '{{ url }}' -
272
    """
273
    log = getLogger('ocrd.cli.workspace.bulk-add') # pylint: disable=redefined-outer-name
274
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
275
276
    try:
277
        pat = re.compile(regex)
278
    except Exception as e:
279
        log.error("Invalid regex: %s" % e)
280
        sys.exit(1)
281
282
    file_paths = []
283
    from_stdin = file_glob == ('-',)
284
    if from_stdin:
285
        file_paths += [Path(x.strip('\n')) for x in sys.stdin.readlines()]
286
    else:
287
        for fglob in file_glob:
288
            expanded = glob(fglob)
289
            if not expanded:
290
                file_paths += [Path(fglob)]
291
            else:
292
                file_paths += [Path(x) for x in expanded]
293
294
    for i, file_path in enumerate(file_paths):
295
        log.info("[%4d/%d] %s" % (i + 1, len(file_paths), file_path))
296
297
        # match regex
298
        m = pat.match(str(file_path))
299
        if not m:
300
            if skip:
301
                continue
302
            log.error("File '%s' not matched by regex: '%s'" % (file_path, regex))
303
            sys.exit(1)
304
        group_dict = m.groupdict()
305
306
        # set up file info
307
        file_dict = {'url': url, 'mimetype': mimetype, 'file_id': file_id, 'page_id': page_id, 'file_grp': file_grp}
308
309
        # Flag to track whether 'url' should be 'src'
310
        url_is_src = False
311
312
        # expand templates
313
        for param_name in file_dict:
314
            if not file_dict[param_name]:
315
                if param_name == 'url':
316
                    url_is_src = True
317
                    continue
318
                elif param_name in ['mimetype', 'file_id']:
319
                    # auto-filled below once the other
320
                    # replacements have happened
321
                    continue
322
                raise ValueError(f"OcrdFile attribute '{param_name}' unset ({file_dict})")
323
            for group_name in group_dict:
324
                file_dict[param_name] = file_dict[param_name].replace('{{ %s }}' % group_name, group_dict[group_name])
325
326
        # Where to copy from
327
        if src_path_option:
328
            src_path = src_path_option
329
            for group_name in group_dict:
330
                src_path = src_path.replace('{{ %s }}' % group_name, group_dict[group_name])
331
            srcpath = Path(src_path)
332
        else:
333
            srcpath = file_path
334
335
        # derive --file-id from filename if not --file-id not explicitly set
336
        if not file_id:
337
            id_field = srcpath.stem if file_path != srcpath else file_path.stem
338
            file_dict['file_id'] = safe_filename('%s_%s' % (file_dict['file_grp'], id_field))
339
        if not mimetype:
340
            try:
341
                file_dict['mimetype'] = EXT_TO_MIME[srcpath.suffix]
342
            except KeyError:
343
                log.error("Cannot guess MIME type from extension '%s' for '%s'. Set --mimetype explicitly" % (srcpath.suffix, srcpath))
344
345
        # copy files if src != url
346
        if url_is_src:
347
            file_dict['url'] = str(srcpath)
348
        else:
349
            destpath = Path(workspace.directory, file_dict['url'])
350
            if srcpath != destpath and not destpath.exists():
351
                log.info("cp '%s' '%s'", srcpath, destpath)
352
                if not dry_run:
353
                    if not destpath.parent.is_dir():
354
                        destpath.parent.mkdir()
355
                    destpath.write_bytes(srcpath.read_bytes())
356
357
        # Add to workspace (or not)
358
        fileGrp = file_dict.pop('file_grp')
359
        if dry_run:
360
            log.info('workspace.add_file(%s)' % file_dict)
361
        else:
362
            workspace.add_file(fileGrp, ignore=ignore, force=force, **file_dict)
363
364
    # save changes to disk
365
    workspace.save_mets()
366
367
368
# ----------------------------------------------------------------------
369
# ocrd workspace find
370
# ----------------------------------------------------------------------
371
372
@workspace_cli.command('find')
373
@mets_find_options
374
@click.option('-k', '--output-field', help="Output field. Repeat for multiple fields, will be joined with tab",
375
        default=['local_filename'],
376
        multiple=True,
377
        type=click.Choice([
378
            'url',
379
            'mimetype',
380
            'page_id',
381
            'pageId',
382
            'file_id',
383
            'ID',
384
            'file_grp',
385
            'fileGrp',
386
            'basename',
387
            'basename_without_extension',
388
            'local_filename',
389
        ]))
390
@click.option('--download', is_flag=True, help="Download found files to workspace and change location in METS file ")
391
@click.option('--undo-download', is_flag=True, help="Remove all downloaded files from the METS")
392
@click.option('--wait', type=int, default=0, help="Wait this many seconds between download requests")
393
@pass_workspace
394
def workspace_find(ctx, file_grp, mimetype, page_id, file_id, output_field, download, undo_download, wait):
395
    """
396
    Find files.
397
398
    (If any ``FILTER`` starts with ``//``, then its remainder
399
     will be interpreted as a regular expression.)
400
    """
401
    snake_to_camel = {"file_id": "ID", "page_id": "pageId", "file_grp": "fileGrp"}
402
    output_field = [snake_to_camel.get(x, x) for x in output_field]
403
    modified_mets = False
404
    ret = list()
405
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
406
    for f in workspace.find_files(
407
            file_id=file_id,
408
            file_grp=file_grp,
409
            mimetype=mimetype,
410
            page_id=page_id,
411
        ):
412
        if download and not f.local_filename:
413
            workspace.download_file(f)
414
            modified_mets = True
415
            if wait:
416
                time.sleep(wait)
417
        if undo_download and f.local_filename:
418
            f.local_filename = None
419
            modified_mets = True
420
        ret.append([f.ID if field == 'pageId' else str(getattr(f, field)) or ''
421
                    for field in output_field])
422
    if modified_mets:
423
        workspace.save_mets()
424
    if 'pageId' in output_field:
425
        idx = output_field.index('pageId')
426
        fileIds = list(map(lambda fields: fields[idx], ret))
0 ignored issues
show
introduced by
The variable idx does not seem to be defined in case 'pageId' in output_field on line 424 is False. Are you sure this can never be the case?
Loading history...
427
        pages = workspace.mets.get_physical_pages(for_fileIds=fileIds)
428
        for fields, page in zip(ret, pages):
429
            fields[idx] = page or ''
430
    for fields in ret:
431
        print('\t'.join(fields))
432
433
# ----------------------------------------------------------------------
434
# ocrd workspace remove
435
# ----------------------------------------------------------------------
436
437
@workspace_cli.command('remove')
438
@click.option('-k', '--keep-file', help="Do not delete file from file system", default=False, is_flag=True)
439
@click.option('-f', '--force', help="Continue even if mets:file or file on file system does not exist", default=False, is_flag=True)
440
@click.argument('ID', nargs=-1)
441
@pass_workspace
442
def workspace_remove_file(ctx, id, force, keep_file):  # pylint: disable=redefined-builtin
443
    """
444
    Delete files (given by their ID attribute ``ID``).
445
446
    (If any ``ID`` starts with ``//``, then its remainder
447
     will be interpreted as a regular expression.)
448
    """
449
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
450
    for i in id:
451
        workspace.remove_file(i, force=force, keep_file=keep_file)
452
    workspace.save_mets()
453
454
455
# ----------------------------------------------------------------------
456
# ocrd workspace rename-group
457
# ----------------------------------------------------------------------
458
459
@workspace_cli.command('rename-group')
460
@click.argument('OLD', nargs=1)
461
@click.argument('NEW', nargs=1)
462
@pass_workspace
463
def rename_group(ctx, old, new):
464
    """
465
    Rename fileGrp (USE attribute ``NEW`` to ``OLD``).
466
    """
467
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
468
    workspace.rename_file_group(old, new)
469
    workspace.save_mets()
470
471
# ----------------------------------------------------------------------
472
# ocrd workspace remove-group
473
# ----------------------------------------------------------------------
474
475
@workspace_cli.command('remove-group')
476
@click.option('-r', '--recursive', help="Delete any files in the group before the group itself", default=False, is_flag=True)
477
@click.option('-f', '--force', help="Continue removing even if group or containing files not found in METS", default=False, is_flag=True)
478
@click.option('-k', '--keep-files', help="Do not delete files from file system", default=False, is_flag=True)
479
@click.argument('GROUP', nargs=-1)
480
@pass_workspace
481
def remove_group(ctx, group, recursive, force, keep_files):
482
    """
483
    Delete fileGrps (given by their USE attribute ``GROUP``).
484
485
    (If any ``GROUP`` starts with ``//``, then its remainder
486
     will be interpreted as a regular expression.)
487
    """
488
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
489
    for g in group:
490
        workspace.remove_file_group(g, recursive=recursive, force=force, keep_files=keep_files)
491
    workspace.save_mets()
492
493
# ----------------------------------------------------------------------
494
# ocrd workspace prune-files
495
# ----------------------------------------------------------------------
496
497
@workspace_cli.command('prune-files')
498
@click.option('-G', '--file-grp', help="fileGrp USE", metavar='FILTER')
499
@click.option('-m', '--mimetype', help="Media type to look for", metavar='FILTER')
500
@click.option('-g', '--page-id', help="Page ID", metavar='FILTER')
501
@click.option('-i', '--file-id', help="ID", metavar='FILTER')
502
@pass_workspace
503
def prune_files(ctx, file_grp, mimetype, page_id, file_id):
504
    """
505
    Removes mets:files that point to non-existing local files
506
507
    (If any ``FILTER`` starts with ``//``, then its remainder
508
     will be interpreted as a regular expression.)
509
    """
510
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
511
    with pushd_popd(workspace.directory):
512
        for f in workspace.find_files(
513
            file_id=file_id,
514
            file_grp=file_grp,
515
            mimetype=mimetype,
516
            page_id=page_id,
517
        ):
518
            try:
519
                if not f.local_filename or not exists(f.local_filename):
520
                    workspace.mets.remove_file(f.ID)
521
            except Exception as e:
522
                ctx.log.exception("Error removing %f: %s", f, e)
523
                raise(e)
524
        workspace.save_mets()
525
526
# ----------------------------------------------------------------------
527
# ocrd workspace list-group
528
# ----------------------------------------------------------------------
529
530
@workspace_cli.command('list-group')
531
@pass_workspace
532
def list_groups(ctx):
533
    """
534
    List fileGrp USE attributes
535
    """
536
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
537
    print("\n".join(workspace.mets.file_groups))
538
539
# ----------------------------------------------------------------------
540
# ocrd workspace list-pages
541
# ----------------------------------------------------------------------
542
543
@workspace_cli.command('list-page')
544
@pass_workspace
545
def list_pages(ctx):
546
    """
547
    List physical page IDs
548
    """
549
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
550
    print("\n".join(workspace.mets.physical_pages))
551
552
# ----------------------------------------------------------------------
553
# ocrd workspace get-id
554
# ----------------------------------------------------------------------
555
556
@workspace_cli.command('get-id')
557
@pass_workspace
558
def get_id(ctx):
559
    """
560
    Get METS id if any
561
    """
562
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename)
563
    ID = workspace.mets.unique_identifier
564
    if ID:
565
        print(ID)
566
567
# ----------------------------------------------------------------------
568
# ocrd workspace set-id
569
# ----------------------------------------------------------------------
570
571
@workspace_cli.command('set-id')
572
@click.argument('ID')
573
@pass_workspace
574
def set_id(ctx, id):   # pylint: disable=redefined-builtin
575
    """
576
    Set METS ID.
577
578
    If one of the supported identifier mechanisms is used, will set this identifier.
579
580
    Otherwise will create a new <mods:identifier type="purl">{{ ID }}</mods:identifier>.
581
    """
582
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
583
    workspace.mets.unique_identifier = id
584
    workspace.save_mets()
585
586
# ----------------------------------------------------------------------
587
# ocrd workspace merge
588
# ----------------------------------------------------------------------
589
590
def _handle_json_option(ctx, param, value):
591
    return parse_json_string_or_file(value) if value else None
592
593
@workspace_cli.command('merge')
594
@click.argument('METS_PATH')
595
@click.option('--overwrite/--no-overwrite', is_flag=True, default=False, help="Overwrite on-disk file in case of file name conflicts with data from METS_PATH")
596
@click.option('--force/--no-force', is_flag=True, default=False, help="Overwrite mets:file from --mets with mets:file from METS_PATH if IDs clash")
597
@click.option('--copy-files/--no-copy-files', is_flag=True, help="Copy files as well", default=True, show_default=True)
598
@click.option('--fileGrp-mapping', help="JSON object mapping src to dest fileGrp", callback=_handle_json_option)
599
@click.option('--fileId-mapping', help="JSON object mapping src to dest file ID", callback=_handle_json_option)
600
@click.option('--pageId-mapping', help="JSON object mapping src to dest page ID", callback=_handle_json_option)
601
@mets_find_options
602
@pass_workspace
603
def merge(ctx, overwrite, force, copy_files, filegrp_mapping, fileid_mapping, pageid_mapping, file_grp, file_id, page_id, mimetype, mets_path):   # pylint: disable=redefined-builtin
604
    """
605
    Merges this workspace with the workspace that contains ``METS_PATH``
606
607
    Pass a JSON string or file to ``--fileGrp-mapping``, ``--fileId-mapping`` or ``--pageId-mapping``
608
    in order to rename all fileGrp, file ID or page ID values, respectively.
609
610
    The ``--file-id``, ``--page-id``, ``--mimetype`` and ``--file-grp`` options have
611
    the same semantics as in ``ocrd workspace find``, see ``ocrd workspace find --help``
612
    for an explanation.
613
    """
614
    mets_path = Path(mets_path)
615
    if filegrp_mapping:
616
        filegrp_mapping = loads(filegrp_mapping)
617
    workspace = Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup)
618
    other_workspace = Workspace(ctx.resolver, directory=str(mets_path.parent), mets_basename=str(mets_path.name))
619
    workspace.merge(
620
        other_workspace,
621
        force=force,
622
        overwrite=overwrite,
623
        copy_files=copy_files,
624
        fileGrp_mapping=filegrp_mapping,
625
        fileId_mapping=fileid_mapping,
626
        pageId_mapping=pageid_mapping,
627
        file_grp=file_grp,
628
        file_id=file_id,
629
        page_id=page_id,
630
        mimetype=mimetype
631
    )
632
    workspace.save_mets()
633
634
# ----------------------------------------------------------------------
635
# ocrd workspace backup
636
# ----------------------------------------------------------------------
637
638
@workspace_cli.group('backup')
639
@click.pass_context
640
def workspace_backup_cli(ctx): # pylint: disable=unused-argument
641
    """
642
    Backing and restoring workspaces - dev edition
643
    """
644
645
@workspace_backup_cli.command('add')
646
@pass_workspace
647
def workspace_backup_add(ctx):
648
    """
649
    Create a new backup
650
    """
651
    backup_manager = WorkspaceBackupManager(Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup))
652
    backup_manager.add()
653
654
@workspace_backup_cli.command('list')
655
@pass_workspace
656
def workspace_backup_list(ctx):
657
    """
658
    List backups
659
    """
660
    backup_manager = WorkspaceBackupManager(Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup))
661
    for b in backup_manager.list():
662
        print(b)
663
664
@workspace_backup_cli.command('restore')
665
@click.option('-f', '--choose-first', help="Restore first matching version if more than one", is_flag=True)
666
@click.argument('bak') #, type=click.Path(dir_okay=False, readable=True, resolve_path=True))
667
@pass_workspace
668
def workspace_backup_restore(ctx, choose_first, bak):
669
    """
670
    Restore backup BAK
671
    """
672
    backup_manager = WorkspaceBackupManager(Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup))
673
    backup_manager.restore(bak, choose_first)
674
675
@workspace_backup_cli.command('undo')
676
@pass_workspace
677
def workspace_backup_undo(ctx):
678
    """
679
    Restore the last backup
680
    """
681
    backup_manager = WorkspaceBackupManager(Workspace(ctx.resolver, directory=ctx.directory, mets_basename=ctx.mets_basename, automatic_backup=ctx.automatic_backup))
682
    backup_manager.undo()
683