Passed
Pull Request — master (#123)
by
unknown
10:42
created

MetsDocument::_getPhysicalStructure()   F

Complexity

Conditions 21
Paths 130

Size

Total Lines 57
Code Lines 35

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 21
eloc 35
c 0
b 0
f 0
nc 130
nop 0
dl 0
loc 57
rs 3.9166

1 Method

Rating   Name   Duplication   Size   Complexity  
A MetsDocument::magicGetDmdSec() 0 4 1

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * (c) Kitodo. Key to digital objects e.V. <[email protected]>
5
 *
6
 * This file is part of the Kitodo and TYPO3 projects.
7
 *
8
 * @license GNU General Public License version 3 or later.
9
 * For the full copyright and license information, please read the
10
 * LICENSE.txt file that was distributed with this source code.
11
 */
12
13
namespace Kitodo\Dlf\Common;
14
15
use TYPO3\CMS\Core\Configuration\ExtensionConfiguration;
16
use TYPO3\CMS\Core\Database\ConnectionPool;
17
use TYPO3\CMS\Core\Database\Query\Restriction\HiddenRestriction;
18
use TYPO3\CMS\Core\Log\LogManager;
19
use TYPO3\CMS\Core\Utility\GeneralUtility;
20
use Ubl\Iiif\Tools\IiifHelper;
21
use Ubl\Iiif\Services\AbstractImageService;
22
23
/**
24
 * MetsDocument class for the 'dlf' extension.
25
 *
26
 * @package TYPO3
27
 * @subpackage dlf
28
 *
29
 * @access public
30
 *
31
 * @property int $cPid this holds the PID for the configuration
32
 * @property-read array $formats this holds the configuration for all supported metadata encodings
33
 * @property bool $formatsLoaded flag with information if the available metadata formats are loaded
34
 * @property-read bool $hasFulltext flag with information if there are any fulltext files available
35
 * @property array $lastSearchedPhysicalPage the last searched logical and physical page
36
 * @property array $logicalUnits this holds the logical units
37
 * @property-read array $metadataArray this holds the documents' parsed metadata array
38
 * @property bool $metadataArrayLoaded flag with information if the metadata array is loaded
39
 * @property-read int $numPages the holds the total number of pages
40
 * @property-read int $parentId this holds the UID of the parent document or zero if not multi-volumed
41
 * @property-read array $physicalStructure this holds the physical structure
42
 * @property-read array $physicalStructureInfo this holds the physical structure metadata
43
 * @property bool $physicalStructureLoaded flag with information if the physical structure is loaded
44
 * @property-read int $pid this holds the PID of the document or zero if not in database
45
 * @property array $rawTextArray this holds the documents' raw text pages with their corresponding structMap//div's ID (METS) or Range / Manifest / Sequence ID (IIIF) as array key
46
 * @property-read bool $ready Is the document instantiated successfully?
47
 * @property-read string $recordId the METS file's / IIIF manifest's record identifier
48
 * @property-read int $rootId this holds the UID of the root document or zero if not multi-volumed
49
 * @property-read array $smLinks this holds the smLinks between logical and physical structMap
50
 * @property bool $smLinksLoaded flag with information if the smLinks are loaded
51
 * @property-read array $tableOfContents this holds the logical structure
52
 * @property bool $tableOfContentsLoaded flag with information if the table of contents is loaded
53
 * @property-read string $thumbnail this holds the document's thumbnail location
54
 * @property bool $thumbnailLoaded flag with information if the thumbnail is loaded
55
 * @property-read string $toplevelId this holds the toplevel structure's "@ID" (METS) or the manifest's "@id" (IIIF)
56
 * @property \SimpleXMLElement $xml this holds the whole XML file as \SimpleXMLElement object
57
 * @property-read array $mdSec associative array of METS metadata sections indexed by their IDs.
58
 * @property bool $mdSecLoaded flag with information if the array of METS metadata sections is loaded
59
 * @property-read array $dmdSec subset of `$mdSec` storing only the dmdSec entries; kept for compatibility.
60
 * @property-read array $fileGrps this holds the file ID -> USE concordance
61
 * @property bool $fileGrpsLoaded flag with information if file groups array is loaded
62
 * @property-read array $fileInfos additional information about files (e.g., ADMID), indexed by ID.
63
 * @property-read \SimpleXMLElement $mets this holds the XML file's METS part as \SimpleXMLElement object
64
 * @property-read string $parentHref URL of the parent document (determined via mptr element), or empty string if none is available
65
 */
66
final class MetsDocument extends AbstractDocument
67
{
68
    /**
69
     * @access protected
70
     * @var string[] Subsections / tags that may occur within `<mets:amdSec>`
71
     *
72
     * @link https://www.loc.gov/standards/mets/docs/mets.v1-9.html#amdSec
73
     * @link https://www.loc.gov/standards/mets/docs/mets.v1-9.html#mdSecType
74
     */
75
    protected const ALLOWED_AMD_SEC = ['techMD', 'rightsMD', 'sourceMD', 'digiprovMD'];
76
77
    /**
78
     * @access protected
79
     * @var string This holds the whole XML file as string for serialization purposes
80
     *
81
     * @see __sleep() / __wakeup()
82
     */
83
    protected string $asXML = '';
84
85
    /**
86
     * @access protected
87
     * @var array This maps the ID of each amdSec to the IDs of its children (techMD etc.). When an ADMID references an amdSec instead of techMD etc., this is used to iterate the child elements.
88
     */
89
    protected array $amdSecChildIds = [];
90
91
    /**
92
     * @access protected
93
     * @var array Associative array of METS metadata sections indexed by their IDs.
94
     */
95
    protected array $mdSec = [];
96
97
    /**
98
     * @access protected
99
     * @var bool Are the METS file's metadata sections loaded?
100
     *
101
     * @see MetsDocument::$mdSec
102
     */
103
    protected bool $mdSecLoaded = false;
104
105
    /**
106
     * @access protected
107
     * @var array Subset of $mdSec storing only the dmdSec entries; kept for compatibility.
108
     */
109
    protected array $dmdSec = [];
110
111
    /**
112
     * @access protected
113
     * @var array This holds the file ID -> USE concordance
114
     *
115
     * @see magicGetFileGrps()
116
     */
117
    protected array $fileGrps = [];
118
119
    /**
120
     * @access protected
121
     * @var bool Are the image file groups loaded?
122
     *
123
     * @see $fileGrps
124
     */
125
    protected bool $fileGrpsLoaded = false;
126
127
    /**
128
     * @access protected
129
     * @var \SimpleXMLElement This holds the XML file's METS part as \SimpleXMLElement object
130
     */
131
    protected \SimpleXMLElement $mets;
132
133
    /**
134
     * @access protected
135
     * @var string URL of the parent document (determined via mptr element), or empty string if none is available
136
     */
137
    protected string $parentHref = '';
138
139
    /**
140
     * @access protected
141
     * @var array the extension settings
142
     */
143
    protected array $settings = [];
144
145
    /**
146
     * This adds metadata from METS structural map to metadata array.
147
     *
148
     * @access public
149
     *
150
     * @param array &$metadata The metadata array to extend
151
     * @param string $id The "@ID" attribute of the logical structure node
152
     *
153
     * @return void
154
     */
155
    public function addMetadataFromMets(array &$metadata, string $id): void
156
    {
157
        $details = $this->getLogicalStructure($id);
158
        if (!empty($details)) {
159
            $metadata['mets_order'][0] = $details['order'];
160
            $metadata['mets_label'][0] = $details['label'];
161
            $metadata['mets_orderlabel'][0] = $details['orderlabel'];
162
        }
163
    }
164
165
    /**
166
     * @see AbstractDocument::establishRecordId()
167
     */
168
    protected function establishRecordId(int $pid): void
169
    {
170
        // Check for METS object @ID.
171
        if (!empty($this->mets['OBJID'])) {
172
            $this->recordId = (string) $this->mets['OBJID'];
0 ignored issues
show
Bug introduced by
The property recordId is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
173
        }
174
        // Get hook objects.
175
        $hookObjects = Helper::getHookObjects('Classes/Common/MetsDocument.php');
176
        // Apply hooks.
177
        foreach ($hookObjects as $hookObj) {
178
            if (method_exists($hookObj, 'postProcessRecordId')) {
179
                $hookObj->postProcessRecordId($this->xml, $this->recordId);
180
            }
181
        }
182
    }
183
184
    /**
185
     * @see AbstractDocument::getDownloadLocation()
186
     */
187
    public function getDownloadLocation(string $id): string
188
    {
189
        $file = $this->getFileInfo($id);
190
        if ($file['mimeType'] === 'application/vnd.kitodo.iiif') {
191
            $file['location'] = (strrpos($file['location'], 'info.json') === strlen($file['location']) - 9) ? $file['location'] : (strrpos($file['location'], '/') === strlen($file['location']) ? $file['location'] . 'info.json' : $file['location'] . '/info.json');
192
            $conf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'iiif');
193
            IiifHelper::setUrlReader(IiifUrlReader::getInstance());
194
            IiifHelper::setMaxThumbnailHeight($conf['thumbnailHeight']);
195
            IiifHelper::setMaxThumbnailWidth($conf['thumbnailWidth']);
196
            $service = IiifHelper::loadIiifResource($file['location']);
197
            if ($service instanceof AbstractImageService) {
198
                return $service->getImageUrl();
199
            }
200
        } elseif ($file['mimeType'] === 'application/vnd.netfpx') {
201
            $baseURL = $file['location'] . (strpos($file['location'], '?') === false ? '?' : '');
202
            // TODO CVT is an optional IIP server capability; in theory, capabilities should be determined in the object request with '&obj=IIP-server'
203
            return $baseURL . '&CVT=jpeg';
204
        }
205
        return $file['location'];
206
    }
207
208
    /**
209
     * {@inheritDoc}
210
     * @see AbstractDocument::getFileInfo()
211
     */
212
    public function getFileInfo($id): ?array
213
    {
214
        $this->magicGetFileGrps();
215
216
        if (isset($this->fileInfos[$id]) && empty($this->fileInfos[$id]['location'])) {
217
            $this->fileInfos[$id]['location'] = $this->getFileLocation($id);
0 ignored issues
show
Bug introduced by
The property fileInfos is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
218
        }
219
220
        if (isset($this->fileInfos[$id]) && empty($this->fileInfos[$id]['mimeType'])) {
221
            $this->fileInfos[$id]['mimeType'] = $this->getFileMimeType($id);
222
        }
223
224
        return $this->fileInfos[$id] ?? null;
225
    }
226
227
    /**
228
     * @see AbstractDocument::getFileLocation()
229
     */
230
    public function getFileLocation(string $id): string
231
    {
232
        $location = $this->mets->xpath('./mets:fileSec/mets:fileGrp/mets:file[@ID="' . $id . '"]/mets:FLocat[@LOCTYPE="URL"]');
233
        if (
234
            !empty($id)
235
            && !empty($location)
236
        ) {
237
            return (string) $location[0]->attributes('http://www.w3.org/1999/xlink')->href;
238
        } else {
239
            $this->logger->warning('There is no file node with @ID "' . $id . '"');
240
            return '';
241
        }
242
    }
243
244
    /**
245
     * @see AbstractDocument::getFileMimeType()
246
     */
247
    public function getFileMimeType(string $id): string
248
    {
249
        $mimetype = $this->mets->xpath('./mets:fileSec/mets:fileGrp/mets:file[@ID="' . $id . '"]/@MIMETYPE');
250
        if (
251
            !empty($id)
252
            && !empty($mimetype)
253
        ) {
254
            return (string) $mimetype[0];
255
        } else {
256
            $this->logger->warning('There is no file node with @ID "' . $id . '" or no MIME type specified');
257
            return '';
258
        }
259
    }
260
261
    /**
262
     * @see AbstractDocument::getLogicalStructure()
263
     */
264
    public function getLogicalStructure(string $id, bool $recursive = false): array
265
    {
266
        $details = [];
267
        // Is the requested logical unit already loaded?
268
        if (
269
            !$recursive
270
            && !empty($this->logicalUnits[$id])
271
        ) {
272
            // Yes. Return it.
273
            return $this->logicalUnits[$id];
274
        } elseif (!empty($id)) {
275
            // Get specified logical unit.
276
            $divs = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $id . '"]');
277
        } else {
278
            // Get all logical units at top level.
279
            $divs = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]/mets:div');
280
        }
281
        if (!empty($divs)) {
282
            if (!$recursive) {
283
                // Get the details for the first xpath hit.
284
                $details = $this->getLogicalStructureInfo($divs[0]);
285
            } else {
286
                // Walk the logical structure recursively and fill the whole table of contents.
287
                foreach ($divs as $div) {
288
                    $this->tableOfContents[] = $this->getLogicalStructureInfo($div, $recursive);
0 ignored issues
show
Bug introduced by
The property tableOfContents is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
289
                }
290
            }
291
        }
292
        return $details;
293
    }
294
295
    /**
296
     * This gets details about a logical structure element
297
     *
298
     * @access protected
299
     *
300
     * @param \SimpleXMLElement $structure The logical structure node
301
     * @param bool $recursive Whether to include the child elements
302
     *
303
     * @return array Array of the element's id, label, type and physical page indexes/mptr link
304
     */
305
    protected function getLogicalStructureInfo(\SimpleXMLElement $structure, bool $recursive = false): array
306
    {
307
        $attributes = $structure->attributes();
308
309
        // Extract identity information.
310
        $details = [
311
            'id' => (string) $attributes['ID'],
312
            'dmdId' => isset($attributes['DMDID']) ? (string) $attributes['DMDID'] : '',
313
            'admId' => isset($attributes['ADMID']) ? (string) $attributes['ADMID'] : '',
314
            'order' => isset($attributes['ORDER']) ? (string) $attributes['ORDER'] : '',
315
            'label' => isset($attributes['LABEL']) ? (string) $attributes['LABEL'] : '',
316
            'orderlabel' => isset($attributes['ORDERLABEL']) ? (string) $attributes['ORDERLABEL'] : '',
317
            'contentIds' => isset($attributes['CONTENTIDS']) ? (string) $attributes['CONTENTIDS'] : '',
318
            'volume' => '',
319
            'year' => '',
320
            'pagination' => '',
321
            'type' => isset($attributes['TYPE']) ? (string) $attributes['TYPE'] : '',
322
            'description' => '',
323
            'thumbnailId' => null,
324
            'files' => [],
325
        ];
326
327
        // Set volume and year information only if no label is set and this is the toplevel structure element.
328
        if (empty($details['label']) && empty($details['orderlabel'])) {
329
            $metadata = $this->getMetadata($details['id']);
330
            $details['volume'] = $metadata['volume'][0] ?? '';
331
            $details['year'] = $metadata['year'][0] ?? '';
332
        }
333
334
        // add description for 3D objects
335
        if ($details['type'] == 'object') {
336
            $metadata = $this->getMetadata($details['id']);
337
            $details['description'] = $metadata['description'][0] ?? '';
338
        }
339
340
        // Load smLinks.
341
        $this->magicGetSmLinks();
342
        // Load physical structure.
343
        $this->magicGetPhysicalStructure();
344
        // Get the physical page or external file this structure element is pointing at.
345
        // Is there a mptr node?
346
        if (count($structure->children('http://www.loc.gov/METS/')->mptr)) {
347
            // Yes. Get the file reference.
348
            $details['points'] = (string) $structure->children('http://www.loc.gov/METS/')->mptr[0]->attributes('http://www.w3.org/1999/xlink')->href;
349
        } elseif (
350
            !empty($this->physicalStructure)
351
            && array_key_exists($details['id'], $this->smLinks['l2p'])
352
        ) {
353
            // Link logical structure to the first corresponding physical page/track.
354
            $details['points'] = max((int) array_search($this->smLinks['l2p'][$details['id']][0], $this->physicalStructure, true), 1);
355
            $details['thumbnailId'] = $this->getThumbnail();
356
            // Get page/track number of the first page/track related to this structure element.
357
            $details['pagination'] = $this->physicalStructureInfo[$this->smLinks['l2p'][$details['id']][0]]['orderlabel'];
358
        } elseif ($details['id'] == $this->magicGetToplevelId()) {
359
            // Point to self if this is the toplevel structure.
360
            $details['points'] = 1;
361
            $details['thumbnailId'] = $this->getThumbnail();
362
        }
363
        if ($details['thumbnailId'] === null) {
364
            unset($details['thumbnailId']);
365
        }
366
        // Get the files this structure element is pointing at.
367
        $fileUse = $this->magicGetFileGrps();
368
        // Get the file representations from fileSec node.
369
        foreach ($structure->children('http://www.loc.gov/METS/')->fptr as $fptr) {
370
            // Check if file has valid @USE attribute.
371
            if (!empty($fileUse[(string) $fptr->attributes()->FILEID])) {
0 ignored issues
show
Bug introduced by
The method attributes() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

371
            if (!empty($fileUse[(string) $fptr->/** @scrutinizer ignore-call */ attributes()->FILEID])) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
372
                $details['files'][$fileUse[(string) $fptr->attributes()->FILEID]] = (string) $fptr->attributes()->FILEID;
373
            }
374
        }
375
        // Keep for later usage.
376
        $this->logicalUnits[$details['id']] = $details;
377
        // Walk the structure recursively? And are there any children of the current element?
378
        if (
379
            $recursive
380
            && count($structure->children('http://www.loc.gov/METS/')->div)
381
        ) {
382
            $details['children'] = [];
383
            foreach ($structure->children('http://www.loc.gov/METS/')->div as $child) {
384
                // Repeat for all children.
385
                $details['children'][] = $this->getLogicalStructureInfo($child, true);
0 ignored issues
show
Bug introduced by
It seems like $child can also be of type null; however, parameter $structure of Kitodo\Dlf\Common\MetsDo...tLogicalStructureInfo() does only seem to accept SimpleXMLElement, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

385
                $details['children'][] = $this->getLogicalStructureInfo(/** @scrutinizer ignore-type */ $child, true);
Loading history...
386
            }
387
        }
388
        return $details;
389
    }
390
391
    /**
392
     * Get thumbnail for logical structure info.
393
     *
394
     * @access private
395
     *
396
     * @param string $id empty if top level document, else passed the id of parent document
397
     *
398
     * @return ?string thumbnail or null if not found
399
     */
400
    private function getThumbnail(string $id = '')
401
    {
402
        // Load plugin configuration.
403
        $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'files');
404
        $fileGrpsThumb = GeneralUtility::trimExplode(',', $extConf['fileGrpThumbs']);
405
406
        $thumbnail = null;
407
408
        while ($fileGrpThumb = array_shift($fileGrpsThumb)) {
409
            if (empty($id)) {
410
                $thumbnail = $this->physicalStructureInfo[$this->physicalStructure[1]]['files'][$fileGrpThumb] ?? null;
411
            } else {
412
                $parentId = $this->smLinks['l2p'][$id][0] ?? null;
413
                $thumbnail = $this->physicalStructureInfo[$parentId]['files'][$fileGrpThumb] ?? null;
414
            }
415
416
            if (!empty($thumbnail)) {
417
                break;
418
            }
419
        }
420
        return $thumbnail;
421
    }
422
423
    /**
424
     * @see AbstractDocument::getMetadata()
425
     */
426
    public function getMetadata(string $id, int $cPid = 0): array
427
    {
428
        $cPid = $this->ensureValidPid($cPid);
429
430
        if ($cPid == 0) {
431
            $this->logger->warning('Invalid PID for metadata definitions');
432
            return [];
433
        }
434
435
        $metadata = $this->getMetadataFromArray($id, $cPid);
436
437
        if (empty($metadata)) {
438
            return [];
439
        }
440
441
        $metadata = $this->processMetadataSections($id, $cPid, $metadata);
442
443
        if (!empty($metadata)) {
444
            $metadata = $this->setDefaultTitleAndDate($metadata);
445
        }
446
447
        return $metadata;
448
    }
449
450
    /**
451
     * Ensure that pId is valid.
452
     *
453
     * @access private
454
     *
455
     * @param integer $cPid
456
     *
457
     * @return integer
458
     */
459
    private function ensureValidPid(int $cPid): int
460
    {
461
        $cPid = max($cPid, 0);
462
        if ($cPid == 0 && ($this->cPid || $this->pid)) {
463
            // Retain current PID.
464
            $cPid = $this->cPid ?: $this->pid;
465
        }
466
        return $cPid;
467
    }
468
469
    /**
470
     * Get metadata from array.
471
     *
472
     * @access private
473
     *
474
     * @param string $id
475
     * @param integer $cPid
476
     *
477
     * @return array
478
     */
479
    private function getMetadataFromArray(string $id, int $cPid): array
480
    {
481
        if (!empty($this->metadataArray[$id]) && $this->metadataArray[0] == $cPid) {
482
            return $this->metadataArray[$id];
483
        }
484
        return $this->initializeMetadata('METS');
485
    }
486
487
    /**
488
     * Process metadata sections.
489
     *
490
     * @access private
491
     *
492
     * @param string $id
493
     * @param integer $cPid
494
     * @param array $metadata
495
     *
496
     * @return array
497
     */
498
    private function processMetadataSections(string $id, int $cPid, array $metadata): array
499
    {
500
        $mdIds = $this->getMetadataIds($id);
501
        if (empty($mdIds)) {
502
            // There is no metadata section for this structure node.
503
            return [];
504
        }
505
        // Associative array used as set of available section types (dmdSec, techMD, ...)
506
        $hasMetadataSection = [];
507
        // Load available metadata formats and metadata sections.
508
        $this->loadFormats();
509
        $this->magicGetMdSec();
510
511
        $metadata['type'] = $this->getLogicalUnitType($id);
512
513
        foreach ($mdIds as $dmdId) {
514
            $mdSectionType = $this->mdSec[$dmdId]['section'];
515
516
            if ($mdSectionType === 'dmdSec' && isset($hasMetadataSection['dmdSec'])) {
517
                continue;
518
            }
519
520
            if (!$this->extractAndProcessMetadata($dmdId, $mdSectionType, $metadata, $cPid, $hasMetadataSection)) {
521
                continue;
522
            }
523
524
            $hasMetadataSection[$mdSectionType] = true;
525
        }
526
527
        // Files are not expected to reference a dmdSec
528
        if (isset($this->fileInfos[$id]) || isset($hasMetadataSection['dmdSec'])) {
529
            return $metadata;
530
        } else {
531
            $this->logger->warning('No supported descriptive metadata found for logical structure with @ID "' . $id . '"');
532
            return [];
533
        }
534
    }
535
536
    /**
537
     * Get logical unit type.
538
     *
539
     * @access private
540
     *
541
     * @param string $id
542
     *
543
     * @return array
544
     */
545
    private function getLogicalUnitType(string $id): array
546
    {
547
        if (!empty($this->logicalUnits[$id])) {
548
            return [$this->logicalUnits[$id]['type']];
549
        } else {
550
            $struct = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $id . '"]/@TYPE');
551
            if (!empty($struct)) {
552
                return [(string) $struct[0]];
553
            }
554
        }
555
        return [];
556
    }
557
558
    /**
559
     * Extract and process metadata.
560
     *
561
     * @access private
562
     *
563
     * @param string $dmdId
564
     * @param string $mdSectionType
565
     * @param array $metadata
566
     * @param integer $cPid
567
     * @param array $hasMetadataSection
568
     *
569
     * @return boolean
570
     */
571
    private function extractAndProcessMetadata(string $dmdId, string $mdSectionType, array &$metadata, int $cPid, array $hasMetadataSection): bool
572
    {
573
        if ($mdSectionType === 'dmdSec' && isset($hasMetadataSection['dmdSec'])) {
574
            return true;
575
        }
576
577
        $metadataExtracted = $this->extractMetadataIfTypeSupported($dmdId, $mdSectionType, $metadata);
578
579
        if (!$metadataExtracted) {
580
            return false;
581
        }
582
583
        $additionalMetadata = $this->getAdditionalMetadataFromDatabase($cPid, $dmdId);
584
        // We need a \DOMDocument here, because SimpleXML doesn't support XPath functions properly.
585
        $domNode = dom_import_simplexml($this->mdSec[$dmdId]['xml']);
586
        $domXPath = new \DOMXPath($domNode->ownerDocument);
0 ignored issues
show
Bug introduced by
It seems like $domNode->ownerDocument can also be of type null; however, parameter $document of DOMXPath::__construct() does only seem to accept DOMDocument, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

586
        $domXPath = new \DOMXPath(/** @scrutinizer ignore-type */ $domNode->ownerDocument);
Loading history...
587
        $this->registerNamespaces($domXPath);
588
589
        $this->processAdditionalMetadata($additionalMetadata, $domXPath, $domNode, $metadata);
590
591
        return true;
592
    }
593
594
    /**
595
     * Process additional metadata.
596
     *
597
     * @access private
598
     *
599
     * @param array $additionalMetadata
600
     * @param \DOMXPath $domXPath
601
     * @param \DOMElement $domNode
602
     * @param array $metadata
603
     *
604
     * @return void
605
     */
606
    private function processAdditionalMetadata(array $additionalMetadata, \DOMXPath $domXPath, \DOMElement $domNode, array &$metadata): void
607
    {
608
        foreach ($additionalMetadata as $resArray) {
609
            $this->setMetadataFieldValues($resArray, $domXPath, $domNode, $metadata);
610
            $this->setDefaultMetadataValue($resArray, $metadata);
611
            $this->setSortableMetadataValue($resArray, $domXPath, $domNode, $metadata);
612
        }
613
    }
614
615
    /**
616
     * Set metadata field values.
617
     *
618
     * @access private
619
     *
620
     * @param array $resArray
621
     * @param \DOMXPath $domXPath
622
     * @param \DOMElement $domNode
623
     * @param array $metadata
624
     *
625
     * @return void
626
     */
627
    private function setMetadataFieldValues(array $resArray, \DOMXPath $domXPath, \DOMElement $domNode, array &$metadata): void
628
    {
629
        if ($resArray['format'] > 0 && !empty($resArray['xpath'])) {
630
            $values = $domXPath->evaluate($resArray['xpath'], $domNode);
631
            if ($values instanceof \DOMNodeList && $values->length > 0) {
632
                $metadata[$resArray['index_name']] = [];
633
                foreach ($values as $value) {
634
                    $metadata[$resArray['index_name']][] = trim((string) $value->nodeValue);
635
                }
636
            } elseif (!($values instanceof \DOMNodeList)) {
637
                $metadata[$resArray['index_name']] = [trim((string) $values)];
638
            }
639
        }
640
    }
641
642
    /**
643
     * Set default metadata value.
644
     *
645
     * @access private
646
     *
647
     * @param array $resArray
648
     * @param array $metadata
649
     *
650
     * @return void
651
     */
652
    private function setDefaultMetadataValue(array $resArray, array &$metadata): void
653
    {
654
        if (empty($metadata[$resArray['index_name']][0]) && strlen($resArray['default_value']) > 0) {
655
            $metadata[$resArray['index_name']] = [$resArray['default_value']];
656
        }
657
    }
658
659
    /**
660
     * Set sortable metadata value.
661
     *
662
     * @access private
663
     *
664
     * @param array $resArray
665
     * @param \DOMXPath $domXPath
666
     * @param \DOMElement $domNode
667
     * @param array $metadata
668
     *
669
     * @return void
670
     */
671
    private function setSortableMetadataValue(array $resArray, \DOMXPath $domXPath, \DOMElement $domNode, array &$metadata): void
672
    {
673
        if (!empty($metadata[$resArray['index_name']]) && $resArray['is_sortable']) {
674
            if ($resArray['format'] > 0 && !empty($resArray['xpath_sorting'])) {
675
                $values = $domXPath->evaluate($resArray['xpath_sorting'], $domNode);
676
                if ($values instanceof \DOMNodeList && $values->length > 0) {
677
                    $metadata[$resArray['index_name'] . '_sorting'][0] = trim((string) $values->item(0)->nodeValue);
678
                } elseif (!($values instanceof \DOMNodeList)) {
679
                    $metadata[$resArray['index_name'] . '_sorting'][0] = trim((string) $values);
680
                }
681
            }
682
            if (empty($metadata[$resArray['index_name'] . '_sorting'][0])) {
683
                $metadata[$resArray['index_name'] . '_sorting'][0] = $metadata[$resArray['index_name']][0];
684
            }
685
        }
686
    }
687
688
    /**
689
     * Set default title and date if those metadata is not set.
690
     *
691
     * @access private
692
     *
693
     * @param array $metadata
694
     *
695
     * @return array
696
     */
697
    private function setDefaultTitleAndDate(array $metadata): array
698
    {
699
        // Set title to empty string if not present.
700
        if (empty($metadata['title'][0])) {
701
            $metadata['title'][0] = '';
702
            $metadata['title_sorting'][0] = '';
703
        }
704
705
        // Set title_sorting to title as default.
706
        if (empty($metadata['title_sorting'][0])) {
707
            $metadata['title_sorting'][0] = $metadata['title'][0];
708
        }
709
710
        // Set date to empty string if not present.
711
        if (empty($metadata['date'][0])) {
712
            $metadata['date'][0] = '';
713
        }
714
715
        return $metadata;
716
    }
717
718
    /**
719
     * Extract metadata if metadata type is supported.
720
     *
721
     * @access private
722
     *
723
     * @param string $dmdId descriptive metadata id
724
     * @param string $mdSectionType metadata section type
725
     * @param array &$metadata
726
     *
727
     * @return bool true if extraction successful, false otherwise
728
     */
729
    private function extractMetadataIfTypeSupported(string $dmdId, string $mdSectionType, array &$metadata)
730
    {
731
        // Is this metadata format supported?
732
        if (!empty($this->formats[$this->mdSec[$dmdId]['type']])) {
733
            if (!empty($this->formats[$this->mdSec[$dmdId]['type']]['class'])) {
734
                $class = $this->formats[$this->mdSec[$dmdId]['type']]['class'];
735
                // Get the metadata from class.
736
                if (class_exists($class)) {
737
                    $obj = GeneralUtility::makeInstance($class);
738
                    if ($obj instanceof MetadataInterface) {
739
                        $obj->extractMetadata($this->mdSec[$dmdId]['xml'], $metadata, GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'general')['useExternalApisForMetadata']);
740
                        return true;
741
                    }
742
                } else {
743
                    $this->logger->warning('Invalid class/method "' . $class . '->extractMetadata()" for metadata format "' . $this->mdSec[$dmdId]['type'] . '"');
744
                }
745
            }
746
        } else {
747
            $this->logger->notice('Unsupported metadata format "' . $this->mdSec[$dmdId]['type'] . '" in ' . $mdSectionType . ' with @ID "' . $dmdId . '"');
748
        }
749
        return false;
750
    }
751
752
    /**
753
     * Get additional data from database.
754
     *
755
     * @access private
756
     *
757
     * @param int $cPid page id
758
     * @param string $dmdId descriptive metadata id
759
     *
760
     * @return array additional metadata data queried from database
761
     */
762
    private function getAdditionalMetadataFromDatabase(int $cPid, string $dmdId)
763
    {
764
        $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
765
            ->getQueryBuilderForTable('tx_dlf_metadata');
766
        // Get hidden records, too.
767
        $queryBuilder
768
            ->getRestrictions()
769
            ->removeByType(HiddenRestriction::class);
770
        // Get all metadata with configured xpath and applicable format first.
771
        $resultWithFormat = $queryBuilder
772
            ->select(
773
                'tx_dlf_metadata.index_name AS index_name',
774
                'tx_dlf_metadataformat_joins.xpath AS xpath',
775
                'tx_dlf_metadataformat_joins.xpath_sorting AS xpath_sorting',
776
                'tx_dlf_metadata.is_sortable AS is_sortable',
777
                'tx_dlf_metadata.default_value AS default_value',
778
                'tx_dlf_metadata.format AS format'
779
            )
780
            ->from('tx_dlf_metadata')
781
            ->innerJoin(
782
                'tx_dlf_metadata',
783
                'tx_dlf_metadataformat',
784
                'tx_dlf_metadataformat_joins',
785
                $queryBuilder->expr()->eq(
786
                    'tx_dlf_metadataformat_joins.parent_id',
787
                    'tx_dlf_metadata.uid'
788
                )
789
            )
790
            ->innerJoin(
791
                'tx_dlf_metadataformat_joins',
792
                'tx_dlf_formats',
793
                'tx_dlf_formats_joins',
794
                $queryBuilder->expr()->eq(
795
                    'tx_dlf_formats_joins.uid',
796
                    'tx_dlf_metadataformat_joins.encoded'
797
                )
798
            )
799
            ->where(
800
                $queryBuilder->expr()->eq('tx_dlf_metadata.pid', $cPid),
801
                $queryBuilder->expr()->eq('tx_dlf_metadata.l18n_parent', 0),
802
                $queryBuilder->expr()->eq('tx_dlf_metadataformat_joins.pid', $cPid),
803
                $queryBuilder->expr()->eq('tx_dlf_formats_joins.type', $queryBuilder->createNamedParameter($this->mdSec[$dmdId]['type']))
804
            )
805
            ->execute();
806
        // Get all metadata without a format, but with a default value next.
807
        $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
808
            ->getQueryBuilderForTable('tx_dlf_metadata');
809
            // Get hidden records, too.
810
        $queryBuilder
811
            ->getRestrictions()
812
            ->removeByType(HiddenRestriction::class);
813
        $resultWithoutFormat = $queryBuilder
814
            ->select(
815
                'tx_dlf_metadata.index_name AS index_name',
816
                'tx_dlf_metadata.is_sortable AS is_sortable',
817
                'tx_dlf_metadata.default_value AS default_value',
818
                'tx_dlf_metadata.format AS format'
819
            )
820
            ->from('tx_dlf_metadata')
821
            ->where(
822
                $queryBuilder->expr()->eq('tx_dlf_metadata.pid', $cPid),
823
                $queryBuilder->expr()->eq('tx_dlf_metadata.l18n_parent', 0),
824
                $queryBuilder->expr()->eq('tx_dlf_metadata.format', 0),
825
                $queryBuilder->expr()->neq('tx_dlf_metadata.default_value', $queryBuilder->createNamedParameter(''))
826
            )
827
            ->execute();
828
        // Merge both result sets.
829
        return array_merge($resultWithFormat->fetchAllAssociative(), $resultWithoutFormat->fetchAllAssociative());
830
    }
831
832
    /**
833
     * Get IDs of (descriptive and administrative) metadata sections
834
     * referenced by node of given $id. The $id may refer to either
835
     * a logical structure node or to a file.
836
     *
837
     * @access protected
838
     *
839
     * @param string $id The "@ID" attribute of the file node
840
     *
841
     * @return array
842
     */
843
    protected function getMetadataIds(string $id): array
844
    {
845
        // Load amdSecChildIds concordance
846
        $this->magicGetMdSec();
847
        $fileInfo = $this->getFileInfo($id);
848
849
        // Get DMDID and ADMID of logical structure node
850
        if (!empty($this->logicalUnits[$id])) {
851
            $dmdIds = $this->logicalUnits[$id]['dmdId'] ?? '';
852
            $admIds = $this->logicalUnits[$id]['admId'] ?? '';
853
        } else {
854
            $mdSec = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $id . '"]')[0];
855
            if ($mdSec) {
0 ignored issues
show
introduced by
$mdSec is of type SimpleXMLElement, thus it always evaluated to true.
Loading history...
856
                $dmdIds = (string) $mdSec->attributes()->DMDID;
857
                $admIds = (string) $mdSec->attributes()->ADMID;
858
            } elseif (isset($fileInfo)) {
859
                $dmdIds = $fileInfo['dmdId'];
860
                $admIds = $fileInfo['admId'];
861
            } else {
862
                $dmdIds = '';
863
                $admIds = '';
864
            }
865
        }
866
867
        // Handle multiple DMDIDs/ADMIDs
868
        $allMdIds = explode(' ', $dmdIds);
869
870
        foreach (explode(' ', $admIds) as $admId) {
871
            if (isset($this->mdSec[$admId])) {
872
                // $admId references an actual metadata section such as techMD
873
                $allMdIds[] = $admId;
874
            } elseif (isset($this->amdSecChildIds[$admId])) {
875
                // $admId references a <mets:amdSec> element. Resolve child elements.
876
                foreach ($this->amdSecChildIds[$admId] as $childId) {
877
                    $allMdIds[] = $childId;
878
                }
879
            }
880
        }
881
882
        return array_filter(
883
            $allMdIds,
884
            function ($element) {
885
                return !empty($element);
886
            }
887
        );
888
    }
889
890
    /**
891
     * @see AbstractDocument::getFullText()
892
     */
893
    public function getFullText(string $id): string
894
    {
895
        $fullText = '';
896
897
        // Load fileGrps and check for full text files.
898
        $this->magicGetFileGrps();
899
        if ($this->hasFulltext) {
900
            $fullText = $this->getFullTextFromXml($id);
901
        }
902
        return $fullText;
903
    }
904
905
    /**
906
     * @see AbstractDocument::getStructureDepth()
907
     */
908
    public function getStructureDepth(string $logId)
909
    {
910
        $ancestors = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $logId . '"]/ancestor::*');
911
        if (!empty($ancestors)) {
912
            return count($ancestors);
913
        } else {
914
            return 0;
915
        }
916
    }
917
918
    /**
919
     * @see AbstractDocument::init()
920
     */
921
    protected function init(string $location, array $settings): void
922
    {
923
        $this->logger = GeneralUtility::makeInstance(LogManager::class)->getLogger(get_class($this));
924
        $this->settings = $settings;
925
        // Get METS node from XML file.
926
        $this->registerNamespaces($this->xml);
927
        $mets = $this->xml->xpath('//mets:mets');
928
        if (!empty($mets)) {
929
            $this->mets = $mets[0];
0 ignored issues
show
Bug introduced by
The property mets is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
930
            // Register namespaces.
931
            $this->registerNamespaces($this->mets);
932
        } else {
933
            if (!empty($location)) {
934
                $this->logger->error('No METS part found in document with location "' . $location . '".');
935
            } elseif (!empty($this->recordId)) {
936
                $this->logger->error('No METS part found in document with recordId "' . $this->recordId . '".');
937
            } else {
938
                $this->logger->error('No METS part found in current document.');
939
            }
940
        }
941
    }
942
943
    /**
944
     * @see AbstractDocument::loadLocation()
945
     */
946
    protected function loadLocation(string $location): bool
947
    {
948
        $fileResource = Helper::getUrl($location);
949
        if ($fileResource !== false) {
950
            $xml = Helper::getXmlFileAsString($fileResource);
951
            // Set some basic properties.
952
            if ($xml !== false) {
953
                $this->xml = $xml;
954
                return true;
955
            }
956
        }
957
        $this->logger->error('Could not load XML file from "' . $location . '"');
958
        return false;
959
    }
960
961
    /**
962
     * @see AbstractDocument::ensureHasFulltextIsSet()
963
     */
964
    protected function ensureHasFulltextIsSet(): void
965
    {
966
        // Are the fileGrps already loaded?
967
        if (!$this->fileGrpsLoaded) {
968
            $this->magicGetFileGrps();
969
        }
970
    }
971
972
    /**
973
     * @see AbstractDocument::setPreloadedDocument()
974
     */
975
    protected function setPreloadedDocument($preloadedDocument): bool
976
    {
977
978
        if ($preloadedDocument instanceof \SimpleXMLElement) {
979
            $this->xml = $preloadedDocument;
980
            return true;
981
        }
982
        return false;
983
    }
984
985
    /**
986
     * @see AbstractDocument::getDocument()
987
     */
988
    protected function getDocument(): \SimpleXMLElement
989
    {
990
        return $this->mets;
991
    }
992
993
    /**
994
     * This builds an array of the document's metadata sections
995
     *
996
     * @access protected
997
     *
998
     * @return array Array of metadata sections with their IDs as array key
999
     */
1000
    protected function magicGetMdSec(): array
1001
    {
1002
        if (!$this->mdSecLoaded) {
1003
            $this->loadFormats();
1004
1005
            foreach ($this->mets->xpath('./mets:dmdSec') as $dmdSecTag) {
1006
                $dmdSec = $this->processMdSec($dmdSecTag);
1007
1008
                if ($dmdSec !== null) {
1009
                    $this->mdSec[$dmdSec['id']] = $dmdSec;
0 ignored issues
show
Bug introduced by
The property mdSec is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1010
                    $this->dmdSec[$dmdSec['id']] = $dmdSec;
0 ignored issues
show
Bug introduced by
The property dmdSec is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1011
                }
1012
            }
1013
1014
            foreach ($this->mets->xpath('./mets:amdSec') as $amdSecTag) {
1015
                $childIds = [];
1016
1017
                foreach ($amdSecTag->children('http://www.loc.gov/METS/') as $mdSecTag) {
1018
                    if (!in_array($mdSecTag->getName(), self::ALLOWED_AMD_SEC)) {
1019
                        continue;
1020
                    }
1021
1022
                    // TODO: Should we check that the format may occur within this type (e.g., to ignore VIDEOMD within rightsMD)?
1023
                    $mdSec = $this->processMdSec($mdSecTag);
0 ignored issues
show
Bug introduced by
It seems like $mdSecTag can also be of type null; however, parameter $element of Kitodo\Dlf\Common\MetsDocument::processMdSec() does only seem to accept SimpleXMLElement, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

1023
                    $mdSec = $this->processMdSec(/** @scrutinizer ignore-type */ $mdSecTag);
Loading history...
1024
1025
                    if ($mdSec !== null) {
1026
                        $this->mdSec[$mdSec['id']] = $mdSec;
1027
1028
                        $childIds[] = $mdSec['id'];
1029
                    }
1030
                }
1031
1032
                $amdSecId = (string) $amdSecTag->attributes()->ID;
1033
                if (!empty($amdSecId)) {
1034
                    $this->amdSecChildIds[$amdSecId] = $childIds;
1035
                }
1036
            }
1037
1038
            $this->mdSecLoaded = true;
1039
        }
1040
        return $this->mdSec;
1041
    }
1042
1043
    /**
1044
     * Gets the document's metadata sections
1045
     *
1046
     * @access protected
1047
     *
1048
     * @return array Array of metadata sections with their IDs as array key
1049
     */
1050
    protected function magicGetDmdSec(): array
1051
    {
1052
        $this->magicGetMdSec();
1053
        return $this->dmdSec;
1054
    }
1055
1056
    /**
1057
     * Processes an element of METS `mdSecType`.
1058
     *
1059
     * @access protected
1060
     *
1061
     * @param \SimpleXMLElement $element
1062
     *
1063
     * @return array|null The processed metadata section
1064
     */
1065
    protected function processMdSec(\SimpleXMLElement $element): ?array
1066
    {
1067
        $mdId = (string) $element->attributes()->ID;
1068
        if (empty($mdId)) {
1069
            return null;
1070
        }
1071
1072
        $this->registerNamespaces($element);
1073
1074
        $type = '';
1075
        $mdType = $element->xpath('./mets:mdWrap[not(@MDTYPE="OTHER")]/@MDTYPE');
1076
        $otherMdType = $element->xpath('./mets:mdWrap[@MDTYPE="OTHER"]/@OTHERMDTYPE');
1077
1078
        if (!empty($mdType) && !empty($this->formats[(string) $mdType[0]])) {
1079
            $type = (string) $mdType[0];
1080
            $xml = $element->xpath('./mets:mdWrap[@MDTYPE="' . $type . '"]/mets:xmlData/' . strtolower($type) . ':' . $this->formats[$type]['rootElement']);
1081
        } elseif (!empty($otherMdType) && !empty($this->formats[(string) $otherMdType[0]])) {
1082
            $type = (string) $otherMdType[0];
1083
            $xml = $element->xpath('./mets:mdWrap[@MDTYPE="OTHER"][@OTHERMDTYPE="' . $type . '"]/mets:xmlData/' . strtolower($type) . ':' . $this->formats[$type]['rootElement']);
1084
        }
1085
1086
        if (empty($xml)) {
1087
            return null;
1088
        }
1089
1090
        $this->registerNamespaces($xml[0]);
1091
1092
        return [
1093
            'id' => $mdId,
1094
            'section' => $element->getName(),
1095
            'type' => $type,
1096
            'xml' => $xml[0],
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $xml does not seem to be defined for all execution paths leading up to this point.
Loading history...
1097
        ];
1098
    }
1099
1100
    /**
1101
     * This builds the file ID -> USE concordance
1102
     *
1103
     * @access protected
1104
     *
1105
     * @return array Array of file use groups with file IDs
1106
     */
1107
    protected function magicGetFileGrps(): array
1108
    {
1109
        if (!$this->fileGrpsLoaded) {
1110
            // Get configured USE attributes.
1111
            $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'files');
1112
            $useGrps = GeneralUtility::trimExplode(',', $extConf['fileGrpImages']);
1113
            if (!empty($extConf['fileGrpThumbs'])) {
1114
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpThumbs']));
1115
            }
1116
            if (!empty($extConf['fileGrpDownload'])) {
1117
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpDownload']));
1118
            }
1119
            if (!empty($extConf['fileGrpFulltext'])) {
1120
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpFulltext']));
1121
            }
1122
            if (!empty($extConf['fileGrpAudio'])) {
1123
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpAudio']));
1124
            }
1125
            // Get all file groups.
1126
            $fileGrps = $this->mets->xpath('./mets:fileSec/mets:fileGrp');
1127
            if (!empty($fileGrps)) {
1128
                // Build concordance for configured USE attributes.
1129
                foreach ($fileGrps as $fileGrp) {
1130
                    if (in_array((string) $fileGrp['USE'], $useGrps)) {
1131
                        foreach ($fileGrp->children('http://www.loc.gov/METS/')->file as $file) {
1132
                            $fileId = (string) $file->attributes()->ID;
0 ignored issues
show
Bug introduced by
The method attributes() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

1132
                            $fileId = (string) $file->/** @scrutinizer ignore-call */ attributes()->ID;

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
1133
                            $this->fileGrps[$fileId] = (string) $fileGrp['USE'];
0 ignored issues
show
Bug introduced by
The property fileGrps is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1134
                            $this->fileInfos[$fileId] = [
0 ignored issues
show
Bug introduced by
The property fileInfos is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1135
                                'fileGrp' => (string) $fileGrp['USE'],
1136
                                'admId' => (string) $file->attributes()->ADMID,
1137
                                'dmdId' => (string) $file->attributes()->DMDID,
1138
                            ];
1139
                        }
1140
                    }
1141
                }
1142
            }
1143
            // Are there any fulltext files available?
1144
            if (
1145
                !empty($extConf['fileGrpFulltext'])
1146
                && array_intersect(GeneralUtility::trimExplode(',', $extConf['fileGrpFulltext']), $this->fileGrps) !== []
1147
            ) {
1148
                $this->hasFulltext = true;
0 ignored issues
show
Bug introduced by
The property hasFulltext is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1149
            }
1150
            $this->fileGrpsLoaded = true;
1151
        }
1152
        return $this->fileGrps;
1153
    }
1154
1155
    /**
1156
     * @see AbstractDocument::prepareMetadataArray()
1157
     */
1158
    protected function prepareMetadataArray(int $cPid): void
1159
    {
1160
        $ids = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@DMDID]/@ID');
1161
        // Get all logical structure nodes with metadata.
1162
        if (!empty($ids)) {
1163
            foreach ($ids as $id) {
1164
                $this->metadataArray[(string) $id] = $this->getMetadata((string) $id, $cPid);
0 ignored issues
show
Bug introduced by
The property metadataArray is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1165
            }
1166
        }
1167
        // Set current PID for metadata definitions.
1168
    }
1169
1170
    /**
1171
     * This returns $this->mets via __get()
1172
     *
1173
     * @access protected
1174
     *
1175
     * @return \SimpleXMLElement The XML's METS part as \SimpleXMLElement object
1176
     */
1177
    protected function magicGetMets(): \SimpleXMLElement
1178
    {
1179
        return $this->mets;
1180
    }
1181
1182
    /**
1183
     * @see AbstractDocument::magicGetPhysicalStructure()
1184
     */
1185
    protected function magicGetPhysicalStructure(): array
1186
    {
1187
        // Is there no physical structure array yet?
1188
        if (!$this->physicalStructureLoaded) {
1189
            // Does the document have a structMap node of type "PHYSICAL"?
1190
            $elementNodes = $this->mets->xpath('./mets:structMap[@TYPE="PHYSICAL"]/mets:div[@TYPE="physSequence"]/mets:div');
1191
            if (!empty($elementNodes)) {
1192
                // Get file groups.
1193
                $fileUse = $this->magicGetFileGrps();
1194
                // Get the physical sequence's metadata.
1195
                $physNode = $this->mets->xpath('./mets:structMap[@TYPE="PHYSICAL"]/mets:div[@TYPE="physSequence"]');
1196
                $firstNode = $physNode[0];
1197
                $id = (string) $firstNode['ID'];
1198
                $this->physicalStructureInfo[$id]['id'] = $id;
0 ignored issues
show
Bug introduced by
The property physicalStructureInfo is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1199
                $this->physicalStructureInfo[$id]['dmdId'] = isset($firstNode['DMDID']) ? (string) $firstNode['DMDID'] : '';
1200
                $this->physicalStructureInfo[$id]['admId'] = isset($firstNode['ADMID']) ? (string) $firstNode['ADMID'] : '';
1201
                $this->physicalStructureInfo[$id]['order'] = isset($firstNode['ORDER']) ? (string) $firstNode['ORDER'] : '';
1202
                $this->physicalStructureInfo[$id]['label'] = isset($firstNode['LABEL']) ? (string) $firstNode['LABEL'] : '';
1203
                $this->physicalStructureInfo[$id]['orderlabel'] = isset($firstNode['ORDERLABEL']) ? (string) $firstNode['ORDERLABEL'] : '';
1204
                $this->physicalStructureInfo[$id]['type'] = (string) $firstNode['TYPE'];
1205
                $this->physicalStructureInfo[$id]['contentIds'] = isset($firstNode['CONTENTIDS']) ? (string) $firstNode['CONTENTIDS'] : '';
1206
                // Get the file representations from fileSec node.
1207
                foreach ($physNode[0]->children('http://www.loc.gov/METS/')->fptr as $fptr) {
1208
                    // Check if file has valid @USE attribute.
1209
                    if (!empty($fileUse[(string) $fptr->attributes()->FILEID])) {
1210
                        $this->physicalStructureInfo[$id]['files'][$fileUse[(string) $fptr->attributes()->FILEID]] = (string) $fptr->attributes()->FILEID;
1211
                    }
1212
                }
1213
                // Build the physical elements' array from the physical structMap node.
1214
                $elements = [];
1215
                foreach ($elementNodes as $elementNode) {
1216
                    $id = (string) $elementNode['ID'];
1217
                    $order = (int) $elementNode['ORDER'];
1218
                    $elements[$order] = $id;
1219
                    $this->physicalStructureInfo[$elements[$order]]['id'] = $id;
1220
                    $this->physicalStructureInfo[$elements[$order]]['dmdId'] = isset($elementNode['DMDID']) ? (string) $elementNode['DMDID'] : '';
1221
                    $this->physicalStructureInfo[$elements[$order]]['admId'] = isset($elementNode['ADMID']) ? (string) $elementNode['ADMID'] : '';
1222
                    $this->physicalStructureInfo[$elements[$order]]['order'] = isset($elementNode['ORDER']) ? (string) $elementNode['ORDER'] : '';
1223
                    $this->physicalStructureInfo[$elements[$order]]['label'] = isset($elementNode['LABEL']) ? (string) $elementNode['LABEL'] : '';
1224
                    $this->physicalStructureInfo[$elements[$order]]['orderlabel'] = isset($elementNode['ORDERLABEL']) ? (string) $elementNode['ORDERLABEL'] : '';
1225
                    $this->physicalStructureInfo[$elements[$order]]['type'] = (string) $elementNode['TYPE'];
1226
                    $this->physicalStructureInfo[$elements[$order]]['contentIds'] = isset($elementNode['CONTENTIDS']) ? (string) $elementNode['CONTENTIDS'] : '';
1227
                    // Get the file representations from fileSec node.
1228
                    foreach ($elementNode->children('http://www.loc.gov/METS/')->fptr as $fptr) {
1229
                        // Check if file has valid @USE attribute.
1230
                        if (!empty($fileUse[(string) $fptr->attributes()->FILEID])) {
1231
                            $this->physicalStructureInfo[$elements[$order]]['files'][$fileUse[(string) $fptr->attributes()->FILEID]] = (string) $fptr->attributes()->FILEID;
1232
                        }
1233
                    }
1234
                }
1235
                // Sort array by keys (= @ORDER).
1236
                ksort($elements);
1237
                // Set total number of pages/tracks.
1238
                $this->numPages = count($elements);
0 ignored issues
show
Bug introduced by
The property numPages is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1239
                // Merge and re-index the array to get numeric indexes.
1240
                array_unshift($elements, $id);
1241
                $this->physicalStructure = $elements;
0 ignored issues
show
Bug introduced by
The property physicalStructure is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1242
            }
1243
            $this->physicalStructureLoaded = true;
1244
        }
1245
        return $this->physicalStructure;
1246
    }
1247
1248
    /**
1249
     * @see AbstractDocument::magicGetSmLinks()
1250
     */
1251
    protected function magicGetSmLinks(): array
1252
    {
1253
        if (!$this->smLinksLoaded) {
1254
            $smLinks = $this->mets->xpath('./mets:structLink/mets:smLink');
1255
            if (!empty($smLinks)) {
1256
                foreach ($smLinks as $smLink) {
1257
                    $this->smLinks['l2p'][(string) $smLink->attributes('http://www.w3.org/1999/xlink')->from][] = (string) $smLink->attributes('http://www.w3.org/1999/xlink')->to;
0 ignored issues
show
Bug introduced by
The property smLinks is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1258
                    $this->smLinks['p2l'][(string) $smLink->attributes('http://www.w3.org/1999/xlink')->to][] = (string) $smLink->attributes('http://www.w3.org/1999/xlink')->from;
1259
                }
1260
            }
1261
            $this->smLinksLoaded = true;
1262
        }
1263
        return $this->smLinks;
1264
    }
1265
1266
    /**
1267
     * @see AbstractDocument::magicGetThumbnail()
1268
     */
1269
    protected function magicGetThumbnail(bool $forceReload = false): string
1270
    {
1271
        if (
1272
            !$this->thumbnailLoaded
1273
            || $forceReload
1274
        ) {
1275
            // Retain current PID.
1276
            $cPid = $this->cPid ?: $this->pid;
1277
            if (!$cPid) {
1278
                $this->logger->error('Invalid PID ' . $cPid . ' for structure definitions');
1279
                $this->thumbnailLoaded = true;
1280
                return $this->thumbnail;
1281
            }
1282
            // Load extension configuration.
1283
            $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'files');
1284
            if (empty($extConf['fileGrpThumbs'])) {
1285
                $this->logger->warning('No fileGrp for thumbnails specified');
1286
                $this->thumbnailLoaded = true;
1287
                return $this->thumbnail;
1288
            }
1289
            $strctId = $this->magicGetToplevelId();
1290
            $metadata = $this->getToplevelMetadata($cPid);
1291
1292
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
1293
                ->getQueryBuilderForTable('tx_dlf_structures');
1294
1295
            // Get structure element to get thumbnail from.
1296
            $result = $queryBuilder
1297
                ->select('tx_dlf_structures.thumbnail AS thumbnail')
1298
                ->from('tx_dlf_structures')
1299
                ->where(
1300
                    $queryBuilder->expr()->eq('tx_dlf_structures.pid', $cPid),
1301
                    $queryBuilder->expr()->eq('tx_dlf_structures.index_name', $queryBuilder->expr()->literal($metadata['type'][0])),
1302
                    Helper::whereExpression('tx_dlf_structures')
1303
                )
1304
                ->setMaxResults(1)
1305
                ->execute();
1306
1307
            $allResults = $result->fetchAllAssociative();
1308
1309
            if (count($allResults) == 1) {
1310
                $resArray = $allResults[0];
1311
                // Get desired thumbnail structure if not the toplevel structure itself.
1312
                if (!empty($resArray['thumbnail'])) {
1313
                    $strctType = Helper::getIndexNameFromUid($resArray['thumbnail'], 'tx_dlf_structures', $cPid);
1314
                    // Check if this document has a structure element of the desired type.
1315
                    $strctIds = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@TYPE="' . $strctType . '"]/@ID');
1316
                    if (!empty($strctIds)) {
1317
                        $strctId = (string) $strctIds[0];
1318
                    }
1319
                }
1320
                // Load smLinks.
1321
                $this->magicGetSmLinks();
1322
                // Get thumbnail location.
1323
                $fileGrpsThumb = GeneralUtility::trimExplode(',', $extConf['fileGrpThumbs']);
1324
                while ($fileGrpThumb = array_shift($fileGrpsThumb)) {
1325
                    if (
1326
                        $this->magicGetPhysicalStructure()
1327
                        && !empty($this->smLinks['l2p'][$strctId])
1328
                        && !empty($this->physicalStructureInfo[$this->smLinks['l2p'][$strctId][0]]['files'][$fileGrpThumb])
1329
                    ) {
1330
                        $this->thumbnail = $this->getFileLocation($this->physicalStructureInfo[$this->smLinks['l2p'][$strctId][0]]['files'][$fileGrpThumb]);
0 ignored issues
show
Bug introduced by
The property thumbnail is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1331
                        break;
1332
                    } elseif (!empty($this->physicalStructureInfo[$this->physicalStructure[1]]['files'][$fileGrpThumb])) {
1333
                        $this->thumbnail = $this->getFileLocation($this->physicalStructureInfo[$this->physicalStructure[1]]['files'][$fileGrpThumb]);
1334
                        break;
1335
                    }
1336
                }
1337
            } else {
1338
                $this->logger->error('No structure of type "' . $metadata['type'][0] . '" found in database');
1339
            }
1340
            $this->thumbnailLoaded = true;
1341
        }
1342
        return $this->thumbnail;
1343
    }
1344
1345
    /**
1346
     * @see AbstractDocument::magicGetToplevelId()
1347
     */
1348
    protected function magicGetToplevelId(): string
1349
    {
1350
        if (empty($this->toplevelId)) {
1351
            // Get all logical structure nodes with metadata, but without associated METS-Pointers.
1352
            $divs = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@DMDID and not(./mets:mptr)]');
1353
            if (!empty($divs)) {
1354
                // Load smLinks.
1355
                $this->magicGetSmLinks();
1356
                foreach ($divs as $div) {
1357
                    $id = (string) $div['ID'];
1358
                    // Are there physical structure nodes for this logical structure?
1359
                    if (array_key_exists($id, $this->smLinks['l2p'])) {
1360
                        // Yes. That's what we're looking for.
1361
                        $this->toplevelId = $id;
0 ignored issues
show
Bug introduced by
The property toplevelId is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1362
                        break;
1363
                    } elseif (empty($this->toplevelId)) {
1364
                        // No. Remember this anyway, but keep looking for a better one.
1365
                        $this->toplevelId = $id;
1366
                    }
1367
                }
1368
            }
1369
        }
1370
        return $this->toplevelId;
1371
    }
1372
1373
    /**
1374
     * Try to determine URL of parent document.
1375
     *
1376
     * @access public
1377
     *
1378
     * @return string
1379
     */
1380
    public function magicGetParentHref(): string
1381
    {
1382
        if (empty($this->parentHref)) {
1383
            // Get the closest ancestor of the current document which has a MPTR child.
1384
            $parentMptr = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $this->toplevelId . '"]/ancestor::mets:div[./mets:mptr][1]/mets:mptr');
1385
            if (!empty($parentMptr)) {
1386
                $this->parentHref = (string) $parentMptr[0]->attributes('http://www.w3.org/1999/xlink')->href;
0 ignored issues
show
Bug introduced by
The property parentHref is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1387
            }
1388
        }
1389
1390
        return $this->parentHref;
1391
    }
1392
1393
    /**
1394
     * This magic method is executed prior to any serialization of the object
1395
     * @see __wakeup()
1396
     *
1397
     * @access public
1398
     *
1399
     * @return array Properties to be serialized
1400
     */
1401
    public function __sleep(): array
1402
    {
1403
        // \SimpleXMLElement objects can't be serialized, thus save the XML as string for serialization
1404
        $this->asXML = $this->xml->asXML();
0 ignored issues
show
Documentation Bug introduced by
It seems like $this->xml->asXML() can also be of type true. However, the property $asXML is declared as type string. Maybe add an additional type check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a mixed type is assigned to a property that is type hinted more strictly.

For example, imagine you have a variable $accountId that can either hold an Id object or false (if there is no account id yet). Your code now assigns that value to the id property of an instance of the Account class. This class holds a proper account, so the id value must no longer be false.

Either this assignment is in error or a type check should be added for that assignment.

class Id
{
    public $id;

    public function __construct($id)
    {
        $this->id = $id;
    }

}

class Account
{
    /** @var  Id $id */
    public $id;
}

$account_id = false;

if (starsAreRight()) {
    $account_id = new Id(42);
}

$account = new Account();
if ($account instanceof Id)
{
    $account->id = $account_id;
}
Loading history...
1405
        return ['pid', 'recordId', 'parentId', 'asXML'];
1406
    }
1407
1408
    /**
1409
     * This magic method is used for setting a string value for the object
1410
     *
1411
     * @access public
1412
     *
1413
     * @return string String representing the METS object
1414
     */
1415
    public function __toString(): string
1416
    {
1417
        $xml = new \DOMDocument('1.0', 'utf-8');
1418
        $xml->appendChild($xml->importNode(dom_import_simplexml($this->mets), true));
1419
        $xml->formatOutput = true;
1420
        return $xml->saveXML();
1421
    }
1422
1423
    /**
1424
     * This magic method is executed after the object is deserialized
1425
     * @see __sleep()
1426
     *
1427
     * @access public
1428
     *
1429
     * @return void
1430
     */
1431
    public function __wakeup(): void
1432
    {
1433
        $xml = Helper::getXmlFileAsString($this->asXML);
1434
        if ($xml !== false) {
1435
            $this->asXML = '';
1436
            $this->xml = $xml;
1437
            // Rebuild the unserializable properties.
1438
            $this->init('', $this->settings);
1439
        } else {
1440
            $this->logger = GeneralUtility::makeInstance(LogManager::class)->getLogger(static::class);
1441
            $this->logger->error('Could not load XML after deserialization');
1442
        }
1443
    }
1444
}
1445