Passed
Pull Request — master (#123)
by
unknown
04:16
created

MetsDocument::getMetadata()   A

Complexity

Conditions 4
Paths 4

Size

Total Lines 22
Code Lines 11

Duplication

Lines 0
Ratio 0 %

Importance

Changes 6
Bugs 0 Features 0
Metric Value
cc 4
eloc 11
c 6
b 0
f 0
nc 4
nop 2
dl 0
loc 22
rs 9.9
1
<?php
2
3
/**
4
 * (c) Kitodo. Key to digital objects e.V. <[email protected]>
5
 *
6
 * This file is part of the Kitodo and TYPO3 projects.
7
 *
8
 * @license GNU General Public License version 3 or later.
9
 * For the full copyright and license information, please read the
10
 * LICENSE.txt file that was distributed with this source code.
11
 */
12
13
namespace Kitodo\Dlf\Common;
14
15
use \DOMElement;
0 ignored issues
show
Bug introduced by
The type \DOMElement was not found. Maybe you did not declare it correctly or list all dependencies?

The issue could also be caused by a filter entry in the build configuration. If the path has been excluded in your configuration, e.g. excluded_paths: ["lib/*"], you can move it to the dependency path list as follows:

filter:
    dependency_paths: ["lib/*"]

For further information see https://scrutinizer-ci.com/docs/tools/php/php-scrutinizer/#list-dependency-paths

Loading history...
16
use \DOMXPath;
0 ignored issues
show
Bug introduced by
The type \DOMXPath was not found. Maybe you did not declare it correctly or list all dependencies?

The issue could also be caused by a filter entry in the build configuration. If the path has been excluded in your configuration, e.g. excluded_paths: ["lib/*"], you can move it to the dependency path list as follows:

filter:
    dependency_paths: ["lib/*"]

For further information see https://scrutinizer-ci.com/docs/tools/php/php-scrutinizer/#list-dependency-paths

Loading history...
17
use \SimpleXMLElement;
0 ignored issues
show
Bug introduced by
The type \SimpleXMLElement was not found. Maybe you did not declare it correctly or list all dependencies?

The issue could also be caused by a filter entry in the build configuration. If the path has been excluded in your configuration, e.g. excluded_paths: ["lib/*"], you can move it to the dependency path list as follows:

filter:
    dependency_paths: ["lib/*"]

For further information see https://scrutinizer-ci.com/docs/tools/php/php-scrutinizer/#list-dependency-paths

Loading history...
18
use TYPO3\CMS\Core\Configuration\ExtensionConfiguration;
19
use TYPO3\CMS\Core\Database\ConnectionPool;
20
use TYPO3\CMS\Core\Database\Query\Restriction\HiddenRestriction;
21
use TYPO3\CMS\Core\Log\LogManager;
22
use TYPO3\CMS\Core\Utility\GeneralUtility;
23
use Ubl\Iiif\Tools\IiifHelper;
24
use Ubl\Iiif\Services\AbstractImageService;
25
26
/**
27
 * MetsDocument class for the 'dlf' extension.
28
 *
29
 * @package TYPO3
30
 * @subpackage dlf
31
 *
32
 * @access public
33
 *
34
 * @property int $cPid this holds the PID for the configuration
35
 * @property-read array $formats this holds the configuration for all supported metadata encodings
36
 * @property bool $formatsLoaded flag with information if the available metadata formats are loaded
37
 * @property-read bool $hasFulltext flag with information if there are any fulltext files available
38
 * @property array $lastSearchedPhysicalPage the last searched logical and physical page
39
 * @property array $logicalUnits this holds the logical units
40
 * @property-read array $metadataArray this holds the documents' parsed metadata array
41
 * @property bool $metadataArrayLoaded flag with information if the metadata array is loaded
42
 * @property-read int $numPages the holds the total number of pages
43
 * @property-read int $parentId this holds the UID of the parent document or zero if not multi-volumed
44
 * @property-read array $physicalStructure this holds the physical structure
45
 * @property-read array $physicalStructureInfo this holds the physical structure metadata
46
 * @property bool $physicalStructureLoaded flag with information if the physical structure is loaded
47
 * @property-read int $pid this holds the PID of the document or zero if not in database
48
 * @property array $rawTextArray this holds the documents' raw text pages with their corresponding structMap//div's ID (METS) or Range / Manifest / Sequence ID (IIIF) as array key
49
 * @property-read bool $ready Is the document instantiated successfully?
50
 * @property-read string $recordId the METS file's / IIIF manifest's record identifier
51
 * @property-read int $rootId this holds the UID of the root document or zero if not multi-volumed
52
 * @property-read array $smLinks this holds the smLinks between logical and physical structMap
53
 * @property bool $smLinksLoaded flag with information if the smLinks are loaded
54
 * @property-read array $tableOfContents this holds the logical structure
55
 * @property bool $tableOfContentsLoaded flag with information if the table of contents is loaded
56
 * @property-read string $thumbnail this holds the document's thumbnail location
57
 * @property bool $thumbnailLoaded flag with information if the thumbnail is loaded
58
 * @property-read string $toplevelId this holds the toplevel structure's "@ID" (METS) or the manifest's "@id" (IIIF)
59
 * @property SimpleXMLElement $xml this holds the whole XML file as SimpleXMLElement object
60
 * @property-read array $mdSec associative array of METS metadata sections indexed by their IDs.
61
 * @property bool $mdSecLoaded flag with information if the array of METS metadata sections is loaded
62
 * @property-read array $dmdSec subset of `$mdSec` storing only the dmdSec entries; kept for compatibility.
63
 * @property-read array $fileGrps this holds the file ID -> USE concordance
64
 * @property bool $fileGrpsLoaded flag with information if file groups array is loaded
65
 * @property-read array $fileInfos additional information about files (e.g., ADMID), indexed by ID.
66
 * @property-read SimpleXMLElement $mets this holds the XML file's METS part as SimpleXMLElement object
67
 * @property-read string $parentHref URL of the parent document (determined via mptr element), or empty string if none is available
68
 */
69
final class MetsDocument extends AbstractDocument
70
{
71
    /**
72
     * @access protected
73
     * @var string[] Subsections / tags that may occur within `<mets:amdSec>`
74
     *
75
     * @link https://www.loc.gov/standards/mets/docs/mets.v1-9.html#amdSec
76
     * @link https://www.loc.gov/standards/mets/docs/mets.v1-9.html#mdSecType
77
     */
78
    protected const ALLOWED_AMD_SEC = ['techMD', 'rightsMD', 'sourceMD', 'digiprovMD'];
79
80
    /**
81
     * @access protected
82
     * @var string This holds the whole XML file as string for serialization purposes
83
     *
84
     * @see __sleep() / __wakeup()
85
     */
86
    protected string $asXML = '';
87
88
    /**
89
     * @access protected
90
     * @var array This maps the ID of each amdSec to the IDs of its children (techMD etc.). When an ADMID references an amdSec instead of techMD etc., this is used to iterate the child elements.
91
     */
92
    protected array $amdSecChildIds = [];
93
94
    /**
95
     * @access protected
96
     * @var array Associative array of METS metadata sections indexed by their IDs.
97
     */
98
    protected array $mdSec = [];
99
100
    /**
101
     * @access protected
102
     * @var bool Are the METS file's metadata sections loaded?
103
     *
104
     * @see MetsDocument::$mdSec
105
     */
106
    protected bool $mdSecLoaded = false;
107
108
    /**
109
     * @access protected
110
     * @var array Subset of $mdSec storing only the dmdSec entries; kept for compatibility.
111
     */
112
    protected array $dmdSec = [];
113
114
    /**
115
     * @access protected
116
     * @var array This holds the file ID -> USE concordance
117
     *
118
     * @see magicGetFileGrps()
119
     */
120
    protected array $fileGrps = [];
121
122
    /**
123
     * @access protected
124
     * @var bool Are the image file groups loaded?
125
     *
126
     * @see $fileGrps
127
     */
128
    protected bool $fileGrpsLoaded = false;
129
130
    /**
131
     * @access protected
132
     * @var SimpleXMLElement This holds the XML file's METS part as SimpleXMLElement object
133
     */
134
    protected SimpleXMLElement $mets;
135
136
    /**
137
     * @access protected
138
     * @var string URL of the parent document (determined via mptr element), or empty string if none is available
139
     */
140
    protected string $parentHref = '';
141
142
    /**
143
     * @access protected
144
     * @var array the extension settings
145
     */
146
    protected array $settings = [];
147
148
    /**
149
     * This adds metadata from METS structural map to metadata array.
150
     *
151
     * @access public
152
     *
153
     * @param array &$metadata The metadata array to extend
154
     * @param string $id The "@ID" attribute of the logical structure node
155
     *
156
     * @return void
157
     */
158
    public function addMetadataFromMets(array &$metadata, string $id): void
159
    {
160
        $details = $this->getLogicalStructure($id);
161
        if (!empty($details)) {
162
            $metadata['mets_order'][0] = $details['order'];
163
            $metadata['mets_label'][0] = $details['label'];
164
            $metadata['mets_orderlabel'][0] = $details['orderlabel'];
165
        }
166
    }
167
168
    /**
169
     * @see AbstractDocument::establishRecordId()
170
     */
171
    protected function establishRecordId(int $pid): void
172
    {
173
        // Check for METS object @ID.
174
        if (!empty($this->mets['OBJID'])) {
175
            $this->recordId = (string) $this->mets['OBJID'];
0 ignored issues
show
Bug introduced by
The property recordId is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
176
        }
177
        // Get hook objects.
178
        $hookObjects = Helper::getHookObjects('Classes/Common/MetsDocument.php');
179
        // Apply hooks.
180
        foreach ($hookObjects as $hookObj) {
181
            if (method_exists($hookObj, 'postProcessRecordId')) {
182
                $hookObj->postProcessRecordId($this->xml, $this->recordId);
183
            }
184
        }
185
    }
186
187
    /**
188
     * @see AbstractDocument::getDownloadLocation()
189
     */
190
    public function getDownloadLocation(string $id): string
191
    {
192
        $file = $this->getFileInfo($id);
193
        if ($file['mimeType'] === 'application/vnd.kitodo.iiif') {
194
            $file['location'] = (strrpos($file['location'], 'info.json') === strlen($file['location']) - 9) ? $file['location'] : (strrpos($file['location'], '/') === strlen($file['location']) ? $file['location'] . 'info.json' : $file['location'] . '/info.json');
195
            $conf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'iiif');
196
            IiifHelper::setUrlReader(IiifUrlReader::getInstance());
197
            IiifHelper::setMaxThumbnailHeight($conf['thumbnailHeight']);
198
            IiifHelper::setMaxThumbnailWidth($conf['thumbnailWidth']);
199
            $service = IiifHelper::loadIiifResource($file['location']);
200
            if ($service instanceof AbstractImageService) {
201
                return $service->getImageUrl();
202
            }
203
        } elseif ($file['mimeType'] === 'application/vnd.netfpx') {
204
            $baseURL = $file['location'] . (strpos($file['location'], '?') === false ? '?' : '');
205
            // TODO CVT is an optional IIP server capability; in theory, capabilities should be determined in the object request with '&obj=IIP-server'
206
            return $baseURL . '&CVT=jpeg';
207
        }
208
        return $file['location'];
209
    }
210
211
    /**
212
     * {@inheritDoc}
213
     * @see AbstractDocument::getFileInfo()
214
     */
215
    public function getFileInfo($id): ?array
216
    {
217
        $this->magicGetFileGrps();
218
219
        if (isset($this->fileInfos[$id]) && empty($this->fileInfos[$id]['location'])) {
220
            $this->fileInfos[$id]['location'] = $this->getFileLocation($id);
0 ignored issues
show
Bug introduced by
The property fileInfos is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
221
        }
222
223
        if (isset($this->fileInfos[$id]) && empty($this->fileInfos[$id]['mimeType'])) {
224
            $this->fileInfos[$id]['mimeType'] = $this->getFileMimeType($id);
225
        }
226
227
        return $this->fileInfos[$id] ?? null;
228
    }
229
230
    /**
231
     * @see AbstractDocument::getFileLocation()
232
     */
233
    public function getFileLocation(string $id): string
234
    {
235
        $location = $this->mets->xpath('./mets:fileSec/mets:fileGrp/mets:file[@ID="' . $id . '"]/mets:FLocat[@LOCTYPE="URL"]');
236
        if (
237
            !empty($id)
238
            && !empty($location)
239
        ) {
240
            return (string) $location[0]->attributes('http://www.w3.org/1999/xlink')->href;
241
        } else {
242
            $this->logger->warning('There is no file node with @ID "' . $id . '"');
243
            return '';
244
        }
245
    }
246
247
    /**
248
     * @see AbstractDocument::getFileMimeType()
249
     */
250
    public function getFileMimeType(string $id): string
251
    {
252
        $mimetype = $this->mets->xpath('./mets:fileSec/mets:fileGrp/mets:file[@ID="' . $id . '"]/@MIMETYPE');
253
        if (
254
            !empty($id)
255
            && !empty($mimetype)
256
        ) {
257
            return (string) $mimetype[0];
258
        } else {
259
            $this->logger->warning('There is no file node with @ID "' . $id . '" or no MIME type specified');
260
            return '';
261
        }
262
    }
263
264
    /**
265
     * @see AbstractDocument::getLogicalStructure()
266
     */
267
    public function getLogicalStructure(string $id, bool $recursive = false): array
268
    {
269
        $details = [];
270
        // Is the requested logical unit already loaded?
271
        if (
272
            !$recursive
273
            && !empty($this->logicalUnits[$id])
274
        ) {
275
            // Yes. Return it.
276
            return $this->logicalUnits[$id];
277
        } elseif (!empty($id)) {
278
            // Get specified logical unit.
279
            $divs = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $id . '"]');
280
        } else {
281
            // Get all logical units at top level.
282
            $divs = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]/mets:div');
283
        }
284
        if (!empty($divs)) {
285
            if (!$recursive) {
286
                // Get the details for the first xpath hit.
287
                $details = $this->getLogicalStructureInfo($divs[0]);
288
            } else {
289
                // Walk the logical structure recursively and fill the whole table of contents.
290
                foreach ($divs as $div) {
291
                    $this->tableOfContents[] = $this->getLogicalStructureInfo($div, $recursive);
0 ignored issues
show
Bug introduced by
The property tableOfContents is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
292
                }
293
            }
294
        }
295
        return $details;
296
    }
297
298
    /**
299
     * This gets details about a logical structure element
300
     *
301
     * @access protected
302
     *
303
     * @param SimpleXMLElement $structure The logical structure node
304
     * @param bool $recursive Whether to include the child elements
305
     *
306
     * @return array Array of the element's id, label, type and physical page indexes/mptr link
307
     */
308
    protected function getLogicalStructureInfo(SimpleXMLElement $structure, bool $recursive = false): array
309
    {
310
        $attributes = $structure->attributes();
311
312
        // Extract identity information.
313
        $details = [
314
            'id' => (string) $attributes['ID'],
315
            'dmdId' => isset($attributes['DMDID']) ? (string) $attributes['DMDID'] : '',
316
            'admId' => isset($attributes['ADMID']) ? (string) $attributes['ADMID'] : '',
317
            'order' => isset($attributes['ORDER']) ? (string) $attributes['ORDER'] : '',
318
            'label' => isset($attributes['LABEL']) ? (string) $attributes['LABEL'] : '',
319
            'orderlabel' => isset($attributes['ORDERLABEL']) ? (string) $attributes['ORDERLABEL'] : '',
320
            'contentIds' => isset($attributes['CONTENTIDS']) ? (string) $attributes['CONTENTIDS'] : '',
321
            'volume' => '',
322
            'year' => '',
323
            'pagination' => '',
324
            'type' => isset($attributes['TYPE']) ? (string) $attributes['TYPE'] : '',
325
            'description' => '',
326
            'thumbnailId' => null,
327
            'files' => [],
328
        ];
329
330
        // Set volume and year information only if no label is set and this is the toplevel structure element.
331
        if (empty($details['label']) && empty($details['orderlabel'])) {
332
            $metadata = $this->getMetadata($details['id']);
333
            $details['volume'] = $metadata['volume'][0] ?? '';
334
            $details['year'] = $metadata['year'][0] ?? '';
335
        }
336
337
        // add description for 3D objects
338
        if ($details['type'] == 'object') {
339
            $metadata = $this->getMetadata($details['id']);
340
            $details['description'] = $metadata['description'][0] ?? '';
341
        }
342
343
        // Load smLinks.
344
        $this->magicGetSmLinks();
345
        // Load physical structure.
346
        $this->magicGetPhysicalStructure();
347
348
        $this->getPage($details, $structure->children('http://www.loc.gov/METS/')->mptr);
349
        $this->getFiles($details, $structure->children('http://www.loc.gov/METS/')->fptr);
350
351
        // Keep for later usage.
352
        $this->logicalUnits[$details['id']] = $details;
353
        // Walk the structure recursively? And are there any children of the current element?
354
        if (
355
            $recursive
356
            && count($structure->children('http://www.loc.gov/METS/')->div)
357
        ) {
358
            $details['children'] = [];
359
            foreach ($structure->children('http://www.loc.gov/METS/')->div as $child) {
360
                // Repeat for all children.
361
                $details['children'][] = $this->getLogicalStructureInfo($child, true);
362
            }
363
        }
364
        return $details;
365
    }
366
367
    /**
368
     * Get the files this structure element is pointing at.
369
     *
370
     * @param ?SimpleXMLElement $filePointers
371
     *
372
     * @return void
373
     */
374
    private function getFiles(array &$details, ?SimpleXMLElement $filePointers): void
375
    {
376
        $fileUse = $this->magicGetFileGrps();
377
        // Get the file representations from fileSec node.
378
        foreach ($filePointers as $filePointer) {
379
            $fileId = (string) $filePointer->attributes()->FILEID;
380
            // Check if file has valid @USE attribute.
381
            if (!empty($fileUse[$fileId])) {
382
                $details['files'][$fileUse[$fileId]] = $fileId;
383
            }
384
        }
385
    }
386
387
    /**
388
     * Get the physical page or external file this structure element is pointing at.
389
     *
390
     * @access private
391
     *
392
     * @param array $details passed as reference
393
     * @param ?SimpleXMLElement $metsPointers
394
     *
395
     * @return void
396
     */
397
    private function getPage(array &$details, ?SimpleXMLElement $metsPointers): void
398
    {
399
        if (count($metsPointers)) {
0 ignored issues
show
Bug introduced by
It seems like $metsPointers can also be of type null; however, parameter $value of count() does only seem to accept Countable|array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

399
        if (count(/** @scrutinizer ignore-type */ $metsPointers)) {
Loading history...
400
            // Yes. Get the file reference.
401
            $details['points'] = (string) $metsPointers[0]->attributes('http://www.w3.org/1999/xlink')->href;
402
        } elseif (
403
            !empty($this->physicalStructure)
404
            && array_key_exists($details['id'], $this->smLinks['l2p'])
405
        ) {
406
            // Link logical structure to the first corresponding physical page/track.
407
            $details['points'] = max((int) array_search($this->smLinks['l2p'][$details['id']][0], $this->physicalStructure, true), 1);
408
            $details['thumbnailId'] = $this->getThumbnail();
409
            // Get page/track number of the first page/track related to this structure element.
410
            $details['pagination'] = $this->physicalStructureInfo[$this->smLinks['l2p'][$details['id']][0]]['orderlabel'];
411
        } elseif ($details['id'] == $this->magicGetToplevelId()) {
412
            // Point to self if this is the toplevel structure.
413
            $details['points'] = 1;
414
            $details['thumbnailId'] = $this->getThumbnail();
415
        }
416
        if ($details['thumbnailId'] === null) {
417
            unset($details['thumbnailId']);
418
        }
419
    }
420
421
    /**
422
     * Get thumbnail for logical structure info.
423
     *
424
     * @access private
425
     *
426
     * @param string $id empty if top level document, else passed the id of parent document
427
     *
428
     * @return ?string thumbnail or null if not found
429
     */
430
    private function getThumbnail(string $id = '')
431
    {
432
        // Load plugin configuration.
433
        $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'files');
434
        $fileGrpsThumb = GeneralUtility::trimExplode(',', $extConf['fileGrpThumbs']);
435
436
        $thumbnail = null;
437
438
        while ($fileGrpThumb = array_shift($fileGrpsThumb)) {
439
            if (empty($id)) {
440
                $thumbnail = $this->physicalStructureInfo[$this->physicalStructure[1]]['files'][$fileGrpThumb] ?? null;
441
            } else {
442
                $parentId = $this->smLinks['l2p'][$id][0] ?? null;
443
                $thumbnail = $this->physicalStructureInfo[$parentId]['files'][$fileGrpThumb] ?? null;
444
            }
445
446
            if (!empty($thumbnail)) {
447
                break;
448
            }
449
        }
450
        return $thumbnail;
451
    }
452
453
    /**
454
     * @see AbstractDocument::getMetadata()
455
     */
456
    public function getMetadata(string $id, int $cPid = 0): array
457
    {
458
        $cPid = $this->ensureValidPid($cPid);
459
460
        if ($cPid == 0) {
461
            $this->logger->warning('Invalid PID for metadata definitions');
462
            return [];
463
        }
464
465
        $metadata = $this->getMetadataFromArray($id, $cPid);
466
467
        if (empty($metadata)) {
468
            return [];
469
        }
470
471
        $metadata = $this->processMetadataSections($id, $cPid, $metadata);
472
473
        if (!empty($metadata)) {
474
            $metadata = $this->setDefaultTitleAndDate($metadata);
475
        }
476
477
        return $metadata;
478
    }
479
480
    /**
481
     * Ensure that pId is valid.
482
     *
483
     * @access private
484
     *
485
     * @param integer $cPid
486
     *
487
     * @return integer
488
     */
489
    private function ensureValidPid(int $cPid): int
490
    {
491
        $cPid = max($cPid, 0);
492
        if ($cPid == 0 && ($this->cPid || $this->pid)) {
493
            // Retain current PID.
494
            $cPid = $this->cPid ?: $this->pid;
495
        }
496
        return $cPid;
497
    }
498
499
    /**
500
     * Get metadata from array.
501
     *
502
     * @access private
503
     *
504
     * @param string $id
505
     * @param integer $cPid
506
     *
507
     * @return array
508
     */
509
    private function getMetadataFromArray(string $id, int $cPid): array
510
    {
511
        if (!empty($this->metadataArray[$id]) && $this->metadataArray[0] == $cPid) {
512
            return $this->metadataArray[$id];
513
        }
514
        return $this->initializeMetadata('METS');
515
    }
516
517
    /**
518
     * Process metadata sections.
519
     *
520
     * @access private
521
     *
522
     * @param string $id
523
     * @param integer $cPid
524
     * @param array $metadata
525
     *
526
     * @return array
527
     */
528
    private function processMetadataSections(string $id, int $cPid, array $metadata): array
529
    {
530
        $mdIds = $this->getMetadataIds($id);
531
        if (empty($mdIds)) {
532
            // There is no metadata section for this structure node.
533
            return [];
534
        }
535
        // Array used as set of available section types (dmdSec, techMD, ...)
536
        $metadataSections = [];
537
        // Load available metadata formats and metadata sections.
538
        $this->loadFormats();
539
        $this->magicGetMdSec();
540
541
        $metadata['type'] = $this->getLogicalUnitType($id);
542
543
        foreach ($mdIds as $dmdId) {
544
            $mdSectionType = $this->mdSec[$dmdId]['section'];
545
546
            if ($this->hasMetadataSection($metadataSections, $mdSectionType, 'dmdSec')) {
547
                continue;
548
            }
549
550
            if (!$this->extractAndProcessMetadata($dmdId, $mdSectionType, $metadata, $cPid, $metadataSections)) {
551
                continue;
552
            }
553
554
            $metadataSections[] = $mdSectionType;
555
        }
556
557
        // Files are not expected to reference a dmdSec
558
        if (isset($this->fileInfos[$id]) || in_array('dmdSec', $metadataSections)) {
559
            return $metadata;
560
        } else {
561
            $this->logger->warning('No supported descriptive metadata found for logical structure with @ID "' . $id . '"');
562
            return [];
563
        }
564
    }
565
566
    /**
567
     * Get logical unit type.
568
     *
569
     * @access private
570
     *
571
     * @param string $id
572
     *
573
     * @return array
574
     */
575
    private function getLogicalUnitType(string $id): array
576
    {
577
        if (!empty($this->logicalUnits[$id])) {
578
            return [$this->logicalUnits[$id]['type']];
579
        } else {
580
            $struct = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $id . '"]/@TYPE');
581
            if (!empty($struct)) {
582
                return [(string) $struct[0]];
583
            }
584
        }
585
        return [];
586
    }
587
588
    /**
589
     * Extract and process metadata.
590
     *
591
     * @access private
592
     *
593
     * @param string $dmdId
594
     * @param string $mdSectionType
595
     * @param array $metadata
596
     * @param integer $cPid
597
     * @param array $metadataSections
598
     *
599
     * @return boolean
600
     */
601
    private function extractAndProcessMetadata(string $dmdId, string $mdSectionType, array &$metadata, int $cPid, array $metadataSections): bool
602
    {
603
        if ($this->hasMetadataSection($metadataSections, $mdSectionType, 'dmdSec')) {
604
            return true;
605
        }
606
607
        $metadataExtracted = $this->extractMetadataIfTypeSupported($dmdId, $mdSectionType, $metadata);
608
609
        if (!$metadataExtracted) {
610
            return false;
611
        }
612
613
        $additionalMetadata = $this->getAdditionalMetadataFromDatabase($cPid, $dmdId);
614
        // We need a \DOMDocument here, because SimpleXML doesn't support XPath functions properly.
615
        $domNode = dom_import_simplexml($this->mdSec[$dmdId]['xml']);
616
        $domXPath = new DOMXPath($domNode->ownerDocument);
617
        $this->registerNamespaces($domXPath);
618
619
        $this->processAdditionalMetadata($additionalMetadata, $domXPath, $domNode, $metadata);
620
621
        return true;
622
    }
623
624
    /**
625
     * Check if searched metadata section is stored in the array.
626
     *
627
     * @access private
628
     *
629
     * @param array $metadataSections
630
     * @param string $currentMetadataSection
631
     * @param string $searchedMetadataSection
632
     *
633
     * @return boolean
634
     */
635
    private function hasMetadataSection(array $metadataSections, string $currentMetadataSection, string $searchedMetadataSection): bool
636
    {
637
        return $currentMetadataSection === $searchedMetadataSection && in_array($searchedMetadataSection, $metadataSections);
638
    }
639
640
    /**
641
     * Process additional metadata.
642
     *
643
     * @access private
644
     *
645
     * @param array $additionalMetadata
646
     * @param DOMXPath $domXPath
647
     * @param DOMElement $domNode
648
     * @param array $metadata
649
     *
650
     * @return void
651
     */
652
    private function processAdditionalMetadata(array $additionalMetadata, DOMXPath $domXPath, DOMElement $domNode, array &$metadata): void
653
    {
654
        foreach ($additionalMetadata as $resArray) {
655
            $this->setMetadataFieldValues($resArray, $domXPath, $domNode, $metadata);
656
            $this->setDefaultMetadataValue($resArray, $metadata);
657
            $this->setSortableMetadataValue($resArray, $domXPath, $domNode, $metadata);
658
        }
659
    }
660
661
    /**
662
     * Set metadata field values.
663
     *
664
     * @access private
665
     *
666
     * @param array $resArray
667
     * @param DOMXPath $domXPath
668
     * @param DOMElement $domNode
669
     * @param array $metadata
670
     *
671
     * @return void
672
     */
673
    private function setMetadataFieldValues(array $resArray, DOMXPath $domXPath, DOMElement $domNode, array &$metadata): void
674
    {
675
        if ($resArray['format'] > 0 && !empty($resArray['xpath'])) {
676
            $values = $domXPath->evaluate($resArray['xpath'], $domNode);
677
            if ($values instanceof \DOMNodeList && $values->length > 0) {
678
                $metadata[$resArray['index_name']] = [];
679
                foreach ($values as $value) {
680
                    $metadata[$resArray['index_name']][] = trim((string) $value->nodeValue);
681
                }
682
            } elseif (!($values instanceof \DOMNodeList)) {
683
                $metadata[$resArray['index_name']] = [trim((string) $values)];
684
            }
685
        }
686
    }
687
688
    /**
689
     * Set default metadata value.
690
     *
691
     * @access private
692
     *
693
     * @param array $resArray
694
     * @param array $metadata
695
     *
696
     * @return void
697
     */
698
    private function setDefaultMetadataValue(array $resArray, array &$metadata): void
699
    {
700
        if (empty($metadata[$resArray['index_name']][0]) && strlen($resArray['default_value']) > 0) {
701
            $metadata[$resArray['index_name']] = [$resArray['default_value']];
702
        }
703
    }
704
705
    /**
706
     * Set sortable metadata value.
707
     *
708
     * @access private
709
     *
710
     * @param array $resArray
711
     * @param  $domXPath
712
     * @param DOMElement $domNode
713
     * @param array $metadata
714
     *
715
     * @return void
716
     */
717
    private function setSortableMetadataValue(array $resArray, DOMXPath $domXPath, DOMElement $domNode, array &$metadata): void
718
    {
719
        if (!empty($metadata[$resArray['index_name']]) && $resArray['is_sortable']) {
720
            if ($resArray['format'] > 0 && !empty($resArray['xpath_sorting'])) {
721
                $values = $domXPath->evaluate($resArray['xpath_sorting'], $domNode);
722
                if ($values instanceof \DOMNodeList && $values->length > 0) {
723
                    $metadata[$resArray['index_name'] . '_sorting'][0] = trim((string) $values->item(0)->nodeValue);
724
                } elseif (!($values instanceof \DOMNodeList)) {
725
                    $metadata[$resArray['index_name'] . '_sorting'][0] = trim((string) $values);
726
                }
727
            }
728
            if (empty($metadata[$resArray['index_name'] . '_sorting'][0])) {
729
                $metadata[$resArray['index_name'] . '_sorting'][0] = $metadata[$resArray['index_name']][0];
730
            }
731
        }
732
    }
733
734
    /**
735
     * Set default title and date if those metadata is not set.
736
     *
737
     * @access private
738
     *
739
     * @param array $metadata
740
     *
741
     * @return array
742
     */
743
    private function setDefaultTitleAndDate(array $metadata): array
744
    {
745
        // Set title to empty string if not present.
746
        if (empty($metadata['title'][0])) {
747
            $metadata['title'][0] = '';
748
            $metadata['title_sorting'][0] = '';
749
        }
750
751
        // Set title_sorting to title as default.
752
        if (empty($metadata['title_sorting'][0])) {
753
            $metadata['title_sorting'][0] = $metadata['title'][0];
754
        }
755
756
        // Set date to empty string if not present.
757
        if (empty($metadata['date'][0])) {
758
            $metadata['date'][0] = '';
759
        }
760
761
        return $metadata;
762
    }
763
764
    /**
765
     * Extract metadata if metadata type is supported.
766
     *
767
     * @access private
768
     *
769
     * @param string $dmdId descriptive metadata id
770
     * @param string $mdSectionType metadata section type
771
     * @param array &$metadata
772
     *
773
     * @return bool true if extraction successful, false otherwise
774
     */
775
    private function extractMetadataIfTypeSupported(string $dmdId, string $mdSectionType, array &$metadata)
776
    {
777
        // Is this metadata format supported?
778
        if (!empty($this->formats[$this->mdSec[$dmdId]['type']])) {
779
            if (!empty($this->formats[$this->mdSec[$dmdId]['type']]['class'])) {
780
                $class = $this->formats[$this->mdSec[$dmdId]['type']]['class'];
781
                // Get the metadata from class.
782
                if (class_exists($class)) {
783
                    $obj = GeneralUtility::makeInstance($class);
784
                    if ($obj instanceof MetadataInterface) {
785
                        $obj->extractMetadata($this->mdSec[$dmdId]['xml'], $metadata, GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'general')['useExternalApisForMetadata']);
786
                        return true;
787
                    }
788
                } else {
789
                    $this->logger->warning('Invalid class/method "' . $class . '->extractMetadata()" for metadata format "' . $this->mdSec[$dmdId]['type'] . '"');
790
                }
791
            }
792
        } else {
793
            $this->logger->notice('Unsupported metadata format "' . $this->mdSec[$dmdId]['type'] . '" in ' . $mdSectionType . ' with @ID "' . $dmdId . '"');
794
        }
795
        return false;
796
    }
797
798
    /**
799
     * Get additional data from database.
800
     *
801
     * @access private
802
     *
803
     * @param int $cPid page id
804
     * @param string $dmdId descriptive metadata id
805
     *
806
     * @return array additional metadata data queried from database
807
     */
808
    private function getAdditionalMetadataFromDatabase(int $cPid, string $dmdId)
809
    {
810
        $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
811
            ->getQueryBuilderForTable('tx_dlf_metadata');
812
        // Get hidden records, too.
813
        $queryBuilder
814
            ->getRestrictions()
815
            ->removeByType(HiddenRestriction::class);
816
        // Get all metadata with configured xpath and applicable format first.
817
        $resultWithFormat = $queryBuilder
818
            ->select(
819
                'tx_dlf_metadata.index_name AS index_name',
820
                'tx_dlf_metadataformat_joins.xpath AS xpath',
821
                'tx_dlf_metadataformat_joins.xpath_sorting AS xpath_sorting',
822
                'tx_dlf_metadata.is_sortable AS is_sortable',
823
                'tx_dlf_metadata.default_value AS default_value',
824
                'tx_dlf_metadata.format AS format'
825
            )
826
            ->from('tx_dlf_metadata')
827
            ->innerJoin(
828
                'tx_dlf_metadata',
829
                'tx_dlf_metadataformat',
830
                'tx_dlf_metadataformat_joins',
831
                $queryBuilder->expr()->eq(
832
                    'tx_dlf_metadataformat_joins.parent_id',
833
                    'tx_dlf_metadata.uid'
834
                )
835
            )
836
            ->innerJoin(
837
                'tx_dlf_metadataformat_joins',
838
                'tx_dlf_formats',
839
                'tx_dlf_formats_joins',
840
                $queryBuilder->expr()->eq(
841
                    'tx_dlf_formats_joins.uid',
842
                    'tx_dlf_metadataformat_joins.encoded'
843
                )
844
            )
845
            ->where(
846
                $queryBuilder->expr()->eq('tx_dlf_metadata.pid', $cPid),
847
                $queryBuilder->expr()->eq('tx_dlf_metadata.l18n_parent', 0),
848
                $queryBuilder->expr()->eq('tx_dlf_metadataformat_joins.pid', $cPid),
849
                $queryBuilder->expr()->eq('tx_dlf_formats_joins.type', $queryBuilder->createNamedParameter($this->mdSec[$dmdId]['type']))
850
            )
851
            ->execute();
852
        // Get all metadata without a format, but with a default value next.
853
        $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
854
            ->getQueryBuilderForTable('tx_dlf_metadata');
855
            // Get hidden records, too.
856
        $queryBuilder
857
            ->getRestrictions()
858
            ->removeByType(HiddenRestriction::class);
859
        $resultWithoutFormat = $queryBuilder
860
            ->select(
861
                'tx_dlf_metadata.index_name AS index_name',
862
                'tx_dlf_metadata.is_sortable AS is_sortable',
863
                'tx_dlf_metadata.default_value AS default_value',
864
                'tx_dlf_metadata.format AS format'
865
            )
866
            ->from('tx_dlf_metadata')
867
            ->where(
868
                $queryBuilder->expr()->eq('tx_dlf_metadata.pid', $cPid),
869
                $queryBuilder->expr()->eq('tx_dlf_metadata.l18n_parent', 0),
870
                $queryBuilder->expr()->eq('tx_dlf_metadata.format', 0),
871
                $queryBuilder->expr()->neq('tx_dlf_metadata.default_value', $queryBuilder->createNamedParameter(''))
872
            )
873
            ->execute();
874
        // Merge both result sets.
875
        return array_merge($resultWithFormat->fetchAllAssociative(), $resultWithoutFormat->fetchAllAssociative());
876
    }
877
878
    /**
879
     * Get IDs of (descriptive and administrative) metadata sections
880
     * referenced by node of given $id. The $id may refer to either
881
     * a logical structure node or to a file.
882
     *
883
     * @access protected
884
     *
885
     * @param string $id The "@ID" attribute of the file node
886
     *
887
     * @return array
888
     */
889
    protected function getMetadataIds(string $id): array
890
    {
891
        // Load amdSecChildIds concordance
892
        $this->magicGetMdSec();
893
        $fileInfo = $this->getFileInfo($id);
894
895
        // Get DMDID and ADMID of logical structure node
896
        if (!empty($this->logicalUnits[$id])) {
897
            $dmdIds = $this->logicalUnits[$id]['dmdId'] ?? '';
898
            $admIds = $this->logicalUnits[$id]['admId'] ?? '';
899
        } else {
900
            $mdSec = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $id . '"]')[0];
901
            if ($mdSec) {
902
                $dmdIds = (string) $mdSec->attributes()->DMDID;
903
                $admIds = (string) $mdSec->attributes()->ADMID;
904
            } elseif (isset($fileInfo)) {
905
                $dmdIds = $fileInfo['dmdId'];
906
                $admIds = $fileInfo['admId'];
907
            } else {
908
                $dmdIds = '';
909
                $admIds = '';
910
            }
911
        }
912
913
        // Handle multiple DMDIDs/ADMIDs
914
        $allMdIds = explode(' ', $dmdIds);
915
916
        foreach (explode(' ', $admIds) as $admId) {
917
            if (isset($this->mdSec[$admId])) {
918
                // $admId references an actual metadata section such as techMD
919
                $allMdIds[] = $admId;
920
            } elseif (isset($this->amdSecChildIds[$admId])) {
921
                // $admId references a <mets:amdSec> element. Resolve child elements.
922
                foreach ($this->amdSecChildIds[$admId] as $childId) {
923
                    $allMdIds[] = $childId;
924
                }
925
            }
926
        }
927
928
        return array_filter(
929
            $allMdIds,
930
            function ($element) {
931
                return !empty($element);
932
            }
933
        );
934
    }
935
936
    /**
937
     * @see AbstractDocument::getFullText()
938
     */
939
    public function getFullText(string $id): string
940
    {
941
        $fullText = '';
942
943
        // Load fileGrps and check for full text files.
944
        $this->magicGetFileGrps();
945
        if ($this->hasFulltext) {
946
            $fullText = $this->getFullTextFromXml($id);
947
        }
948
        return $fullText;
949
    }
950
951
    /**
952
     * @see AbstractDocument::getStructureDepth()
953
     */
954
    public function getStructureDepth(string $logId)
955
    {
956
        $ancestors = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $logId . '"]/ancestor::*');
957
        if (!empty($ancestors)) {
958
            return count($ancestors);
959
        } else {
960
            return 0;
961
        }
962
    }
963
964
    /**
965
     * @see AbstractDocument::init()
966
     */
967
    protected function init(string $location, array $settings): void
968
    {
969
        $this->logger = GeneralUtility::makeInstance(LogManager::class)->getLogger(get_class($this));
970
        $this->settings = $settings;
971
        // Get METS node from XML file.
972
        $this->registerNamespaces($this->xml);
973
        $mets = $this->xml->xpath('//mets:mets');
974
        if (!empty($mets)) {
975
            $this->mets = $mets[0];
0 ignored issues
show
Bug introduced by
The property mets is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
976
            // Register namespaces.
977
            $this->registerNamespaces($this->mets);
978
        } else {
979
            if (!empty($location)) {
980
                $this->logger->error('No METS part found in document with location "' . $location . '".');
981
            } elseif (!empty($this->recordId)) {
982
                $this->logger->error('No METS part found in document with recordId "' . $this->recordId . '".');
983
            } else {
984
                $this->logger->error('No METS part found in current document.');
985
            }
986
        }
987
    }
988
989
    /**
990
     * @see AbstractDocument::loadLocation()
991
     */
992
    protected function loadLocation(string $location): bool
993
    {
994
        $fileResource = Helper::getUrl($location);
995
        if ($fileResource !== false) {
996
            $xml = Helper::getXmlFileAsString($fileResource);
997
            // Set some basic properties.
998
            if ($xml !== false) {
999
                $this->xml = $xml;
0 ignored issues
show
Documentation Bug introduced by
It seems like $xml of type SimpleXMLElement is incompatible with the declared type \SimpleXMLElement of property $xml.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
1000
                return true;
1001
            }
1002
        }
1003
        $this->logger->error('Could not load XML file from "' . $location . '"');
1004
        return false;
1005
    }
1006
1007
    /**
1008
     * @see AbstractDocument::ensureHasFulltextIsSet()
1009
     */
1010
    protected function ensureHasFulltextIsSet(): void
1011
    {
1012
        // Are the fileGrps already loaded?
1013
        if (!$this->fileGrpsLoaded) {
1014
            $this->magicGetFileGrps();
1015
        }
1016
    }
1017
1018
    /**
1019
     * @see AbstractDocument::setPreloadedDocument()
1020
     */
1021
    protected function setPreloadedDocument($preloadedDocument): bool
1022
    {
1023
1024
        if ($preloadedDocument instanceof SimpleXMLElement) {
1025
            $this->xml = $preloadedDocument;
1026
            return true;
1027
        }
1028
        return false;
1029
    }
1030
1031
    /**
1032
     * @see AbstractDocument::getDocument()
1033
     */
1034
    protected function getDocument(): SimpleXMLElement
1035
    {
1036
        return $this->mets;
1037
    }
1038
1039
    /**
1040
     * This builds an array of the document's metadata sections
1041
     *
1042
     * @access protected
1043
     *
1044
     * @return array Array of metadata sections with their IDs as array key
1045
     */
1046
    protected function magicGetMdSec(): array
1047
    {
1048
        if (!$this->mdSecLoaded) {
1049
            $this->loadFormats();
1050
1051
            foreach ($this->mets->xpath('./mets:dmdSec') as $dmdSecTag) {
1052
                $dmdSec = $this->processMdSec($dmdSecTag);
1053
1054
                if ($dmdSec !== null) {
1055
                    $this->mdSec[$dmdSec['id']] = $dmdSec;
0 ignored issues
show
Bug introduced by
The property mdSec is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1056
                    $this->dmdSec[$dmdSec['id']] = $dmdSec;
0 ignored issues
show
Bug introduced by
The property dmdSec is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1057
                }
1058
            }
1059
1060
            foreach ($this->mets->xpath('./mets:amdSec') as $amdSecTag) {
1061
                $childIds = [];
1062
1063
                foreach ($amdSecTag->children('http://www.loc.gov/METS/') as $mdSecTag) {
1064
                    if (!in_array($mdSecTag->getName(), self::ALLOWED_AMD_SEC)) {
1065
                        continue;
1066
                    }
1067
1068
                    // TODO: Should we check that the format may occur within this type (e.g., to ignore VIDEOMD within rightsMD)?
1069
                    $mdSec = $this->processMdSec($mdSecTag);
1070
1071
                    if ($mdSec !== null) {
1072
                        $this->mdSec[$mdSec['id']] = $mdSec;
1073
1074
                        $childIds[] = $mdSec['id'];
1075
                    }
1076
                }
1077
1078
                $amdSecId = (string) $amdSecTag->attributes()->ID;
1079
                if (!empty($amdSecId)) {
1080
                    $this->amdSecChildIds[$amdSecId] = $childIds;
1081
                }
1082
            }
1083
1084
            $this->mdSecLoaded = true;
1085
        }
1086
        return $this->mdSec;
1087
    }
1088
1089
    /**
1090
     * Gets the document's metadata sections
1091
     *
1092
     * @access protected
1093
     *
1094
     * @return array Array of metadata sections with their IDs as array key
1095
     */
1096
    protected function magicGetDmdSec(): array
1097
    {
1098
        $this->magicGetMdSec();
1099
        return $this->dmdSec;
1100
    }
1101
1102
    /**
1103
     * Processes an element of METS `mdSecType`.
1104
     *
1105
     * @access protected
1106
     *
1107
     * @param SimpleXMLElement $element
1108
     *
1109
     * @return array|null The processed metadata section
1110
     */
1111
    protected function processMdSec(SimpleXMLElement $element): ?array
1112
    {
1113
        $mdId = (string) $element->attributes()->ID;
1114
        if (empty($mdId)) {
1115
            return null;
1116
        }
1117
1118
        $this->registerNamespaces($element);
1119
1120
        $type = '';
1121
        $mdType = $element->xpath('./mets:mdWrap[not(@MDTYPE="OTHER")]/@MDTYPE');
1122
        $otherMdType = $element->xpath('./mets:mdWrap[@MDTYPE="OTHER"]/@OTHERMDTYPE');
1123
1124
        if (!empty($mdType) && !empty($this->formats[(string) $mdType[0]])) {
1125
            $type = (string) $mdType[0];
1126
            $xml = $element->xpath('./mets:mdWrap[@MDTYPE="' . $type . '"]/mets:xmlData/' . strtolower($type) . ':' . $this->formats[$type]['rootElement']);
1127
        } elseif (!empty($otherMdType) && !empty($this->formats[(string) $otherMdType[0]])) {
1128
            $type = (string) $otherMdType[0];
1129
            $xml = $element->xpath('./mets:mdWrap[@MDTYPE="OTHER"][@OTHERMDTYPE="' . $type . '"]/mets:xmlData/' . strtolower($type) . ':' . $this->formats[$type]['rootElement']);
1130
        }
1131
1132
        if (empty($xml)) {
1133
            return null;
1134
        }
1135
1136
        $this->registerNamespaces($xml[0]);
1137
1138
        return [
1139
            'id' => $mdId,
1140
            'section' => $element->getName(),
1141
            'type' => $type,
1142
            'xml' => $xml[0],
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $xml does not seem to be defined for all execution paths leading up to this point.
Loading history...
1143
        ];
1144
    }
1145
1146
    /**
1147
     * This builds the file ID -> USE concordance
1148
     *
1149
     * @access protected
1150
     *
1151
     * @return array Array of file use groups with file IDs
1152
     */
1153
    protected function magicGetFileGrps(): array
1154
    {
1155
        if (!$this->fileGrpsLoaded) {
1156
            // Get configured USE attributes.
1157
            $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'files');
1158
            $useGrps = GeneralUtility::trimExplode(',', $extConf['fileGrpImages']);
1159
            if (!empty($extConf['fileGrpThumbs'])) {
1160
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpThumbs']));
1161
            }
1162
            if (!empty($extConf['fileGrpDownload'])) {
1163
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpDownload']));
1164
            }
1165
            if (!empty($extConf['fileGrpFulltext'])) {
1166
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpFulltext']));
1167
            }
1168
            if (!empty($extConf['fileGrpAudio'])) {
1169
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpAudio']));
1170
            }
1171
            // Get all file groups.
1172
            $fileGrps = $this->mets->xpath('./mets:fileSec/mets:fileGrp');
1173
            if (!empty($fileGrps)) {
1174
                // Build concordance for configured USE attributes.
1175
                foreach ($fileGrps as $fileGrp) {
1176
                    if (in_array((string) $fileGrp['USE'], $useGrps)) {
1177
                        foreach ($fileGrp->children('http://www.loc.gov/METS/')->file as $file) {
1178
                            $fileId = (string) $file->attributes()->ID;
1179
                            $this->fileGrps[$fileId] = (string) $fileGrp['USE'];
0 ignored issues
show
Bug introduced by
The property fileGrps is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1180
                            $this->fileInfos[$fileId] = [
0 ignored issues
show
Bug introduced by
The property fileInfos is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1181
                                'fileGrp' => (string) $fileGrp['USE'],
1182
                                'admId' => (string) $file->attributes()->ADMID,
1183
                                'dmdId' => (string) $file->attributes()->DMDID,
1184
                            ];
1185
                        }
1186
                    }
1187
                }
1188
            }
1189
            // Are there any fulltext files available?
1190
            if (
1191
                !empty($extConf['fileGrpFulltext'])
1192
                && array_intersect(GeneralUtility::trimExplode(',', $extConf['fileGrpFulltext']), $this->fileGrps) !== []
1193
            ) {
1194
                $this->hasFulltext = true;
0 ignored issues
show
Bug introduced by
The property hasFulltext is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1195
            }
1196
            $this->fileGrpsLoaded = true;
1197
        }
1198
        return $this->fileGrps;
1199
    }
1200
1201
    /**
1202
     * @see AbstractDocument::prepareMetadataArray()
1203
     */
1204
    protected function prepareMetadataArray(int $cPid): void
1205
    {
1206
        $ids = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@DMDID]/@ID');
1207
        // Get all logical structure nodes with metadata.
1208
        if (!empty($ids)) {
1209
            foreach ($ids as $id) {
1210
                $this->metadataArray[(string) $id] = $this->getMetadata((string) $id, $cPid);
0 ignored issues
show
Bug introduced by
The property metadataArray is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1211
            }
1212
        }
1213
        // Set current PID for metadata definitions.
1214
    }
1215
1216
    /**
1217
     * This returns $this->mets via __get()
1218
     *
1219
     * @access protected
1220
     *
1221
     * @return SimpleXMLElement The XML's METS part as SimpleXMLElement object
1222
     */
1223
    protected function magicGetMets(): SimpleXMLElement
1224
    {
1225
        return $this->mets;
1226
    }
1227
1228
    /**
1229
     * @see AbstractDocument::magicGetPhysicalStructure()
1230
     */
1231
    protected function magicGetPhysicalStructure(): array
1232
    {
1233
        // Is there no physical structure array yet?
1234
        if (!$this->physicalStructureLoaded) {
1235
            // Does the document have a structMap node of type "PHYSICAL"?
1236
            $elementNodes = $this->mets->xpath('./mets:structMap[@TYPE="PHYSICAL"]/mets:div[@TYPE="physSequence"]/mets:div');
1237
            if (!empty($elementNodes)) {
1238
                // Get file groups.
1239
                $fileUse = $this->magicGetFileGrps();
1240
                // Get the physical sequence's metadata.
1241
                $physNode = $this->mets->xpath('./mets:structMap[@TYPE="PHYSICAL"]/mets:div[@TYPE="physSequence"]');
1242
                $firstNode = $physNode[0];
1243
                $id = (string) $firstNode['ID'];
1244
                $this->physicalStructureInfo[$id]['id'] = $id;
0 ignored issues
show
Bug introduced by
The property physicalStructureInfo is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1245
                $this->physicalStructureInfo[$id]['dmdId'] = isset($firstNode['DMDID']) ? (string) $firstNode['DMDID'] : '';
1246
                $this->physicalStructureInfo[$id]['admId'] = isset($firstNode['ADMID']) ? (string) $firstNode['ADMID'] : '';
1247
                $this->physicalStructureInfo[$id]['order'] = isset($firstNode['ORDER']) ? (string) $firstNode['ORDER'] : '';
1248
                $this->physicalStructureInfo[$id]['label'] = isset($firstNode['LABEL']) ? (string) $firstNode['LABEL'] : '';
1249
                $this->physicalStructureInfo[$id]['orderlabel'] = isset($firstNode['ORDERLABEL']) ? (string) $firstNode['ORDERLABEL'] : '';
1250
                $this->physicalStructureInfo[$id]['type'] = (string) $firstNode['TYPE'];
1251
                $this->physicalStructureInfo[$id]['contentIds'] = isset($firstNode['CONTENTIDS']) ? (string) $firstNode['CONTENTIDS'] : '';
1252
1253
                $this->getFileRepresentation($id, $firstNode);
1254
1255
                $this->physicalStructure = $this->getPhysicalElements($elementNodes, $fileUse);
0 ignored issues
show
Bug introduced by
The property physicalStructure is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1256
            }
1257
            $this->physicalStructureLoaded = true;
1258
        }
1259
        return $this->physicalStructure;
1260
    }
1261
1262
    /**
1263
     * Get the file representations from fileSec node.
1264
     *
1265
     * @access private
1266
     *
1267
     * @param string $id
1268
     * @param SimpleXMLElement $physicalNode
1269
     *
1270
     * @return void
1271
     */
1272
    private function getFileRepresentation(string $id, SimpleXMLElement $physicalNode): void
1273
    {
1274
        // Get file groups.
1275
        $fileUse = $this->magicGetFileGrps();
1276
1277
        foreach ($physicalNode->children('http://www.loc.gov/METS/')->fptr as $fptr) {
1278
            $fileId = (string) $fptr->attributes()->FILEID;
1279
            // Check if file has valid @USE attribute.
1280
            if (!empty($fileUse[$fileId])) {
1281
                $this->physicalStructureInfo[$id]['files'][$fileUse[$fileId]] = $fileId;
0 ignored issues
show
Bug introduced by
The property physicalStructureInfo is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1282
            }
1283
        }
1284
    }
1285
1286
    /**
1287
     * Build the physical elements' array from the physical structMap node.
1288
     *
1289
     * @access private
1290
     *
1291
     * @param array $elementNodes
1292
     * @param array $fileUse
1293
     *
1294
     * @return array
1295
     */
1296
    private function getPhysicalElements(array $elementNodes, array $fileUse): array
1297
    {
1298
        $elements = [];
1299
        $id = '';
1300
1301
        foreach ($elementNodes as $elementNode) {
1302
            $id = (string) $elementNode['ID'];
1303
            $order = (int) $elementNode['ORDER'];
1304
            $elements[$order] = $id;
1305
            $this->physicalStructureInfo[$elements[$order]]['id'] = $id;
0 ignored issues
show
Bug introduced by
The property physicalStructureInfo is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1306
            $this->physicalStructureInfo[$elements[$order]]['dmdId'] = isset($elementNode['DMDID']) ? (string) $elementNode['DMDID'] : '';
1307
            $this->physicalStructureInfo[$elements[$order]]['admId'] = isset($elementNode['ADMID']) ? (string) $elementNode['ADMID'] : '';
1308
            $this->physicalStructureInfo[$elements[$order]]['order'] = isset($elementNode['ORDER']) ? (string) $elementNode['ORDER'] : '';
1309
            $this->physicalStructureInfo[$elements[$order]]['label'] = isset($elementNode['LABEL']) ? (string) $elementNode['LABEL'] : '';
1310
            $this->physicalStructureInfo[$elements[$order]]['orderlabel'] = isset($elementNode['ORDERLABEL']) ? (string) $elementNode['ORDERLABEL'] : '';
1311
            $this->physicalStructureInfo[$elements[$order]]['type'] = (string) $elementNode['TYPE'];
1312
            $this->physicalStructureInfo[$elements[$order]]['contentIds'] = isset($elementNode['CONTENTIDS']) ? (string) $elementNode['CONTENTIDS'] : '';
1313
            // Get the file representations from fileSec node.
1314
            foreach ($elementNode->children('http://www.loc.gov/METS/')->fptr as $fptr) {
1315
                // Check if file has valid @USE attribute.
1316
                if (!empty($fileUse[(string) $fptr->attributes()->FILEID])) {
1317
                    $this->physicalStructureInfo[$elements[$order]]['files'][$fileUse[(string) $fptr->attributes()->FILEID]] = (string) $fptr->attributes()->FILEID;
1318
                }
1319
            }
1320
        }
1321
1322
        // Sort array by keys (= @ORDER).
1323
        ksort($elements);
1324
        // Set total number of pages/tracks.
1325
        $this->numPages = count($elements);
0 ignored issues
show
Bug introduced by
The property numPages is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1326
        // Merge and re-index the array to get numeric indexes.
1327
        array_unshift($elements, $id);
1328
1329
        return $elements;
1330
    }
1331
1332
    /**
1333
     * @see AbstractDocument::magicGetSmLinks()
1334
     */
1335
    protected function magicGetSmLinks(): array
1336
    {
1337
        if (!$this->smLinksLoaded) {
1338
            $smLinks = $this->mets->xpath('./mets:structLink/mets:smLink');
1339
            if (!empty($smLinks)) {
1340
                foreach ($smLinks as $smLink) {
1341
                    $this->smLinks['l2p'][(string) $smLink->attributes('http://www.w3.org/1999/xlink')->from][] = (string) $smLink->attributes('http://www.w3.org/1999/xlink')->to;
0 ignored issues
show
Bug introduced by
The property smLinks is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1342
                    $this->smLinks['p2l'][(string) $smLink->attributes('http://www.w3.org/1999/xlink')->to][] = (string) $smLink->attributes('http://www.w3.org/1999/xlink')->from;
1343
                }
1344
            }
1345
            $this->smLinksLoaded = true;
1346
        }
1347
        return $this->smLinks;
1348
    }
1349
1350
    /**
1351
     * @see AbstractDocument::magicGetThumbnail()
1352
     */
1353
    protected function magicGetThumbnail(bool $forceReload = false): string
1354
    {
1355
        if (
1356
            !$this->thumbnailLoaded
1357
            || $forceReload
1358
        ) {
1359
            // Retain current PID.
1360
            $cPid = $this->cPid ?: $this->pid;
1361
            if (!$cPid) {
1362
                $this->logger->error('Invalid PID ' . $cPid . ' for structure definitions');
1363
                $this->thumbnailLoaded = true;
1364
                return $this->thumbnail;
1365
            }
1366
            // Load extension configuration.
1367
            $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey, 'files');
1368
            if (empty($extConf['fileGrpThumbs'])) {
1369
                $this->logger->warning('No fileGrp for thumbnails specified');
1370
                $this->thumbnailLoaded = true;
1371
                return $this->thumbnail;
1372
            }
1373
            $strctId = $this->magicGetToplevelId();
1374
            $metadata = $this->getToplevelMetadata($cPid);
1375
1376
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
1377
                ->getQueryBuilderForTable('tx_dlf_structures');
1378
1379
            // Get structure element to get thumbnail from.
1380
            $result = $queryBuilder
1381
                ->select('tx_dlf_structures.thumbnail AS thumbnail')
1382
                ->from('tx_dlf_structures')
1383
                ->where(
1384
                    $queryBuilder->expr()->eq('tx_dlf_structures.pid', $cPid),
1385
                    $queryBuilder->expr()->eq('tx_dlf_structures.index_name', $queryBuilder->expr()->literal($metadata['type'][0])),
1386
                    Helper::whereExpression('tx_dlf_structures')
1387
                )
1388
                ->setMaxResults(1)
1389
                ->execute();
1390
1391
            $allResults = $result->fetchAllAssociative();
1392
1393
            if (count($allResults) == 1) {
1394
                $resArray = $allResults[0];
1395
                // Get desired thumbnail structure if not the toplevel structure itself.
1396
                if (!empty($resArray['thumbnail'])) {
1397
                    $strctType = Helper::getIndexNameFromUid($resArray['thumbnail'], 'tx_dlf_structures', $cPid);
1398
                    // Check if this document has a structure element of the desired type.
1399
                    $strctIds = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@TYPE="' . $strctType . '"]/@ID');
1400
                    if (!empty($strctIds)) {
1401
                        $strctId = (string) $strctIds[0];
1402
                    }
1403
                }
1404
                // Load smLinks.
1405
                $this->magicGetSmLinks();
1406
                // Get thumbnail location.
1407
                $fileGrpsThumb = GeneralUtility::trimExplode(',', $extConf['fileGrpThumbs']);
1408
                while ($fileGrpThumb = array_shift($fileGrpsThumb)) {
1409
                    if (
1410
                        $this->magicGetPhysicalStructure()
1411
                        && !empty($this->smLinks['l2p'][$strctId])
1412
                        && !empty($this->physicalStructureInfo[$this->smLinks['l2p'][$strctId][0]]['files'][$fileGrpThumb])
1413
                    ) {
1414
                        $this->thumbnail = $this->getFileLocation($this->physicalStructureInfo[$this->smLinks['l2p'][$strctId][0]]['files'][$fileGrpThumb]);
0 ignored issues
show
Bug introduced by
The property thumbnail is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1415
                        break;
1416
                    } elseif (!empty($this->physicalStructureInfo[$this->physicalStructure[1]]['files'][$fileGrpThumb])) {
1417
                        $this->thumbnail = $this->getFileLocation($this->physicalStructureInfo[$this->physicalStructure[1]]['files'][$fileGrpThumb]);
1418
                        break;
1419
                    }
1420
                }
1421
            } else {
1422
                $this->logger->error('No structure of type "' . $metadata['type'][0] . '" found in database');
1423
            }
1424
            $this->thumbnailLoaded = true;
1425
        }
1426
        return $this->thumbnail;
1427
    }
1428
1429
    /**
1430
     * @see AbstractDocument::magicGetToplevelId()
1431
     */
1432
    protected function magicGetToplevelId(): string
1433
    {
1434
        if (empty($this->toplevelId)) {
1435
            // Get all logical structure nodes with metadata, but without associated METS-Pointers.
1436
            $divs = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@DMDID and not(./mets:mptr)]');
1437
            if (!empty($divs)) {
1438
                // Load smLinks.
1439
                $this->magicGetSmLinks();
1440
                foreach ($divs as $div) {
1441
                    $id = (string) $div['ID'];
1442
                    // Are there physical structure nodes for this logical structure?
1443
                    if (array_key_exists($id, $this->smLinks['l2p'])) {
1444
                        // Yes. That's what we're looking for.
1445
                        $this->toplevelId = $id;
0 ignored issues
show
Bug introduced by
The property toplevelId is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1446
                        break;
1447
                    } elseif (empty($this->toplevelId)) {
1448
                        // No. Remember this anyway, but keep looking for a better one.
1449
                        $this->toplevelId = $id;
1450
                    }
1451
                }
1452
            }
1453
        }
1454
        return $this->toplevelId;
1455
    }
1456
1457
    /**
1458
     * Try to determine URL of parent document.
1459
     *
1460
     * @access public
1461
     *
1462
     * @return string
1463
     */
1464
    public function magicGetParentHref(): string
1465
    {
1466
        if (empty($this->parentHref)) {
1467
            // Get the closest ancestor of the current document which has a MPTR child.
1468
            $parentMptr = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $this->toplevelId . '"]/ancestor::mets:div[./mets:mptr][1]/mets:mptr');
1469
            if (!empty($parentMptr)) {
1470
                $this->parentHref = (string) $parentMptr[0]->attributes('http://www.w3.org/1999/xlink')->href;
0 ignored issues
show
Bug introduced by
The property parentHref is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1471
            }
1472
        }
1473
1474
        return $this->parentHref;
1475
    }
1476
1477
    /**
1478
     * This magic method is executed prior to any serialization of the object
1479
     * @see __wakeup()
1480
     *
1481
     * @access public
1482
     *
1483
     * @return array Properties to be serialized
1484
     */
1485
    public function __sleep(): array
1486
    {
1487
        // SimpleXMLElement objects can't be serialized, thus save the XML as string for serialization
1488
        $this->asXML = $this->xml->asXML();
1489
        return ['pid', 'recordId', 'parentId', 'asXML'];
1490
    }
1491
1492
    /**
1493
     * This magic method is used for setting a string value for the object
1494
     *
1495
     * @access public
1496
     *
1497
     * @return string String representing the METS object
1498
     */
1499
    public function __toString(): string
1500
    {
1501
        $xml = new \DOMDocument('1.0', 'utf-8');
1502
        $xml->appendChild($xml->importNode(dom_import_simplexml($this->mets), true));
1503
        $xml->formatOutput = true;
1504
        return $xml->saveXML();
1505
    }
1506
1507
    /**
1508
     * This magic method is executed after the object is deserialized
1509
     * @see __sleep()
1510
     *
1511
     * @access public
1512
     *
1513
     * @return void
1514
     */
1515
    public function __wakeup(): void
1516
    {
1517
        $xml = Helper::getXmlFileAsString($this->asXML);
1518
        if ($xml !== false) {
1519
            $this->asXML = '';
1520
            $this->xml = $xml;
0 ignored issues
show
Documentation Bug introduced by
It seems like $xml of type SimpleXMLElement is incompatible with the declared type \SimpleXMLElement of property $xml.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
1521
            // Rebuild the unserializable properties.
1522
            $this->init('', $this->settings);
1523
        } else {
1524
            $this->logger = GeneralUtility::makeInstance(LogManager::class)->getLogger(static::class);
1525
            $this->logger->error('Could not load XML after deserialization');
1526
        }
1527
    }
1528
}
1529