Scrutinizer GitHub App not installed

We could not synchronize checks via GitHub's checks API since Scrutinizer's GitHub App is not installed for this repository.

Install GitHub App

GitHub Access Token became invalid

It seems like the GitHub access token used for retrieving details about this repository from GitHub became invalid. This might prevent certain types of inspections from being run (in particular, everything related to pull requests).
Please ask an admin of your repository to re-new the access token on this website.
Passed
Push — master ( cfb46e...3801b6 )
by
unknown
08:09 queued 04:25
created

MetsDocument::magicGetPhysicalStructure()   D

Complexity

Conditions 20
Paths 66

Size

Total Lines 58
Code Lines 37

Duplication

Lines 0
Ratio 0 %

Importance

Changes 2
Bugs 0 Features 0
Metric Value
cc 20
eloc 37
c 2
b 0
f 0
nc 66
nop 0
dl 0
loc 58
rs 4.1666

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * (c) Kitodo. Key to digital objects e.V. <[email protected]>
5
 *
6
 * This file is part of the Kitodo and TYPO3 projects.
7
 *
8
 * @license GNU General Public License version 3 or later.
9
 * For the full copyright and license information, please read the
10
 * LICENSE.txt file that was distributed with this source code.
11
 */
12
13
namespace Kitodo\Dlf\Common;
14
15
use TYPO3\CMS\Core\Configuration\ExtensionConfiguration;
16
use TYPO3\CMS\Core\Database\ConnectionPool;
17
use TYPO3\CMS\Core\Database\Query\Restriction\HiddenRestriction;
18
use TYPO3\CMS\Core\Log\LogManager;
19
use TYPO3\CMS\Core\Utility\GeneralUtility;
20
use Ubl\Iiif\Tools\IiifHelper;
21
use Ubl\Iiif\Services\AbstractImageService;
22
23
/**
24
 * MetsDocument class for the 'dlf' extension.
25
 *
26
 * @package TYPO3
27
 * @subpackage dlf
28
 *
29
 * @access public
30
 *
31
 * @property int $cPid this holds the PID for the configuration
32
 * @property-read array $formats this holds the configuration for all supported metadata encodings
33
 * @property bool $formatsLoaded flag with information if the available metadata formats are loaded
34
 * @property-read bool $hasFulltext flag with information if there are any fulltext files available
35
 * @property array $lastSearchedPhysicalPage the last searched logical and physical page
36
 * @property array $logicalUnits this holds the logical units
37
 * @property-read array $metadataArray this holds the documents' parsed metadata array
38
 * @property bool $metadataArrayLoaded flag with information if the metadata array is loaded
39
 * @property-read int $numPages the holds the total number of pages
40
 * @property-read int $parentId this holds the UID of the parent document or zero if not multi-volumed
41
 * @property-read array $physicalStructure this holds the physical structure
42
 * @property-read array $physicalStructureInfo this holds the physical structure metadata
43
 * @property bool $physicalStructureLoaded flag with information if the physical structure is loaded
44
 * @property-read int $pid this holds the PID of the document or zero if not in database
45
 * @property array $rawTextArray this holds the documents' raw text pages with their corresponding structMap//div's ID (METS) or Range / Manifest / Sequence ID (IIIF) as array key
46
 * @property-read bool $ready Is the document instantiated successfully?
47
 * @property-read string $recordId the METS file's / IIIF manifest's record identifier
48
 * @property array $registry this holds the singleton object of the document
49
 * @property-read int $rootId this holds the UID of the root document or zero if not multi-volumed
50
 * @property-read array $smLinks this holds the smLinks between logical and physical structMap
51
 * @property bool $smLinksLoaded flag with information if the smLinks are loaded
52
 * @property-read array $tableOfContents this holds the logical structure
53
 * @property bool $tableOfContentsLoaded flag with information if the table of contents is loaded
54
 * @property-read string $thumbnail this holds the document's thumbnail location
55
 * @property bool $thumbnailLoaded flag with information if the thumbnail is loaded
56
 * @property-read string $toplevelId this holds the toplevel structure's "@ID" (METS) or the manifest's "@id" (IIIF)
57
 * @property \SimpleXMLElement $xml this holds the whole XML file as \SimpleXMLElement object
58
 * @property-read array $mdSec associative array of METS metadata sections indexed by their IDs.
59
 * @property bool $mdSecLoaded flag with information if the array of METS metadata sections is loaded
60
 * @property-read array $dmdSec subset of `$mdSec` storing only the dmdSec entries; kept for compatibility.
61
 * @property-read array $fileGrps this holds the file ID -> USE concordance
62
 * @property bool $fileGrpsLoaded flag with information if file groups array is loaded
63
 * @property-read array $fileInfos additional information about files (e.g., ADMID), indexed by ID.
64
 * @property-read \SimpleXMLElement $mets this holds the XML file's METS part as \SimpleXMLElement object
65
 * @property-read string $parentHref URL of the parent document (determined via mptr element), or empty string if none is available
66
 */
67
final class MetsDocument extends AbstractDocument
68
{
69
    /**
70
     * @access protected
71
     * @var string[] Subsections / tags that may occur within `<mets:amdSec>`
72
     *
73
     * @link https://www.loc.gov/standards/mets/docs/mets.v1-9.html#amdSec
74
     * @link https://www.loc.gov/standards/mets/docs/mets.v1-9.html#mdSecType
75
     */
76
    protected const ALLOWED_AMD_SEC = ['techMD', 'rightsMD', 'sourceMD', 'digiprovMD'];
77
78
    /**
79
     * @access protected
80
     * @var string This holds the whole XML file as string for serialization purposes
81
     *
82
     * @see __sleep() / __wakeup()
83
     */
84
    protected string $asXML = '';
85
86
    /**
87
     * @access protected
88
     * @var array This maps the ID of each amdSec to the IDs of its children (techMD etc.). When an ADMID references an amdSec instead of techMD etc., this is used to iterate the child elements.
89
     */
90
    protected array $amdSecChildIds = [];
91
92
    /**
93
     * @access protected
94
     * @var array Associative array of METS metadata sections indexed by their IDs.
95
     */
96
    protected array $mdSec = [];
97
98
    /**
99
     * @access protected
100
     * @var bool Are the METS file's metadata sections loaded?
101
     *
102
     * @see MetsDocument::$mdSec
103
     */
104
    protected bool $mdSecLoaded = false;
105
106
    /**
107
     * @access protected
108
     * @var array Subset of $mdSec storing only the dmdSec entries; kept for compatibility.
109
     */
110
    protected array $dmdSec = [];
111
112
    /**
113
     * @access protected
114
     * @var array This holds the file ID -> USE concordance
115
     *
116
     * @see magicGetFileGrps()
117
     */
118
    protected array $fileGrps = [];
119
120
    /**
121
     * @access protected
122
     * @var bool Are the image file groups loaded?
123
     *
124
     * @see $fileGrps
125
     */
126
    protected bool $fileGrpsLoaded = false;
127
128
    /**
129
     * @access protected
130
     * @var \SimpleXMLElement This holds the XML file's METS part as \SimpleXMLElement object
131
     */
132
    protected \SimpleXMLElement $mets;
133
134
    /**
135
     * @access protected
136
     * @var string URL of the parent document (determined via mptr element), or empty string if none is available
137
     */
138
    protected string $parentHref = '';
139
140
    /**
141
     * @access protected
142
     * @var array the extension settings
143
     */
144
    protected array $settings = [];
145
146
    /**
147
     * This adds metadata from METS structural map to metadata array.
148
     *
149
     * @access public
150
     *
151
     * @param array &$metadata The metadata array to extend
152
     * @param string $id The "@ID" attribute of the logical structure node
153
     *
154
     * @return void
155
     */
156
    public function addMetadataFromMets(array &$metadata, string $id): void
157
    {
158
        $details = $this->getLogicalStructure($id);
159
        if (!empty($details)) {
160
            $metadata['mets_order'][0] = $details['order'];
161
            $metadata['mets_label'][0] = $details['label'];
162
            $metadata['mets_orderlabel'][0] = $details['orderlabel'];
163
        }
164
    }
165
166
    /**
167
     * @see AbstractDocument::establishRecordId()
168
     */
169
    protected function establishRecordId(int $pid): void
170
    {
171
        // Check for METS object @ID.
172
        if (!empty($this->mets['OBJID'])) {
173
            $this->recordId = (string) $this->mets['OBJID'];
174
        }
175
        // Get hook objects.
176
        $hookObjects = Helper::getHookObjects('Classes/Common/MetsDocument.php');
177
        // Apply hooks.
178
        foreach ($hookObjects as $hookObj) {
179
            if (method_exists($hookObj, 'postProcessRecordId')) {
180
                $hookObj->postProcessRecordId($this->xml, $this->recordId);
181
            }
182
        }
183
    }
184
185
    /**
186
     * @see AbstractDocument::getDownloadLocation()
187
     */
188
    public function getDownloadLocation(string $id): string
189
    {
190
        $file = $this->getFileInfo($id);
191
        if ($file['mimeType'] === 'application/vnd.kitodo.iiif') {
192
            $file['location'] = (strrpos($file['location'], 'info.json') === strlen($file['location']) - 9) ? $file['location'] : (strrpos($file['location'], '/') === strlen($file['location']) ? $file['location'] . 'info.json' : $file['location'] . '/info.json');
193
            $conf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
194
            IiifHelper::setUrlReader(IiifUrlReader::getInstance());
195
            IiifHelper::setMaxThumbnailHeight($conf['iiifThumbnailHeight']);
196
            IiifHelper::setMaxThumbnailWidth($conf['iiifThumbnailWidth']);
197
            $service = IiifHelper::loadIiifResource($file['location']);
198
            if ($service instanceof AbstractImageService) {
199
                return $service->getImageUrl();
200
            }
201
        } elseif ($file['mimeType'] === 'application/vnd.netfpx') {
202
            $baseURL = $file['location'] . (strpos($file['location'], '?') === false ? '?' : '');
203
            // TODO CVT is an optional IIP server capability; in theory, capabilities should be determined in the object request with '&obj=IIP-server'
204
            return $baseURL . '&CVT=jpeg';
205
        }
206
        return $file['location'];
207
    }
208
209
    /**
210
     * {@inheritDoc}
211
     * @see AbstractDocument::getFileInfo()
212
     */
213
    public function getFileInfo($id): ?array
214
    {
215
        $this->magicGetFileGrps();
216
217
        if (isset($this->fileInfos[$id]) && empty($this->fileInfos[$id]['location'])) {
218
            $this->fileInfos[$id]['location'] = $this->getFileLocation($id);
0 ignored issues
show
Bug introduced by
The property fileInfos is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
219
        }
220
221
        if (isset($this->fileInfos[$id]) && empty($this->fileInfos[$id]['mimeType'])) {
222
            $this->fileInfos[$id]['mimeType'] = $this->getFileMimeType($id);
223
        }
224
225
        return $this->fileInfos[$id];
226
    }
227
228
    /**
229
     * @see AbstractDocument::getFileLocation()
230
     */
231
    public function getFileLocation(string $id): string
232
    {
233
        $location = $this->mets->xpath('./mets:fileSec/mets:fileGrp/mets:file[@ID="' . $id . '"]/mets:FLocat[@LOCTYPE="URL"]');
234
        if (
235
            !empty($id)
236
            && !empty($location)
237
        ) {
238
            return (string) $location[0]->attributes('http://www.w3.org/1999/xlink')->href;
239
        } else {
240
            $this->logger->warning('There is no file node with @ID "' . $id . '"');
241
            return '';
242
        }
243
    }
244
245
    /**
246
     * @see AbstractDocument::getFileMimeType()
247
     */
248
    public function getFileMimeType(string $id): string
249
    {
250
        $mimetype = $this->mets->xpath('./mets:fileSec/mets:fileGrp/mets:file[@ID="' . $id . '"]/@MIMETYPE');
251
        if (
252
            !empty($id)
253
            && !empty($mimetype)
254
        ) {
255
            return (string) $mimetype[0];
256
        } else {
257
            $this->logger->warning('There is no file node with @ID "' . $id . '" or no MIME type specified');
258
            return '';
259
        }
260
    }
261
262
    /**
263
     * @see AbstractDocument::getLogicalStructure()
264
     */
265
    public function getLogicalStructure(string $id, bool $recursive = false): array
266
    {
267
        $details = [];
268
        // Is the requested logical unit already loaded?
269
        if (
270
            !$recursive
271
            && !empty($this->logicalUnits[$id])
272
        ) {
273
            // Yes. Return it.
274
            return $this->logicalUnits[$id];
275
        } elseif (!empty($id)) {
276
            // Get specified logical unit.
277
            $divs = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $id . '"]');
278
        } else {
279
            // Get all logical units at top level.
280
            $divs = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]/mets:div');
281
        }
282
        if (!empty($divs)) {
283
            if (!$recursive) {
284
                // Get the details for the first xpath hit.
285
                $details = $this->getLogicalStructureInfo($divs[0]);
286
            } else {
287
                // Walk the logical structure recursively and fill the whole table of contents.
288
                foreach ($divs as $div) {
289
                    $this->tableOfContents[] = $this->getLogicalStructureInfo($div, $recursive);
290
                }
291
            }
292
        }
293
        return $details;
294
    }
295
296
    /**
297
     * This gets details about a logical structure element
298
     *
299
     * @access protected
300
     *
301
     * @param \SimpleXMLElement $structure The logical structure node
302
     * @param bool $recursive Whether to include the child elements
303
     *
304
     * @return array Array of the element's id, label, type and physical page indexes/mptr link
305
     */
306
    protected function getLogicalStructureInfo(\SimpleXMLElement $structure, bool $recursive = false): array
307
    {
308
        $attributes = $structure->attributes();
309
310
        // Extract identity information.
311
        $details = [
312
            'id' => (string) $attributes['ID'],
313
            'dmdId' => isset($attributes['DMDID']) ? (string) $attributes['DMDID'] : '',
314
            'admId' => isset($attributes['ADMID']) ? (string) $attributes['ADMID'] : '',
315
            'order' => isset($attributes['ORDER']) ? (string) $attributes['ORDER'] : '',
316
            'label' => isset($attributes['LABEL']) ? (string) $attributes['LABEL'] : '',
317
            'orderlabel' => isset($attributes['ORDERLABEL']) ? (string) $attributes['ORDERLABEL'] : '',
318
            'contentIds' => isset($attributes['CONTENTIDS']) ? (string) $attributes['CONTENTIDS'] : '',
319
            'volume' => '',
320
            'year' => '',
321
            'pagination' => '',
322
            'type' => isset($attributes['TYPE']) ? (string) $attributes['TYPE'] : '',
323
            'description' => '',
324
            'thumbnailId' => null,
325
            'files' => [],
326
        ];
327
328
        // Set volume and year information only if no label is set and this is the toplevel structure element.
329
        if (empty($details['label']) && empty($details['orderlabel'])) {
330
            $metadata = $this->getMetadata($details['id']);
331
            $details['volume'] = $metadata['volume'][0] ?? '';
332
            $details['year'] = $metadata['year'][0] ?? '';
333
        }
334
335
        // add description for 3D objects
336
        if ($details['type'] == 'object') {
337
            $metadata = $this->getMetadata($details['id']);
338
            $details['description'] = $metadata['description'][0] ?? '';
339
        }
340
341
        // Load smLinks.
342
        $this->magicGetSmLinks();
343
        // Load physical structure.
344
        $this->magicGetPhysicalStructure();
345
        // Get the physical page or external file this structure element is pointing at.
346
        // Is there a mptr node?
347
        if (count($structure->children('http://www.loc.gov/METS/')->mptr)) {
348
            // Yes. Get the file reference.
349
            $details['points'] = (string) $structure->children('http://www.loc.gov/METS/')->mptr[0]->attributes('http://www.w3.org/1999/xlink')->href;
350
        } elseif (
351
            !empty($this->physicalStructure)
352
            && array_key_exists($details['id'], $this->smLinks['l2p'])
353
        ) {
354
            // Link logical structure to the first corresponding physical page/track.
355
            $details['points'] = max((int) array_search($this->smLinks['l2p'][$details['id']][0], $this->physicalStructure, true), 1);
356
            $details['thumbnailId'] = $this->getThumbnail();
357
            // Get page/track number of the first page/track related to this structure element.
358
            $details['pagination'] = $this->physicalStructureInfo[$this->smLinks['l2p'][$details['id']][0]]['orderlabel'];
359
        } elseif ($details['id'] == $this->magicGetToplevelId()) {
360
            // Point to self if this is the toplevel structure.
361
            $details['points'] = 1;
362
            $details['thumbnailId'] = $this->getThumbnail();
363
        }
364
        if ($details['thumbnailId'] === null) {
365
            unset($details['thumbnailId']);
366
        }
367
        // Get the files this structure element is pointing at.
368
        $fileUse = $this->magicGetFileGrps();
369
        // Get the file representations from fileSec node.
370
        foreach ($structure->children('http://www.loc.gov/METS/')->fptr as $fptr) {
371
            // Check if file has valid @USE attribute.
372
            if (!empty($fileUse[(string) $fptr->attributes()->FILEID])) {
0 ignored issues
show
Bug introduced by
The method attributes() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

372
            if (!empty($fileUse[(string) $fptr->/** @scrutinizer ignore-call */ attributes()->FILEID])) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
373
                $details['files'][$fileUse[(string) $fptr->attributes()->FILEID]] = (string) $fptr->attributes()->FILEID;
374
            }
375
        }
376
        // Keep for later usage.
377
        $this->logicalUnits[$details['id']] = $details;
378
        // Walk the structure recursively? And are there any children of the current element?
379
        if (
380
            $recursive
381
            && count($structure->children('http://www.loc.gov/METS/')->div)
382
        ) {
383
            $details['children'] = [];
384
            foreach ($structure->children('http://www.loc.gov/METS/')->div as $child) {
385
                // Repeat for all children.
386
                $details['children'][] = $this->getLogicalStructureInfo($child, true);
0 ignored issues
show
Bug introduced by
It seems like $child can also be of type null; however, parameter $structure of Kitodo\Dlf\Common\MetsDo...tLogicalStructureInfo() does only seem to accept SimpleXMLElement, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

386
                $details['children'][] = $this->getLogicalStructureInfo(/** @scrutinizer ignore-type */ $child, true);
Loading history...
387
            }
388
        }
389
        return $details;
390
    }
391
392
    /**
393
     * Get thumbnail for logical structure info.
394
     *
395
     * @access private
396
     *
397
     * @param string $id empty if top level document, else passed the id of parent document
398
     *
399
     * @return ?string thumbnail or null if not found
400
     */
401
    private function getThumbnail(string $id = '')
402
    {
403
        // Load plugin configuration.
404
        $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
405
        $fileGrpsThumb = GeneralUtility::trimExplode(',', $extConf['fileGrpThumbs']);
406
407
        $thumbnail = null;
408
409
        while ($fileGrpThumb = array_shift($fileGrpsThumb)) {
410
            if (empty($id)) {
411
                $thumbnail = $this->physicalStructureInfo[$this->physicalStructure[1]]['files'][$fileGrpThumb] ?? null;
412
            } else {
413
                $parentId = $this->smLinks['l2p'][$id][0] ?? null;
414
                $thumbnail = $this->physicalStructureInfo[$parentId]['files'][$fileGrpThumb] ?? null;
415
            }
416
417
            if (!empty($thumbnail)) {
418
                break;
419
            }
420
        }
421
        return $thumbnail;
422
    }
423
424
    /**
425
     * @see AbstractDocument::getMetadata()
426
     */
427
    public function getMetadata(string $id, int $cPid = 0): array
428
    {
429
        $cPid = $this->ensureValidPid($cPid);
430
431
        if ($cPid == 0) {
432
            $this->logger->warning('Invalid PID for metadata definitions');
433
            return [];
434
        }
435
436
        $metadata = $this->getMetadataFromArray($id, $cPid);
437
438
        if (empty($metadata)) {
439
            return [];
440
        }
441
442
        $metadata = $this->processMetadataSections($id, $cPid, $metadata);
443
444
        if (!empty($metadata)) {
445
            $metadata = $this->setDefaultTitleAndDate($metadata);
446
        }
447
448
        return $metadata;
449
    }
450
451
    /**
452
     * Ensure that pId is valid.
453
     *
454
     * @access private
455
     *
456
     * @param integer $cPid
457
     *
458
     * @return integer
459
     */
460
    private function ensureValidPid(int $cPid): int
461
    {
462
        $cPid = max($cPid, 0);
463
        if ($cPid == 0 && ($this->cPid || $this->pid)) {
464
            // Retain current PID.
465
            $cPid = $this->cPid ?: $this->pid;
466
        }
467
        return $cPid;
468
    }
469
470
    /**
471
     * Get metadata from array.
472
     *
473
     * @access private
474
     *
475
     * @param string $id
476
     * @param integer $cPid
477
     *
478
     * @return array
479
     */
480
    private function getMetadataFromArray(string $id, int $cPid): array
481
    {
482
        if (!empty($this->metadataArray[$id]) && $this->metadataArray[0] == $cPid) {
483
            return $this->metadataArray[$id];
484
        }
485
        return $this->initializeMetadata('METS');
486
    }
487
488
    /**
489
     * Process metadata sections.
490
     *
491
     * @access private
492
     *
493
     * @param string $id
494
     * @param integer $cPid
495
     * @param array $metadata
496
     *
497
     * @return array
498
     */
499
    private function processMetadataSections(string $id, int $cPid, array $metadata): array
500
    {
501
        $mdIds = $this->getMetadataIds($id);
502
        if (empty($mdIds)) {
503
            // There is no metadata section for this structure node.
504
            return [];
505
        }
506
        // Associative array used as set of available section types (dmdSec, techMD, ...)
507
        $hasMetadataSection = [];
508
        // Load available metadata formats and metadata sections.
509
        $this->loadFormats();
510
        $this->magicGetMdSec();
511
512
        $metadata['type'] = $this->getLogicalUnitType($id);
513
514
        foreach ($mdIds as $dmdId) {
515
            $mdSectionType = $this->mdSec[$dmdId]['section'];
516
517
            if ($mdSectionType === 'dmdSec' && isset($hasMetadataSection['dmdSec'])) {
518
                continue;
519
            }
520
521
            if (!$this->extractAndProcessMetadata($dmdId, $mdSectionType, $metadata, $cPid, $hasMetadataSection)) {
522
                continue;
523
            }
524
525
            $hasMetadataSection[$mdSectionType] = true;
526
        }
527
528
        // Files are not expected to reference a dmdSec
529
        if (isset($this->fileInfos[$id]) || isset($hasMetadataSection['dmdSec'])) {
530
            return $metadata;
531
        } else {
532
            $this->logger->warning('No supported descriptive metadata found for logical structure with @ID "' . $id . '"');
533
            return [];
534
        }
535
    }
536
537
    /**
538
     * Get logical unit type.
539
     *
540
     * @access private
541
     *
542
     * @param string $id
543
     *
544
     * @return array
545
     */
546
    private function getLogicalUnitType(string $id): array
547
    {
548
        if (!empty($this->logicalUnits[$id])) {
549
            return [$this->logicalUnits[$id]['type']];
550
        } else {
551
            $struct = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $id . '"]/@TYPE');
552
            if (!empty($struct)) {
553
                return [(string) $struct[0]];
554
            }
555
        }
556
        return [];
557
    }
558
559
    /**
560
     * Extract and process metadata.
561
     *
562
     * @access private
563
     *
564
     * @param string $dmdId
565
     * @param string $mdSectionType
566
     * @param array $metadata
567
     * @param integer $cPid
568
     * @param array $hasMetadataSection
569
     *
570
     * @return boolean
571
     */
572
    private function extractAndProcessMetadata(string $dmdId, string $mdSectionType, array &$metadata, int $cPid, array $hasMetadataSection): bool
573
    {
574
        if ($mdSectionType === 'dmdSec' && isset($hasMetadataSection['dmdSec'])) {
575
            return true;
576
        }
577
578
        $metadataExtracted = $this->extractMetadataIfTypeSupported($dmdId, $mdSectionType, $metadata);
579
580
        if (!$metadataExtracted) {
581
            return false;
582
        }
583
584
        $additionalMetadata = $this->getAdditionalMetadataFromDatabase($cPid, $dmdId);
585
        // We need a \DOMDocument here, because SimpleXML doesn't support XPath functions properly.
586
        $domNode = dom_import_simplexml($this->mdSec[$dmdId]['xml']);
587
        $domXPath = new \DOMXPath($domNode->ownerDocument);
0 ignored issues
show
Bug introduced by
It seems like $domNode->ownerDocument can also be of type null; however, parameter $document of DOMXPath::__construct() does only seem to accept DOMDocument, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

587
        $domXPath = new \DOMXPath(/** @scrutinizer ignore-type */ $domNode->ownerDocument);
Loading history...
588
        $this->registerNamespaces($domXPath);
589
590
        $this->processAdditionalMetadata($additionalMetadata, $domXPath, $domNode, $metadata);
591
592
        return true;
593
    }
594
595
    /**
596
     * Process additional metadata.
597
     *
598
     * @access private
599
     *
600
     * @param array $additionalMetadata
601
     * @param \DOMXPath $domXPath
602
     * @param \DOMElement $domNode
603
     * @param array $metadata
604
     *
605
     * @return void
606
     */
607
    private function processAdditionalMetadata(array $additionalMetadata, \DOMXPath $domXPath, \DOMElement $domNode, array &$metadata): void
608
    {
609
        foreach ($additionalMetadata as $resArray) {
610
            $this->setMetadataFieldValues($resArray, $domXPath, $domNode, $metadata);
611
            $this->setDefaultMetadataValue($resArray, $metadata);
612
            $this->setSortableMetadataValue($resArray, $domXPath, $domNode, $metadata);
613
        }
614
    }
615
616
    /**
617
     * Set metadata field values.
618
     *
619
     * @access private
620
     *
621
     * @param array $resArray
622
     * @param \DOMXPath $domXPath
623
     * @param \DOMElement $domNode
624
     * @param array $metadata
625
     *
626
     * @return void
627
     */
628
    private function setMetadataFieldValues(array $resArray, \DOMXPath $domXPath, \DOMElement $domNode, array &$metadata): void
629
    {
630
        if ($resArray['format'] > 0 && !empty($resArray['xpath'])) {
631
            $values = $domXPath->evaluate($resArray['xpath'], $domNode);
632
            if ($values instanceof \DOMNodeList && $values->length > 0) {
633
                $metadata[$resArray['index_name']] = [];
634
                foreach ($values as $value) {
635
                    $metadata[$resArray['index_name']][] = trim((string) $value->nodeValue);
636
                }
637
            } elseif (!($values instanceof \DOMNodeList)) {
638
                $metadata[$resArray['index_name']] = [trim((string) $values)];
639
            }
640
        }
641
    }
642
643
    /**
644
     * Set default metadata value.
645
     *
646
     * @access private
647
     *
648
     * @param array $resArray
649
     * @param array $metadata
650
     *
651
     * @return void
652
     */
653
    private function setDefaultMetadataValue(array $resArray, array &$metadata): void
654
    {
655
        if (empty($metadata[$resArray['index_name']][0]) && strlen($resArray['default_value']) > 0) {
656
            $metadata[$resArray['index_name']] = [$resArray['default_value']];
657
        }
658
    }
659
660
    /**
661
     * Set sortable metadata value.
662
     *
663
     * @access private
664
     *
665
     * @param array $resArray
666
     * @param \DOMXPath $domXPath
667
     * @param \DOMElement $domNode
668
     * @param array $metadata
669
     *
670
     * @return void
671
     */
672
    private function setSortableMetadataValue(array $resArray, \DOMXPath $domXPath, \DOMElement $domNode, array &$metadata): void
673
    {
674
        if (!empty($metadata[$resArray['index_name']]) && $resArray['is_sortable']) {
675
            if ($resArray['format'] > 0 && !empty($resArray['xpath_sorting'])) {
676
                $values = $domXPath->evaluate($resArray['xpath_sorting'], $domNode);
677
                if ($values instanceof \DOMNodeList && $values->length > 0) {
678
                    $metadata[$resArray['index_name'] . '_sorting'][0] = trim((string) $values->item(0)->nodeValue);
679
                } elseif (!($values instanceof \DOMNodeList)) {
680
                    $metadata[$resArray['index_name'] . '_sorting'][0] = trim((string) $values);
681
                }
682
            }
683
            if (empty($metadata[$resArray['index_name'] . '_sorting'][0])) {
684
                $metadata[$resArray['index_name'] . '_sorting'][0] = $metadata[$resArray['index_name']][0];
685
            }
686
        }
687
    }
688
689
    /**
690
     * Set default title and date if those metadata is not set.
691
     *
692
     * @access private
693
     *
694
     * @param array $metadata
695
     *
696
     * @return array
697
     */
698
    private function setDefaultTitleAndDate(array $metadata): array
699
    {
700
        // Set title to empty string if not present.
701
        if (empty($metadata['title'][0])) {
702
            $metadata['title'][0] = '';
703
            $metadata['title_sorting'][0] = '';
704
        }
705
706
        // Set title_sorting to title as default.
707
        if (empty($metadata['title_sorting'][0])) {
708
            $metadata['title_sorting'][0] = $metadata['title'][0];
709
        }
710
711
        // Set date to empty string if not present.
712
        if (empty($metadata['date'][0])) {
713
            $metadata['date'][0] = '';
714
        }
715
716
        return $metadata;
717
    }
718
719
    /**
720
     * Extract metadata if metadata type is supported.
721
     *
722
     * @access private
723
     *
724
     * @param string $dmdId descriptive metadata id
725
     * @param string $mdSectionType metadata section type
726
     * @param array &$metadata
727
     *
728
     * @return bool true if extraction successful, false otherwise
729
     */
730
    private function extractMetadataIfTypeSupported(string $dmdId, string $mdSectionType, array &$metadata)
731
    {
732
        // Is this metadata format supported?
733
        if (!empty($this->formats[$this->mdSec[$dmdId]['type']])) {
734
            if (!empty($this->formats[$this->mdSec[$dmdId]['type']]['class'])) {
735
                $class = $this->formats[$this->mdSec[$dmdId]['type']]['class'];
736
                // Get the metadata from class.
737
                if (class_exists($class)) {
738
                    $obj = GeneralUtility::makeInstance($class);
739
                    if ($obj instanceof MetadataInterface) {
740
                        $obj->extractMetadata($this->mdSec[$dmdId]['xml'], $metadata, GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey)['useExternalApisForMetadata']);
741
                        return true;
742
                    }
743
                } else {
744
                    $this->logger->warning('Invalid class/method "' . $class . '->extractMetadata()" for metadata format "' . $this->mdSec[$dmdId]['type'] . '"');
745
                }
746
            }
747
        } else {
748
            $this->logger->notice('Unsupported metadata format "' . $this->mdSec[$dmdId]['type'] . '" in ' . $mdSectionType . ' with @ID "' . $dmdId . '"');
749
        }
750
        return false;
751
    }
752
753
    /**
754
     * Get additional data from database.
755
     *
756
     * @access private
757
     *
758
     * @param int $cPid page id
759
     * @param string $dmdId descriptive metadata id
760
     *
761
     * @return array additional metadata data queried from database
762
     */
763
    private function getAdditionalMetadataFromDatabase(int $cPid, string $dmdId)
764
    {
765
        $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
766
            ->getQueryBuilderForTable('tx_dlf_metadata');
767
        // Get hidden records, too.
768
        $queryBuilder
769
            ->getRestrictions()
770
            ->removeByType(HiddenRestriction::class);
771
        // Get all metadata with configured xpath and applicable format first.
772
        $resultWithFormat = $queryBuilder
773
            ->select(
774
                'tx_dlf_metadata.index_name AS index_name',
775
                'tx_dlf_metadataformat_joins.xpath AS xpath',
776
                'tx_dlf_metadataformat_joins.xpath_sorting AS xpath_sorting',
777
                'tx_dlf_metadata.is_sortable AS is_sortable',
778
                'tx_dlf_metadata.default_value AS default_value',
779
                'tx_dlf_metadata.format AS format'
780
            )
781
            ->from('tx_dlf_metadata')
782
            ->innerJoin(
783
                'tx_dlf_metadata',
784
                'tx_dlf_metadataformat',
785
                'tx_dlf_metadataformat_joins',
786
                $queryBuilder->expr()->eq(
787
                    'tx_dlf_metadataformat_joins.parent_id',
788
                    'tx_dlf_metadata.uid'
789
                )
790
            )
791
            ->innerJoin(
792
                'tx_dlf_metadataformat_joins',
793
                'tx_dlf_formats',
794
                'tx_dlf_formats_joins',
795
                $queryBuilder->expr()->eq(
796
                    'tx_dlf_formats_joins.uid',
797
                    'tx_dlf_metadataformat_joins.encoded'
798
                )
799
            )
800
            ->where(
801
                $queryBuilder->expr()->eq('tx_dlf_metadata.pid', $cPid),
802
                $queryBuilder->expr()->eq('tx_dlf_metadata.l18n_parent', 0),
803
                $queryBuilder->expr()->eq('tx_dlf_metadataformat_joins.pid', $cPid),
804
                $queryBuilder->expr()->eq('tx_dlf_formats_joins.type', $queryBuilder->createNamedParameter($this->mdSec[$dmdId]['type']))
805
            )
806
            ->execute();
807
        // Get all metadata without a format, but with a default value next.
808
        $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
809
            ->getQueryBuilderForTable('tx_dlf_metadata');
810
            // Get hidden records, too.
811
        $queryBuilder
812
            ->getRestrictions()
813
            ->removeByType(HiddenRestriction::class);
814
        $resultWithoutFormat = $queryBuilder
815
            ->select(
816
                'tx_dlf_metadata.index_name AS index_name',
817
                'tx_dlf_metadata.is_sortable AS is_sortable',
818
                'tx_dlf_metadata.default_value AS default_value',
819
                'tx_dlf_metadata.format AS format'
820
            )
821
            ->from('tx_dlf_metadata')
822
            ->where(
823
                $queryBuilder->expr()->eq('tx_dlf_metadata.pid', $cPid),
824
                $queryBuilder->expr()->eq('tx_dlf_metadata.l18n_parent', 0),
825
                $queryBuilder->expr()->eq('tx_dlf_metadata.format', 0),
826
                $queryBuilder->expr()->neq('tx_dlf_metadata.default_value', $queryBuilder->createNamedParameter(''))
827
            )
828
            ->execute();
829
        // Merge both result sets.
830
        return array_merge($resultWithFormat->fetchAllAssociative(), $resultWithoutFormat->fetchAllAssociative());
831
    }
832
833
    /**
834
     * Get IDs of (descriptive and administrative) metadata sections
835
     * referenced by node of given $id. The $id may refer to either
836
     * a logical structure node or to a file.
837
     *
838
     * @access protected
839
     *
840
     * @param string $id The "@ID" attribute of the file node
841
     *
842
     * @return array
843
     */
844
    protected function getMetadataIds(string $id): array
845
    {
846
        // Load amdSecChildIds concordance
847
        $this->magicGetMdSec();
848
        $fileInfo = $this->getFileInfo($id);
849
850
        // Get DMDID and ADMID of logical structure node
851
        if (!empty($this->logicalUnits[$id])) {
852
            $dmdIds = $this->logicalUnits[$id]['dmdId'] ?? '';
853
            $admIds = $this->logicalUnits[$id]['admId'] ?? '';
854
        } else {
855
            $mdSec = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $id . '"]')[0];
856
            if ($mdSec) {
0 ignored issues
show
introduced by
$mdSec is of type SimpleXMLElement, thus it always evaluated to true.
Loading history...
857
                $dmdIds = (string) $mdSec->attributes()->DMDID;
858
                $admIds = (string) $mdSec->attributes()->ADMID;
859
            } elseif (isset($fileInfo)) {
860
                $dmdIds = $fileInfo['dmdId'];
861
                $admIds = $fileInfo['admId'];
862
            } else {
863
                $dmdIds = '';
864
                $admIds = '';
865
            }
866
        }
867
868
        // Handle multiple DMDIDs/ADMIDs
869
        $allMdIds = explode(' ', $dmdIds);
870
871
        foreach (explode(' ', $admIds) as $admId) {
872
            if (isset($this->mdSec[$admId])) {
873
                // $admId references an actual metadata section such as techMD
874
                $allMdIds[] = $admId;
875
            } elseif (isset($this->amdSecChildIds[$admId])) {
876
                // $admId references a <mets:amdSec> element. Resolve child elements.
877
                foreach ($this->amdSecChildIds[$admId] as $childId) {
878
                    $allMdIds[] = $childId;
879
                }
880
            }
881
        }
882
883
        return array_filter(
884
            $allMdIds,
885
            function ($element) {
886
                return !empty($element);
887
            }
888
        );
889
    }
890
891
    /**
892
     * @see AbstractDocument::getFullText()
893
     */
894
    public function getFullText(string $id): string
895
    {
896
        $fullText = '';
897
898
        // Load fileGrps and check for full text files.
899
        $this->magicGetFileGrps();
900
        if ($this->hasFulltext) {
901
            $fullText = $this->getFullTextFromXml($id);
902
        }
903
        return $fullText;
904
    }
905
906
    /**
907
     * @see AbstractDocument::getStructureDepth()
908
     */
909
    public function getStructureDepth(string $logId)
910
    {
911
        $ancestors = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $logId . '"]/ancestor::*');
912
        if (!empty($ancestors)) {
913
            return count($ancestors);
914
        } else {
915
            return 0;
916
        }
917
    }
918
919
    /**
920
     * @see AbstractDocument::init()
921
     */
922
    protected function init(string $location, array $settings): void
923
    {
924
        $this->logger = GeneralUtility::makeInstance(LogManager::class)->getLogger(get_class($this));
925
        $this->settings = $settings;
926
        // Get METS node from XML file.
927
        $this->registerNamespaces($this->xml);
928
        $mets = $this->xml->xpath('//mets:mets');
929
        if (!empty($mets)) {
930
            $this->mets = $mets[0];
931
            // Register namespaces.
932
            $this->registerNamespaces($this->mets);
933
        } else {
934
            if (!empty($location)) {
935
                $this->logger->error('No METS part found in document with location "' . $location . '".');
936
            } elseif (!empty($this->recordId)) {
937
                $this->logger->error('No METS part found in document with recordId "' . $this->recordId . '".');
938
            } else {
939
                $this->logger->error('No METS part found in current document.');
940
            }
941
        }
942
    }
943
944
    /**
945
     * @see AbstractDocument::loadLocation()
946
     */
947
    protected function loadLocation(string $location): bool
948
    {
949
        $fileResource = Helper::getUrl($location);
950
        if ($fileResource !== false) {
951
            $xml = Helper::getXmlFileAsString($fileResource);
952
            // Set some basic properties.
953
            if ($xml !== false) {
954
                $this->xml = $xml;
955
                return true;
956
            }
957
        }
958
        $this->logger->error('Could not load XML file from "' . $location . '"');
959
        return false;
960
    }
961
962
    /**
963
     * @see AbstractDocument::ensureHasFulltextIsSet()
964
     */
965
    protected function ensureHasFulltextIsSet(): void
966
    {
967
        // Are the fileGrps already loaded?
968
        if (!$this->fileGrpsLoaded) {
969
            $this->magicGetFileGrps();
970
        }
971
    }
972
973
    /**
974
     * @see AbstractDocument::setPreloadedDocument()
975
     */
976
    protected function setPreloadedDocument($preloadedDocument): bool
977
    {
978
979
        if ($preloadedDocument instanceof \SimpleXMLElement) {
980
            $this->xml = $preloadedDocument;
981
            return true;
982
        }
983
        return false;
984
    }
985
986
    /**
987
     * @see AbstractDocument::getDocument()
988
     */
989
    protected function getDocument(): \SimpleXMLElement
990
    {
991
        return $this->mets;
992
    }
993
994
    /**
995
     * This builds an array of the document's metadata sections
996
     *
997
     * @access protected
998
     *
999
     * @return array Array of metadata sections with their IDs as array key
1000
     */
1001
    protected function magicGetMdSec(): array
1002
    {
1003
        if (!$this->mdSecLoaded) {
1004
            $this->loadFormats();
1005
1006
            foreach ($this->mets->xpath('./mets:dmdSec') as $dmdSecTag) {
1007
                $dmdSec = $this->processMdSec($dmdSecTag);
1008
1009
                if ($dmdSec !== null) {
1010
                    $this->mdSec[$dmdSec['id']] = $dmdSec;
0 ignored issues
show
Bug introduced by
The property mdSec is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1011
                    $this->dmdSec[$dmdSec['id']] = $dmdSec;
1012
                }
1013
            }
1014
1015
            foreach ($this->mets->xpath('./mets:amdSec') as $amdSecTag) {
1016
                $childIds = [];
1017
1018
                foreach ($amdSecTag->children('http://www.loc.gov/METS/') as $mdSecTag) {
1019
                    if (!in_array($mdSecTag->getName(), self::ALLOWED_AMD_SEC)) {
1020
                        continue;
1021
                    }
1022
1023
                    // TODO: Should we check that the format may occur within this type (e.g., to ignore VIDEOMD within rightsMD)?
1024
                    $mdSec = $this->processMdSec($mdSecTag);
0 ignored issues
show
Bug introduced by
It seems like $mdSecTag can also be of type null; however, parameter $element of Kitodo\Dlf\Common\MetsDocument::processMdSec() does only seem to accept SimpleXMLElement, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

1024
                    $mdSec = $this->processMdSec(/** @scrutinizer ignore-type */ $mdSecTag);
Loading history...
1025
1026
                    if ($mdSec !== null) {
1027
                        $this->mdSec[$mdSec['id']] = $mdSec;
1028
1029
                        $childIds[] = $mdSec['id'];
1030
                    }
1031
                }
1032
1033
                $amdSecId = (string) $amdSecTag->attributes()->ID;
1034
                if (!empty($amdSecId)) {
1035
                    $this->amdSecChildIds[$amdSecId] = $childIds;
1036
                }
1037
            }
1038
1039
            $this->mdSecLoaded = true;
1040
        }
1041
        return $this->mdSec;
1042
    }
1043
1044
    /**
1045
     * Gets the document's metadata sections
1046
     *
1047
     * @access protected
1048
     *
1049
     * @return array Array of metadata sections with their IDs as array key
1050
     */
1051
    protected function magicGetDmdSec(): array
1052
    {
1053
        $this->magicGetMdSec();
1054
        return $this->dmdSec;
1055
    }
1056
1057
    /**
1058
     * Processes an element of METS `mdSecType`.
1059
     *
1060
     * @access protected
1061
     *
1062
     * @param \SimpleXMLElement $element
1063
     *
1064
     * @return array|null The processed metadata section
1065
     */
1066
    protected function processMdSec(\SimpleXMLElement $element): ?array
1067
    {
1068
        $mdId = (string) $element->attributes()->ID;
1069
        if (empty($mdId)) {
1070
            return null;
1071
        }
1072
1073
        $this->registerNamespaces($element);
1074
1075
        $type = '';
1076
        $mdType = $element->xpath('./mets:mdWrap[not(@MDTYPE="OTHER")]/@MDTYPE');
1077
        $otherMdType = $element->xpath('./mets:mdWrap[@MDTYPE="OTHER"]/@OTHERMDTYPE');
1078
1079
        if (!empty($mdType) && !empty($this->formats[(string) $mdType[0]])) {
1080
            $type = (string) $mdType[0];
1081
            $xml = $element->xpath('./mets:mdWrap[@MDTYPE="' . $type . '"]/mets:xmlData/' . strtolower($type) . ':' . $this->formats[$type]['rootElement']);
1082
        } elseif (!empty($otherMdType) && !empty($this->formats[(string) $otherMdType[0]])) {
1083
            $type = (string) $otherMdType[0];
1084
            $xml = $element->xpath('./mets:mdWrap[@MDTYPE="OTHER"][@OTHERMDTYPE="' . $type . '"]/mets:xmlData/' . strtolower($type) . ':' . $this->formats[$type]['rootElement']);
1085
        }
1086
1087
        if (empty($xml)) {
1088
            return null;
1089
        }
1090
1091
        $this->registerNamespaces($xml[0]);
1092
1093
        return [
1094
            'id' => $mdId,
1095
            'section' => $element->getName(),
1096
            'type' => $type,
1097
            'xml' => $xml[0],
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $xml does not seem to be defined for all execution paths leading up to this point.
Loading history...
1098
        ];
1099
    }
1100
1101
    /**
1102
     * This builds the file ID -> USE concordance
1103
     *
1104
     * @access protected
1105
     *
1106
     * @return array Array of file use groups with file IDs
1107
     */
1108
    protected function magicGetFileGrps(): array
1109
    {
1110
        if (!$this->fileGrpsLoaded) {
1111
            // Get configured USE attributes.
1112
            $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
1113
            $useGrps = GeneralUtility::trimExplode(',', $extConf['fileGrpImages']);
1114
            if (!empty($extConf['fileGrpThumbs'])) {
1115
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpThumbs']));
1116
            }
1117
            if (!empty($extConf['fileGrpDownload'])) {
1118
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpDownload']));
1119
            }
1120
            if (!empty($extConf['fileGrpFulltext'])) {
1121
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpFulltext']));
1122
            }
1123
            if (!empty($extConf['fileGrpAudio'])) {
1124
                $useGrps = array_merge($useGrps, GeneralUtility::trimExplode(',', $extConf['fileGrpAudio']));
1125
            }
1126
            // Get all file groups.
1127
            $fileGrps = $this->mets->xpath('./mets:fileSec/mets:fileGrp');
1128
            if (!empty($fileGrps)) {
1129
                // Build concordance for configured USE attributes.
1130
                foreach ($fileGrps as $fileGrp) {
1131
                    if (in_array((string) $fileGrp['USE'], $useGrps)) {
1132
                        foreach ($fileGrp->children('http://www.loc.gov/METS/')->file as $file) {
1133
                            $fileId = (string) $file->attributes()->ID;
0 ignored issues
show
Bug introduced by
The method attributes() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

1133
                            $fileId = (string) $file->/** @scrutinizer ignore-call */ attributes()->ID;

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
1134
                            $this->fileGrps[$fileId] = (string) $fileGrp['USE'];
1135
                            $this->fileInfos[$fileId] = [
0 ignored issues
show
Bug introduced by
The property fileInfos is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1136
                                'fileGrp' => (string) $fileGrp['USE'],
1137
                                'admId' => (string) $file->attributes()->ADMID,
1138
                                'dmdId' => (string) $file->attributes()->DMDID,
1139
                            ];
1140
                        }
1141
                    }
1142
                }
1143
            }
1144
            // Are there any fulltext files available?
1145
            if (
1146
                !empty($extConf['fileGrpFulltext'])
1147
                && array_intersect(GeneralUtility::trimExplode(',', $extConf['fileGrpFulltext']), $this->fileGrps) !== []
1148
            ) {
1149
                $this->hasFulltext = true;
1150
            }
1151
            $this->fileGrpsLoaded = true;
1152
        }
1153
        return $this->fileGrps;
1154
    }
1155
1156
    /**
1157
     * @see AbstractDocument::prepareMetadataArray()
1158
     */
1159
    protected function prepareMetadataArray(int $cPid): void
1160
    {
1161
        $ids = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@DMDID]/@ID');
1162
        // Get all logical structure nodes with metadata.
1163
        if (!empty($ids)) {
1164
            foreach ($ids as $id) {
1165
                $this->metadataArray[(string) $id] = $this->getMetadata((string) $id, $cPid);
1166
            }
1167
        }
1168
        // Set current PID for metadata definitions.
1169
    }
1170
1171
    /**
1172
     * This returns $this->mets via __get()
1173
     *
1174
     * @access protected
1175
     *
1176
     * @return \SimpleXMLElement The XML's METS part as \SimpleXMLElement object
1177
     */
1178
    protected function magicGetMets(): \SimpleXMLElement
1179
    {
1180
        return $this->mets;
1181
    }
1182
1183
    /**
1184
     * @see AbstractDocument::magicGetPhysicalStructure()
1185
     */
1186
    protected function magicGetPhysicalStructure(): array
1187
    {
1188
        // Is there no physical structure array yet?
1189
        if (!$this->physicalStructureLoaded) {
1190
            // Does the document have a structMap node of type "PHYSICAL"?
1191
            $elementNodes = $this->mets->xpath('./mets:structMap[@TYPE="PHYSICAL"]/mets:div[@TYPE="physSequence"]/mets:div');
1192
            if (!empty($elementNodes)) {
1193
                // Get file groups.
1194
                $fileUse = $this->magicGetFileGrps();
1195
                // Get the physical sequence's metadata.
1196
                $physNode = $this->mets->xpath('./mets:structMap[@TYPE="PHYSICAL"]/mets:div[@TYPE="physSequence"]');
1197
                $id = (string) $physNode[0]['ID'];
1198
                $this->physicalStructureInfo[$id]['id'] = (string) $physNode[0]['ID'];
1199
                $this->physicalStructureInfo[$id]['dmdId'] = (isset($physNode[0]['DMDID']) ? (string) $physNode[0]['DMDID'] : '');
1200
                $this->physicalStructureInfo[$id]['admId'] = (isset($physNode[0]['ADMID']) ? (string) $physNode[0]['ADMID'] : '');
1201
                $this->physicalStructureInfo[$id]['order'] = (isset($physNode[0]['ORDER']) ? (string) $physNode[0]['ORDER'] : '');
1202
                $this->physicalStructureInfo[$id]['label'] = (isset($physNode[0]['LABEL']) ? (string) $physNode[0]['LABEL'] : '');
1203
                $this->physicalStructureInfo[$id]['orderlabel'] = (isset($physNode[0]['ORDERLABEL']) ? (string) $physNode[0]['ORDERLABEL'] : '');
1204
                $this->physicalStructureInfo[$id]['type'] = (string) $physNode[0]['TYPE'];
1205
                $this->physicalStructureInfo[$id]['contentIds'] = (isset($physNode[0]['CONTENTIDS']) ? (string) $physNode[0]['CONTENTIDS'] : '');
1206
                // Get the file representations from fileSec node.
1207
                foreach ($physNode[0]->children('http://www.loc.gov/METS/')->fptr as $fptr) {
1208
                    // Check if file has valid @USE attribute.
1209
                    if (!empty($fileUse[(string) $fptr->attributes()->FILEID])) {
1210
                        $this->physicalStructureInfo[$id]['files'][$fileUse[(string) $fptr->attributes()->FILEID]] = (string) $fptr->attributes()->FILEID;
1211
                    }
1212
                }
1213
                // Build the physical elements' array from the physical structMap node.
1214
                $elements = [];
1215
                foreach ($elementNodes as $elementNode) {
1216
                    $elements[(int) $elementNode['ORDER']] = (string) $elementNode['ID'];
1217
                    $this->physicalStructureInfo[$elements[(int) $elementNode['ORDER']]]['id'] = (string) $elementNode['ID'];
1218
                    $this->physicalStructureInfo[$elements[(int) $elementNode['ORDER']]]['dmdId'] = (isset($elementNode['DMDID']) ? (string) $elementNode['DMDID'] : '');
1219
                    $this->physicalStructureInfo[$elements[(int) $elementNode['ORDER']]]['admId'] = (isset($elementNode['ADMID']) ? (string) $elementNode['ADMID'] : '');
1220
                    $this->physicalStructureInfo[$elements[(int) $elementNode['ORDER']]]['order'] = (isset($elementNode['ORDER']) ? (string) $elementNode['ORDER'] : '');
1221
                    $this->physicalStructureInfo[$elements[(int) $elementNode['ORDER']]]['label'] = (isset($elementNode['LABEL']) ? (string) $elementNode['LABEL'] : '');
1222
                    $this->physicalStructureInfo[$elements[(int) $elementNode['ORDER']]]['orderlabel'] = (isset($elementNode['ORDERLABEL']) ? (string) $elementNode['ORDERLABEL'] : '');
1223
                    $this->physicalStructureInfo[$elements[(int) $elementNode['ORDER']]]['type'] = (string) $elementNode['TYPE'];
1224
                    $this->physicalStructureInfo[$elements[(int) $elementNode['ORDER']]]['contentIds'] = (isset($elementNode['CONTENTIDS']) ? (string) $elementNode['CONTENTIDS'] : '');
1225
                    // Get the file representations from fileSec node.
1226
                    foreach ($elementNode->children('http://www.loc.gov/METS/')->fptr as $fptr) {
1227
                        // Check if file has valid @USE attribute.
1228
                        if (!empty($fileUse[(string) $fptr->attributes()->FILEID])) {
1229
                            $this->physicalStructureInfo[$elements[(int) $elementNode['ORDER']]]['files'][$fileUse[(string) $fptr->attributes()->FILEID]] = (string) $fptr->attributes()->FILEID;
1230
                        }
1231
                    }
1232
                }
1233
                // Sort array by keys (= @ORDER).
1234
                ksort($elements);
1235
                // Set total number of pages/tracks.
1236
                $this->numPages = count($elements);
1237
                // Merge and re-index the array to get numeric indexes.
1238
                array_unshift($elements, $id);
1239
                $this->physicalStructure = $elements;
1240
            }
1241
            $this->physicalStructureLoaded = true;
1242
        }
1243
        return $this->physicalStructure;
1244
    }
1245
1246
    /**
1247
     * @see AbstractDocument::magicGetSmLinks()
1248
     */
1249
    protected function magicGetSmLinks(): array
1250
    {
1251
        if (!$this->smLinksLoaded) {
1252
            $smLinks = $this->mets->xpath('./mets:structLink/mets:smLink');
1253
            if (!empty($smLinks)) {
1254
                foreach ($smLinks as $smLink) {
1255
                    $this->smLinks['l2p'][(string) $smLink->attributes('http://www.w3.org/1999/xlink')->from][] = (string) $smLink->attributes('http://www.w3.org/1999/xlink')->to;
1256
                    $this->smLinks['p2l'][(string) $smLink->attributes('http://www.w3.org/1999/xlink')->to][] = (string) $smLink->attributes('http://www.w3.org/1999/xlink')->from;
1257
                }
1258
            }
1259
            $this->smLinksLoaded = true;
1260
        }
1261
        return $this->smLinks;
1262
    }
1263
1264
    /**
1265
     * @see AbstractDocument::magicGetThumbnail()
1266
     */
1267
    protected function magicGetThumbnail(bool $forceReload = false): string
1268
    {
1269
        if (
1270
            !$this->thumbnailLoaded
1271
            || $forceReload
1272
        ) {
1273
            // Retain current PID.
1274
            $cPid = ($this->cPid ? $this->cPid : $this->pid);
1275
            if (!$cPid) {
1276
                $this->logger->error('Invalid PID ' . $cPid . ' for structure definitions');
1277
                $this->thumbnailLoaded = true;
1278
                return $this->thumbnail;
1279
            }
1280
            // Load extension configuration.
1281
            $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
1282
            if (empty($extConf['fileGrpThumbs'])) {
1283
                $this->logger->warning('No fileGrp for thumbnails specified');
1284
                $this->thumbnailLoaded = true;
1285
                return $this->thumbnail;
1286
            }
1287
            $strctId = $this->magicGetToplevelId();
1288
            $metadata = $this->getToplevelMetadata($cPid);
1289
1290
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
1291
                ->getQueryBuilderForTable('tx_dlf_structures');
1292
1293
            // Get structure element to get thumbnail from.
1294
            $result = $queryBuilder
1295
                ->select('tx_dlf_structures.thumbnail AS thumbnail')
1296
                ->from('tx_dlf_structures')
1297
                ->where(
1298
                    $queryBuilder->expr()->eq('tx_dlf_structures.pid', (int) $cPid),
1299
                    $queryBuilder->expr()->eq('tx_dlf_structures.index_name', $queryBuilder->expr()->literal($metadata['type'][0])),
1300
                    Helper::whereExpression('tx_dlf_structures')
1301
                )
1302
                ->setMaxResults(1)
1303
                ->execute();
1304
1305
            $allResults = $result->fetchAllAssociative();
1306
1307
            if (count($allResults) == 1) {
1308
                $resArray = $allResults[0];
1309
                // Get desired thumbnail structure if not the toplevel structure itself.
1310
                if (!empty($resArray['thumbnail'])) {
1311
                    $strctType = Helper::getIndexNameFromUid($resArray['thumbnail'], 'tx_dlf_structures', $cPid);
1312
                    // Check if this document has a structure element of the desired type.
1313
                    $strctIds = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@TYPE="' . $strctType . '"]/@ID');
1314
                    if (!empty($strctIds)) {
1315
                        $strctId = (string) $strctIds[0];
1316
                    }
1317
                }
1318
                // Load smLinks.
1319
                $this->magicGetSmLinks();
1320
                // Get thumbnail location.
1321
                $fileGrpsThumb = GeneralUtility::trimExplode(',', $extConf['fileGrpThumbs']);
1322
                while ($fileGrpThumb = array_shift($fileGrpsThumb)) {
1323
                    if (
1324
                        $this->magicGetPhysicalStructure()
1325
                        && !empty($this->smLinks['l2p'][$strctId])
1326
                        && !empty($this->physicalStructureInfo[$this->smLinks['l2p'][$strctId][0]]['files'][$fileGrpThumb])
1327
                    ) {
1328
                        $this->thumbnail = $this->getFileLocation($this->physicalStructureInfo[$this->smLinks['l2p'][$strctId][0]]['files'][$fileGrpThumb]);
1329
                        break;
1330
                    } elseif (!empty($this->physicalStructureInfo[$this->physicalStructure[1]]['files'][$fileGrpThumb])) {
1331
                        $this->thumbnail = $this->getFileLocation($this->physicalStructureInfo[$this->physicalStructure[1]]['files'][$fileGrpThumb]);
1332
                        break;
1333
                    }
1334
                }
1335
            } else {
1336
                $this->logger->error('No structure of type "' . $metadata['type'][0] . '" found in database');
1337
            }
1338
            $this->thumbnailLoaded = true;
1339
        }
1340
        return $this->thumbnail;
1341
    }
1342
1343
    /**
1344
     * @see AbstractDocument::magicGetToplevelId()
1345
     */
1346
    protected function magicGetToplevelId(): string
1347
    {
1348
        if (empty($this->toplevelId)) {
1349
            // Get all logical structure nodes with metadata, but without associated METS-Pointers.
1350
            $divs = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@DMDID and not(./mets:mptr)]');
1351
            if (!empty($divs)) {
1352
                // Load smLinks.
1353
                $this->magicGetSmLinks();
1354
                foreach ($divs as $div) {
1355
                    $id = (string) $div['ID'];
1356
                    // Are there physical structure nodes for this logical structure?
1357
                    if (array_key_exists($id, $this->smLinks['l2p'])) {
1358
                        // Yes. That's what we're looking for.
1359
                        $this->toplevelId = $id;
1360
                        break;
1361
                    } elseif (empty($this->toplevelId)) {
1362
                        // No. Remember this anyway, but keep looking for a better one.
1363
                        $this->toplevelId = $id;
1364
                    }
1365
                }
1366
            }
1367
        }
1368
        return $this->toplevelId;
1369
    }
1370
1371
    /**
1372
     * Try to determine URL of parent document.
1373
     *
1374
     * @access public
1375
     *
1376
     * @return string
1377
     */
1378
    public function magicGetParentHref(): string
1379
    {
1380
        if (empty($this->parentHref)) {
1381
            // Get the closest ancestor of the current document which has a MPTR child.
1382
            $parentMptr = $this->mets->xpath('./mets:structMap[@TYPE="LOGICAL"]//mets:div[@ID="' . $this->toplevelId . '"]/ancestor::mets:div[./mets:mptr][1]/mets:mptr');
1383
            if (!empty($parentMptr)) {
1384
                $this->parentHref = (string) $parentMptr[0]->attributes('http://www.w3.org/1999/xlink')->href;
0 ignored issues
show
Bug introduced by
The property parentHref is declared read-only in Kitodo\Dlf\Common\MetsDocument.
Loading history...
1385
            }
1386
        }
1387
1388
        return $this->parentHref;
1389
    }
1390
1391
    /**
1392
     * This magic method is executed prior to any serialization of the object
1393
     * @see __wakeup()
1394
     *
1395
     * @access public
1396
     *
1397
     * @return array Properties to be serialized
1398
     */
1399
    public function __sleep(): array
1400
    {
1401
        // \SimpleXMLElement objects can't be serialized, thus save the XML as string for serialization
1402
        $this->asXML = $this->xml->asXML();
0 ignored issues
show
Documentation Bug introduced by
It seems like $this->xml->asXML() can also be of type true. However, the property $asXML is declared as type string. Maybe add an additional type check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a mixed type is assigned to a property that is type hinted more strictly.

For example, imagine you have a variable $accountId that can either hold an Id object or false (if there is no account id yet). Your code now assigns that value to the id property of an instance of the Account class. This class holds a proper account, so the id value must no longer be false.

Either this assignment is in error or a type check should be added for that assignment.

class Id
{
    public $id;

    public function __construct($id)
    {
        $this->id = $id;
    }

}

class Account
{
    /** @var  Id $id */
    public $id;
}

$account_id = false;

if (starsAreRight()) {
    $account_id = new Id(42);
}

$account = new Account();
if ($account instanceof Id)
{
    $account->id = $account_id;
}
Loading history...
1403
        return ['uid', 'pid', 'recordId', 'parentId', 'asXML'];
1404
    }
1405
1406
    /**
1407
     * This magic method is used for setting a string value for the object
1408
     *
1409
     * @access public
1410
     *
1411
     * @return string String representing the METS object
1412
     */
1413
    public function __toString(): string
1414
    {
1415
        $xml = new \DOMDocument('1.0', 'utf-8');
1416
        $xml->appendChild($xml->importNode(dom_import_simplexml($this->mets), true));
1417
        $xml->formatOutput = true;
1418
        return $xml->saveXML();
1419
    }
1420
1421
    /**
1422
     * This magic method is executed after the object is deserialized
1423
     * @see __sleep()
1424
     *
1425
     * @access public
1426
     *
1427
     * @return void
1428
     */
1429
    public function __wakeup(): void
1430
    {
1431
        $xml = Helper::getXmlFileAsString($this->asXML);
1432
        if ($xml !== false) {
1433
            $this->asXML = '';
1434
            $this->xml = $xml;
1435
            // Rebuild the unserializable properties.
1436
            $this->init('', $this->settings);
1437
        } else {
1438
            $this->logger = GeneralUtility::makeInstance(LogManager::class)->getLogger(static::class);
1439
            $this->logger->error('Could not load XML after deserialization');
1440
        }
1441
    }
1442
}
1443