Scrutinizer GitHub App not installed

We could not synchronize checks via GitHub's checks API since Scrutinizer's GitHub App is not installed for this repository.

Install GitHub App

GitHub Access Token became invalid

It seems like the GitHub access token used for retrieving details about this repository from GitHub became invalid. This might prevent certain types of inspections from being run (in particular, everything related to pull requests).
Please ask an admin of your repository to re-new the access token on this website.
Passed
Push — master ( ccf559...b4fe18 )
by
unknown
04:08
created

AbstractDocument::getTextFormat()   A

Complexity

Conditions 2
Paths 2

Size

Total Lines 9
Code Lines 5

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 2
eloc 5
nc 2
nop 1
dl 0
loc 9
rs 10
c 0
b 0
f 0
1
<?php
2
3
/**
4
 * (c) Kitodo. Key to digital objects e.V. <[email protected]>
5
 *
6
 * This file is part of the Kitodo and TYPO3 projects.
7
 *
8
 * @license GNU General Public License version 3 or later.
9
 * For the full copyright and license information, please read the
10
 * LICENSE.txt file that was distributed with this source code.
11
 */
12
13
namespace Kitodo\Dlf\Common;
14
15
use TYPO3\CMS\Core\Cache\CacheManager;
16
use TYPO3\CMS\Core\Configuration\ExtensionConfiguration;
17
use TYPO3\CMS\Core\Database\ConnectionPool;
18
use TYPO3\CMS\Core\Log\Logger;
19
use TYPO3\CMS\Core\Utility\GeneralUtility;
20
use Ubl\Iiif\Presentation\Common\Model\Resources\IiifResourceInterface;
21
use Ubl\Iiif\Tools\IiifHelper;
22
23
/**
24
 * Document class for the 'dlf' extension
25
 *
26
 * @author Sebastian Meyer <[email protected]>
27
 * @author Henrik Lochmann <[email protected]>
28
 * @package TYPO3
29
 * @subpackage dlf
30
 * @access public
31
 * @property int $cPid This holds the PID for the configuration
32
 * @property-read bool $hasFulltext Are there any fulltext files available?
33
 * @property-read array $metadataArray This holds the documents' parsed metadata array
34
 * @property-read int $numPages The holds the total number of pages
35
 * @property-read int $parentId This holds the UID of the parent document or zero if not multi-volumed
36
 * @property-read array $physicalStructure This holds the physical structure
37
 * @property-read array $physicalStructureInfo This holds the physical structure metadata
38
 * @property-read int $pid This holds the PID of the document or zero if not in database
39
 * @property-read bool $ready Is the document instantiated successfully?
40
 * @property-read string $recordId The METS file's / IIIF manifest's record identifier
41
 * @property-read int $rootId This holds the UID of the root document or zero if not multi-volumed
42
 * @property-read array $smLinks This holds the smLinks between logical and physical structMap
43
 * @property-read array $tableOfContents This holds the logical structure
44
 * @property-read string $thumbnail This holds the document's thumbnail location
45
 * @property-read string $toplevelId This holds the toplevel structure's "@ID" (METS) or the manifest's "@id" (IIIF)
46
 * @abstract
47
 */
48
abstract class AbstractDocument
49
{
50
    /**
51
     * This holds the logger
52
     *
53
     * @var Logger
54
     * @access protected
55
     */
56
    protected $logger;
57
58
    /**
59
     * This holds the PID for the configuration
60
     *
61
     * @var int
62
     * @access protected
63
     */
64
    protected $cPid = 0;
65
66
    /**
67
     * The extension key
68
     *
69
     * @var string
70
     * @access public
71
     */
72
    public static $extKey = 'dlf';
73
74
    /**
75
     * This holds the configuration for all supported metadata encodings
76
     * @see loadFormats()
77
     *
78
     * @var array
79
     * @access protected
80
     */
81
    protected $formats = [
82
        'OAI' => [
83
            'rootElement' => 'OAI-PMH',
84
            'namespaceURI' => 'http://www.openarchives.org/OAI/2.0/',
85
        ],
86
        'METS' => [
87
            'rootElement' => 'mets',
88
            'namespaceURI' => 'http://www.loc.gov/METS/',
89
        ],
90
        'XLINK' => [
91
            'rootElement' => 'xlink',
92
            'namespaceURI' => 'http://www.w3.org/1999/xlink',
93
        ]
94
    ];
95
96
    /**
97
     * Are the available metadata formats loaded?
98
     * @see $formats
99
     *
100
     * @var bool
101
     * @access protected
102
     */
103
    protected $formatsLoaded = false;
104
105
    /**
106
     * Are there any fulltext files available? This also includes IIIF text annotations
107
     * with motivation 'painting' if Kitodo.Presentation is configured to store text
108
     * annotations as fulltext.
109
     *
110
     * @var bool
111
     * @access protected
112
     */
113
    protected $hasFulltext = false;
114
115
    /**
116
     * Last searched logical and physical page
117
     *
118
     * @var array
119
     * @access protected
120
     */
121
    protected $lastSearchedPhysicalPage = ['logicalPage' => null, 'physicalPage' => null];
122
123
    /**
124
     * This holds the logical units
125
     *
126
     * @var array
127
     * @access protected
128
     */
129
    protected $logicalUnits = [];
130
131
    /**
132
     * This holds the documents' parsed metadata array with their corresponding
133
     * structMap//div's ID (METS) or Range / Manifest / Sequence ID (IIIF) as array key
134
     *
135
     * @var array
136
     * @access protected
137
     */
138
    protected $metadataArray = [];
139
140
    /**
141
     * Is the metadata array loaded?
142
     * @see $metadataArray
143
     *
144
     * @var bool
145
     * @access protected
146
     */
147
    protected $metadataArrayLoaded = false;
148
149
    /**
150
     * The holds the total number of pages
151
     *
152
     * @var int
153
     * @access protected
154
     */
155
    protected $numPages = 0;
156
157
    /**
158
     * This holds the UID of the parent document or zero if not multi-volumed
159
     *
160
     * @var int
161
     * @access protected
162
     */
163
    protected $parentId = 0;
164
165
    /**
166
     * This holds the physical structure
167
     *
168
     * @var array
169
     * @access protected
170
     */
171
    protected $physicalStructure = [];
172
173
    /**
174
     * This holds the physical structure metadata
175
     *
176
     * @var array
177
     * @access protected
178
     */
179
    protected $physicalStructureInfo = [];
180
181
    /**
182
     * Is the physical structure loaded?
183
     * @see $physicalStructure
184
     *
185
     * @var bool
186
     * @access protected
187
     */
188
    protected $physicalStructureLoaded = false;
189
190
    /**
191
     * This holds the PID of the document or zero if not in database
192
     *
193
     * @var int
194
     * @access protected
195
     */
196
    protected $pid = 0;
197
198
    /**
199
     * This holds the documents' raw text pages with their corresponding
200
     * structMap//div's ID (METS) or Range / Manifest / Sequence ID (IIIF) as array key
201
     *
202
     * @var array
203
     * @access protected
204
     */
205
    protected $rawTextArray = [];
206
207
    /**
208
     * Is the document instantiated successfully?
209
     *
210
     * @var bool
211
     * @access protected
212
     */
213
    protected $ready = false;
214
215
    /**
216
     * The METS file's / IIIF manifest's record identifier
217
     *
218
     * @var string
219
     * @access protected
220
     */
221
    protected $recordId;
222
223
    /**
224
     * This holds the singleton object of the document
225
     *
226
     * @var array (AbstractDocument)
227
     * @static
228
     * @access protected
229
     */
230
    protected static $registry = [];
231
232
    /**
233
     * This holds the UID of the root document or zero if not multi-volumed
234
     *
235
     * @var int
236
     * @access protected
237
     */
238
    protected $rootId = 0;
239
240
    /**
241
     * Is the root id loaded?
242
     * @see $rootId
243
     *
244
     * @var bool
245
     * @access protected
246
     */
247
    protected $rootIdLoaded = false;
248
249
    /**
250
     * This holds the smLinks between logical and physical structMap
251
     *
252
     * @var array
253
     * @access protected
254
     */
255
    protected $smLinks = ['l2p' => [], 'p2l' => []];
256
257
    /**
258
     * Are the smLinks loaded?
259
     * @see $smLinks
260
     *
261
     * @var bool
262
     * @access protected
263
     */
264
    protected $smLinksLoaded = false;
265
266
    /**
267
     * This holds the logical structure
268
     *
269
     * @var array
270
     * @access protected
271
     */
272
    protected $tableOfContents = [];
273
274
    /**
275
     * Is the table of contents loaded?
276
     * @see $tableOfContents
277
     *
278
     * @var bool
279
     * @access protected
280
     */
281
    protected $tableOfContentsLoaded = false;
282
283
    /**
284
     * This holds the document's thumbnail location
285
     *
286
     * @var string
287
     * @access protected
288
     */
289
    protected $thumbnail = '';
290
291
    /**
292
     * Is the document's thumbnail location loaded?
293
     * @see $thumbnail
294
     *
295
     * @var bool
296
     * @access protected
297
     */
298
    protected $thumbnailLoaded = false;
299
300
    /**
301
     * This holds the toplevel structure's "@ID" (METS) or the manifest's "@id" (IIIF)
302
     *
303
     * @var string
304
     * @access protected
305
     */
306
    protected $toplevelId = '';
307
308
    /**
309
     * This holds the whole XML file as \SimpleXMLElement object
310
     *
311
     * @var \SimpleXMLElement
312
     * @access protected
313
     */
314
    protected $xml;
315
316
    /**
317
     * This clears the static registry to prevent memory exhaustion
318
     *
319
     * @access public
320
     *
321
     * @static
322
     *
323
     * @return void
324
     */
325
    public static function clearRegistry()
326
    {
327
        // Reset registry array.
328
        self::$registry = [];
329
    }
330
331
    /**
332
     * This ensures that the recordId, if existent, is retrieved from the document
333
     *
334
     * @access protected
335
     *
336
     * @abstract
337
     *
338
     * @param int $pid: ID of the configuration page with the recordId config
339
     *
340
     */
341
    protected abstract function establishRecordId($pid);
342
343
    /**
344
     * Source document PHP object which is represented by a Document instance
345
     *
346
     * @access protected
347
     *
348
     * @abstract
349
     *
350
     * @return \SimpleXMLElement|IiifResourceInterface An PHP object representation of
351
     * the current document. SimpleXMLElement for METS, IiifResourceInterface for IIIF
352
     */
353
    protected abstract function getDocument();
354
355
    /**
356
     * This gets the location of a downloadable file for a physical page or track
357
     *
358
     * @access public
359
     *
360
     * @abstract
361
     *
362
     * @param string $id: The "@ID" attribute of the file node (METS) or the "@id" property of the IIIF resource
363
     *
364
     * @return string    The file's location as URL
365
     */
366
    public abstract function getDownloadLocation($id);
367
368
    /**
369
     * This gets the location of a file representing a physical page or track
370
     *
371
     * @access public
372
     *
373
     * @abstract
374
     *
375
     * @param string $id: The "@ID" attribute of the file node (METS) or the "@id" property of the IIIF resource
376
     *
377
     * @return string The file's location as URL
378
     */
379
    public abstract function getFileLocation($id);
380
381
    /**
382
     * This gets the MIME type of a file representing a physical page or track
383
     *
384
     * @access public
385
     *
386
     * @abstract
387
     *
388
     * @param string $id: The "@ID" attribute of the file node
389
     *
390
     * @return string The file's MIME type
391
     */
392
    public abstract function getFileMimeType($id);
393
394
    /**
395
     * This is a singleton class, thus an instance must be created by this method
396
     *
397
     * @access public
398
     *
399
     * @static
400
     *
401
     * @param string $location: The URL of XML file or the IRI of the IIIF resource
402
     * @param array $settings
403
     * @param bool $forceReload: Force reloading the document instead of returning the cached instance
404
     *
405
     * @return AbstractDocument|null Instance of this class, either MetsDocument or IiifManifest
406
     */
407
    public static function &getInstance($location, $settings = [], $forceReload = false)
408
    {
409
        // Create new instance depending on format (METS or IIIF) ...
410
        $documentFormat = null;
411
        $xml = null;
412
        $iiif = null;
413
414
        if ($instance = self::getDocCache($location) && !$forceReload) {
0 ignored issues
show
Comprehensibility introduced by
Consider adding parentheses for clarity. Current Interpretation: $instance = (self::getDo...ion) && ! $forceReload), Probably Intended Meaning: ($instance = self::getDo...ion)) && ! $forceReload
Loading history...
415
            return $instance;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $instance returns the type true which is incompatible with the documented return type Kitodo\Dlf\Common\AbstractDocument|null.
Loading history...
416
        } else {
417
            $instance = null;
418
        }
419
420
        // Try to load a file from the url
421
        if (GeneralUtility::isValidUrl($location)) {
422
            // Load extension configuration
423
            $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
424
425
            $content = Helper::getUrl($location);
426
            if ($content !== false) {
427
                $xml = Helper::getXmlFileAsString($content);
428
                if ($xml !== false) {
429
                    /* @var $xml \SimpleXMLElement */
430
                    $xml->registerXPathNamespace('mets', 'http://www.loc.gov/METS/');
431
                    $xpathResult = $xml->xpath('//mets:mets');
432
                    $documentFormat = !empty($xpathResult) ? 'METS' : null;
433
                } else {
434
                    // Try to load file as IIIF resource instead.
435
                    $contentAsJsonArray = json_decode($content, true);
436
                    if ($contentAsJsonArray !== null) {
437
                        IiifHelper::setUrlReader(IiifUrlReader::getInstance());
438
                        IiifHelper::setMaxThumbnailHeight($extConf['iiifThumbnailHeight']);
439
                        IiifHelper::setMaxThumbnailWidth($extConf['iiifThumbnailWidth']);
440
                        $iiif = IiifHelper::loadIiifResource($contentAsJsonArray);
441
                        if ($iiif instanceof IiifResourceInterface) {
442
                            $documentFormat = 'IIIF';
443
                        }
444
                    }
445
                }
446
            }
447
        }
448
449
        // Sanitize input.
450
        $pid = max(intval($settings['storagePid']), 0);
451
        if ($documentFormat == 'METS') {
452
            $instance = new MetsDocument($location, $pid, $xml);
453
        } elseif ($documentFormat == 'IIIF') {
454
            $instance = new IiifManifest($location, $pid, $iiif);
455
        }
456
457
        if ($instance) {
458
            self::setDocCache($location, $instance);
459
        }
460
461
        return $instance;
462
    }
463
464
    /**
465
     * This gets details about a logical structure element
466
     *
467
     * @access public
468
     *
469
     * @abstract
470
     *
471
     * @param string $id: The "@ID" attribute of the logical structure node (METS) or
472
     * the "@id" property of the Manifest / Range (IIIF)
473
     * @param bool $recursive: Whether to include the child elements / resources
474
     *
475
     * @return array Array of the element's id, label, type and physical page indexes/mptr link
476
     */
477
    public abstract function getLogicalStructure($id, $recursive = false);
478
479
    /**
480
     * This extracts all the metadata for a logical structure node
481
     *
482
     * @access public
483
     *
484
     * @abstract
485
     *
486
     * @param string $id: The "@ID" attribute of the logical structure node (METS) or the "@id" property
487
     * of the Manifest / Range (IIIF)
488
     * @param int $cPid: The PID for the metadata definitions
489
     *                       (defaults to $this->cPid or $this->pid)
490
     *
491
     * @return array The logical structure node's / the IIIF resource's parsed metadata array
492
     */
493
    public abstract function getMetadata($id, $cPid = 0);
494
495
    /**
496
     * This returns the first corresponding physical page number of a given logical page label
497
     *
498
     * @access public
499
     *
500
     * @param string $logicalPage: The label (or a part of the label) of the logical page
501
     *
502
     * @return int The physical page number
503
     */
504
    public function getPhysicalPage($logicalPage)
505
    {
506
        if (
507
            !empty($this->lastSearchedPhysicalPage['logicalPage'])
508
            && $this->lastSearchedPhysicalPage['logicalPage'] == $logicalPage
509
        ) {
510
            return $this->lastSearchedPhysicalPage['physicalPage'];
511
        } else {
512
            $physicalPage = 0;
513
            foreach ($this->physicalStructureInfo as $page) {
514
                if (strpos($page['orderlabel'], $logicalPage) !== false) {
515
                    $this->lastSearchedPhysicalPage['logicalPage'] = $logicalPage;
516
                    $this->lastSearchedPhysicalPage['physicalPage'] = $physicalPage;
517
                    return $physicalPage;
518
                }
519
                $physicalPage++;
520
            }
521
        }
522
        return 1;
523
    }
524
525
    /**
526
     * This extracts the OCR full text for a physical structure node / IIIF Manifest / Canvas. Text might be
527
     * given as ALTO for METS or as annotations or ALTO for IIIF resources.
528
     *
529
     * @access public
530
     *
531
     * @abstract
532
     *
533
     * @param string $id: The "@ID" attribute of the physical structure node (METS) or the "@id" property
534
     * of the Manifest / Range (IIIF)
535
     *
536
     * @return string The OCR full text
537
     */
538
    public abstract function getFullText($id);
539
540
    /**
541
     * This extracts the OCR full text for a physical structure node / IIIF Manifest / Canvas from an
542
     * XML full text representation (currently only ALTO). For IIIF manifests, ALTO documents have
543
     * to be given in the Canvas' / Manifest's "seeAlso" property.
544
     *
545
     * @param string $id: The "@ID" attribute of the physical structure node (METS) or the "@id" property
546
     * of the Manifest / Range (IIIF)
547
     *
548
     * @return string The OCR full text
549
     */
550
    protected function getFullTextFromXml($id)
551
    {
552
        $fullText = '';
553
        // Load available text formats, ...
554
        $this->loadFormats();
555
        // ... physical structure ...
556
        $this->_getPhysicalStructure();
557
        // ... and extension configuration.
558
        $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
559
        $fileGrpsFulltext = GeneralUtility::trimExplode(',', $extConf['fileGrpFulltext']);
560
        $textFormat = "";
561
        if (!empty($this->physicalStructureInfo[$id])) {
562
            while ($fileGrpFulltext = array_shift($fileGrpsFulltext)) {
563
                if (!empty($this->physicalStructureInfo[$id]['files'][$fileGrpFulltext])) {
564
                    // Get full text file.
565
                    $fileContent = GeneralUtility::getUrl($this->getFileLocation($this->physicalStructureInfo[$id]['files'][$fileGrpFulltext]));
566
                    if ($fileContent !== false) {
567
                        $textFormat = $this->getTextFormat($fileContent);
568
                    } else {
569
                        $this->logger->warning('Couldn\'t load full text file for structure node @ID "' . $id . '"');
570
                        return $fullText;
571
                    }
572
                    break;
573
                }
574
            }
575
        } else {
576
            $this->logger->warning('Invalid structure node @ID "' . $id . '"');
577
            return $fullText;
578
        }
579
        // Is this text format supported?
580
        // This part actually differs from previous version of indexed OCR
581
        if (!empty($fileContent) && !empty($this->formats[$textFormat])) {
582
            $textMiniOcr = '';
583
            if (!empty($this->formats[$textFormat]['class'])) {
584
                $class = $this->formats[$textFormat]['class'];
585
                // Get the raw text from class.
586
                if (
587
                    class_exists($class)
588
                    && ($obj = GeneralUtility::makeInstance($class)) instanceof FulltextInterface
589
                ) {
590
                    // Load XML from file.
591
                    $ocrTextXml = Helper::getXmlFileAsString($fileContent);
592
                    $textMiniOcr = $obj->getTextAsMiniOcr($ocrTextXml);
0 ignored issues
show
Bug introduced by
It seems like $ocrTextXml can also be of type false; however, parameter $xml of Kitodo\Dlf\Common\Fullte...ace::getTextAsMiniOcr() does only seem to accept SimpleXMLElement, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

592
                    $textMiniOcr = $obj->getTextAsMiniOcr(/** @scrutinizer ignore-type */ $ocrTextXml);
Loading history...
593
                    $this->rawTextArray[$id] = $textMiniOcr;
594
                } else {
595
                    $this->logger->warning('Invalid class/method "' . $class . '->getRawText()" for text format "' . $textFormat . '"');
596
                }
597
            }
598
            $fullText = $textMiniOcr;
599
        } else {
600
            $this->logger->warning('Unsupported text format "' . $textFormat . '" in physical node with @ID "' . $id . '"');
601
        }
602
        return $fullText;
603
    }
604
605
    /**
606
     * Get format of the OCR full text
607
     *
608
     * @access private
609
     *
610
     * @param string $fileContent: content of the XML file
611
     *
612
     * @return string The format of the OCR full text
613
     */
614
    private function getTextFormat($fileContent)
615
    {
616
        $xml = Helper::getXmlFileAsString($fileContent);
617
618
        if ($xml !== false) {
619
            // Get the root element's name as text format.
620
            return strtoupper($xml->getName());
621
        } else {
622
            return '';
623
        }
624
    }
625
626
    /**
627
     * This determines a title for the given document
628
     *
629
     * @access public
630
     *
631
     * @static
632
     *
633
     * @param int $uid: The UID of the document
634
     * @param bool $recursive: Search superior documents for a title, too?
635
     *
636
     * @return string The title of the document itself or a parent document
637
     */
638
    public static function getTitle($uid, $recursive = false)
639
    {
640
        $title = '';
641
        // Sanitize input.
642
        $uid = max(intval($uid), 0);
643
        if ($uid) {
644
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
645
                ->getQueryBuilderForTable('tx_dlf_documents');
646
647
            $result = $queryBuilder
648
                ->select(
649
                    'tx_dlf_documents.title',
650
                    'tx_dlf_documents.partof'
651
                )
652
                ->from('tx_dlf_documents')
653
                ->where(
654
                    $queryBuilder->expr()->eq('tx_dlf_documents.uid', $uid),
655
                    Helper::whereExpression('tx_dlf_documents')
656
                )
657
                ->setMaxResults(1)
658
                ->execute();
659
660
            if ($resArray = $result->fetchAssociative()) {
661
                // Get title information.
662
                $title = $resArray['title'];
663
                $partof = $resArray['partof'];
664
                // Search parent documents recursively for a title?
665
                if (
666
                    $recursive
667
                    && empty($title)
668
                    && intval($partof)
669
                    && $partof != $uid
670
                ) {
671
                    $title = self::getTitle($partof, true);
672
                }
673
            } else {
674
                Helper::log('No document with UID ' . $uid . ' found or document not accessible', LOG_SEVERITY_WARNING);
675
            }
676
        } else {
677
            Helper::log('Invalid UID ' . $uid . ' for document', LOG_SEVERITY_ERROR);
678
        }
679
        return $title;
680
    }
681
682
    /**
683
     * This extracts all the metadata for the toplevel logical structure node / resource
684
     *
685
     * @access public
686
     *
687
     * @param int $cPid: The PID for the metadata definitions
688
     *
689
     * @return array The logical structure node's / resource's parsed metadata array
690
     */
691
    public function getTitledata($cPid = 0)
692
    {
693
        $titledata = $this->getMetadata($this->_getToplevelId(), $cPid);
694
        // Add information from METS structural map to titledata array.
695
        if ($this instanceof MetsDocument) {
696
            $this->addMetadataFromMets($titledata, $this->_getToplevelId());
697
        }
698
        // Set record identifier for METS file / IIIF manifest if not present.
699
        if (
700
            is_array($titledata)
701
            && array_key_exists('record_id', $titledata)
702
        ) {
703
            if (
704
                !empty($this->recordId)
705
                && !in_array($this->recordId, $titledata['record_id'])
706
            ) {
707
                array_unshift($titledata['record_id'], $this->recordId);
708
            }
709
        }
710
        return $titledata;
711
    }
712
713
    /**
714
     * Traverse a logical (sub-) structure tree to find the structure with the requested logical id and return its depth.
715
     *
716
     * @access protected
717
     *
718
     * @param array $structure: logical structure array
719
     * @param int $depth: current tree depth
720
     * @param string $logId: ID of the logical structure whose depth is requested
721
     *
722
     * @return int|bool: false if structure with $logId is not a child of this substructure,
723
     * or the actual depth.
724
     */
725
    protected function getTreeDepth($structure, $depth, $logId)
726
    {
727
        foreach ($structure as $element) {
728
            if ($element['id'] == $logId) {
729
                return $depth;
730
            } elseif (array_key_exists('children', $element)) {
731
                $foundInChildren = $this->getTreeDepth($element['children'], $depth + 1, $logId);
732
                if ($foundInChildren !== false) {
733
                    return $foundInChildren;
734
                }
735
            }
736
        }
737
        return false;
738
    }
739
740
    /**
741
     * Get the tree depth of a logical structure element within the table of content
742
     *
743
     * @access public
744
     *
745
     * @param string $logId: The id of the logical structure element whose depth is requested
746
     * @return int|bool tree depth as integer or false if no element with $logId exists within the TOC.
747
     */
748
    public function getStructureDepth($logId)
749
    {
750
        return $this->getTreeDepth($this->_getTableOfContents(), 1, $logId);
751
    }
752
753
    /**
754
     * This sets some basic class properties
755
     *
756
     * @access protected
757
     *
758
     * @abstract
759
     *
760
     * @param string $location:The location URL of the XML file to parse
761
     *
762
     * @return void
763
     */
764
    protected abstract function init($location);
765
766
    /**
767
     * Reuse any document object that might have been already loaded to determine whether document is METS or IIIF
768
     *
769
     * @access protected
770
     *
771
     * @abstract
772
     *
773
     * @param \SimpleXMLElement|IiifResourceInterface $preloadedDocument: any instance that has already been loaded
774
     *
775
     * @return bool true if $preloadedDocument can actually be reused, false if it has to be loaded again
776
     */
777
    protected abstract function setPreloadedDocument($preloadedDocument);
778
779
    /**
780
     * METS/IIIF specific part of loading a location
781
     *
782
     * @access protected
783
     *
784
     * @abstract
785
     *
786
     * @param string $location: The URL of the file to load
787
     *
788
     * @return bool true on success or false on failure
789
     */
790
    protected abstract function loadLocation($location);
791
792
    /**
793
     * Load XML file / IIIF resource from URL
794
     *
795
     * @access protected
796
     *
797
     * @param string $location: The URL of the file to load
798
     *
799
     * @return bool true on success or false on failure
800
     */
801
    protected function load($location)
802
    {
803
        // Load XML / JSON-LD file.
804
        if (GeneralUtility::isValidUrl($location)) {
805
            // the actual loading is format specific
806
            return $this->loadLocation($location);
807
        } else {
808
            $this->logger->error('Invalid file location "' . $location . '" for document loading');
809
        }
810
        return false;
811
    }
812
813
    /**
814
     * Analyze the document if it contains any fulltext that needs to be indexed.
815
     *
816
     * @access protected
817
     *
818
     * @abstract
819
     */
820
    protected abstract function ensureHasFulltextIsSet();
821
822
    /**
823
     * Register all available data formats
824
     *
825
     * @access protected
826
     *
827
     * @return void
828
     */
829
    protected function loadFormats()
830
    {
831
        if (!$this->formatsLoaded) {
832
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
833
                ->getQueryBuilderForTable('tx_dlf_formats');
834
835
            // Get available data formats from database.
836
            $result = $queryBuilder
837
                ->select(
838
                    'tx_dlf_formats.type AS type',
839
                    'tx_dlf_formats.root AS root',
840
                    'tx_dlf_formats.namespace AS namespace',
841
                    'tx_dlf_formats.class AS class'
842
                )
843
                ->from('tx_dlf_formats')
844
                ->where(
845
                    $queryBuilder->expr()->eq('tx_dlf_formats.pid', 0)
846
                )
847
                ->execute();
848
849
            while ($resArray = $result->fetchAssociative()) {
850
                // Update format registry.
851
                $this->formats[$resArray['type']] = [
852
                    'rootElement' => $resArray['root'],
853
                    'namespaceURI' => $resArray['namespace'],
854
                    'class' => $resArray['class']
855
                ];
856
            }
857
            $this->formatsLoaded = true;
858
        }
859
    }
860
861
    /**
862
     * Register all available namespaces for a \SimpleXMLElement object
863
     *
864
     * @access public
865
     *
866
     * @param \SimpleXMLElement|\DOMXPath &$obj: \SimpleXMLElement or \DOMXPath object
867
     *
868
     * @return void
869
     */
870
    public function registerNamespaces(&$obj)
871
    {
872
        // TODO Check usage. XML specific method does not seem to be used anywhere outside this class within the project, but it is public and may be used by extensions.
873
        $this->loadFormats();
874
        // Do we have a \SimpleXMLElement or \DOMXPath object?
875
        if ($obj instanceof \SimpleXMLElement) {
876
            $method = 'registerXPathNamespace';
877
        } elseif ($obj instanceof \DOMXPath) {
878
            $method = 'registerNamespace';
879
        } else {
880
            $this->logger->error('Given object is neither a SimpleXMLElement nor a DOMXPath instance');
881
            return;
882
        }
883
        // Register metadata format's namespaces.
884
        foreach ($this->formats as $enc => $conf) {
885
            $obj->$method(strtolower($enc), $conf['namespaceURI']);
886
        }
887
    }
888
889
    /**
890
     * Initialize metadata array with empty values.
891
     *
892
     * @access protected
893
     *
894
     * @param string $format of the document eg. METS
895
     *
896
     * @return array
897
     */
898
    protected function initializeMetadata($format) {
899
        return [
900
            'title' => [],
901
            'title_sorting' => [],
902
            'description' => [],
903
            'author' => [],
904
            'holder' => [],
905
            'place' => [],
906
            'year' => [],
907
            'prod_id' => [],
908
            'record_id' => [],
909
            'opac_id' => [],
910
            'union_id' => [],
911
            'urn' => [],
912
            'purl' => [],
913
            'type' => [],
914
            'volume' => [],
915
            'volume_sorting' => [],
916
            'date' => [],
917
            'license' => [],
918
            'terms' => [],
919
            'restrictions' => [],
920
            'out_of_print' => [],
921
            'rights_info' => [],
922
            'collection' => [],
923
            'owner' => [],
924
            'mets_label' => [],
925
            'mets_orderlabel' => [],
926
            'document_format' => [$format]
927
        ];
928
    }
929
930
    /**
931
     * This returns $this->cPid via __get()
932
     *
933
     * @access protected
934
     *
935
     * @return int The PID of the metadata definitions
936
     */
937
    protected function _getCPid()
938
    {
939
        return $this->cPid;
940
    }
941
942
    /**
943
     * This returns $this->hasFulltext via __get()
944
     *
945
     * @access protected
946
     *
947
     * @return bool Are there any fulltext files available?
948
     */
949
    protected function _getHasFulltext()
950
    {
951
        $this->ensureHasFulltextIsSet();
952
        return $this->hasFulltext;
953
    }
954
955
    /**
956
     * Format specific part of building the document's metadata array
957
     *
958
     * @access protected
959
     *
960
     * @abstract
961
     *
962
     * @param int $cPid
963
     */
964
    protected abstract function prepareMetadataArray($cPid);
965
966
    /**
967
     * This builds an array of the document's metadata
968
     *
969
     * @access protected
970
     *
971
     * @return array Array of metadata with their corresponding logical structure node ID as key
972
     */
973
    protected function _getMetadataArray()
974
    {
975
        // Set metadata definitions' PID.
976
        $cPid = ($this->cPid ? $this->cPid : $this->pid);
977
        if (!$cPid) {
978
            $this->logger->error('Invalid PID ' . $cPid . ' for metadata definitions');
979
            return [];
980
        }
981
        if (
982
            !$this->metadataArrayLoaded
983
            || $this->metadataArray[0] != $cPid
984
        ) {
985
            $this->prepareMetadataArray($cPid);
986
            $this->metadataArray[0] = $cPid;
0 ignored issues
show
Bug introduced by
The property metadataArray is declared read-only in Kitodo\Dlf\Common\AbstractDocument.
Loading history...
987
            $this->metadataArrayLoaded = true;
988
        }
989
        return $this->metadataArray;
990
    }
991
992
    /**
993
     * This returns $this->numPages via __get()
994
     *
995
     * @access protected
996
     *
997
     * @return int The total number of pages and/or tracks
998
     */
999
    protected function _getNumPages()
1000
    {
1001
        $this->_getPhysicalStructure();
1002
        return $this->numPages;
1003
    }
1004
1005
    /**
1006
     * This returns $this->parentId via __get()
1007
     *
1008
     * @access protected
1009
     *
1010
     * @return int The UID of the parent document or zero if not applicable
1011
     */
1012
    protected function _getParentId()
1013
    {
1014
        return $this->parentId;
1015
    }
1016
1017
    /**
1018
     * This builds an array of the document's physical structure
1019
     *
1020
     * @access protected
1021
     *
1022
     * @abstract
1023
     *
1024
     * @return array Array of physical elements' id, type, label and file representations ordered
1025
     * by "@ORDER" attribute / IIIF Sequence's Canvases
1026
     */
1027
    protected abstract function _getPhysicalStructure();
1028
1029
    /**
1030
     * This gives an array of the document's physical structure metadata
1031
     *
1032
     * @access protected
1033
     *
1034
     * @return array Array of elements' type, label and file representations ordered by "@ID" attribute / Canvas order
1035
     */
1036
    protected function _getPhysicalStructureInfo()
1037
    {
1038
        // Is there no physical structure array yet?
1039
        if (!$this->physicalStructureLoaded) {
1040
            // Build physical structure array.
1041
            $this->_getPhysicalStructure();
1042
        }
1043
        return $this->physicalStructureInfo;
1044
    }
1045
1046
    /**
1047
     * This returns $this->pid via __get()
1048
     *
1049
     * @access protected
1050
     *
1051
     * @return int The PID of the document or zero if not in database
1052
     */
1053
    protected function _getPid()
1054
    {
1055
        return $this->pid;
1056
    }
1057
1058
    /**
1059
     * This returns $this->ready via __get()
1060
     *
1061
     * @access protected
1062
     *
1063
     * @return bool Is the document instantiated successfully?
1064
     */
1065
    protected function _getReady()
1066
    {
1067
        return $this->ready;
1068
    }
1069
1070
    /**
1071
     * This returns $this->recordId via __get()
1072
     *
1073
     * @access protected
1074
     *
1075
     * @return mixed The METS file's / IIIF manifest's record identifier
1076
     */
1077
    protected function _getRecordId()
1078
    {
1079
        return $this->recordId;
1080
    }
1081
1082
    /**
1083
     * This returns $this->rootId via __get()
1084
     *
1085
     * @access protected
1086
     *
1087
     * @return int The UID of the root document or zero if not applicable
1088
     */
1089
    protected function _getRootId()
1090
    {
1091
        if (!$this->rootIdLoaded) {
1092
            if ($this->parentId) {
1093
                $parent = self::getInstance($this->parentId, ['storagePid' => $this->pid]);
1094
                $this->rootId = $parent->rootId;
0 ignored issues
show
Bug introduced by
The property rootId is declared read-only in Kitodo\Dlf\Common\AbstractDocument.
Loading history...
1095
            }
1096
            $this->rootIdLoaded = true;
1097
        }
1098
        return $this->rootId;
1099
    }
1100
1101
    /**
1102
     * This returns the smLinks between logical and physical structMap (METS) and models the
1103
     * relation between IIIF Canvases and Manifests / Ranges in the same way
1104
     *
1105
     * @access protected
1106
     *
1107
     * @abstract
1108
     *
1109
     * @return array The links between logical and physical nodes / Range, Manifest and Canvas
1110
     */
1111
    protected abstract function _getSmLinks();
1112
1113
    /**
1114
     * This builds an array of the document's logical structure
1115
     *
1116
     * @access protected
1117
     *
1118
     * @return array Array of structure nodes' id, label, type and physical page indexes/mptr / Canvas link with original hierarchy preserved
1119
     */
1120
    protected function _getTableOfContents()
1121
    {
1122
        // Is there no logical structure array yet?
1123
        if (!$this->tableOfContentsLoaded) {
1124
            // Get all logical structures.
1125
            $this->getLogicalStructure('', true);
1126
            $this->tableOfContentsLoaded = true;
1127
        }
1128
        return $this->tableOfContents;
1129
    }
1130
1131
    /**
1132
     * This returns the document's thumbnail location
1133
     *
1134
     * @access protected
1135
     *
1136
     * @abstract
1137
     *
1138
     * @param bool $forceReload: Force reloading the thumbnail instead of returning the cached value
1139
     *
1140
     * @return string The document's thumbnail location
1141
     */
1142
    protected abstract function _getThumbnail($forceReload = false);
1143
1144
    /**
1145
     * This returns the ID of the toplevel logical structure node
1146
     *
1147
     * @access protected
1148
     *
1149
     * @abstract
1150
     *
1151
     * @return string The logical structure node's ID
1152
     */
1153
    protected abstract function _getToplevelId();
1154
1155
    /**
1156
     * This sets $this->cPid via __set()
1157
     *
1158
     * @access protected
1159
     *
1160
     * @param int $value: The new PID for the metadata definitions
1161
     *
1162
     * @return void
1163
     */
1164
    protected function _setCPid($value)
1165
    {
1166
        $this->cPid = max(intval($value), 0);
1167
    }
1168
1169
    /**
1170
     * This is a singleton class, thus the constructor should be private/protected
1171
     * (Get an instance of this class by calling AbstractDocument::getInstance())
1172
     *
1173
     * @access protected
1174
     *
1175
     * @param string $location: The location URL of the XML file to parse
1176
     * @param int $pid: If > 0, then only document with this PID gets loaded
1177
     * @param \SimpleXMLElement|IiifResourceInterface $preloadedDocument: Either null or the \SimpleXMLElement
1178
     * or IiifResourceInterface that has been loaded to determine the basic document format.
1179
     *
1180
     * @return void
1181
     */
1182
    protected function __construct($location, $pid, $preloadedDocument)
1183
    {
1184
        $this->pid = $pid;
0 ignored issues
show
Bug introduced by
The property pid is declared read-only in Kitodo\Dlf\Common\AbstractDocument.
Loading history...
1185
        $this->setPreloadedDocument($preloadedDocument);
1186
        $this->init($location);
1187
        $this->establishRecordId($pid);
1188
        return;
1189
    }
1190
1191
    /**
1192
     * This magic method is called each time an invisible property is referenced from the object
1193
     *
1194
     * @access public
1195
     *
1196
     * @param string $var: Name of variable to get
1197
     *
1198
     * @return mixed Value of $this->$var
1199
     */
1200
    public function __get($var)
1201
    {
1202
        $method = '_get' . ucfirst($var);
1203
        if (
1204
            !property_exists($this, $var)
1205
            || !method_exists($this, $method)
1206
        ) {
1207
            $this->logger->warning('There is no getter function for property "' . $var . '"');
1208
            return;
1209
        } else {
1210
            return $this->$method();
1211
        }
1212
    }
1213
1214
    /**
1215
     * This magic method is called each time an invisible property is checked for isset() or empty()
1216
     *
1217
     * @access public
1218
     *
1219
     * @param string $var: Name of variable to check
1220
     *
1221
     * @return bool true if variable is set and not empty, false otherwise
1222
     */
1223
    public function __isset($var)
1224
    {
1225
        return !empty($this->__get($var));
1226
    }
1227
1228
    /**
1229
     * This magic method is called each time an invisible property is referenced from the object
1230
     *
1231
     * @access public
1232
     *
1233
     * @param string $var: Name of variable to set
1234
     * @param mixed $value: New value of variable
1235
     *
1236
     * @return void
1237
     */
1238
    public function __set($var, $value)
1239
    {
1240
        $method = '_set' . ucfirst($var);
1241
        if (
1242
            !property_exists($this, $var)
1243
            || !method_exists($this, $method)
1244
        ) {
1245
            $this->logger->warning('There is no setter function for property "' . $var . '"');
1246
        } else {
1247
            $this->$method($value);
1248
        }
1249
    }
1250
1251
    /**
1252
     * get Cache Hit for $doc
1253
     *
1254
     * @param string $location
1255
     * @return Doc|false
0 ignored issues
show
Bug introduced by
The type Kitodo\Dlf\Common\Doc was not found. Maybe you did not declare it correctly or list all dependencies?

The issue could also be caused by a filter entry in the build configuration. If the path has been excluded in your configuration, e.g. excluded_paths: ["lib/*"], you can move it to the dependency path list as follows:

filter:
    dependency_paths: ["lib/*"]

For further information see https://scrutinizer-ci.com/docs/tools/php/php-scrutinizer/#list-dependency-paths

Loading history...
1256
     */
1257
    private static function getDocCache(string $location)
1258
    {
1259
        $cacheIdentifier = md5($location);
1260
        $cache = GeneralUtility::makeInstance(CacheManager::class)->getCache('tx_dlf_doc');
1261
        $cacheHit = $cache->get($cacheIdentifier);
1262
1263
        return $cacheHit;
1264
    }
1265
1266
    /**
1267
     * set Cache for $doc
1268
     *
1269
     * @param string $location
1270
     * @param AbstractDocument $currentDocument
1271
     * @return void
1272
     */
1273
    private static function setDocCache(string $location, AbstractDocument $currentDocument)
1274
    {
1275
        $cacheIdentifier = md5($location);
1276
        $cache = GeneralUtility::makeInstance(CacheManager::class)->getCache('tx_dlf_doc');
1277
1278
        // Save value in cache
1279
        $cache->set($cacheIdentifier, $currentDocument);
1280
    }
1281
1282
}
1283