Scrutinizer GitHub App not installed

We could not synchronize checks via GitHub's checks API since Scrutinizer's GitHub App is not installed for this repository.

Install GitHub App

GitHub Access Token became invalid

It seems like the GitHub access token used for retrieving details about this repository from GitHub became invalid. This might prevent certain types of inspections from being run (in particular, everything related to pull requests).
Please ask an admin of your repository to re-new the access token on this website.
Passed
Pull Request — dev-extbase-fluid (#746)
by Alexander
03:49 queued 01:03
created

Doc   F

Complexity

Total Complexity 90

Size/Duplication

Total Lines 1234
Duplicated Lines 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
eloc 277
c 1
b 0
f 0
dl 0
loc 1234
rs 2
wmc 90

32 Methods

Rating   Name   Duplication   Size   Complexity  
A clearRegistry() 0 4 1
A setDocCache() 0 7 1
A _getTableOfContents() 0 9 2
A getDocCache() 0 7 1
A getTreeDepth() 0 13 5
A _setCPid() 0 3 1
A _getRootId() 0 10 3
A _getUid() 0 3 1
A _getPid() 0 3 1
A _getLocation() 0 3 1
A loadFormats() 0 29 3
A _getNumPages() 0 4 1
A load() 0 16 3
A __construct() 0 8 1
B getFullTextFromXml() 0 52 10
A getStructureDepth() 0 3 1
A _getRecordId() 0 3 1
A registerNamespaces() 0 16 4
A getTextFormat() 0 4 1
B getTitle() 0 42 7
A _getMetadataArray() 0 17 5
A _getHasFulltext() 0 4 1
C getInstance() 0 60 12
A _getParentId() 0 3 1
A getTitledata() 0 20 6
A _getCPid() 0 3 1
A _getReady() 0 3 1
A _getPhysicalStructureInfo() 0 8 2
A __set() 0 10 3
A getPhysicalPage() 0 19 5
A __isset() 0 3 1
A __get() 0 11 3

How to fix   Complexity   

Complex Class

Complex classes like Doc often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Doc, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
/**
4
 * (c) Kitodo. Key to digital objects e.V. <[email protected]>
5
 *
6
 * This file is part of the Kitodo and TYPO3 projects.
7
 *
8
 * @license GNU General Public License version 3 or later.
9
 * For the full copyright and license information, please read the
10
 * LICENSE.txt file that was distributed with this source code.
11
 */
12
13
namespace Kitodo\Dlf\Common;
14
15
use TYPO3\CMS\Core\Cache\CacheManager;
16
use TYPO3\CMS\Core\Configuration\ExtensionConfiguration;
17
use TYPO3\CMS\Core\Database\ConnectionPool;
18
use TYPO3\CMS\Core\Database\Query\Restriction\HiddenRestriction;
19
use TYPO3\CMS\Core\Log\LogManager;
20
use TYPO3\CMS\Core\Utility\GeneralUtility;
21
use TYPO3\CMS\Core\Utility\MathUtility;
22
use TYPO3\CMS\Extbase\Configuration\ConfigurationManager;
23
use Ubl\Iiif\Presentation\Common\Model\Resources\IiifResourceInterface;
24
use Ubl\Iiif\Tools\IiifHelper;
25
26
/**
27
 * Document class for the 'dlf' extension
28
 *
29
 * @author Sebastian Meyer <[email protected]>
30
 * @author Henrik Lochmann <[email protected]>
31
 * @package TYPO3
32
 * @subpackage dlf
33
 * @access public
34
 * @property int $cPid This holds the PID for the configuration
35
 * @property-read bool $hasFulltext Are there any fulltext files available?
36
 * @property-read string $location This holds the documents location
37
 * @property-read array $metadataArray This holds the documents' parsed metadata array
38
 * @property-read int $numPages The holds the total number of pages
39
 * @property-read int $parentId This holds the UID of the parent document or zero if not multi-volumed
40
 * @property-read array $physicalStructure This holds the physical structure
41
 * @property-read array $physicalStructureInfo This holds the physical structure metadata
42
 * @property-read int $pid This holds the PID of the document or zero if not in database
43
 * @property-read bool $ready Is the document instantiated successfully?
44
 * @property-read string $recordId The METS file's / IIIF manifest's record identifier
45
 * @property-read int $rootId This holds the UID of the root document or zero if not multi-volumed
46
 * @property-read array $smLinks This holds the smLinks between logical and physical structMap
47
 * @property-read array $tableOfContents This holds the logical structure
48
 * @property-read string $thumbnail This holds the document's thumbnail location
49
 * @property-read string $toplevelId This holds the toplevel structure's @ID (METS) or the manifest's @id (IIIF)
50
 * @property-read mixed $uid This holds the UID or the URL of the document
51
 * @abstract
52
 */
53
abstract class Doc
54
{
55
    /**
56
     * This holds the logger
57
     *
58
     * @var LogManager
59
     * @access protected
60
     */
61
    protected $logger;
62
63
    /**
64
     * This holds the PID for the configuration
65
     *
66
     * @var int
67
     * @access protected
68
     */
69
    protected $cPid = 0;
70
71
    /**
72
     * The extension key
73
     *
74
     * @var string
75
     * @access public
76
     */
77
    public static $extKey = 'dlf';
78
79
    /**
80
     * This holds the configuration for all supported metadata encodings
81
     * @see loadFormats()
82
     *
83
     * @var array
84
     * @access protected
85
     */
86
    protected $formats = [
87
        'OAI' => [
88
            'rootElement' => 'OAI-PMH',
89
            'namespaceURI' => 'http://www.openarchives.org/OAI/2.0/',
90
        ],
91
        'METS' => [
92
            'rootElement' => 'mets',
93
            'namespaceURI' => 'http://www.loc.gov/METS/',
94
        ],
95
        'XLINK' => [
96
            'rootElement' => 'xlink',
97
            'namespaceURI' => 'http://www.w3.org/1999/xlink',
98
        ]
99
    ];
100
101
    /**
102
     * Are the available metadata formats loaded?
103
     * @see $formats
104
     *
105
     * @var bool
106
     * @access protected
107
     */
108
    protected $formatsLoaded = false;
109
110
    /**
111
     * Are there any fulltext files available? This also includes IIIF text annotations
112
     * with motivation 'painting' if Kitodo.Presentation is configured to store text
113
     * annotations as fulltext.
114
     *
115
     * @var bool
116
     * @access protected
117
     */
118
    protected $hasFulltext = false;
119
120
    /**
121
     * Last searched logical and physical page
122
     *
123
     * @var array
124
     * @access protected
125
     */
126
    protected $lastSearchedPhysicalPage = ['logicalPage' => null, 'physicalPage' => null];
127
128
    /**
129
     * This holds the documents location
130
     *
131
     * @var string
132
     * @access protected
133
     */
134
    protected $location = '';
135
136
    /**
137
     * This holds the logical units
138
     *
139
     * @var array
140
     * @access protected
141
     */
142
    protected $logicalUnits = [];
143
144
    /**
145
     * This holds the documents' parsed metadata array with their corresponding
146
     * structMap//div's ID (METS) or Range / Manifest / Sequence ID (IIIF) as array key
147
     *
148
     * @var array
149
     * @access protected
150
     */
151
    protected $metadataArray = [];
152
153
    /**
154
     * Is the metadata array loaded?
155
     * @see $metadataArray
156
     *
157
     * @var bool
158
     * @access protected
159
     */
160
    protected $metadataArrayLoaded = false;
161
162
    /**
163
     * The holds the total number of pages
164
     *
165
     * @var int
166
     * @access protected
167
     */
168
    protected $numPages = 0;
169
170
    /**
171
     * This holds the UID of the parent document or zero if not multi-volumed
172
     *
173
     * @var int
174
     * @access protected
175
     */
176
    protected $parentId = 0;
177
178
    /**
179
     * This holds the physical structure
180
     *
181
     * @var array
182
     * @access protected
183
     */
184
    protected $physicalStructure = [];
185
186
    /**
187
     * This holds the physical structure metadata
188
     *
189
     * @var array
190
     * @access protected
191
     */
192
    protected $physicalStructureInfo = [];
193
194
    /**
195
     * Is the physical structure loaded?
196
     * @see $physicalStructure
197
     *
198
     * @var bool
199
     * @access protected
200
     */
201
    protected $physicalStructureLoaded = false;
202
203
    /**
204
     * This holds the PID of the document or zero if not in database
205
     *
206
     * @var int
207
     * @access protected
208
     */
209
    protected $pid = 0;
210
211
    /**
212
     * This holds the documents' raw text pages with their corresponding
213
     * structMap//div's ID (METS) or Range / Manifest / Sequence ID (IIIF) as array key
214
     *
215
     * @var array
216
     * @access protected
217
     */
218
    protected $rawTextArray = [];
219
220
    /**
221
     * Is the document instantiated successfully?
222
     *
223
     * @var bool
224
     * @access protected
225
     */
226
    protected $ready = false;
227
228
    /**
229
     * The METS file's / IIIF manifest's record identifier
230
     *
231
     * @var string
232
     * @access protected
233
     */
234
    protected $recordId;
235
236
    /**
237
     * This holds the singleton object of the document
238
     *
239
     * @var array (\Kitodo\Dlf\Common\Doc)
240
     * @static
241
     * @access protected
242
     */
243
    protected static $registry = [];
244
245
    /**
246
     * This holds the UID of the root document or zero if not multi-volumed
247
     *
248
     * @var int
249
     * @access protected
250
     */
251
    protected $rootId = 0;
252
253
    /**
254
     * Is the root id loaded?
255
     * @see $rootId
256
     *
257
     * @var bool
258
     * @access protected
259
     */
260
    protected $rootIdLoaded = false;
261
262
    /**
263
     * This holds the smLinks between logical and physical structMap
264
     *
265
     * @var array
266
     * @access protected
267
     */
268
    protected $smLinks = ['l2p' => [], 'p2l' => []];
269
270
    /**
271
     * Are the smLinks loaded?
272
     * @see $smLinks
273
     *
274
     * @var bool
275
     * @access protected
276
     */
277
    protected $smLinksLoaded = false;
278
279
    /**
280
     * This holds the logical structure
281
     *
282
     * @var array
283
     * @access protected
284
     */
285
    protected $tableOfContents = [];
286
287
    /**
288
     * Is the table of contents loaded?
289
     * @see $tableOfContents
290
     *
291
     * @var bool
292
     * @access protected
293
     */
294
    protected $tableOfContentsLoaded = false;
295
296
    /**
297
     * This holds the document's thumbnail location
298
     *
299
     * @var string
300
     * @access protected
301
     */
302
    protected $thumbnail = '';
303
304
    /**
305
     * Is the document's thumbnail location loaded?
306
     * @see $thumbnail
307
     *
308
     * @var bool
309
     * @access protected
310
     */
311
    protected $thumbnailLoaded = false;
312
313
    /**
314
     * This holds the toplevel structure's @ID (METS) or the manifest's @id (IIIF)
315
     *
316
     * @var string
317
     * @access protected
318
     */
319
    protected $toplevelId = '';
320
321
    /**
322
     * This holds the UID or the URL of the document
323
     *
324
     * @var mixed
325
     * @access protected
326
     */
327
    protected $uid = 0;
328
329
    /**
330
     * This holds the whole XML file as \SimpleXMLElement object
331
     *
332
     * @var \SimpleXMLElement
333
     * @access protected
334
     */
335
    protected $xml;
336
337
    /**
338
     * This clears the static registry to prevent memory exhaustion
339
     *
340
     * @access public
341
     *
342
     * @static
343
     *
344
     * @return void
345
     */
346
    public static function clearRegistry()
347
    {
348
        // Reset registry array.
349
        self::$registry = [];
350
    }
351
352
    /**
353
     * This ensures that the recordId, if existent, is retrieved from the document
354
     *
355
     * @access protected
356
     *
357
     * @abstract
358
     *
359
     * @param int $pid: ID of the configuration page with the recordId config
360
     *
361
     */
362
    protected abstract function establishRecordId($pid);
363
364
    /**
365
     * Source document PHP object which is represented by a Document instance
366
     *
367
     * @access protected
368
     *
369
     * @abstract
370
     *
371
     * @return \SimpleXMLElement|IiifResourceInterface An PHP object representation of
372
     * the current document. SimpleXMLElement for METS, IiifResourceInterface for IIIF
373
     */
374
    protected abstract function getDocument();
375
376
    /**
377
     * This gets the location of a downloadable file for a physical page or track
378
     *
379
     * @access public
380
     *
381
     * @abstract
382
     *
383
     * @param string $id: The @ID attribute of the file node (METS) or the @id property of the IIIF resource
384
     *
385
     * @return string    The file's location as URL
386
     */
387
    public abstract function getDownloadLocation($id);
388
389
    /**
390
     * This gets the location of a file representing a physical page or track
391
     *
392
     * @access public
393
     *
394
     * @abstract
395
     *
396
     * @param string $id: The @ID attribute of the file node (METS) or the @id property of the IIIF resource
397
     *
398
     * @return string The file's location as URL
399
     */
400
    public abstract function getFileLocation($id);
401
402
    /**
403
     * This gets the MIME type of a file representing a physical page or track
404
     *
405
     * @access public
406
     *
407
     * @abstract
408
     *
409
     * @param string $id: The @ID attribute of the file node
410
     *
411
     * @return string The file's MIME type
412
     */
413
    public abstract function getFileMimeType($id);
414
415
    /**
416
     * This is a singleton class, thus an instance must be created by this method
417
     *
418
     * @access public
419
     *
420
     * @static
421
     *
422
     * @param string $location: The URL of XML file or the IRI of the IIIF resource
423
     * @param array $settings
424
     * @param bool $forceReload: Force reloading the document instead of returning the cached instance
425
     *
426
     * @return \Kitodo\Dlf\Common\Doc|null Instance of this class, either MetsDocument or IiifManifest
427
     */
428
    public static function &getInstance($location, $settings = [], $forceReload = false)
429
    {
430
        // Create new instance depending on format (METS or IIIF) ...
431
        $documentFormat = null;
432
        $xml = null;
433
        $iiif = null;
434
435
        if ($instance = self::getDocCache($location)) {
436
            return $instance;
437
        } else {
438
            $instance = null;
439
        }
440
441
        // Try to load a file from the url
442
        if (GeneralUtility::isValidUrl($location)) {
443
            // Load extension configuration
444
            $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
445
            // Set user-agent to identify self when fetching XML data.
446
            if (!empty($extConf['useragent'])) {
447
                @ini_set('user_agent', $extConf['useragent']);
448
            }
449
            $content = GeneralUtility::getUrl($location);
450
            if ($content !== false) {
451
                $xml = Helper::getXmlFileAsString($content);
452
                if ($xml !== false) {
453
                    /* @var $xml \SimpleXMLElement */
454
                    $xml->registerXPathNamespace('mets', 'http://www.loc.gov/METS/');
455
                    $xpathResult = $xml->xpath('//mets:mets');
456
                    $documentFormat = !empty($xpathResult) ? 'METS' : null;
457
                } else {
458
                    // Try to load file as IIIF resource instead.
459
                    $contentAsJsonArray = json_decode($content, true);
460
                    if ($contentAsJsonArray !== null) {
461
                        // Load plugin configuration.
462
                        $conf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
463
                        IiifHelper::setUrlReader(IiifUrlReader::getInstance());
464
                        IiifHelper::setMaxThumbnailHeight($conf['iiifThumbnailHeight']);
465
                        IiifHelper::setMaxThumbnailWidth($conf['iiifThumbnailWidth']);
466
                        $iiif = IiifHelper::loadIiifResource($contentAsJsonArray);
467
                        if ($iiif instanceof IiifResourceInterface) {
468
                            $documentFormat = 'IIIF';
469
                        }
470
                    }
471
                }
472
            }
473
        }
474
475
        // Sanitize input.
476
        $pid = max(intval($settings['storagePid']), 0);
477
        if ($documentFormat == 'METS') {
478
            $instance = new MetsDocument($location, $pid, $xml);
479
        } elseif ($documentFormat == 'IIIF') {
480
            $instance = new IiifManifest($location, $pid, $iiif);
481
        }
482
483
        if ($instance) {
484
            self::setDocCache($location, $instance);
485
        }
486
487
        return $instance;
488
    }
489
490
    /**
491
     * This gets details about a logical structure element
492
     *
493
     * @access public
494
     *
495
     * @abstract
496
     *
497
     * @param string $id: The @ID attribute of the logical structure node (METS) or
498
     * the @id property of the Manifest / Range (IIIF)
499
     * @param bool $recursive: Whether to include the child elements / resources
500
     *
501
     * @return array Array of the element's id, label, type and physical page indexes/mptr link
502
     */
503
    public abstract function getLogicalStructure($id, $recursive = false);
504
505
    /**
506
     * This extracts all the metadata for a logical structure node
507
     *
508
     * @access public
509
     *
510
     * @abstract
511
     *
512
     * @param string $id: The @ID attribute of the logical structure node (METS) or the @id property
513
     * of the Manifest / Range (IIIF)
514
     * @param int $cPid: The PID for the metadata definitions
515
     *                       (defaults to $this->cPid or $this->pid)
516
     *
517
     * @return array The logical structure node's / the IIIF resource's parsed metadata array
518
     */
519
    public abstract function getMetadata($id, $cPid = 0);
520
521
    /**
522
     * This returns the first corresponding physical page number of a given logical page label
523
     *
524
     * @access public
525
     *
526
     * @param string $logicalPage: The label (or a part of the label) of the logical page
527
     *
528
     * @return int The physical page number
529
     */
530
    public function getPhysicalPage($logicalPage)
531
    {
532
        if (
533
            !empty($this->lastSearchedPhysicalPage['logicalPage'])
534
            && $this->lastSearchedPhysicalPage['logicalPage'] == $logicalPage
535
        ) {
536
            return $this->lastSearchedPhysicalPage['physicalPage'];
537
        } else {
538
            $physicalPage = 0;
539
            foreach ($this->physicalStructureInfo as $page) {
540
                if (strpos($page['orderlabel'], $logicalPage) !== false) {
541
                    $this->lastSearchedPhysicalPage['logicalPage'] = $logicalPage;
542
                    $this->lastSearchedPhysicalPage['physicalPage'] = $physicalPage;
543
                    return $physicalPage;
544
                }
545
                $physicalPage++;
546
            }
547
        }
548
        return 1;
549
    }
550
551
    /**
552
     * This extracts the OCR full text for a physical structure node / IIIF Manifest / Canvas. Text might be
553
     * given as ALTO for METS or as annotations or ALTO for IIIF resources.
554
     *
555
     * @access public
556
     *
557
     * @abstract
558
     *
559
     * @param string $id: The @ID attribute of the physical structure node (METS) or the @id property
560
     * of the Manifest / Range (IIIF)
561
     *
562
     * @return string The OCR full text
563
     */
564
    public abstract function getFullText($id);
565
566
    /**
567
     * This extracts the OCR full text for a physical structure node / IIIF Manifest / Canvas from an
568
     * XML full text representation (currently only ALTO). For IIIF manifests, ALTO documents have
569
     * to be given in the Canvas' / Manifest's "seeAlso" property.
570
     *
571
     * @param string $id: The @ID attribute of the physical structure node (METS) or the @id property
572
     * of the Manifest / Range (IIIF)
573
     *
574
     * @return string The OCR full text
575
     */
576
    protected function getFullTextFromXml($id)
577
    {
578
        $fullText = '';
579
        // Load available text formats, ...
580
        $this->loadFormats();
581
        // ... physical structure ...
582
        $this->_getPhysicalStructure();
583
        // ... and extension configuration.
584
        $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
585
        $fileGrpsFulltext = GeneralUtility::trimExplode(',', $extConf['fileGrpFulltext']);
586
        if (!empty($this->physicalStructureInfo[$id])) {
587
            while ($fileGrpFulltext = array_shift($fileGrpsFulltext)) {
588
                if (!empty($this->physicalStructureInfo[$id]['files'][$fileGrpFulltext])) {
589
                    // Get full text file.
590
                    $fileContent = GeneralUtility::getUrl($this->getFileLocation($this->physicalStructureInfo[$id]['files'][$fileGrpFulltext]));
591
                    if ($fileContent !== false) {
592
                        $textFormat = $this->getTextFormat($fileContent);
593
                    } else {
594
                        $this->logger->warning('Couldn\'t load full text file for structure node @ID "' . $id . '"');
595
                        return $fullText;
596
                    }
597
                    break;
598
                }
599
            }
600
        } else {
601
            $this->logger->warning('Invalid structure node @ID "' . $id . '"');
602
            return $fullText;
603
        }
604
        // Is this text format supported?
605
        // This part actually differs from previous version of indexed OCR
606
        if (!empty($fileContent) && !empty($this->formats[$textFormat])) {
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $textFormat does not seem to be defined for all execution paths leading up to this point.
Loading history...
607
            $textMiniOcr = '';
608
            if (!empty($this->formats[$textFormat]['class'])) {
609
                $class = $this->formats[$textFormat]['class'];
610
                // Get the raw text from class.
611
                if (
612
                    class_exists($class)
613
                    && ($obj = GeneralUtility::makeInstance($class)) instanceof FulltextInterface
614
                ) {
615
                    // Load XML from file.
616
                    $ocrTextXml = Helper::getXmlFileAsString($fileContent);
617
                    $textMiniOcr = $obj->getTextAsMiniOcr($ocrTextXml);
618
                    $this->rawTextArray[$id] = $textMiniOcr;
619
                } else {
620
                    $this->logger->warning('Invalid class/method "' . $class . '->getRawText()" for text format "' . $textFormat . '"');
621
                }
622
            }
623
            $fullText = $textMiniOcr;
624
        } else {
625
            $this->logger->warning('Unsupported text format "' . $textFormat . '" in physical node with @ID "' . $id . '"');
626
        }
627
        return $fullText;
628
    }
629
630
    /**
631
     * Get format of the OCR full text
632
     *
633
     * @access private
634
     *
635
     * @param string $fileContent: content of the XML file
636
     *
637
     * @return string The format of the OCR full text
638
     */
639
    private function getTextFormat($fileContent)
640
    {
641
        // Get the root element's name as text format.
642
        return strtoupper(Helper::getXmlFileAsString($fileContent)->getName());
643
    }
644
645
    /**
646
     * This determines a title for the given document
647
     *
648
     * @access public
649
     *
650
     * @static
651
     *
652
     * @param int $uid: The UID of the document
653
     * @param bool $recursive: Search superior documents for a title, too?
654
     *
655
     * @return string The title of the document itself or a parent document
656
     */
657
    public static function getTitle($uid, $recursive = false)
658
    {
659
        $title = '';
660
        // Sanitize input.
661
        $uid = max(intval($uid), 0);
662
        if ($uid) {
663
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
664
                ->getQueryBuilderForTable('tx_dlf_documents');
665
666
            $result = $queryBuilder
667
                ->select(
668
                    'tx_dlf_documents.title',
669
                    'tx_dlf_documents.partof'
670
                )
671
                ->from('tx_dlf_documents')
672
                ->where(
673
                    $queryBuilder->expr()->eq('tx_dlf_documents.uid', $uid),
674
                    Helper::whereExpression('tx_dlf_documents')
675
                )
676
                ->setMaxResults(1)
677
                ->execute();
678
679
            if ($resArray = $result->fetch()) {
680
                // Get title information.
681
                $title = $resArray['title'];
682
                $partof = $resArray['partof'];
683
                // Search parent documents recursively for a title?
684
                if (
685
                    $recursive
686
                    && empty($title)
687
                    && intval($partof)
688
                    && $partof != $uid
689
                ) {
690
                    $title = self::getTitle($partof, true);
691
                }
692
            } else {
693
                Helper::log('No document with UID ' . $uid . ' found or document not accessible', LOG_SEVERITY_WARNING);
694
            }
695
        } else {
696
            Helper::log('Invalid UID ' . $uid . ' for document', LOG_SEVERITY_ERROR);
697
        }
698
        return $title;
699
    }
700
701
    /**
702
     * This extracts all the metadata for the toplevel logical structure node / resource
703
     *
704
     * @access public
705
     *
706
     * @param int $cPid: The PID for the metadata definitions
707
     *
708
     * @return array The logical structure node's / resource's parsed metadata array
709
     */
710
    public function getTitledata($cPid = 0)
711
    {
712
        $titledata = $this->getMetadata($this->_getToplevelId(), $cPid);
713
        // Add information from METS structural map to titledata array.
714
        if ($this instanceof MetsDocument) {
715
            $this->addMetadataFromMets($titledata, $this->_getToplevelId());
716
        }
717
        // Set record identifier for METS file / IIIF manifest if not present.
718
        if (
719
            is_array($titledata)
720
            && array_key_exists('record_id', $titledata)
721
        ) {
722
            if (
723
                !empty($this->recordId)
724
                && !in_array($this->recordId, $titledata['record_id'])
725
            ) {
726
                array_unshift($titledata['record_id'], $this->recordId);
727
            }
728
        }
729
        return $titledata;
730
    }
731
732
    /**
733
     * Traverse a logical (sub-) structure tree to find the structure with the requested logical id and return it's depth.
734
     *
735
     * @access protected
736
     *
737
     * @param array $structure: logical structure array
738
     * @param int $depth: current tree depth
739
     * @param string $logId: ID of the logical structure whose depth is requested
740
     *
741
     * @return int|bool: false if structure with $logId is not a child of this substructure,
742
     * or the actual depth.
743
     */
744
    protected function getTreeDepth($structure, $depth, $logId)
745
    {
746
        foreach ($structure as $element) {
747
            if ($element['id'] == $logId) {
748
                return $depth;
749
            } elseif (array_key_exists('children', $element)) {
750
                $foundInChildren = $this->getTreeDepth($element['children'], $depth + 1, $logId);
751
                if ($foundInChildren !== false) {
752
                    return $foundInChildren;
753
                }
754
            }
755
        }
756
        return false;
757
    }
758
759
    /**
760
     * Get the tree depth of a logical structure element within the table of content
761
     *
762
     * @access public
763
     *
764
     * @param string $logId: The id of the logical structure element whose depth is requested
765
     * @return int|bool tree depth as integer or false if no element with $logId exists within the TOC.
766
     */
767
    public function getStructureDepth($logId)
768
    {
769
        return $this->getTreeDepth($this->_getTableOfContents(), 1, $logId);
770
    }
771
772
    /**
773
     * This sets some basic class properties
774
     *
775
     * @access protected
776
     *
777
     * @abstract
778
     *
779
     * @return void
780
     */
781
    protected abstract function init();
782
783
    /**
784
     * Reuse any document object that might have been already loaded to determine wether document is METS or IIIF
785
     *
786
     * @access protected
787
     *
788
     * @abstract
789
     *
790
     * @param \SimpleXMLElement|IiifResourceInterface $preloadedDocument: any instance that has already been loaded
791
     *
792
     * @return bool true if $preloadedDocument can actually be reused, false if it has to be loaded again
793
     */
794
    protected abstract function setPreloadedDocument($preloadedDocument);
795
796
    /**
797
     * METS/IIIF specific part of loading a location
798
     *
799
     * @access protected
800
     *
801
     * @abstract
802
     *
803
     * @param string $location: The URL of the file to load
804
     *
805
     * @return bool true on success or false on failure
806
     */
807
    protected abstract function loadLocation($location);
808
809
    /**
810
     * Load XML file / IIIF resource from URL
811
     *
812
     * @access protected
813
     *
814
     * @param string $location: The URL of the file to load
815
     *
816
     * @return bool true on success or false on failure
817
     */
818
    protected function load($location)
819
    {
820
        // Load XML / JSON-LD file.
821
        if (GeneralUtility::isValidUrl($location)) {
822
            // Load extension configuration
823
            $extConf = GeneralUtility::makeInstance(ExtensionConfiguration::class)->get(self::$extKey);
824
            // Set user-agent to identify self when fetching XML / JSON-LD data.
825
            if (!empty($extConf['useragent'])) {
826
                @ini_set('user_agent', $extConf['useragent']);
827
            }
828
            // the actual loading is format specific
829
            return $this->loadLocation($location);
830
        } else {
831
            $this->logger->error('Invalid file location "' . $location . '" for document loading');
832
        }
833
        return false;
834
    }
835
836
    /**
837
     * Analyze the document if it contains any fulltext that needs to be indexed.
838
     *
839
     * @access protected
840
     *
841
     * @abstract
842
     */
843
    protected abstract function ensureHasFulltextIsSet();
844
845
    /**
846
     * Register all available data formats
847
     *
848
     * @access protected
849
     *
850
     * @return void
851
     */
852
    protected function loadFormats()
853
    {
854
        if (!$this->formatsLoaded) {
855
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
856
                ->getQueryBuilderForTable('tx_dlf_formats');
857
858
            // Get available data formats from database.
859
            $result = $queryBuilder
860
                ->select(
861
                    'tx_dlf_formats.type AS type',
862
                    'tx_dlf_formats.root AS root',
863
                    'tx_dlf_formats.namespace AS namespace',
864
                    'tx_dlf_formats.class AS class'
865
                )
866
                ->from('tx_dlf_formats')
867
                ->where(
868
                    $queryBuilder->expr()->eq('tx_dlf_formats.pid', 0)
869
                )
870
                ->execute();
871
872
            while ($resArray = $result->fetch()) {
873
                // Update format registry.
874
                $this->formats[$resArray['type']] = [
875
                    'rootElement' => $resArray['root'],
876
                    'namespaceURI' => $resArray['namespace'],
877
                    'class' => $resArray['class']
878
                ];
879
            }
880
            $this->formatsLoaded = true;
881
        }
882
    }
883
884
    /**
885
     * Register all available namespaces for a \SimpleXMLElement object
886
     *
887
     * @access public
888
     *
889
     * @param \SimpleXMLElement|\DOMXPath &$obj: \SimpleXMLElement or \DOMXPath object
890
     *
891
     * @return void
892
     */
893
    public function registerNamespaces(&$obj)
894
    {
895
        // TODO Check usage. XML specific method does not seem to be used anywhere outside this class within the project, but it is public and may be used by extensions.
896
        $this->loadFormats();
897
        // Do we have a \SimpleXMLElement or \DOMXPath object?
898
        if ($obj instanceof \SimpleXMLElement) {
899
            $method = 'registerXPathNamespace';
900
        } elseif ($obj instanceof \DOMXPath) {
901
            $method = 'registerNamespace';
902
        } else {
903
            $this->logger->error('Given object is neither a SimpleXMLElement nor a DOMXPath instance');
904
            return;
905
        }
906
        // Register metadata format's namespaces.
907
        foreach ($this->formats as $enc => $conf) {
908
            $obj->$method(strtolower($enc), $conf['namespaceURI']);
909
        }
910
    }
911
912
    /**
913
     * This returns $this->cPid via __get()
914
     *
915
     * @access protected
916
     *
917
     * @return int The PID of the metadata definitions
918
     */
919
    protected function _getCPid()
920
    {
921
        return $this->cPid;
922
    }
923
924
    /**
925
     * This returns $this->hasFulltext via __get()
926
     *
927
     * @access protected
928
     *
929
     * @return bool Are there any fulltext files available?
930
     */
931
    protected function _getHasFulltext()
932
    {
933
        $this->ensureHasFulltextIsSet();
934
        return $this->hasFulltext;
935
    }
936
937
    /**
938
     * This returns $this->location via __get()
939
     *
940
     * @access protected
941
     *
942
     * @return string The location of the document
943
     */
944
    protected function _getLocation()
945
    {
946
        return $this->location;
947
    }
948
949
    /**
950
     * Format specific part of building the document's metadata array
951
     *
952
     * @access protected
953
     *
954
     * @abstract
955
     *
956
     * @param int $cPid
957
     */
958
    protected abstract function prepareMetadataArray($cPid);
959
960
    /**
961
     * This builds an array of the document's metadata
962
     *
963
     * @access protected
964
     *
965
     * @return array Array of metadata with their corresponding logical structure node ID as key
966
     */
967
    protected function _getMetadataArray()
968
    {
969
        // Set metadata definitions' PID.
970
        $cPid = ($this->cPid ? $this->cPid : $this->pid);
971
        if (!$cPid) {
972
            $this->logger->error('Invalid PID ' . $cPid . ' for metadata definitions');
973
            return [];
974
        }
975
        if (
976
            !$this->metadataArrayLoaded
977
            || $this->metadataArray[0] != $cPid
978
        ) {
979
            $this->prepareMetadataArray($cPid);
980
            $this->metadataArray[0] = $cPid;
0 ignored issues
show
Bug introduced by
The property metadataArray is declared read-only in Kitodo\Dlf\Common\Doc.
Loading history...
981
            $this->metadataArrayLoaded = true;
982
        }
983
        return $this->metadataArray;
984
    }
985
986
    /**
987
     * This returns $this->numPages via __get()
988
     *
989
     * @access protected
990
     *
991
     * @return int The total number of pages and/or tracks
992
     */
993
    protected function _getNumPages()
994
    {
995
        $this->_getPhysicalStructure();
996
        return $this->numPages;
997
    }
998
999
    /**
1000
     * This returns $this->parentId via __get()
1001
     *
1002
     * @access protected
1003
     *
1004
     * @return int The UID of the parent document or zero if not applicable
1005
     */
1006
    protected function _getParentId()
1007
    {
1008
        return $this->parentId;
1009
    }
1010
1011
    /**
1012
     * This builds an array of the document's physical structure
1013
     *
1014
     * @access protected
1015
     *
1016
     * @abstract
1017
     *
1018
     * @return array Array of physical elements' id, type, label and file representations ordered
1019
     * by @ORDER attribute / IIIF Sequence's Canvases
1020
     */
1021
    protected abstract function _getPhysicalStructure();
1022
1023
    /**
1024
     * This gives an array of the document's physical structure metadata
1025
     *
1026
     * @access protected
1027
     *
1028
     * @return array Array of elements' type, label and file representations ordered by @ID attribute / Canvas order
1029
     */
1030
    protected function _getPhysicalStructureInfo()
1031
    {
1032
        // Is there no physical structure array yet?
1033
        if (!$this->physicalStructureLoaded) {
1034
            // Build physical structure array.
1035
            $this->_getPhysicalStructure();
1036
        }
1037
        return $this->physicalStructureInfo;
1038
    }
1039
1040
    /**
1041
     * This returns $this->pid via __get()
1042
     *
1043
     * @access protected
1044
     *
1045
     * @return int The PID of the document or zero if not in database
1046
     */
1047
    protected function _getPid()
1048
    {
1049
        return $this->pid;
1050
    }
1051
1052
    /**
1053
     * This returns $this->ready via __get()
1054
     *
1055
     * @access protected
1056
     *
1057
     * @return bool Is the document instantiated successfully?
1058
     */
1059
    protected function _getReady()
1060
    {
1061
        return $this->ready;
1062
    }
1063
1064
    /**
1065
     * This returns $this->recordId via __get()
1066
     *
1067
     * @access protected
1068
     *
1069
     * @return mixed The METS file's / IIIF manifest's record identifier
1070
     */
1071
    protected function _getRecordId()
1072
    {
1073
        return $this->recordId;
1074
    }
1075
1076
    /**
1077
     * This returns $this->rootId via __get()
1078
     *
1079
     * @access protected
1080
     *
1081
     * @return int The UID of the root document or zero if not applicable
1082
     */
1083
    protected function _getRootId()
1084
    {
1085
        if (!$this->rootIdLoaded) {
1086
            if ($this->parentId) {
1087
                $parent = self::getInstance($this->parentId, ['storagePid' => $this->pid]);
1088
                $this->rootId = $parent->rootId;
0 ignored issues
show
Bug introduced by
The property rootId is declared read-only in Kitodo\Dlf\Common\Doc.
Loading history...
1089
            }
1090
            $this->rootIdLoaded = true;
1091
        }
1092
        return $this->rootId;
1093
    }
1094
1095
    /**
1096
     * This returns the smLinks between logical and physical structMap (METS) and models the
1097
     * relation between IIIF Canvases and Manifests / Ranges in the same way
1098
     *
1099
     * @access protected
1100
     *
1101
     * @abstract
1102
     *
1103
     * @return array The links between logical and physical nodes / Range, Manifest and Canvas
1104
     */
1105
    protected abstract function _getSmLinks();
1106
1107
    /**
1108
     * This builds an array of the document's logical structure
1109
     *
1110
     * @access protected
1111
     *
1112
     * @return array Array of structure nodes' id, label, type and physical page indexes/mptr / Canvas link with original hierarchy preserved
1113
     */
1114
    protected function _getTableOfContents()
1115
    {
1116
        // Is there no logical structure array yet?
1117
        if (!$this->tableOfContentsLoaded) {
1118
            // Get all logical structures.
1119
            $this->getLogicalStructure('', true);
1120
            $this->tableOfContentsLoaded = true;
1121
        }
1122
        return $this->tableOfContents;
1123
    }
1124
1125
    /**
1126
     * This returns the document's thumbnail location
1127
     *
1128
     * @access protected
1129
     *
1130
     * @abstract
1131
     *
1132
     * @param bool $forceReload: Force reloading the thumbnail instead of returning the cached value
1133
     *
1134
     * @return string The document's thumbnail location
1135
     */
1136
    protected abstract function _getThumbnail($forceReload = false);
1137
1138
    /**
1139
     * This returns the ID of the toplevel logical structure node
1140
     *
1141
     * @access protected
1142
     *
1143
     * @abstract
1144
     *
1145
     * @return string The logical structure node's ID
1146
     */
1147
    protected abstract function _getToplevelId();
1148
1149
    /**
1150
     * This returns $this->uid via __get()
1151
     *
1152
     * @access protected
1153
     *
1154
     * @return mixed The UID or the URL of the document
1155
     */
1156
    protected function _getUid()
1157
    {
1158
        return $this->uid;
1159
    }
1160
1161
    /**
1162
     * This sets $this->cPid via __set()
1163
     *
1164
     * @access protected
1165
     *
1166
     * @param int $value: The new PID for the metadata definitions
1167
     *
1168
     * @return void
1169
     */
1170
    protected function _setCPid($value)
1171
    {
1172
        $this->cPid = max(intval($value), 0);
1173
    }
1174
1175
    /**
1176
     * This is a singleton class, thus the constructor should be private/protected
1177
     * (Get an instance of this class by calling \Kitodo\Dlf\Common\Doc::getInstance())
1178
     *
1179
     * @access protected
1180
     *
1181
     * @param int $location: The location URL of the XML file to parse
1182
     * @param int $pid: If > 0, then only document with this PID gets loaded
1183
     * @param \SimpleXMLElement|IiifResourceInterface $preloadedDocument: Either null or the \SimpleXMLElement
1184
     * or IiifResourceInterface that has been loaded to determine the basic document format.
1185
     *
1186
     * @return void
1187
     */
1188
    protected function __construct($location, $pid, $preloadedDocument)
1189
    {
1190
        $this->setPreloadedDocument($preloadedDocument);
1191
        $this->init();
1192
        $this->establishRecordId($pid);
1193
        $this->logger = GeneralUtility::makeInstance(LogManager::class)->getLogger();
1194
1195
        return;
1196
    }
1197
1198
    /**
1199
     * This magic method is called each time an invisible property is referenced from the object
1200
     *
1201
     * @access public
1202
     *
1203
     * @param string $var: Name of variable to get
1204
     *
1205
     * @return mixed Value of $this->$var
1206
     */
1207
    public function __get($var)
1208
    {
1209
        $method = '_get' . ucfirst($var);
1210
        if (
1211
            !property_exists($this, $var)
1212
            || !method_exists($this, $method)
1213
        ) {
1214
            $this->logger->warning('There is no getter function for property "' . $var . '"');
1215
            return;
1216
        } else {
1217
            return $this->$method();
1218
        }
1219
    }
1220
1221
    /**
1222
     * This magic method is called each time an invisible property is checked for isset() or empty()
1223
     *
1224
     * @access public
1225
     *
1226
     * @param string $var: Name of variable to check
1227
     *
1228
     * @return bool true if variable is set and not empty, false otherwise
1229
     */
1230
    public function __isset($var)
1231
    {
1232
        return !empty($this->__get($var));
1233
    }
1234
1235
    /**
1236
     * This magic method is called each time an invisible property is referenced from the object
1237
     *
1238
     * @access public
1239
     *
1240
     * @param string $var: Name of variable to set
1241
     * @param mixed $value: New value of variable
1242
     *
1243
     * @return void
1244
     */
1245
    public function __set($var, $value)
1246
    {
1247
        $method = '_set' . ucfirst($var);
1248
        if (
1249
            !property_exists($this, $var)
1250
            || !method_exists($this, $method)
1251
        ) {
1252
            $this->logger->warning('There is no setter function for property "' . $var . '"');
1253
        } else {
1254
            $this->$method($value);
1255
        }
1256
    }
1257
1258
    /**
1259
     * get Cache Hit for $doc
1260
     *
1261
     * @param string $location
1262
     * @return Doc|false
1263
     */
1264
    private static function getDocCache(string $location)
1265
    {
1266
        $cacheIdentifier = md5($location);
1267
        $cache = GeneralUtility::makeInstance(CacheManager::class)->getCache('tx_dlf_doc');
1268
        $cacheHit = $cache->get($cacheIdentifier);
1269
1270
        return $cacheHit;
1271
    }
1272
1273
    /**
1274
     * set Cache for $doc
1275
     *
1276
     * @param string $location
1277
     * @param Doc $doc
1278
     * @return void
1279
     */
1280
    private static function setDocCache(string $location, Doc $doc)
1281
    {
1282
        $cacheIdentifier = md5($location);
1283
        $cache = GeneralUtility::makeInstance(CacheManager::class)->getCache('tx_dlf_doc');
1284
1285
        // Save value in cache
1286
        $cache->set($cacheIdentifier, $doc);
1287
    }
1288
1289
}
1290