Completed
Push — master ( 9a2a94...744aa1 )
by
unknown
16s queued 14s
created

DocumentRepository   F

Complexity

Total Complexity 106

Size/Duplication

Total Lines 849
Duplicated Lines 0 %

Importance

Changes 14
Bugs 0 Features 0
Metric Value
eloc 428
c 14
b 0
f 0
dl 0
loc 849
rs 2
wmc 106

14 Methods

Rating   Name   Duplication   Size   Complexity  
A findOldestDocument() 0 8 1
F findSolrByCollection() 0 201 48
A findOneByIdAndSettings() 0 5 1
A findAllByCollectionsLimited() 0 25 4
B fetchMetadataFromSolr() 0 38 10
A getOaiRecord() 0 31 2
B findOneByParameters() 0 38 11
C searchSolr() 0 94 16
A getTableOfContentsFromDb() 0 39 2
B getStatisticsForSelectedCollection() 0 124 2
A findDocumentsBySettings() 0 21 5
A findAllByUids() 0 34 2
A getChildrenOfYearAnchor() 0 12 1
A getOaiDocumentList() 0 26 1

How to fix   Complexity   

Complex Class

Complex classes like DocumentRepository often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use DocumentRepository, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
/**
4
 * (c) Kitodo. Key to digital objects e.V. <[email protected]>
5
 *
6
 * This file is part of the Kitodo and TYPO3 projects.
7
 *
8
 * @license GNU General Public License version 3 or later.
9
 * For the full copyright and license information, please read the
10
 * LICENSE.txt file that was distributed with this source code.
11
 */
12
13
namespace Kitodo\Dlf\Domain\Repository;
14
15
use Kitodo\Dlf\Common\Doc;
16
use Kitodo\Dlf\Common\Helper;
17
use Kitodo\Dlf\Common\Indexer;
18
use Kitodo\Dlf\Common\Solr;
19
use Kitodo\Dlf\Domain\Model\Document;
20
use Kitodo\Dlf\Common\SolrSearchResult\ResultDocument;
21
use TYPO3\CMS\Core\Cache\CacheManager;
22
use TYPO3\CMS\Core\Database\ConnectionPool;
23
use TYPO3\CMS\Core\Database\Connection;
24
use TYPO3\CMS\Core\Utility\GeneralUtility;
25
use TYPO3\CMS\Core\Utility\MathUtility;
26
use TYPO3\CMS\Extbase\Persistence\QueryInterface;
27
28
class DocumentRepository extends \TYPO3\CMS\Extbase\Persistence\Repository
29
{
30
    /**
31
     * The controller settings passed to the repository for some special actions.
32
     *
33
     * @var array
34
     * @access protected
35
     */
36
    protected $settings;
37
38
    /**
39
     * Find one document by given parameters
40
     *
41
     * GET parameters may be:
42
     *
43
     * - 'id': the uid of the document
44
     * - 'location': the URL of the location of the XML file
45
     * - 'recordId': the record_id of the document
46
     *
47
     * Currently used by EXT:slub_digitalcollections
48
     *
49
     * @param array $parameters
50
     *
51
     * @return \Kitodo\Dlf\Domain\Model\Document|null
52
     */
53
    public function findOneByParameters($parameters)
54
    {
55
        $doc = null;
56
        $document = null;
57
58
        if (isset($parameters['id']) && MathUtility::canBeInterpretedAsInteger($parameters['id'])) {
59
60
            $document = $this->findOneByIdAndSettings($parameters['id']);
61
62
        } else if (isset($parameters['recordId'])) {
63
64
            $document = $this->findOneByRecordId($parameters['recordId']);
0 ignored issues
show
Bug introduced by
The method findOneByRecordId() does not exist on Kitodo\Dlf\Domain\Repository\DocumentRepository. Since you implemented __call, consider adding a @method annotation. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

64
            /** @scrutinizer ignore-call */ 
65
            $document = $this->findOneByRecordId($parameters['recordId']);
Loading history...
65
66
        } else if (isset($parameters['location']) && GeneralUtility::isValidUrl($parameters['location'])) {
67
68
            $doc = Doc::getInstance($parameters['location'], [], true);
69
70
            if ($doc->recordId) {
71
                $document = $this->findOneByRecordId($doc->recordId);
72
            }
73
74
            if ($document === null) {
75
                // create new (dummy) Document object
76
                $document = GeneralUtility::makeInstance(Document::class);
77
                $document->setLocation($parameters['location']);
78
            }
79
80
        }
81
82
        if ($document !== null && $doc === null) {
83
            $doc = Doc::getInstance($document->getLocation(), [], true);
0 ignored issues
show
Bug introduced by
The method getLocation() does not exist on TYPO3\CMS\Extbase\Persistence\QueryResultInterface. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

83
            $doc = Doc::getInstance($document->/** @scrutinizer ignore-call */ getLocation(), [], true);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
84
        }
85
86
        if ($doc !== null) {
87
            $document->setDoc($doc);
0 ignored issues
show
Bug introduced by
The method setDoc() does not exist on TYPO3\CMS\Extbase\Persistence\QueryResultInterface. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

87
            $document->/** @scrutinizer ignore-call */ 
88
                       setDoc($doc);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
88
        }
89
90
        return $document;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $document also could return the type TYPO3\CMS\Extbase\Persis...y<mixed,object>|integer which is incompatible with the documented return type Kitodo\Dlf\Domain\Model\Document|null.
Loading history...
91
    }
92
93
    /**
94
     * Find the oldest document
95
     *
96
     * @return \Kitodo\Dlf\Domain\Model\Document|null
97
     */
98
    public function findOldestDocument()
99
    {
100
        $query = $this->createQuery();
101
102
        $query->setOrderings(['tstamp' => QueryInterface::ORDER_ASCENDING]);
103
        $query->setLimit(1);
104
105
        return $query->execute()->getFirst();
106
    }
107
108
    /**
109
     * @param int $partOf
110
     * @param  \Kitodo\Dlf\Domain\Model\Structure $structure
111
     * @return array|\TYPO3\CMS\Extbase\Persistence\QueryResultInterface
112
     */
113
    public function getChildrenOfYearAnchor($partOf, $structure)
114
    {
115
        $query = $this->createQuery();
116
117
        $query->matching($query->equals('structure', $structure));
118
        $query->matching($query->equals('partof', $partOf));
119
120
        $query->setOrderings([
121
            'mets_orderlabel' => \TYPO3\CMS\Extbase\Persistence\QueryInterface::ORDER_ASCENDING
122
        ]);
123
124
        return $query->execute();
125
    }
126
127
    /**
128
     * Finds all documents for the given settings
129
     *
130
     * @param int $uid
131
     * @param array $settings
132
     *
133
     * @return \Kitodo\Dlf\Domain\Model\Document|null
134
     */
135
    public function findOneByIdAndSettings($uid, $settings = [])
0 ignored issues
show
Unused Code introduced by
The parameter $settings is not used and could be removed. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unused  annotation

135
    public function findOneByIdAndSettings($uid, /** @scrutinizer ignore-unused */ $settings = [])

This check looks for parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
136
    {
137
        $settings = ['documentSets' => $uid];
138
139
        return $this->findDocumentsBySettings($settings)->getFirst();
140
    }
141
142
    /**
143
     * Finds all documents for the given settings
144
     *
145
     * @param array $settings
146
     *
147
     * @return array|\TYPO3\CMS\Extbase\Persistence\QueryResultInterface
148
     */
149
    public function findDocumentsBySettings($settings = [])
150
    {
151
        $query = $this->createQuery();
152
153
        $constraints = [];
154
155
        if ($settings['documentSets']) {
156
            $constraints[] = $query->in('uid', GeneralUtility::intExplode(',', $settings['documentSets']));
157
        }
158
159
        if (isset($settings['excludeOther']) && (int) $settings['excludeOther'] === 0) {
160
            $query->getQuerySettings()->setRespectStoragePage(false);
161
        }
162
163
        if (count($constraints)) {
164
            $query->matching(
165
                $query->logicalAnd($constraints)
166
            );
167
        }
168
169
        return $query->execute();
170
    }
171
172
    /**
173
     * Finds all documents for the given collections
174
     *
175
     * @param array $collections
176
     * @param int $limit
177
     *
178
     * @return array|\TYPO3\CMS\Extbase\Persistence\QueryResultInterface
179
     */
180
    public function findAllByCollectionsLimited($collections, $limit = 50)
181
    {
182
        $query = $this->createQuery();
183
184
        // order by start_date -> start_time...
185
        $query->setOrderings(
186
            ['tstamp' => QueryInterface::ORDER_DESCENDING]
187
        );
188
189
        $constraints = [];
190
        if ($collections) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $collections of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
191
            $constraints[] = $query->in('collections.uid', $collections);
192
        }
193
194
        if (count($constraints)) {
195
            $query->matching(
196
                $query->logicalAnd($constraints)
197
            );
198
        }
199
200
        if ($limit > 0) {
201
            $query->setLimit((int) $limit);
202
        }
203
204
        return $query->execute();
205
    }
206
207
    /**
208
     * Count the titles and volumes for statistics
209
     *
210
     * Volumes are documents that are both
211
     *  a) "leaf" elements i.e. partof != 0
212
     *  b) "root" elements that are not referenced by other documents ("root" elements that have no descendants)
213
214
     * @param array $settings
215
     *
216
     * @return array
217
     */
218
    public function getStatisticsForSelectedCollection($settings)
219
    {
220
        if ($settings['collections']) {
221
            // Include only selected collections.
222
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
223
            ->getQueryBuilderForTable('tx_dlf_documents');
224
225
            $countTitles = $queryBuilder
226
                ->count('tx_dlf_documents.uid')
227
                ->from('tx_dlf_documents')
228
                ->innerJoin(
229
                    'tx_dlf_documents',
230
                    'tx_dlf_relations',
231
                    'tx_dlf_relations_joins',
232
                    $queryBuilder->expr()->eq(
233
                        'tx_dlf_relations_joins.uid_local',
234
                        'tx_dlf_documents.uid'
235
                    )
236
                )
237
                ->innerJoin(
238
                    'tx_dlf_relations_joins',
239
                    'tx_dlf_collections',
240
                    'tx_dlf_collections_join',
241
                    $queryBuilder->expr()->eq(
242
                        'tx_dlf_relations_joins.uid_foreign',
243
                        'tx_dlf_collections_join.uid'
244
                    )
245
                )
246
                ->where(
247
                    $queryBuilder->expr()->eq('tx_dlf_documents.pid', intval($settings['storagePid'])),
248
                    $queryBuilder->expr()->eq('tx_dlf_collections_join.pid', intval($settings['storagePid'])),
249
                    $queryBuilder->expr()->eq('tx_dlf_documents.partof', 0),
250
                    $queryBuilder->expr()->in('tx_dlf_collections_join.uid', $queryBuilder->createNamedParameter(GeneralUtility::intExplode(',', $settings['collections']), Connection::PARAM_INT_ARRAY)),
251
                    $queryBuilder->expr()->eq('tx_dlf_relations_joins.ident', $queryBuilder->createNamedParameter('docs_colls'))
252
                )
253
                ->execute()
254
                ->fetchColumn(0);
255
256
                $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
257
                    ->getQueryBuilderForTable('tx_dlf_documents');
258
                $subQueryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
259
                    ->getQueryBuilderForTable('tx_dlf_documents');
260
261
                $subQuery = $subQueryBuilder
262
                    ->select('tx_dlf_documents.partof')
263
                    ->from('tx_dlf_documents')
264
                    ->where(
265
                        $subQueryBuilder->expr()->neq('tx_dlf_documents.partof', 0)
266
                    )
267
                    ->groupBy('tx_dlf_documents.partof')
268
                    ->getSQL();
269
270
                $countVolumes = $queryBuilder
271
                    ->count('tx_dlf_documents.uid')
272
                    ->from('tx_dlf_documents')
273
                    ->innerJoin(
274
                        'tx_dlf_documents',
275
                        'tx_dlf_relations',
276
                        'tx_dlf_relations_joins',
277
                        $queryBuilder->expr()->eq(
278
                            'tx_dlf_relations_joins.uid_local',
279
                            'tx_dlf_documents.uid'
280
                        )
281
                    )
282
                    ->innerJoin(
283
                        'tx_dlf_relations_joins',
284
                        'tx_dlf_collections',
285
                        'tx_dlf_collections_join',
286
                        $queryBuilder->expr()->eq(
287
                            'tx_dlf_relations_joins.uid_foreign',
288
                            'tx_dlf_collections_join.uid'
289
                        )
290
                    )
291
                    ->where(
292
                        $queryBuilder->expr()->eq('tx_dlf_documents.pid', intval($settings['storagePid'])),
293
                        $queryBuilder->expr()->eq('tx_dlf_collections_join.pid', intval($settings['storagePid'])),
294
                        $queryBuilder->expr()->notIn('tx_dlf_documents.uid', $subQuery),
295
                        $queryBuilder->expr()->in('tx_dlf_collections_join.uid', $queryBuilder->createNamedParameter(GeneralUtility::intExplode(',', $settings['collections']), Connection::PARAM_INT_ARRAY)),
296
                        $queryBuilder->expr()->eq('tx_dlf_relations_joins.ident', $queryBuilder->createNamedParameter('docs_colls'))
297
                    )
298
                    ->execute()
299
                    ->fetchColumn(0);
300
        } else {
301
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
302
                ->getQueryBuilderForTable('tx_dlf_documents');
303
304
            // Include all collections.
305
            $countTitles = $queryBuilder
306
                ->count('tx_dlf_documents.uid')
307
                ->from('tx_dlf_documents')
308
                ->where(
309
                    $queryBuilder->expr()->eq('tx_dlf_documents.pid', intval($settings['storagePid'])),
310
                    $queryBuilder->expr()->eq('tx_dlf_documents.partof', 0),
311
                    Helper::whereExpression('tx_dlf_documents')
312
                )
313
                ->execute()
314
                ->fetchColumn(0);
315
316
            $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
317
                ->getQueryBuilderForTable('tx_dlf_documents');
318
            $subQueryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
319
                ->getQueryBuilderForTable('tx_dlf_documents');
320
321
            $subQuery = $subQueryBuilder
322
                ->select('tx_dlf_documents.partof')
323
                ->from('tx_dlf_documents')
324
                ->where(
325
                    $subQueryBuilder->expr()->neq('tx_dlf_documents.partof', 0)
326
                )
327
                ->groupBy('tx_dlf_documents.partof')
328
                ->getSQL();
329
330
            $countVolumes = $queryBuilder
331
                ->count('tx_dlf_documents.uid')
332
                ->from('tx_dlf_documents')
333
                ->where(
334
                    $queryBuilder->expr()->eq('tx_dlf_documents.pid', intval($settings['storagePid'])),
335
                    $queryBuilder->expr()->notIn('tx_dlf_documents.uid', $subQuery)
336
                )
337
                ->execute()
338
                ->fetchColumn(0);
339
        }
340
341
        return ['titles' => $countTitles, 'volumes' => $countVolumes];
342
    }
343
344
    /**
345
     * Build table of contents
346
     *
347
     * @param int $uid
348
     * @param int $pid
349
     * @param array $settings
350
     *
351
     * @return \TYPO3\CMS\Extbase\Persistence\QueryResultInterface
352
     */
353
    public function getTableOfContentsFromDb($uid, $pid, $settings)
354
    {
355
        // Build table of contents from database.
356
        $queryBuilder = GeneralUtility::makeInstance(ConnectionPool::class)
357
            ->getQueryBuilderForTable('tx_dlf_documents');
358
359
        $excludeOtherWhere = '';
360
        if ($settings['excludeOther']) {
361
            $excludeOtherWhere = 'tx_dlf_documents.pid=' . intval($settings['storagePid']);
362
        }
363
        // Check if there are any metadata to suggest.
364
        $result = $queryBuilder
365
            ->select(
366
                'tx_dlf_documents.uid AS uid',
367
                'tx_dlf_documents.title AS title',
368
                'tx_dlf_documents.volume AS volume',
369
                'tx_dlf_documents.mets_label AS mets_label',
370
                'tx_dlf_documents.mets_orderlabel AS mets_orderlabel',
371
                'tx_dlf_structures_join.index_name AS type'
372
            )
373
            ->innerJoin(
374
                'tx_dlf_documents',
375
                'tx_dlf_structures',
376
                'tx_dlf_structures_join',
377
                $queryBuilder->expr()->eq(
378
                    'tx_dlf_structures_join.uid',
379
                    'tx_dlf_documents.structure'
380
                )
381
            )
382
            ->from('tx_dlf_documents')
383
            ->where(
384
                $queryBuilder->expr()->eq('tx_dlf_documents.partof', intval($uid)),
385
                $queryBuilder->expr()->eq('tx_dlf_structures_join.pid', intval($pid)),
386
                $excludeOtherWhere
387
            )
388
            ->addOrderBy('tx_dlf_documents.volume_sorting')
389
            ->addOrderBy('tx_dlf_documents.mets_orderlabel')
390
            ->execute();
391
        return $result;
392
    }
393
394
    /**
395
     * Find one document by given settings and identifier
396
     *
397
     * @param array $settings
398
     * @param array $parameters
399
     *
400
     * @return array The found document object
401
     */
402
    public function getOaiRecord($settings, $parameters)
403
    {
404
        $where = '';
405
406
        if (!$settings['show_userdefined']) {
407
            $where .= 'AND tx_dlf_collections.fe_cruser_id=0 ';
408
        }
409
410
        $connection = GeneralUtility::makeInstance(ConnectionPool::class)
411
            ->getConnectionForTable('tx_dlf_documents');
412
413
        $sql = 'SELECT `tx_dlf_documents`.*, GROUP_CONCAT(DISTINCT `tx_dlf_collections`.`oai_name` ORDER BY `tx_dlf_collections`.`oai_name` SEPARATOR " ") AS `collections` ' .
414
            'FROM `tx_dlf_documents` ' .
415
            'INNER JOIN `tx_dlf_relations` ON `tx_dlf_relations`.`uid_local` = `tx_dlf_documents`.`uid` ' .
416
            'INNER JOIN `tx_dlf_collections` ON `tx_dlf_collections`.`uid` = `tx_dlf_relations`.`uid_foreign` ' .
417
            'WHERE `tx_dlf_documents`.`record_id` = ? ' .
418
            'AND `tx_dlf_relations`.`ident`="docs_colls" ' .
419
            $where;
420
421
        $values = [
422
            $parameters['identifier']
423
        ];
424
425
        $types = [
426
            Connection::PARAM_STR
427
        ];
428
429
        // Create a prepared statement for the passed SQL query, bind the given params with their binding types and execute the query
430
        $statement = $connection->executeQuery($sql, $values, $types);
431
432
        return $statement->fetch();
433
    }
434
435
    /**
436
     * Finds all documents for the given settings
437
     *
438
     * @param array $settings
439
     * @param array $documentsToProcess
440
     *
441
     * @return array The found document objects
442
     */
443
    public function getOaiDocumentList($settings, $documentsToProcess)
0 ignored issues
show
Unused Code introduced by
The parameter $settings is not used and could be removed. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unused  annotation

443
    public function getOaiDocumentList(/** @scrutinizer ignore-unused */ $settings, $documentsToProcess)

This check looks for parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
444
    {
445
        $connection = GeneralUtility::makeInstance(ConnectionPool::class)
446
            ->getConnectionForTable('tx_dlf_documents');
447
448
        $sql = 'SELECT `tx_dlf_documents`.*, GROUP_CONCAT(DISTINCT `tx_dlf_collections`.`oai_name` ORDER BY `tx_dlf_collections`.`oai_name` SEPARATOR " ") AS `collections` ' .
449
            'FROM `tx_dlf_documents` ' .
450
            'INNER JOIN `tx_dlf_relations` ON `tx_dlf_relations`.`uid_local` = `tx_dlf_documents`.`uid` ' .
451
            'INNER JOIN `tx_dlf_collections` ON `tx_dlf_collections`.`uid` = `tx_dlf_relations`.`uid_foreign` ' .
452
            'WHERE `tx_dlf_documents`.`uid` IN ( ? ) ' .
453
            'AND `tx_dlf_relations`.`ident`="docs_colls" ' .
454
            'AND ' . Helper::whereExpression('tx_dlf_collections') . ' ' .
455
            'GROUP BY `tx_dlf_documents`.`uid` ';
456
457
        $values = [
458
            $documentsToProcess,
459
        ];
460
461
        $types = [
462
            Connection::PARAM_INT_ARRAY,
463
        ];
464
465
        // Create a prepared statement for the passed SQL query, bind the given params with their binding types and execute the query
466
        $documents = $connection->executeQuery($sql, $values, $types);
467
468
        return $documents;
469
    }
470
471
    /**
472
     * Finds all documents with given uids
473
     *
474
     * @param array $uids
475
     *
476
     * @return array
477
     */
478
    private function findAllByUids($uids)
479
    {
480
        // get all documents from db we are talking about
481
        $connectionPool = GeneralUtility::makeInstance(ConnectionPool::class);
482
        $queryBuilder = $connectionPool->getQueryBuilderForTable('tx_dlf_documents');
483
        // Fetch document info for UIDs in $documentSet from DB
484
        $kitodoDocuments = $queryBuilder
485
            ->select(
486
                'tx_dlf_documents.uid AS uid',
487
                'tx_dlf_documents.title AS title',
488
                'tx_dlf_documents.structure AS structure',
489
                'tx_dlf_documents.thumbnail AS thumbnail',
490
                'tx_dlf_documents.volume_sorting AS volumeSorting',
491
                'tx_dlf_documents.mets_orderlabel AS metsOrderlabel',
492
                'tx_dlf_documents.partof AS partOf'
493
            )
494
            ->from('tx_dlf_documents')
495
            ->where(
496
                $queryBuilder->expr()->in('tx_dlf_documents.pid', $this->settings['storagePid']),
497
                $queryBuilder->expr()->in('tx_dlf_documents.uid', $uids)
498
            )
499
            ->addOrderBy('tx_dlf_documents.volume_sorting', 'asc')
500
            ->addOrderBy('tx_dlf_documents.mets_orderlabel', 'asc')
501
            ->execute();
502
503
        $allDocuments = [];
504
        $documentStructures = Helper::getDocumentStructures($this->settings['storagePid']);
505
        // Process documents in a usable array structure
506
        while ($resArray = $kitodoDocuments->fetch()) {
507
            $resArray['structure'] = $documentStructures[$resArray['structure']];
508
            $allDocuments[$resArray['uid']] = $resArray;
509
        }
510
511
        return $allDocuments;
512
    }
513
514
    /**
515
     * Find all documents with given collection from Solr
516
     *
517
     * @param \Kitodo\Dlf\Domain\Model\Collection $collection
518
     * @param array $settings
519
     * @param array $searchParams
520
     * @param \TYPO3\CMS\Extbase\Persistence\Generic\QueryResult $listedMetadata
521
     * @return array
522
     */
523
    public function findSolrByCollection($collection, $settings, $searchParams, $listedMetadata = null)
524
    {
525
        // set settings global inside this repository
526
        $this->settings = $settings;
527
528
        // Prepare query parameters.
529
        $params = [];
530
        $matches = [];
531
        $fields = Solr::getFields();
532
533
        // Set search query.
534
        if (
535
            (!empty($searchParams['fulltext']))
536
            || preg_match('/' . $fields['fulltext'] . ':\((.*)\)/', trim($searchParams['query']), $matches)
537
        ) {
538
            // If the query already is a fulltext query e.g using the facets
539
            $searchParams['query'] = empty($matches[1]) ? $searchParams['query'] : $matches[1];
540
            // Search in fulltext field if applicable. Query must not be empty!
541
            if (!empty($searchParams['query'])) {
542
                $query = $fields['fulltext'] . ':(' . Solr::escapeQuery(trim($searchParams['query'])) . ')';
543
            }
544
            $params['fulltext'] = true;
545
        } else {
546
            // Retain given search field if valid.
547
            if (!empty($searchParams['query'])) {
548
                $query = Solr::escapeQueryKeepField(trim($searchParams['query']), $this->settings['storagePid']);
549
            }
550
        }
551
552
        // Add extended search query.
553
        if (
554
            !empty($searchParams['extQuery'])
555
            && is_array($searchParams['extQuery'])
556
        ) {
557
            $allowedOperators = ['AND', 'OR', 'NOT'];
558
            $numberOfExtQueries = count($searchParams['extQuery']);
559
            for ($i = 0; $i < $numberOfExtQueries; $i++) {
560
                if (!empty($searchParams['extQuery'][$i])) {
561
                    if (
562
                        in_array($searchParams['extOperator'][$i], $allowedOperators)
563
                    ) {
564
                        if (!empty($query)) {
565
                            $query .= ' ' . $searchParams['extOperator'][$i] . ' ';
566
                        }
567
                        $query .= Indexer::getIndexFieldName($searchParams['extField'][$i], $this->settings['storagePid']) . ':(' . Solr::escapeQuery($searchParams['extQuery'][$i]) . ')';
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $query does not seem to be defined for all execution paths leading up to this point.
Loading history...
568
                    }
569
                }
570
            }
571
        }
572
573
            // Add filter query for faceting.
574
        if (isset($searchParams['fq']) && is_array($searchParams['fq'])) {
575
            foreach ($searchParams['fq'] as $filterQuery) {
576
                $params['filterquery'][]['query'] = $filterQuery;
577
            }
578
        }
579
580
        // Add filter query for in-document searching.
581
        if (
582
            !empty($searchParams['documentId'])
583
            && \TYPO3\CMS\Core\Utility\MathUtility::canBeInterpretedAsInteger($searchParams['documentId'])
584
        ) {
585
            // Search in document and all subordinates (valid for up to three levels of hierarchy).
586
            $params['filterquery'][]['query'] = '_query_:"{!join from='
587
                . $fields['uid'] . ' to=' . $fields['partof'] . '}'
588
                . $fields['uid'] . ':{!join from=' . $fields['uid'] . ' to=' . $fields['partof'] . '}'
589
                . $fields['uid'] . ':' . $searchParams['documentId'] . '"' . ' OR {!join from='
590
                . $fields['uid'] . ' to=' . $fields['partof'] . '}'
591
                . $fields['uid'] . ':' . $searchParams['documentId'] . ' OR '
592
                . $fields['uid'] . ':' . $searchParams['documentId'];
593
        }
594
595
        // if a collection is given, we prepare the collection query string
596
        if ($collection) {
0 ignored issues
show
introduced by
$collection is of type Kitodo\Dlf\Domain\Model\Collection, thus it always evaluated to true.
Loading history...
597
            $collecionsQueryString = $collection->getIndexName();
598
            $params['filterquery'][]['query'] = 'toplevel:true';
599
            $params['filterquery'][]['query'] = 'partof:0';
600
            $params['filterquery'][]['query'] = 'collection_faceting:("' . $collecionsQueryString . '")';
601
        }
602
603
        // Set some query parameters.
604
        $params['query'] = !empty($query) ? $query : '*';
605
        $params['start'] = 0;
606
        $params['rows'] = 10000;
607
608
        // order the results as given or by title as default
609
        if (!empty($searchParams['orderBy'])) {
610
            $querySort = [
611
                $searchParams['orderBy'] => $searchParams['order']
612
            ];
613
        } else {
614
            $querySort = [
615
                'year_sorting' => 'asc',
616
                'title_sorting' => 'asc'
617
            ];
618
        }
619
620
        $params['sort'] = $querySort;
621
        $params['listMetadataRecords'] = [];
622
623
        // Restrict the fields to the required ones.
624
        $params['fields'] = 'uid,id,page,title,thumbnail,partof,toplevel,type';
625
626
        if ($listedMetadata) {
627
            foreach ($listedMetadata as $metadata) {
628
                if ($metadata->getIndexStored() || $metadata->getIndexIndexed()) {
629
                    $listMetadataRecord = $metadata->getIndexName() . '_' . ($metadata->getIndexTokenized() ? 't' : 'u') . ($metadata->getIndexStored() ? 's' : 'u') . ($metadata->getIndexIndexed() ? 'i' : 'u');
630
                    $params['fields'] .= ',' . $listMetadataRecord;
631
                    $params['listMetadataRecords'][$metadata->getIndexName()] = $listMetadataRecord;
632
                }
633
            }
634
        }
635
636
        // Perform search.
637
        $result = $this->searchSolr($params, true);
638
639
        // Initialize values
640
        $numberOfToplevels = 0;
641
        $documents = [];
642
643
        if ($result['numFound'] > 0) {
644
            // flat array with uids from Solr search
645
            $documentSet = array_unique(array_column($result['documents'], 'uid'));
646
647
            if (empty($documentSet)) {
648
                // return nothing found
649
                return ['solrResults' => [], 'documents' => []];
650
            }
651
652
            // get the Extbase document objects for all uids
653
            $allDocuments = $this->findAllByUids($documentSet);
654
655
            foreach ($result['documents'] as $doc) {
656
                if (empty($documents[$doc['uid']]) && $allDocuments[$doc['uid']]) {
657
                    $documents[$doc['uid']] = $allDocuments[$doc['uid']];
658
                }
659
                if ($documents[$doc['uid']]) {
660
                    if ($doc['toplevel'] === false) {
661
                        // this maybe a chapter, article, ..., year
662
                        if ($doc['type'] === 'year') {
663
                            continue;
664
                        }
665
                        if (!empty($doc['page'])) {
666
                            // it's probably a fulltext or metadata search
667
                            $searchResult = [];
668
                            $searchResult['page'] = $doc['page'];
669
                            $searchResult['thumbnail'] = $doc['thumbnail'];
670
                            $searchResult['structure'] = $doc['type'];
671
                            $searchResult['title'] = $doc['title'];
672
                            foreach ($params['listMetadataRecords'] as $indexName => $solrField) {
673
                                if (isset($doc['metadata'][$indexName])) {
674
                                    $documents[$doc['uid']]['metadata'][$indexName] = $doc['metadata'][$indexName];
675
                                    $searchResult['metadata'][$indexName] = $doc['metadata'][$indexName];
676
                                }
677
                            }
678
                            if ($searchParams['fulltext'] == '1') {
679
                                $searchResult['snippet'] = $doc['snippet'];
680
                                $searchResult['highlight'] = $doc['highlight'];
681
                                $searchResult['highlight_word'] = $searchParams['query'];
682
                            }
683
                            $documents[$doc['uid']]['searchResults'][] = $searchResult;
684
                        }
685
                    } else if ($doc['toplevel'] === true) {
686
                        $numberOfToplevels++;
687
                        foreach ($params['listMetadataRecords'] as $indexName => $solrField) {
688
                            if (isset($doc['metadata'][$indexName])) {
689
                                $documents[$doc['uid']]['metadata'][$indexName] = $doc['metadata'][$indexName];
690
                            }
691
                        }
692
                        if ($searchParams['fulltext'] != '1') {
693
                            $documents[$doc['uid']]['page'] = 1;
694
                            $children = $this->findByPartof($doc['uid']);
0 ignored issues
show
Bug introduced by
The method findByPartof() does not exist on Kitodo\Dlf\Domain\Repository\DocumentRepository. Since you implemented __call, consider adding a @method annotation. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

694
                            /** @scrutinizer ignore-call */ 
695
                            $children = $this->findByPartof($doc['uid']);
Loading history...
695
                            foreach ($children as $docChild) {
696
                                // We need only a few fields from the children, but we need them as array.
697
                                $childDocument = [
698
                                    'thumbnail' => $docChild->getThumbnail(),
699
                                    'title' => $docChild->getTitle(),
700
                                    'structure' => Helper::getIndexNameFromUid($docChild->getStructure(), 'tx_dlf_structures'),
701
                                    'metsOrderlabel' => $docChild->getMetsOrderlabel(),
702
                                    'uid' => $docChild->getUid(),
703
                                    'metadata' => $this->fetchMetadataFromSolr($docChild->getUid(), $listedMetadata)
704
                                ];
705
                                $documents[$doc['uid']]['children'][$docChild->getUid()] = $childDocument;
706
                            }
707
                        }
708
                    }
709
                    if (empty($documents[$doc['uid']]['metadata'])) {
710
                        $documents[$doc['uid']]['metadata'] = $this->fetchMetadataFromSolr($doc['uid'], $listedMetadata);
711
                    }
712
                    // get title of parent if empty
713
                    if (empty($documents[$doc['uid']]['title']) && ($documents[$doc['uid']]['partOf'] > 0)) {
714
                        $parentDocument = $this->findByUid($documents[$doc['uid']]['partOf']);
715
                        if ($parentDocument) {
716
                            $documents[$doc['uid']]['title'] = '[' . $parentDocument->getTitle() . ']';
717
                        }
718
                    }
719
                }
720
            }
721
        }
722
723
        return ['solrResults' => $result, 'numberOfToplevels' => $numberOfToplevels, 'documents' => $documents];
724
    }
725
726
    /**
727
     * Find all listed metadata for given document
728
     *
729
     * @param int $uid the uid of the document
730
     * @param \TYPO3\CMS\Extbase\Persistence\Generic\QueryResult $listedMetadata
731
     * @return array
732
     */
733
    protected function fetchMetadataFromSolr($uid, $listedMetadata = [])
734
    {
735
        // Prepare query parameters.
736
        $params = [];
737
        $metadataArray = [];
738
739
        // Set some query parameters.
740
        $params['query'] = 'uid:' . $uid;
741
        $params['start'] = 0;
742
        $params['rows'] = 1;
743
        $params['sort'] = ['score' => 'desc'];
744
        $params['listMetadataRecords'] = [];
745
746
        // Restrict the fields to the required ones.
747
        $params['fields'] = 'uid,toplevel';
748
749
        if ($listedMetadata) {
750
            foreach ($listedMetadata as $metadata) {
751
                if ($metadata->getIndexStored() || $metadata->getIndexIndexed()) {
752
                    $listMetadataRecord = $metadata->getIndexName() . '_' . ($metadata->getIndexTokenized() ? 't' : 'u') . ($metadata->getIndexStored() ? 's' : 'u') . ($metadata->getIndexIndexed() ? 'i' : 'u');
753
                    $params['fields'] .= ',' . $listMetadataRecord;
754
                    $params['listMetadataRecords'][$metadata->getIndexName()] = $listMetadataRecord;
755
                }
756
            }
757
        }
758
        // Set filter query to just get toplevel documents.
759
        $params['filterquery'][] = ['query' => 'toplevel:true'];
760
761
        // Perform search.
762
        $result = $this->searchSolr($params, true);
763
764
        if ($result['numFound'] > 0) {
765
            // There is only one result found because of toplevel:true.
766
            if (isset($result['documents'][0]['metadata'])) {
767
                $metadataArray = $result['documents'][0]['metadata'];
768
            }
769
        }
770
        return $metadataArray;
771
    }
772
773
    /**
774
     * Processes a search request
775
     *
776
     * @access public
777
     *
778
     * @param array $parameters: Additional search parameters
779
     * @param boolean $enableCache: Enable caching of Solr requests
780
     *
781
     * @return array The Apache Solr Documents that were fetched
782
     */
783
    protected function searchSolr($parameters = [], $enableCache = true)
784
    {
785
        // Set additional query parameters.
786
        $parameters['start'] = 0;
787
        // Set query.
788
        $parameters['query'] = isset($parameters['query']) ? $parameters['query'] : '*';
789
        $parameters['filterquery'] = isset($parameters['filterquery']) ? $parameters['filterquery'] : [];
790
791
        // Perform Solr query.
792
        // Instantiate search object.
793
        $solr = Solr::getInstance($this->settings['solrcore']);
794
        if (!$solr->ready) {
795
            Helper::log('Apache Solr not available', LOG_SEVERITY_ERROR);
796
            return [];
797
        }
798
799
        $cacheIdentifier = '';
800
        $cache = null;
801
        // Calculate cache identifier.
802
        if ($enableCache === true) {
803
            $cacheIdentifier = Helper::digest($solr->core . print_r($parameters, true));
0 ignored issues
show
Bug introduced by
Are you sure print_r($parameters, true) of type string|true can be used in concatenation? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

803
            $cacheIdentifier = Helper::digest($solr->core . /** @scrutinizer ignore-type */ print_r($parameters, true));
Loading history...
804
            $cache = GeneralUtility::makeInstance(CacheManager::class)->getCache('tx_dlf_solr');
805
        }
806
        $resultSet = [
807
            'documents' => [],
808
            'numFound' => 0,
809
        ];
810
        if ($enableCache === false || ($entry = $cache->get($cacheIdentifier)) === false) {
811
            $selectQuery = $solr->service->createSelect($parameters);
812
813
            if ($parameters['fulltext'] === true) {
814
                // get highlighting component and apply settings
815
                $selectQuery->getHighlighting();
816
            }
817
818
            $solrRequest = $solr->service->createRequest($selectQuery);
819
820
            if ($parameters['fulltext'] === true) {
821
                // If it is a fulltext search, enable highlighting.
822
                // field for which highlighting is going to be performed,
823
                // is required if you want to have OCR highlighting
824
                $solrRequest->addParam('hl.ocr.fl', 'fulltext');
825
                // return the coordinates of highlighted search as absolute coordinates
826
                $solrRequest->addParam('hl.ocr.absoluteHighlights', 'on');
827
                // max amount of snippets for a single page
828
                $solrRequest->addParam('hl.snippets', 20);
829
                // we store the fulltext on page level and can disable this option
830
                $solrRequest->addParam('hl.ocr.trackPages', 'off');
831
            }
832
833
            // Perform search for all documents with the same uid that either fit to the search or marked as toplevel.
834
            $response = $solr->service->executeRequest($solrRequest);
835
            $result = $solr->service->createResult($selectQuery, $response);
836
837
            /** @scrutinizer ignore-call */
838
            $resultSet['numFound'] = $result->getNumFound();
839
            $highlighting = [];
840
            if ($parameters['fulltext'] === true) {
841
                $data = $result->getData();
842
                $highlighting = $data['ocrHighlighting'];
843
            }
844
            $fields = Solr::getFields();
845
846
            foreach ($result as $record) {
847
                $resultDocument = new ResultDocument($record, $highlighting, $fields);
848
849
                $document = [
850
                    'id' => $resultDocument->getId(),
851
                    'page' => $resultDocument->getPage(),
852
                    'snippet' => $resultDocument->getSnippets(),
853
                    'thumbnail' => $resultDocument->getThumbnail(),
854
                    'title' => $resultDocument->getTitle(),
855
                    'toplevel' => $resultDocument->getToplevel(),
856
                    'type' => $resultDocument->getType(),
857
                    'uid' => !empty($resultDocument->getUid()) ? $resultDocument->getUid() : $parameters['uid'],
858
                    'highlight' => $resultDocument->getHighlightsIds(),
859
                ];
860
                foreach ($parameters['listMetadataRecords'] as $indexName => $solrField) {
861
                    if (!empty($record->$solrField)) {
862
                        $document['metadata'][$indexName] = $record->$solrField;
863
                    }
864
                }
865
                $resultSet['documents'][] = $document;
866
            }
867
868
            // Save value in cache.
869
            if (!empty($resultSet) && $enableCache === true) {
870
                $cache->set($cacheIdentifier, $resultSet);
871
            }
872
        } else {
873
            // Return cache hit.
874
            $resultSet = $entry;
875
        }
876
        return $resultSet;
877
    }
878
879
}
880