Completed
Pull Request — master (#100)
by Robin
02:31
created

SearchMappingService   C

Complexity

Total Complexity 57

Size/Duplication

Total Lines 477
Duplicated Lines 11.32 %

Coupling/Cohesion

Components 1
Dependencies 5

Importance

Changes 0
Metric Value
wmc 57
lcom 1
cbo 5
dl 54
loc 477
rs 5.04
c 0
b 0
f 0

19 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 5 1
A generateSearchQuery() 0 7 1
A generateSearchQueryParams() 0 35 2
A improveSearchQuerying() 0 5 1
A improveSearchWildcardFilters() 13 13 3
A improveSearchRegexFilters() 13 13 3
A generateSearchQueryContent() 0 19 4
A generateQueryContent() 0 9 2
A generateSearchQueryFromQueryContent() 0 17 3
B generateQueryContentFields() 0 35 8
A generateSearchQueryAccess() 0 22 4
A getExternalFileShares() 0 6 2
A getExternalFilesConditions() 0 26 5
A fieldIsOutLimit() 0 12 3
A generateSearchQueryTags() 0 9 2
B generateSearchSimpleQuery() 28 44 9
A generateSearchHighlighting() 0 14 2
A getDocumentQuery() 0 7 1
A getPartsFields() 0 7 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like SearchMappingService often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use SearchMappingService, and based on these observations, apply Extract Interface, too.

1
<?php
2
declare(strict_types=1);
3
4
5
/**
6
 * FullTextSearch_ElasticSearch - Use Elasticsearch to index the content of your nextcloud
7
 *
8
 * This file is licensed under the Affero General Public License version 3 or
9
 * later. See the COPYING file.
10
 *
11
 * @author Maxence Lange <[email protected]>
12
 * @copyright 2018
13
 * @license GNU AGPL version 3 or any later version
14
 *
15
 * This program is free software: you can redistribute it and/or modify
16
 * it under the terms of the GNU Affero General Public License as
17
 * published by the Free Software Foundation, either version 3 of the
18
 * License, or (at your option) any later version.
19
 *
20
 * This program is distributed in the hope that it will be useful,
21
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
22
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
23
 * GNU Affero General Public License for more details.
24
 *
25
 * You should have received a copy of the GNU Affero General Public License
26
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
27
 *
28
 */
29
30
31
namespace OCA\FullTextSearch_ElasticSearch\Service;
32
33
34
use OCA\FullTextSearch_ElasticSearch\Exceptions\ConfigurationException;
35
use OCA\FullTextSearch_ElasticSearch\Exceptions\QueryContentGenerationException;
36
use OCA\FullTextSearch_ElasticSearch\Exceptions\SearchQueryGenerationException;
37
use OCA\FullTextSearch_ElasticSearch\Model\QueryContent;
38
use OCP\FullTextSearch\Model\IDocumentAccess;
39
use OCP\FullTextSearch\Model\ISearchRequest;
40
use OCP\FullTextSearch\Model\ISearchRequestSimpleQuery;
41
use stdClass;
42
43
/**
44
 * Class SearchMappingService
45
 *
46
 * @package OCA\FullTextSearch_ElasticSearch\Service
47
 */
48
class SearchMappingService {
49
50
	/** @var ConfigService */
51
	private $configService;
52
53
	/** @var MiscService */
54
	private $miscService;
55
56
	/** @var IUserStoragesService */
57
	private $userStoragesService;
58
59
	/**
60
	 * SearchMappingService constructor.
61
	 *
62
	 * @param ConfigService $configService
63
	 * @param MiscService $miscService
64
	 * @param null|IUserStoragesService $userStoragesService
65
	 */
66
	public function __construct(ConfigService $configService, MiscService $miscService, IUserStoragesService $userStoragesService = null) {
67
		$this->configService = $configService;
68
		$this->miscService = $miscService;
69
		$this->userStoragesService = $userStoragesService;
70
	}
71
72
73
	/**
74
	 * @param ISearchRequest $request
75
	 * @param IDocumentAccess $access
76
	 * @param string $providerId
77
	 *
78
	 * @return array
79
	 * @throws ConfigurationException
80
	 * @throws SearchQueryGenerationException
81
	 */
82
	public function generateSearchQuery(
83
		ISearchRequest $request, IDocumentAccess $access, string $providerId
84
	): array {
85
		$query['params'] = $this->generateSearchQueryParams($request, $access, $providerId);
0 ignored issues
show
Coding Style Comprehensibility introduced by
$query was never initialized. Although not strictly required by PHP, it is generally a good practice to add $query = array(); before regardless.

Adding an explicit array definition is generally preferable to implicit array definition as it guarantees a stable state of the code.

Let’s take a look at an example:

foreach ($collection as $item) {
    $myArray['foo'] = $item->getFoo();

    if ($item->hasBar()) {
        $myArray['bar'] = $item->getBar();
    }

    // do something with $myArray
}

As you can see in this example, the array $myArray is initialized the first time when the foreach loop is entered. You can also see that the value of the bar key is only written conditionally; thus, its value might result from a previous iteration.

This might or might not be intended. To make your intention clear, your code more readible and to avoid accidental bugs, we recommend to add an explicit initialization $myArray = array() either outside or inside the foreach loop.

Loading history...
86
87
		return $query;
88
	}
89
90
91
	/**
92
	 * @param ISearchRequest $request
93
	 * @param IDocumentAccess $access
94
	 * @param string $providerId
95
	 *
96
	 * @return array
97
	 * @throws ConfigurationException
98
	 * @throws SearchQueryGenerationException
99
	 */
100
	public function generateSearchQueryParams(
101
		ISearchRequest $request, IDocumentAccess $access, string $providerId
102
	): array {
103
		$params = [
104
			'index' => $this->configService->getElasticIndex(),
105
			'type'  => 'standard',
106
			'size'  => $request->getSize(),
107
			'from'  => (($request->getPage() - 1) * $request->getSize())
108
		];
109
110
		$bool = [];
111
		if ($request->getSearch() !== '') {
112
			$bool['must']['bool']['should'] = $this->generateSearchQueryContent($request);
113
		}
114
115
		$bool['filter'][]['bool']['must'] = ['term' => ['provider' => $providerId]];
116
		$bool['filter'][]['bool']['should'] = $this->generateSearchQueryAccess($access);
117
		$bool['filter'][]['bool']['should'] =
118
			$this->generateSearchQueryTags('metatags', $request->getMetaTags());
119
120
		$bool['filter'][]['bool']['must'] =
121
			$this->generateSearchQueryTags('subtags', $request->getSubTags(true));
122
123
		$bool['filter'][]['bool']['must'] =
124
			$this->generateSearchSimpleQuery($request->getSimpleQueries());
125
126
//		$bool['filter'][]['bool']['should'] = $this->generateSearchQueryTags($request->getTags());
127
128
		$params['body']['query']['bool'] = $bool;
129
		$params['body']['highlight'] = $this->generateSearchHighlighting($request);
130
131
		$this->improveSearchQuerying($request, $params['body']['query']);
132
133
		return $params;
134
	}
135
136
137
	/**
138
	 * @param ISearchRequest $request
139
	 * @param array $arr
140
	 */
141
	private function improveSearchQuerying(ISearchRequest $request, array &$arr) {
142
//		$this->improveSearchWildcardQueries($request, $arr);
143
		$this->improveSearchWildcardFilters($request, $arr);
144
		$this->improveSearchRegexFilters($request, $arr);
145
	}
146
147
148
//	/**
149
//	 * @param SearchRequest $request
150
//	 * @param array $arr
151
//	 */
152
//	private function improveSearchWildcardQueries(SearchRequest $request, &$arr) {
153
//
154
//		$queries = $request->getWildcardQueries();
155
//		foreach ($queries as $query) {
156
//			$wildcards = [];
157
//			foreach ($query as $entry) {
158
//				$wildcards[] = ['wildcard' => $entry];
159
//			}
160
//
161
//			array_push($arr['bool']['must']['bool']['should'], $wildcards);
162
//		}
163
//
164
//	}
165
166
167
	/**
168
	 * @param ISearchRequest $request
169
	 * @param array $arr
170
	 */
171 View Code Duplication
	private function improveSearchWildcardFilters(ISearchRequest $request, array &$arr) {
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
172
173
		$filters = $request->getWildcardFilters();
174
		foreach ($filters as $filter) {
175
			$wildcards = [];
176
			foreach ($filter as $entry) {
177
				$wildcards[] = ['wildcard' => $entry];
178
			}
179
180
			$arr['bool']['filter'][]['bool']['should'] = $wildcards;
181
		}
182
183
	}
184
185
186
	/**
187
	 * @param ISearchRequest $request
188
	 * @param array $arr
189
	 */
190 View Code Duplication
	private function improveSearchRegexFilters(ISearchRequest $request, array &$arr) {
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
191
192
		$filters = $request->getRegexFilters();
193
		foreach ($filters as $filter) {
194
			$regex = [];
195
			foreach ($filter as $entry) {
196
				$regex[] = ['regexp' => $entry];
197
			}
198
199
			$arr['bool']['filter'][]['bool']['should'] = $regex;
200
		}
201
202
	}
203
204
205
	/**
206
	 * @param ISearchRequest $request
207
	 *
208
	 * @return array
209
	 * @throws SearchQueryGenerationException
210
	 */
211
	private function generateSearchQueryContent(ISearchRequest $request): array {
212
		$str = strtolower($request->getSearch());
213
214
		preg_match_all('/[^?]"(?:\\\\.|[^\\\\"])*"|\S+/', " $str ", $words);
215
		$queryContent = [];
216
		foreach ($words[0] as $word) {
217
			try {
218
				$queryContent[] = $this->generateQueryContent(trim($word));
219
			} catch (QueryContentGenerationException $e) {
220
				continue;
221
			}
222
		}
223
224
		if (sizeof($queryContent) === 0) {
225
			throw new SearchQueryGenerationException();
226
		}
227
228
		return $this->generateSearchQueryFromQueryContent($request, $queryContent);
229
	}
230
231
232
	/**
233
	 * @param string $word
234
	 *
235
	 * @return QueryContent
236
	 * @throws QueryContentGenerationException
237
	 */
238
	private function generateQueryContent(string $word): QueryContent {
239
240
		$searchQueryContent = new QueryContent($word);
241
		if (strlen($searchQueryContent->getWord()) === 0) {
242
			throw new QueryContentGenerationException();
243
		}
244
245
		return $searchQueryContent;
246
	}
247
248
249
	/**
250
	 * @param ISearchRequest $request
251
	 * @param QueryContent[] $queryContents
252
	 *
253
	 * @return array
254
	 */
255
	private function generateSearchQueryFromQueryContent(
256
		ISearchRequest $request, array $queryContents
257
	): array {
258
259
		$query = $queryWords = [];
260
		foreach ($queryContents as $queryContent) {
261
			$queryWords[$queryContent->getShould()][] =
262
				$this->generateQueryContentFields($request, $queryContent);
263
		}
264
265
		$listShould = array_keys($queryWords);
266
		foreach ($listShould as $itemShould) {
267
			$query[$itemShould][] = $queryWords[$itemShould];
268
		}
269
270
		return ['bool' => $query];
271
	}
272
273
274
	/**
275
	 * @param ISearchRequest $request
276
	 * @param QueryContent $content
277
	 *
278
	 * @return array
279
	 */
280
	private function generateQueryContentFields(ISearchRequest $request, QueryContent $content
281
	): array {
282
		$queryFields = [];
283
284
		$fields = array_merge(['content', 'title'], $request->getFields());
285
		foreach ($fields as $field) {
286
			if (!$this->fieldIsOutLimit($request, $field)) {
287
				$queryFields[] = [$content->getMatch() => [$field => $content->getWord()]];
288
			}
289
		}
290
291
		foreach ($request->getWildcardFields() as $field) {
292
			if (!$this->fieldIsOutLimit($request, $field)) {
293
				$queryFields[] = ['wildcard' => [$field => '*' . $content->getWord() . '*']];
294
			}
295
		}
296
297
		$parts = [];
298
		foreach ($this->getPartsFields($request) as $field) {
299
			if (!$this->fieldIsOutLimit($request, $field)) {
300
				$parts[] = $field;
301
			}
302
		}
303
304
		if (sizeof($parts) > 0) {
305
			$queryFields[] = [
306
				'query_string' => [
307
					'fields' => $parts,
308
					'query'  => $content->getWord()
309
				]
310
			];
311
		}
312
313
		return ['bool' => ['should' => $queryFields]];
314
	}
315
316
317
	/**
318
	 * @param IDocumentAccess $access
319
	 *
320
	 * @return array
321
	 */
322
	private function generateSearchQueryAccess(IDocumentAccess $access): array {
323
324
		$query = [];
325
		$query[] = ['term' => ['owner' => $access->getViewerId()]];
326
		$query[] = ['term' => ['users' => $access->getViewerId()]];
327
		$query[] = ['term' => ['users' => '__all']];
328
329
		foreach ($access->getGroups() as $group) {
330
			$query[] = ['term' => ['groups' => $group]];
331
		}
332
333
		foreach ($access->getCircles() as $circle) {
334
			$query[] = ['term' => ['circles' => $circle]];
335
		}
336
337
		$externalFilesConditions = $this->getExternalFilesConditions();
338
		if (!empty($externalFilesConditions)) {
339
			$query[] = ['bool' => ['must' => $externalFilesConditions]];
340
		}
341
		
342
		return $query;
343
	}
344
345
	/**
346
	 * @return array
347
	 */
348
	private function getExternalFileShares() : array {
349
		if (!$this->userStoragesService) {
350
			return [];
351
		}
352
		return $this->userStoragesService->getAllStoragesForUser();
353
	}
354
355
	/**
356
	 * Generates condition array for external files
357
	 * @return array
358
	 */
359
	private function getExternalFilesConditions(): array {
360
		// TODO :: normally we should check if user want's to search 
361
		// external files with "$request->getOption('files_external', '1') === '1'"
362
		$externalFileShares = $this->getExternalFileShares();
363
		if (empty($externalFileShares)) {
364
			return [];
365
		}
366
		$allowedExternalShares = [];
367
		foreach ($externalFileShares as $fileShare) {
368
			// If any external share is mounted as root, every
369
			// path is allowed
370
			if ($fileShare === '/') {
371
				$allowedExternalShares = [];
372
				break;
373
			}
374
			$allowedExternalShares[] = ['prefix' => ['title' => $fileShare]];
375
		}
376
		$externalFilesConditions = [];
377
		$externalFilesConditions[] = ['term' => ['source' => 'files_external']];
378
		$externalFilesConditions[] = ['term' => ['owner' => '']];
379
		if (!empty($allowedExternalShares)) {
380
			$externalFilesConditions[] = ['bool' => ['should' => $allowedExternalShares]];
381
		}
382
383
		return $externalFilesConditions;
384
	}
385
386
	/**
387
	 * @param ISearchRequest $request
388
	 * @param string $field
389
	 *
390
	 * @return bool
391
	 */
392
	private function fieldIsOutLimit(ISearchRequest $request, string $field): bool {
393
		$limit = $request->getLimitFields();
394
		if (sizeof($limit) === 0) {
395
			return false;
396
		}
397
398
		if (in_array($field, $limit)) {
0 ignored issues
show
Unused Code introduced by
This if statement, and the following return statement can be replaced with return !in_array($field, $limit);.
Loading history...
399
			return false;
400
		}
401
402
		return true;
403
	}
404
405
406
	/**
407
	 * @param string $k
408
	 * @param array $tags
409
	 *
410
	 * @return array
411
	 */
412
	private function generateSearchQueryTags(string $k, array $tags): array {
413
414
		$query = [];
415
		foreach ($tags as $t) {
416
			$query[] = ['term' => [$k => $t]];
417
		}
418
419
		return $query;
420
	}
421
422
423
	/**
424
	 * @param ISearchRequestSimpleQuery[] $queries
425
	 *
426
	 * @return array
427
	 */
428
	private function generateSearchSimpleQuery(array $queries): array {
429
		$simpleQuery = [];
430
		foreach ($queries as $query) {
431
			// TODO: manage multiple entries array
432
433 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_KEYWORD) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
434
				$value = $query->getValues()[0];
435
				$simpleQuery[] = ['term' => [$query->getField() => $value]];
436
			}
437
438 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_WILDCARD) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
439
				$value = $query->getValues()[0];
440
				$simpleQuery[] = ['wildcard' => [$query->getField() => $value]];
441
			}
442
443 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_EQ) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
444
				$value = $query->getValues()[0];
445
				$simpleQuery[] = ['term' => [$query->getField() => $value]];
446
			}
447
448 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_GTE) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
449
				$value = $query->getValues()[0];
450
				$simpleQuery[] = ['range' => [$query->getField() => ['gte' => $value]]];
451
			}
452
453 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_LTE) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
454
				$value = $query->getValues()[0];
455
				$simpleQuery[] = ['range' => [$query->getField() => ['lte' => $value]]];
456
			}
457
458 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_GT) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
459
				$value = $query->getValues()[0];
460
				$simpleQuery[] = ['range' => [$query->getField() => ['gt' => $value]]];
461
			}
462
463 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_LT) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
464
				$value = $query->getValues()[0];
465
				$simpleQuery[] = ['range' => [$query->getField() => ['lt' => $value]]];
466
			}
467
468
		}
469
470
		return $simpleQuery;
471
	}
472
473
474
	/**
475
	 * @param ISearchRequest $request
476
	 *
477
	 * @return array
478
	 */
479
	private function generateSearchHighlighting(ISearchRequest $request): array {
480
481
		$parts = $this->getPartsFields($request);
482
		$fields = ['content' => new stdClass()];
483
		foreach ($parts as $part) {
484
			$fields[$part] = new stdClass();
485
		}
486
487
		return [
488
			'fields'    => $fields,
489
			'pre_tags'  => [''],
490
			'post_tags' => ['']
491
		];
492
	}
493
494
495
	/**
496
	 * @param string $providerId
497
	 * @param string $documentId
498
	 *
499
	 * @return array
500
	 * @throws ConfigurationException
501
	 */
502
	public function getDocumentQuery(string $providerId, string $documentId): array {
503
		return [
504
			'index' => $this->configService->getElasticIndex(),
505
			'type'  => 'standard',
506
			'id'    => $providerId . ':' . $documentId
507
		];
508
	}
509
510
511
	/**
512
	 * @param ISearchRequest $request
513
	 *
514
	 * @return array
515
	 */
516
	private function getPartsFields(ISearchRequest $request) {
517
		return array_map(
518
			function($value) {
519
				return 'parts.' . $value;
520
			}, $request->getParts()
521
		);
522
	}
523
524
}
525
526