Completed
Pull Request — master (#100)
by Robin
01:27
created

SearchMappingService::generateSearchHighlighting()   A

Complexity

Conditions 2
Paths 2

Size

Total Lines 14

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 14
rs 9.7998
c 0
b 0
f 0
cc 2
nc 2
nop 1
1
<?php
2
declare(strict_types=1);
3
4
5
/**
6
 * FullTextSearch_ElasticSearch - Use Elasticsearch to index the content of your nextcloud
7
 *
8
 * This file is licensed under the Affero General Public License version 3 or
9
 * later. See the COPYING file.
10
 *
11
 * @author Maxence Lange <[email protected]>
12
 * @copyright 2018
13
 * @license GNU AGPL version 3 or any later version
14
 *
15
 * This program is free software: you can redistribute it and/or modify
16
 * it under the terms of the GNU Affero General Public License as
17
 * published by the Free Software Foundation, either version 3 of the
18
 * License, or (at your option) any later version.
19
 *
20
 * This program is distributed in the hope that it will be useful,
21
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
22
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
23
 * GNU Affero General Public License for more details.
24
 *
25
 * You should have received a copy of the GNU Affero General Public License
26
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
27
 *
28
 */
29
30
31
namespace OCA\FullTextSearch_ElasticSearch\Service;
32
33
34
use OCA\FullTextSearch_ElasticSearch\Exceptions\ConfigurationException;
35
use OCA\FullTextSearch_ElasticSearch\Exceptions\QueryContentGenerationException;
36
use OCA\FullTextSearch_ElasticSearch\Exceptions\SearchQueryGenerationException;
37
use OCA\FullTextSearch_ElasticSearch\Model\QueryContent;
38
use OCP\FullTextSearch\Model\IDocumentAccess;
39
use OCP\FullTextSearch\Model\ISearchRequest;
40
use OCP\FullTextSearch\Model\ISearchRequestSimpleQuery;
41
use stdClass;
42
43
/**
44
 * Class SearchMappingService
45
 *
46
 * @package OCA\FullTextSearch_ElasticSearch\Service
47
 */
48
class SearchMappingService {
49
50
	/** @var ConfigService */
51
	private $configService;
52
53
	/** @var MiscService */
54
	private $miscService;
55
56
	/** @var IUserStoragesService */
57
	private $userStoragesService;
58
59
	/**
60
	 * SearchMappingService constructor.
61
	 *
62
	 * @param ConfigService $configService
63
	 * @param MiscService $miscService
64
	 * @param IUserStoragesService $userStoragesService
0 ignored issues
show
Documentation introduced by
Should the type for parameter $userStoragesService not be null|IUserStoragesService?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
65
	 */
66
	public function __construct(ConfigService $configService, MiscService $miscService, IUserStoragesService $userStoragesService = null) {
67
		$this->configService = $configService;
68
		$this->miscService = $miscService;
69
		$this->userStoragesService = $userStoragesService;
70
	}
71
72
73
	/**
74
	 * @param ISearchRequest $request
75
	 * @param IDocumentAccess $access
76
	 * @param string $providerId
77
	 *
78
	 * @return array
79
	 * @throws ConfigurationException
80
	 * @throws SearchQueryGenerationException
81
	 */
82
	public function generateSearchQuery(
83
		ISearchRequest $request, IDocumentAccess $access, string $providerId
84
	): array {
85
		$query['params'] = $this->generateSearchQueryParams($request, $access, $providerId);
0 ignored issues
show
Coding Style Comprehensibility introduced by
$query was never initialized. Although not strictly required by PHP, it is generally a good practice to add $query = array(); before regardless.

Adding an explicit array definition is generally preferable to implicit array definition as it guarantees a stable state of the code.

Let’s take a look at an example:

foreach ($collection as $item) {
    $myArray['foo'] = $item->getFoo();

    if ($item->hasBar()) {
        $myArray['bar'] = $item->getBar();
    }

    // do something with $myArray
}

As you can see in this example, the array $myArray is initialized the first time when the foreach loop is entered. You can also see that the value of the bar key is only written conditionally; thus, its value might result from a previous iteration.

This might or might not be intended. To make your intention clear, your code more readible and to avoid accidental bugs, we recommend to add an explicit initialization $myArray = array() either outside or inside the foreach loop.

Loading history...
86
87
		return $query;
88
	}
89
90
91
	/**
92
	 * @param ISearchRequest $request
93
	 * @param IDocumentAccess $access
94
	 * @param string $providerId
95
	 *
96
	 * @return array
97
	 * @throws ConfigurationException
98
	 * @throws SearchQueryGenerationException
99
	 */
100
	public function generateSearchQueryParams(
101
		ISearchRequest $request, IDocumentAccess $access, string $providerId
102
	): array {
103
		$params = [
104
			'index' => $this->configService->getElasticIndex(),
105
			'type'  => 'standard',
106
			'size'  => $request->getSize(),
107
			'from'  => (($request->getPage() - 1) * $request->getSize())
108
		];
109
110
		$bool = [];
111
		if ($request->getSearch() !== '') {
112
			$bool['must']['bool']['should'] = $this->generateSearchQueryContent($request);
113
		}
114
115
		$bool['filter'][]['bool']['must'] = ['term' => ['provider' => $providerId]];
116
		$bool['filter'][]['bool']['should'] = $this->generateSearchQueryAccess($access);
117
		$bool['filter'][]['bool']['should'] =
118
			$this->generateSearchQueryTags('metatags', $request->getMetaTags());
119
120
		$bool['filter'][]['bool']['must'] =
121
			$this->generateSearchQueryTags('subtags', $request->getSubTags(true));
122
123
		$bool['filter'][]['bool']['must'] =
124
			$this->generateSearchSimpleQuery($request->getSimpleQueries());
125
126
//		$bool['filter'][]['bool']['should'] = $this->generateSearchQueryTags($request->getTags());
127
128
		$params['body']['query']['bool'] = $bool;
129
		$params['body']['highlight'] = $this->generateSearchHighlighting($request);
130
131
		$this->improveSearchQuerying($request, $params['body']['query']);
132
133
		return $params;
134
	}
135
136
137
	/**
138
	 * @param ISearchRequest $request
139
	 * @param array $arr
140
	 */
141
	private function improveSearchQuerying(ISearchRequest $request, array &$arr) {
142
//		$this->improveSearchWildcardQueries($request, $arr);
143
		$this->improveSearchWildcardFilters($request, $arr);
144
		$this->improveSearchRegexFilters($request, $arr);
145
	}
146
147
148
//	/**
149
//	 * @param SearchRequest $request
150
//	 * @param array $arr
151
//	 */
152
//	private function improveSearchWildcardQueries(SearchRequest $request, &$arr) {
153
//
154
//		$queries = $request->getWildcardQueries();
155
//		foreach ($queries as $query) {
156
//			$wildcards = [];
157
//			foreach ($query as $entry) {
158
//				$wildcards[] = ['wildcard' => $entry];
159
//			}
160
//
161
//			array_push($arr['bool']['must']['bool']['should'], $wildcards);
162
//		}
163
//
164
//	}
165
166
167
	/**
168
	 * @param ISearchRequest $request
169
	 * @param array $arr
170
	 */
171 View Code Duplication
	private function improveSearchWildcardFilters(ISearchRequest $request, array &$arr) {
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
172
173
		$filters = $request->getWildcardFilters();
174
		foreach ($filters as $filter) {
175
			$wildcards = [];
176
			foreach ($filter as $entry) {
177
				$wildcards[] = ['wildcard' => $entry];
178
			}
179
180
			$arr['bool']['filter'][]['bool']['should'] = $wildcards;
181
		}
182
183
	}
184
185
186
	/**
187
	 * @param ISearchRequest $request
188
	 * @param array $arr
189
	 */
190 View Code Duplication
	private function improveSearchRegexFilters(ISearchRequest $request, array &$arr) {
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
191
192
		$filters = $request->getRegexFilters();
193
		foreach ($filters as $filter) {
194
			$regex = [];
195
			foreach ($filter as $entry) {
196
				$regex[] = ['regexp' => $entry];
197
			}
198
199
			$arr['bool']['filter'][]['bool']['should'] = $regex;
200
		}
201
202
	}
203
204
205
	/**
206
	 * @param ISearchRequest $request
207
	 *
208
	 * @return array
209
	 * @throws SearchQueryGenerationException
210
	 */
211
	private function generateSearchQueryContent(ISearchRequest $request): array {
212
		$str = strtolower($request->getSearch());
213
214
		preg_match_all('/[^?]"(?:\\\\.|[^\\\\"])*"|\S+/', " $str ", $words);
215
		$queryContent = [];
216
		foreach ($words[0] as $word) {
217
			try {
218
				$queryContent[] = $this->generateQueryContent(trim($word));
219
			} catch (QueryContentGenerationException $e) {
220
				continue;
221
			}
222
		}
223
224
		if (sizeof($queryContent) === 0) {
225
			throw new SearchQueryGenerationException();
226
		}
227
228
		return $this->generateSearchQueryFromQueryContent($request, $queryContent);
229
	}
230
231
232
	/**
233
	 * @param string $word
234
	 *
235
	 * @return QueryContent
236
	 * @throws QueryContentGenerationException
237
	 */
238
	private function generateQueryContent(string $word): QueryContent {
239
240
		$searchQueryContent = new QueryContent($word);
241
		if (strlen($searchQueryContent->getWord()) === 0) {
242
			throw new QueryContentGenerationException();
243
		}
244
245
		return $searchQueryContent;
246
	}
247
248
249
	/**
250
	 * @param ISearchRequest $request
251
	 * @param QueryContent[] $queryContents
252
	 *
253
	 * @return array
254
	 */
255
	private function generateSearchQueryFromQueryContent(
256
		ISearchRequest $request, array $queryContents
257
	): array {
258
259
		$query = $queryWords = [];
260
		foreach ($queryContents as $queryContent) {
261
			$queryWords[$queryContent->getShould()][] =
262
				$this->generateQueryContentFields($request, $queryContent);
263
		}
264
265
		$listShould = array_keys($queryWords);
266
		foreach ($listShould as $itemShould) {
267
			$query[$itemShould][] = $queryWords[$itemShould];
268
		}
269
270
		return ['bool' => $query];
271
	}
272
273
274
	/**
275
	 * @param ISearchRequest $request
276
	 * @param QueryContent $content
277
	 *
278
	 * @return array
279
	 */
280
	private function generateQueryContentFields(ISearchRequest $request, QueryContent $content
281
	): array {
282
		$queryFields = [];
283
284
		$fields = array_merge(['content', 'title'], $request->getFields());
285
		foreach ($fields as $field) {
286
			if (!$this->fieldIsOutLimit($request, $field)) {
287
				$queryFields[] = [$content->getMatch() => [$field => $content->getWord()]];
288
			}
289
		}
290
291
		foreach ($request->getWildcardFields() as $field) {
292
			if (!$this->fieldIsOutLimit($request, $field)) {
293
				$queryFields[] = ['wildcard' => [$field => '*' . $content->getWord() . '*']];
294
			}
295
		}
296
297
		$parts = [];
298
		foreach ($this->getPartsFields($request) as $field) {
299
			if (!$this->fieldIsOutLimit($request, $field)) {
300
				$parts[] = $field;
301
			}
302
		}
303
304
		if (sizeof($parts) > 0) {
305
			$queryFields[] = [
306
				'query_string' => [
307
					'fields' => $parts,
308
					'query'  => $content->getWord()
309
				]
310
			];
311
		}
312
313
		return ['bool' => ['should' => $queryFields]];
314
	}
315
316
317
	/**
318
	 * @param IDocumentAccess $access
319
	 *
320
	 * @return array
321
	 */
322
	private function generateSearchQueryAccess(IDocumentAccess $access): array {
323
324
		$query = [];
325
		$query[] = ['term' => ['owner' => $access->getViewerId()]];
326
		$query[] = ['term' => ['users' => $access->getViewerId()]];
327
		$query[] = ['term' => ['users' => '__all']];
328
329
		foreach ($access->getGroups() as $group) {
330
			$query[] = ['term' => ['groups' => $group]];
331
		}
332
333
		foreach ($access->getCircles() as $circle) {
334
			$query[] = ['term' => ['circles' => $circle]];
335
		}
336
337
		// TODO :: normally we should check if user want's to search 
338
		// external files with "$request->getOption('files_external', '1') === '1'"
339
		$externalFileShares = $this->getExternalFileShares();
340
341
		if ($externalFileShares){
0 ignored issues
show
Bug Best Practice introduced by
The expression $externalFileShares of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
342
			$allowedExternalShares = [];
343
			foreach($externalFileShares as $fileShare){
344
				$allowedExternalShares[] = ['prefix' => ['title' => $fileShare]];
345
			}
346
			$externalFilesConditions = [];
347
			$externalFilesConditions[] = ['term' => ['source' => 'files_external']];
348
			$externalFilesConditions[] = ['term' => ['owner' => '']];
349
			$externalFilesConditions[] = ['bool' => ['should' => $allowedExternalShares]];
350
			$query[] = ['bool' => ['must' => $externalFilesConditions]];
351
		}
352
353
		return $query;
354
	}
355
356
	/**
357
	 * @return array
358
	 */
359
	private function getExternalFileShares() : array {
360
		if (!$this->userStoragesService) {
361
			return [];
362
		}
363
		return $this->userStoragesService->getAllStoragesForUser();
364
	}
365
366
367
	/**
368
	 * @param ISearchRequest $request
369
	 * @param string $field
370
	 *
371
	 * @return bool
372
	 */
373
	private function fieldIsOutLimit(ISearchRequest $request, string $field): bool {
374
		$limit = $request->getLimitFields();
375
		if (sizeof($limit) === 0) {
376
			return false;
377
		}
378
379
		if (in_array($field, $limit)) {
0 ignored issues
show
Unused Code introduced by
This if statement, and the following return statement can be replaced with return !in_array($field, $limit);.
Loading history...
380
			return false;
381
		}
382
383
		return true;
384
	}
385
386
387
	/**
388
	 * @param string $k
389
	 * @param array $tags
390
	 *
391
	 * @return array
392
	 */
393
	private function generateSearchQueryTags(string $k, array $tags): array {
394
395
		$query = [];
396
		foreach ($tags as $t) {
397
			$query[] = ['term' => [$k => $t]];
398
		}
399
400
		return $query;
401
	}
402
403
404
	/**
405
	 * @param ISearchRequestSimpleQuery[] $queries
406
	 *
407
	 * @return array
408
	 */
409
	private function generateSearchSimpleQuery(array $queries): array {
410
		$simpleQuery = [];
411
		foreach ($queries as $query) {
412
			// TODO: manage multiple entries array
413
414 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_KEYWORD) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
415
				$value = $query->getValues()[0];
416
				$simpleQuery[] = ['term' => [$query->getField() => $value]];
417
			}
418
419 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_WILDCARD) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
420
				$value = $query->getValues()[0];
421
				$simpleQuery[] = ['wildcard' => [$query->getField() => $value]];
422
			}
423
424 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_EQ) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
425
				$value = $query->getValues()[0];
426
				$simpleQuery[] = ['term' => [$query->getField() => $value]];
427
			}
428
429 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_GTE) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
430
				$value = $query->getValues()[0];
431
				$simpleQuery[] = ['range' => [$query->getField() => ['gte' => $value]]];
432
			}
433
434 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_LTE) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
435
				$value = $query->getValues()[0];
436
				$simpleQuery[] = ['range' => [$query->getField() => ['lte' => $value]]];
437
			}
438
439 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_GT) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
440
				$value = $query->getValues()[0];
441
				$simpleQuery[] = ['range' => [$query->getField() => ['gt' => $value]]];
442
			}
443
444 View Code Duplication
			if ($query->getType() === ISearchRequestSimpleQuery::COMPARE_TYPE_INT_LT) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
445
				$value = $query->getValues()[0];
446
				$simpleQuery[] = ['range' => [$query->getField() => ['lt' => $value]]];
447
			}
448
449
		}
450
451
		return $simpleQuery;
452
	}
453
454
455
	/**
456
	 * @param ISearchRequest $request
457
	 *
458
	 * @return array
459
	 */
460
	private function generateSearchHighlighting(ISearchRequest $request): array {
461
462
		$parts = $this->getPartsFields($request);
463
		$fields = ['content' => new stdClass()];
464
		foreach ($parts as $part) {
465
			$fields[$part] = new stdClass();
466
		}
467
468
		return [
469
			'fields'    => $fields,
470
			'pre_tags'  => [''],
471
			'post_tags' => ['']
472
		];
473
	}
474
475
476
	/**
477
	 * @param string $providerId
478
	 * @param string $documentId
479
	 *
480
	 * @return array
481
	 * @throws ConfigurationException
482
	 */
483
	public function getDocumentQuery(string $providerId, string $documentId): array {
484
		return [
485
			'index' => $this->configService->getElasticIndex(),
486
			'type'  => 'standard',
487
			'id'    => $providerId . ':' . $documentId
488
		];
489
	}
490
491
492
	/**
493
	 * @param ISearchRequest $request
494
	 *
495
	 * @return array
496
	 */
497
	private function getPartsFields(ISearchRequest $request) {
498
		return array_map(
499
			function($value) {
500
				return 'parts.' . $value;
501
			}, $request->getParts()
502
		);
503
	}
504
505
}
506
507