Completed
Push — master ( 15e850...27125e )
by mw
02:06
created

SampleTextStopwordTest::testByLanguage()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 17
Code Lines 10

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
c 1
b 0
f 0
dl 0
loc 17
rs 9.4286
cc 1
eloc 10
nc 1
nop 3
1
<?php
2
3
namespace Onoi\Tesa\Tests\Integration;
4
5
use Onoi\Tesa\StopwordAnalyzer;
6
use Onoi\Tesa\Sanitizer;
7
8
/**
9
 * @group onoi-tesa
10
 *
11
 * @license GNU GPL v2+
12
 * @since 0.1
13
 *
14
 * @author mwjames
15
 */
16
class SampleTextStopwordTest extends \PHPUnit_Framework_TestCase {
17
18
	/**
19
	 * @dataProvider textByLanguageProvider
20
	 */
21
	public function testByLanguage( $languageCode, $text, $expected ) {
22
23
		$stopwordAnalyzer = new StopwordAnalyzer();
24
		$stopwordAnalyzer->loadListByLanguage( $languageCode );
25
26
		$sanitizer = new Sanitizer( $text );
27
		$sanitizer->toLowercase();
28
29
		$string = $sanitizer->sanitizeBy(
30
			$stopwordAnalyzer
31
		);
32
33
		$this->assertEquals(
34
			$expected,
35
			$string
36
		);
37
	}
38
39
	public function textByLanguageProvider() {
40
41
		// https://en.wikipedia.org/wiki/Stop_words
42
		$provider[] = array(
0 ignored issues
show
Coding Style Comprehensibility introduced by
$provider was never initialized. Although not strictly required by PHP, it is generally a good practice to add $provider = array(); before regardless.

Adding an explicit array definition is generally preferable to implicit array definition as it guarantees a stable state of the code.

Let’s take a look at an example:

foreach ($collection as $item) {
    $myArray['foo'] = $item->getFoo();

    if ($item->hasBar()) {
        $myArray['bar'] = $item->getBar();
    }

    // do something with $myArray
}

As you can see in this example, the array $myArray is initialized the first time when the foreach loop is entered. You can also see that the value of the bar key is only written conditionally; thus, its value might result from a previous iteration.

This might or might not be intended. To make your intention clear, your code more readible and to avoid accidental bugs, we recommend to add an explicit initialization $myArray = array() either outside or inside the foreach loop.

Loading history...
43
			'en',
44
			'In computing, stop words are words which are filtered out before or after processing of natural language data (text).[1] Though stop words usually refer to the most common words in a language, there is no single universal list of stop words used by all natural language processing tools, and indeed not all tools even use such a list. Some tools specifically avoid removing these stop words to support phrase search.',
45
			'computing stop words words filtered processing natural language data text 1 stop words refer common words language single universal list stop words natural language processing tools tools list tools specifically avoid removing stop words support phrase search'
46
		);
47
48
		// https://en.wikipedia.org/wiki/Query_expansion
49
		$provider[] = array(
50
			'en',
51
			"The goal of query expansion in this regard is by increasing recall, precision can potentially increase (rather than decrease as mathematically equated), by including in the result set pages which are more relevant (of higher quality), or at least equally relevant. Pages which would not be included in the result set, which have the potential to be more relevant to the user's desired query, are included, and without query expansion would not have, regardless of relevance.",
52
			"goal query expansion regard increasing recall precision potentially increase decrease mathematically equated including result set pages relevant quality equally relevant pages included result set potential relevant user's desired query included query expansion relevance"
53
		);
54
55
		return $provider;
56
	}
57
58
}
59