NGramTokenizerTest::stringProvider()   B
last analyzed

Complexity

Conditions 1
Paths 1

Size

Total Lines 104
Code Lines 78

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 104
rs 8.2857
c 0
b 0
f 0
cc 1
eloc 78
nc 1
nop 0

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace Onoi\Tesa\Tests;
4
5
use Onoi\Tesa\Tokenizer\NGramTokenizer;
6
7
/**
8
 * @covers \Onoi\Tesa\Tokenizer\NGramTokenizer
9
 * @group onoi-tesa
10
 *
11
 * @license GNU GPL v2+
12
 * @since 0.1
13
 *
14
 * @author mwjames
15
 */
16
class NGramTokenizerTest extends \PHPUnit_Framework_TestCase {
17
18
	public function testCanConstruct() {
19
20
		$this->assertInstanceOf(
21
			'\Onoi\Tesa\Tokenizer\NGramTokenizer',
22
			new NGramTokenizer()
23
		);
24
	}
25
26
	/**
27
	 * @dataProvider stringProvider
28
	 */
29
	public function testTokenize( $string, $ngram, $expected ) {
30
31
		if ( version_compare( phpversion(), '5.4', '<' ) ) {
32
			$this->markTestSkipped(
33
				"Ehh, PHP 5.3 returns with unexpected results"
34
			);
35
		}
36
37
		$instance = new NGramTokenizer( null, $ngram );
38
39
		$this->assertEquals(
40
			$expected,
41
			$instance->tokenize( $string )
42
		);
43
44
		$this->assertFalse(
45
			$instance->isWordTokenizer()
46
		);
47
	}
48
49
	public function testTokenizeWithStartEndMarker() {
50
51
		// http://cloudmark.github.io/Language-Detection
52
		$string = 'TEXT';
53
54
		$expected = array(
55
			'_tex',
56
			'text',
57
			'ext_',
58
			'xt__',
59
			't___'
60
		);
61
62
		$instance = new NGramTokenizer( null, 4 );
63
		$instance->withMarker( true );
64
65
		$this->assertEquals(
66
			$expected,
67
			$instance->tokenize( $string )
68
		);
69
	}
70
71
	public function testTokenizeWithStartEndMarker2() {
72
73
		$string = '教授は';
74
75
		$expected = array(
76
			'_教授',
77
			'教授は',
78
			'授は_',
79
			'は__'
80
		);
81
82
		$instance = new NGramTokenizer( null, 3 );
83
		$instance->withMarker( true );
84
85
		$this->assertEquals(
86
			$expected,
87
			$instance->tokenize( $string )
88
		);
89
	}
90
	public function testTokenizeWithOption() {
91
92
		$string = '红色中华';
93
94
		$tokenizer = $this->getMockBuilder( '\Onoi\Tesa\Tokenizer\Tokenizer' )
95
			->disableOriginalConstructor()
96
			->getMockForAbstractClass();
97
98
		$tokenizer->expects( $this->once() )
99
			->method( 'setOption' );
100
101
		$tokenizer->expects( $this->once() )
102
			->method( 'tokenize' )
103
			->with( $this->equalTo( $string ) )
104
			->will( $this->returnValue( array( $string ) ) );
105
106
		$instance = new NGramTokenizer( $tokenizer );
107
108
		$instance->setOption(
109
			NGramTokenizer::REGEX_EXEMPTION,
110
			array( 'Foo' )
111
		);
112
113
		$this->assertEquals(
114
			array( '红色', '色中', '中华' ),
115
			$instance->tokenize( $string )
116
		);
117
	}
118
119
	public function stringProvider() {
120
121
		$provider[] = array(
0 ignored issues
show
Coding Style Comprehensibility introduced by
$provider was never initialized. Although not strictly required by PHP, it is generally a good practice to add $provider = array(); before regardless.

Adding an explicit array definition is generally preferable to implicit array definition as it guarantees a stable state of the code.

Let’s take a look at an example:

foreach ($collection as $item) {
    $myArray['foo'] = $item->getFoo();

    if ($item->hasBar()) {
        $myArray['bar'] = $item->getBar();
    }

    // do something with $myArray
}

As you can see in this example, the array $myArray is initialized the first time when the foreach loop is entered. You can also see that the value of the bar key is only written conditionally; thus, its value might result from a previous iteration.

This might or might not be intended. To make your intention clear, your code more readible and to avoid accidental bugs, we recommend to add an explicit initialization $myArray = array() either outside or inside the foreach loop.

Loading history...
122
			'TEXT',
123
			'4',
124
			array(
125
				'text'
126
			)
127
		);
128
129
		$provider[] = array(
130
			'12345678',
131
			'2',
132
			array(
133
				'12',
134
				'23',
135
				'34',
136
				'45',
137
				'56',
138
				'67',
139
				'78'
140
			)
141
		);
142
143
		$provider[] = array(
144
			'12345678',
145
			'3',
146
			array(
147
				'123',
148
				'234',
149
				'345',
150
				'456',
151
				'567',
152
				'678'
153
			)
154
		);
155
156
		$provider[] = array(
157
			'hello',
158
			'3',
159
			array(
160
				'hel',
161
				'ell',
162
				'llo'
163
			)
164
		);
165
166
		$provider[] = array(
167
			'Hello World!',
168
			'3',
169
			array(
170
				'hel',
171
				'ell',
172
				'llo',
173
				'lo ',
174
				'o w',
175
				' wo',
176
				'wor',
177
				'orl',
178
				'rld',
179
				'ld!'
180
			)
181
		);
182
183
		$provider[] = array(
184
			'Новости',
185
			'3',
186
			array(
187
				'нов',
188
				'ово',
189
				'вос',
190
				'ост',
191
				'сти'
192
			)
193
		);
194
195
		$provider[] = array(
196
			'1時36分更新',
197
			'3',
198
			array(
199
				'1時3',
200
				'時36',
201
				'36分',
202
				'6分更',
203
				'分更新'
204
			)
205
		);
206
207
		$provider[] = array(
208
			'こんにちは世界!',
209
			'2',
210
			array(
211
				'こん',
212
				'んに',
213
				'にち',
214
				'ちは',
215
				'は世',
216
				'世界',
217
				'界!'
218
			)
219
		);
220
221
		return $provider;
222
	}
223
224
}
225