Passed
Push — master ( b1a1af...f07e26 )
by Alexander
25:59 queued 19:47
created

BaseTokenizer   B

Complexity

Total Complexity 49

Size/Duplication

Total Lines 452
Duplicated Lines 0 %

Test Coverage

Coverage 94.34%

Importance

Changes 0
Metric Value
wmc 49
eloc 153
dl 0
loc 452
ccs 150
cts 159
cp 0.9434
rs 8.48
c 0
b 0
f 0

11 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 3 1
A setSql() 0 3 1
A advance() 0 8 2
B tokenize() 0 46 8
A addTokenFromBuffer() 0 17 4
A isEof() 0 3 1
A substring() 0 20 6
A indexAfter() 0 19 4
B startsWithAnyLongest() 0 34 7
B tokenizeOperator() 0 78 9
A tokenizeDelimitedString() 0 20 6

How to fix   Complexity   

Complex Class

Complex classes like BaseTokenizer often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use BaseTokenizer, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
namespace Yiisoft\Db\Sqlite;
6
7
use SplStack;
8
use Yiisoft\Db\Exception\InvalidArgumentException;
9
10
use function is_array;
11
use function is_string;
12
use function mb_strlen;
13
use function mb_strpos;
14
use function mb_strtoupper;
15
use function mb_substr;
16
use function reset;
17
use function usort;
18
19
/**
20
 * BaseTokenizer splits an SQL query into individual SQL tokens.
21
 *
22
 * It can be used to obtain an addition information from an SQL code.
23
 *
24
 * Usage example:
25
 *
26
 * ```php
27
 * $tokenizer = new SqlTokenizer("SELECT * FROM user WHERE id = 1");
28
 * $root = $tokeinzer->tokenize();
29
 * $sqlTokens = $root->getChildren();
30
 * ```
31
 *
32
 * Tokens are instances of {@see SqlToken}.
33
 */
34
abstract class BaseTokenizer
35
{
36
    /**
37
     * @var string SQL code.
38
     */
39
    private string $sql;
40
41
    /**
42
     * @var int SQL code string length.
43
     */
44
    protected int $length;
45
46
    /**
47
     * @var int SQL code string current offset.
48
     */
49
    protected int $offset;
50
51
    /**
52
     * @var SplStack stack of active tokens.
53
     */
54
    private SplStack $tokenStack;
55
56
    /**
57
     * @var SqlToken|null active token. It's usually a top of the token stack.
58
     */
59
    private ?SqlToken $currentToken = null;
60
61
    /**
62
     * @var string[] cached substrings.
63
     */
64
    private array $substrings;
65
66
    /**
67
     * @var string string current buffer value.
68
     */
69
    private string $buffer = '';
70
71
    /**
72
     * @var SqlToken|null resulting token of a last {@see tokenize()} call.
73
     */
74
    private ?SqlToken $token = null;
75
76 22
    public function __construct(string $sql)
77
    {
78 22
        $this->sql = $sql;
79 22
    }
80
81
    /**
82
     * Tokenizes and returns a code type token.
83
     *
84
     * @throws InvalidArgumentException
85
     *
86
     * @return SqlToken code type token.
87
     */
88 22
    public function tokenize(): SqlToken
89
    {
90 22
        $this->length = mb_strlen($this->sql, 'UTF-8');
91 22
        $this->offset = 0;
92 22
        $this->substrings = [];
93 22
        $this->buffer = '';
94
95 22
        $this->token = (new SqlToken())
96 22
            ->type(SqlToken::TYPE_CODE)
97 22
            ->content($this->sql);
98
99 22
        $this->tokenStack = new SplStack();
100 22
        $this->tokenStack->push($this->token);
101
102 22
        $tk = (new SqlToken())
103 22
            ->type(SqlToken::TYPE_STATEMENT);
104
105 22
        $this->token[] = $tk;
106
107 22
        $this->tokenStack->push($this->token[0]);
108 22
        $this->currentToken = $this->tokenStack->top();
109
110 22
        while (!$this->isEof()) {
111 22
            if ($this->isWhitespace($length) || $this->isComment($length)) {
112 22
                $this->addTokenFromBuffer();
113 22
                $this->advance($length);
114
115 22
                continue;
116
            }
117
118
            /** @psalm-suppress ConflictingReferenceConstraint */
119 22
            if ($this->tokenizeOperator($length) || $this->tokenizeDelimitedString($length)) {
120 22
                $this->advance($length);
121
122 22
                continue;
123
            }
124
125 22
            $this->buffer .= $this->substring(1);
126 22
            $this->advance(1);
127
        }
128 22
        $this->addTokenFromBuffer();
129 22
        if ($this->token->getHasChildren() && !$this->token[-1]->getHasChildren()) {
0 ignored issues
show
Bug introduced by
The method getHasChildren() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

129
        if ($this->token->getHasChildren() && !$this->token[-1]->/** @scrutinizer ignore-call */ getHasChildren()) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
130 5
            unset($this->token[-1]);
131
        }
132
133 22
        return $this->token;
134
    }
135
136
    /**
137
     * Returns whether there's a whitespace at the current offset.
138
     *
139
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
140
     *
141
     * @param int|null $length length of the matched string.
142
     *
143
     * @return bool whether there's a whitespace at the current offset.
144
     */
145
    abstract protected function isWhitespace(?int &$length): bool;
146
147
    /**
148
     * Returns whether there's a commentary at the current offset.
149
     *
150
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
151
     *
152
     * @param int $length length of the matched string.
153
     *
154
     * @return bool whether there's a commentary at the current offset.
155
     */
156
    abstract protected function isComment(int &$length): bool;
157
158
    /**
159
     * Returns whether there's an operator at the current offset.
160
     *
161
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
162
     * also set `$content` to a string that will be used as a token content.
163
     *
164
     * @param int $length  length of the matched string.
165
     * @param string|null $content optional content instead of the matched string.
166
     *
167
     * @return bool whether there's an operator at the current offset.
168
     */
169
    abstract protected function isOperator(int &$length, ?string &$content): bool;
170
171
    /**
172
     * Returns whether there's an identifier at the current offset.
173
     *
174
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
175
     * also set `$content` to a string that will be used as a token content.
176
     *
177
     * @param int $length length of the matched string.
178
     * @param string|null $content optional content instead of the matched string.
179
     *
180
     * @return bool whether there's an identifier at the current offset.
181
     */
182
    abstract protected function isIdentifier(int &$length, ?string &$content): bool;
183
184
    /**
185
     * Returns whether there's a string literal at the current offset.
186
     *
187
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
188
     * also set `$content` to a string that will be used as a token content.
189
     *
190
     * @param int $length  length of the matched string.
191
     * @param string|null $content optional content instead of the matched string.
192
     *
193
     * @return bool whether there's a string literal at the current offset.
194
     */
195
    abstract protected function isStringLiteral(int &$length, ?string &$content): bool;
196
197
    /**
198
     * Returns whether the given string is a keyword.
199
     *
200
     * The method may set `$content` to a string that will be used as a token content.
201
     *
202
     * @param string $string  string to be matched.
203
     * @param string|null $content optional content instead of the matched string.
204
     *
205
     * @return bool whether the given string is a keyword.
206
     */
207
    abstract protected function isKeyword(string $string, ?string &$content): bool;
208
209
    /**
210
     * @param string $sql
211
     */
212
    public function setSql(string $sql): void
213
    {
214
        $this->sql = $sql;
215
    }
216
217
    /**
218
     * Returns whether the longest common prefix equals to the SQL code of the same length at the current offset.
219
     *
220
     * @param string[] $with strings to be tested. The method **will** modify this parameter to speed up lookups.
221
     * @param bool $caseSensitive whether to perform a case sensitive comparison.
222
     * @param int|null $length length of the matched string.
223
     * @param string|null $content matched string.
224
     *
225
     * @return bool whether a match is found.
226
     */
227 22
    protected function startsWithAnyLongest(
228
        array &$with,
229
        bool $caseSensitive,
230
        ?int &$length = null,
231
        ?string &$content = null
232
    ): bool {
233 22
        if (empty($with)) {
234
            return false;
235
        }
236
237 22
        if (!is_array(reset($with))) {
238 1
            usort($with, static function ($string1, $string2) {
239 1
                return mb_strlen($string2, 'UTF-8') - mb_strlen($string1, 'UTF-8');
240 1
            });
241
242 1
            $map = [];
243
244 1
            foreach ($with as $string) {
245 1
                $map[mb_strlen($string, 'UTF-8')][$caseSensitive ? $string : mb_strtoupper($string, 'UTF-8')] = true;
246
            }
247
248 1
            $with = $map;
249
        }
250 22
        foreach ($with as $testLength => $testValues) {
251 22
            $content = $this->substring($testLength, $caseSensitive);
252
253 22
            if (isset($testValues[$content])) {
254 22
                $length = $testLength;
255
256 22
                return true;
257
            }
258
        }
259
260 22
        return false;
261
    }
262
263
    /**
264
     * Returns a string of the given length starting with the specified offset.
265
     *
266
     * @param int $length string length to be returned.
267
     * @param bool $caseSensitive if it's `false`, the string will be uppercased.
268
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
269
     *
270
     * @return string result string, it may be empty if there's nothing to return.
271
     */
272 22
    protected function substring(int $length, bool $caseSensitive = true, ?int $offset = null): string
273
    {
274 22
        if ($offset === null) {
275 22
            $offset = $this->offset;
276
        }
277
278 22
        if ($offset + $length > $this->length) {
279 22
            return '';
280
        }
281
282 22
        $cacheKey = $offset . ',' . $length;
283
284 22
        if (!isset($this->substrings[$cacheKey . ',1'])) {
285 22
            $this->substrings[$cacheKey . ',1'] = mb_substr($this->sql, $offset, $length, 'UTF-8');
286
        }
287 22
        if (!$caseSensitive && !isset($this->substrings[$cacheKey . ',0'])) {
288
            $this->substrings[$cacheKey . ',0'] = mb_strtoupper($this->substrings[$cacheKey . ',1'], 'UTF-8');
289
        }
290
291 22
        return $this->substrings[$cacheKey . ',' . (int) $caseSensitive];
292
    }
293
294
    /**
295
     * Returns an index after the given string in the SQL code starting with the specified offset.
296
     *
297
     * @param string $string string to be found.
298
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
299
     *
300
     * @return int index after the given string or end of string index.
301
     */
302 22
    protected function indexAfter(string $string, ?int $offset = null): int
303
    {
304 22
        if ($offset === null) {
305
            $offset = $this->offset;
306
        }
307
308 22
        if ($offset + mb_strlen($string, 'UTF-8') > $this->length) {
309
            return $this->length;
310
        }
311
312 22
        $afterIndexOf = mb_strpos($this->sql, $string, $offset, 'UTF-8');
313
314 22
        if ($afterIndexOf === false) {
315
            $afterIndexOf = $this->length;
316
        } else {
317 22
            $afterIndexOf += mb_strlen($string, 'UTF-8');
318
        }
319
320 22
        return $afterIndexOf;
321
    }
322
323
    /**
324
     * Determines whether there is a delimited string at the current offset and adds it to the token children.
325
     *
326
     * @param int $length
327
     *
328
     * @return bool
329
     */
330 22
    private function tokenizeDelimitedString(int &$length): bool
331
    {
332 22
        $isIdentifier = $this->isIdentifier($length, $content);
333 22
        $isStringLiteral = !$isIdentifier && $this->isStringLiteral($length, $content);
334
335 22
        if (!$isIdentifier && !$isStringLiteral) {
336 22
            return false;
337
        }
338
339 22
        $this->addTokenFromBuffer();
340
341 22
        $tk = (new SqlToken())
342 22
            ->type($isIdentifier ? SqlToken::TYPE_IDENTIFIER : SqlToken::TYPE_STRING_LITERAL)
343 22
            ->content(is_string($content) ? $content : $this->substring($length))
344 22
            ->startOffset($this->offset)
345 22
            ->endOffset($this->offset + $length);
346
347 22
        $this->currentToken[] = $tk;
348
349 22
        return true;
350
    }
351
352
    /**
353
     * Determines whether there is an operator at the current offset and adds it to the token children.
354
     *
355
     * @param int $length
356
     *
357
     * @return bool
358
     */
359 22
    private function tokenizeOperator(int &$length): bool
360
    {
361 22
        if (!$this->isOperator($length, $content)) {
362 22
            return false;
363
        }
364
365 22
        $this->addTokenFromBuffer();
366
367 22
        switch ($this->substring($length)) {
368 22
            case '(':
369 22
                $tk = (new SqlToken())
370 22
                    ->type(SqlToken::TYPE_OPERATOR)
371 22
                    ->content(is_string($content) ? $content : $this->substring($length))
372 22
                    ->startOffset($this->offset)
373 22
                    ->endOffset($this->offset + $length);
374
375 22
                $this->currentToken[] = $tk;
376
377 22
                $tk1 = (new SqlToken())
378 22
                    ->type(SqlToken::TYPE_PARENTHESIS);
379
380 22
                $this->currentToken[] = $tk1;
381
382 22
                $this->tokenStack->push($this->currentToken[-1]);
383 22
                $this->currentToken = $this->tokenStack->top();
384
385 22
                break;
386
387 22
            case ')':
388 22
                $this->tokenStack->pop();
389 22
                $this->currentToken = $this->tokenStack->top();
390
391 22
                $tk = (new SqlToken())
392 22
                    ->type(SqlToken::TYPE_OPERATOR)
393 22
                    ->content(')')
394 22
                    ->startOffset($this->offset)
395 22
                    ->endOffset($this->offset + $length);
396
397 22
                $this->currentToken[] = $tk;
398
399 22
                break;
400 22
            case ';':
401 7
                if (!$this->currentToken->getHasChildren()) {
0 ignored issues
show
Bug introduced by
The method getHasChildren() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

401
                if (!$this->currentToken->/** @scrutinizer ignore-call */ getHasChildren()) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
402 1
                    break;
403
                }
404
405 7
                $tk = (new SqlToken())
406 7
                    ->type(SqlToken::TYPE_OPERATOR)
407 7
                    ->content(is_string($content) ? $content : $this->substring($length))
408 7
                    ->startOffset($this->offset)
409 7
                    ->endOffset($this->offset + $length);
410
411 7
                $this->currentToken[] = $tk;
412
413 7
                $this->tokenStack->pop();
414 7
                $this->currentToken = $this->tokenStack->top();
415
416 7
                $tk1 = (new SqlToken())
417 7
                    ->type(SqlToken::TYPE_STATEMENT);
418
419 7
                $this->currentToken[] = $tk1;
420 7
                $this->tokenStack->push($this->currentToken[-1]);
421 7
                $this->currentToken = $this->tokenStack->top();
422
423 7
                break;
424
            default:
425 22
                $tk = (new SqlToken())
426 22
                    ->type(SqlToken::TYPE_OPERATOR)
427 22
                    ->content(is_string($content) ? $content : $this->substring($length))
428 22
                    ->startOffset($this->offset)
429 22
                    ->endOffset($this->offset + $length);
430
431 22
                $this->currentToken[] = $tk;
432
433 22
                break;
434
        }
435
436 22
        return true;
437
    }
438
439
    /**
440
     * Determines a type of text in the buffer, tokenizes it and adds it to the token children.
441
     */
442 22
    private function addTokenFromBuffer(): void
443
    {
444 22
        if ($this->buffer === '') {
445 22
            return;
446
        }
447
448 22
        $isKeyword = $this->isKeyword($this->buffer, $content);
449
450 22
        $tk = (new SqlToken())
451 22
            ->type($isKeyword ? SqlToken::TYPE_KEYWORD : SqlToken::TYPE_TOKEN)
452 22
            ->content(is_string($content) ? $content : $this->buffer)
453 22
            ->startOffset($this->offset - mb_strlen($this->buffer, 'UTF-8'))
454 22
            ->endOffset($this->offset);
455
456 22
        $this->currentToken[] = $tk;
457
458 22
        $this->buffer = '';
459 22
    }
460
461
    /**
462
     * Adds the specified length to the current offset.
463
     *
464
     * @param int $length
465
     *
466
     * @throws InvalidArgumentException
467
     */
468 22
    private function advance(int $length): void
469
    {
470 22
        if ($length <= 0) {
471
            throw new InvalidArgumentException('Length must be greater than 0.');
472
        }
473
474 22
        $this->offset += $length;
475 22
        $this->substrings = [];
476 22
    }
477
478
    /**
479
     * Returns whether the SQL code is completely traversed.
480
     *
481
     * @return bool
482
     */
483 22
    private function isEof(): bool
484
    {
485 22
        return $this->offset >= $this->length;
486
    }
487
}
488