Passed
Branch dev (ea35c5)
by Wilmer
17:29 queued 12:43
created

BaseTokenizer   C

Complexity

Total Complexity 53

Size/Duplication

Total Lines 446
Duplicated Lines 0 %

Test Coverage

Coverage 93.79%

Importance

Changes 0
Metric Value
wmc 53
eloc 145
dl 0
loc 446
ccs 136
cts 145
cp 0.9379
rs 6.96
c 0
b 0
f 0

11 Methods

Rating   Name   Duplication   Size   Complexity  
A isEof() 0 3 1
A setSql() 0 3 1
A advance() 0 8 2
A substring() 0 21 6
A __construct() 0 3 1
A indexAfter() 0 19 4
B tokenize() 0 46 9
C tokenizeOperator() 0 67 12
A addTokenFromBuffer() 0 15 4
A tokenizeDelimitedString() 0 18 6
B startsWithAnyLongest() 0 35 7

How to fix   Complexity   

Complex Class

Complex classes like BaseTokenizer often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use BaseTokenizer, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
namespace Yiisoft\Db\Sqlite;
6
7
use SplStack;
8
use Yiisoft\Db\Exception\InvalidArgumentException;
9
10
use function is_array;
11
use function is_string;
12
use function mb_strlen;
13
use function mb_strpos;
14
use function mb_strtoupper;
15
use function mb_substr;
16
use function reset;
17
use function usort;
18
19
/**
20
 * BaseTokenizer splits an SQL query into individual SQL tokens.
21
 *
22
 * It can be used to obtain an addition information from an SQL code.
23
 *
24
 * Usage example:
25
 *
26
 * ```php
27
 * $tokenizer = new SqlTokenizer("SELECT * FROM user WHERE id = 1");
28
 * $root = $tokenizer->tokenize();
29
 * $sqlTokens = $root->getChildren();
30
 * ```
31
 *
32
 * Tokens are instances of {@see SqlToken}.
33
 */
34
abstract class BaseTokenizer
35
{
36
    /**
37
     * @var string SQL code.
38
     */
39
    private string $sql;
40
41
    /**
42
     * @var int SQL code string length.
43
     */
44
    protected int $length = 0;
45
46
    /**
47
     * @var int SQL code string current offset.
48
     */
49
    protected int $offset = 0;
50
51
    /**
52
     * @var SplStack of active tokens.
53
     *
54
     * @psalm-var SplStack<SqlToken>
55
     * @psalm-suppress PropertyNotSetInConstructor
56
     */
57
    private SplStack $tokenStack;
58
59
    /**
60
     * @psalm-var SqlToken|SqlToken[] active token. It's usually a top of the token stack.
61
     *
62
     * @psalm-suppress PropertyNotSetInConstructor
63
     */
64
    private array|SqlToken $currentToken;
65
66
    /**
67
     * @var string[] cached substrings.
68
     */
69
    private array $substrings = [];
70
71
    /**
72
     * @var string string current buffer value.
73
     */
74
    private string $buffer = '';
75
76
    /**
77
     * @var SqlToken|null resulting token of a last {@see tokenize()} call.
78
     */
79
    private ?SqlToken $token = null;
80
81 17
    public function __construct(string $sql)
82
    {
83 17
        $this->sql = $sql;
84
    }
85
86
    /**
87
     * Tokenizes and returns a code type token.
88
     *
89
     * @throws InvalidArgumentException
90
     *
91
     * @return SqlToken code type token.
92
     */
93 17
    public function tokenize(): SqlToken
94
    {
95 17
        $this->length = mb_strlen($this->sql, 'UTF-8');
96 17
        $this->offset = 0;
97 17
        $this->substrings = [];
98 17
        $this->buffer = '';
99 17
        $this->token = (new SqlToken())->type(SqlToken::TYPE_CODE)->content($this->sql);
100 17
        $this->tokenStack = new SplStack();
101 17
        $this->tokenStack->push($this->token);
102 17
        $this->token[] = (new SqlToken())->type(SqlToken::TYPE_STATEMENT);
103 17
        $this->tokenStack->push($this->token[0]);
104
        /** @var SqlToken */
105 17
        $this->currentToken = $this->tokenStack->top();
0 ignored issues
show
Bug introduced by
The property currentToken does not seem to exist on Yiisoft\Db\Sqlite\SqlToken.
Loading history...
106
107 17
        $length = 0;
108
109 17
        while (!$this->isEof()) {
110 17
            if ($this->isWhitespace($length) || $this->isComment($length)) {
111 17
                $this->addTokenFromBuffer();
112 17
                $this->advance($length);
113
114 17
                continue;
115
            }
116
117
            /** @psalm-suppress ConflictingReferenceConstraint */
118 17
            if ($this->tokenizeOperator($length) || $this->tokenizeDelimitedString($length)) {
119 17
                $this->advance($length);
120
121 17
                continue;
122
            }
123
124 17
            $this->buffer .= $this->substring(1);
125 17
            $this->advance(1);
126
        }
127
128 17
        $this->addTokenFromBuffer();
129
130
        if (
131 17
            $this->token->getHasChildren() &&
132 17
            $this->token[-1] instanceof SqlToken &&
133 17
            !$this->token[-1]->getHasChildren()
0 ignored issues
show
Bug introduced by
The method getHasChildren() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

133
            !$this->token[-1]->/** @scrutinizer ignore-call */ getHasChildren()

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
134
        ) {
135 5
            unset($this->token[-1]);
136
        }
137
138 17
        return $this->token;
139
    }
140
141
    /**
142
     * Returns whether there's a whitespace at the current offset.
143
     *
144
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
145
     *
146
     * @param int $length length of the matched string.
147
     *
148
     * @return bool whether there's a whitespace at the current offset.
149
     */
150
    abstract protected function isWhitespace(int &$length): bool;
151
152
    /**
153
     * Returns whether there's a commentary at the current offset.
154
     *
155
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
156
     *
157
     * @param int $length length of the matched string.
158
     *
159
     * @return bool whether there's a commentary at the current offset.
160
     */
161
    abstract protected function isComment(int &$length): bool;
162
163
    /**
164
     * Returns whether there's an operator at the current offset.
165
     *
166
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
167
     * also set `$content` to a string that will be used as a token content.
168
     *
169
     * @param int $length  length of the matched string.
170
     * @param string|null $content optional content instead of the matched string.
171
     *
172
     * @return bool whether there's an operator at the current offset.
173
     */
174
    abstract protected function isOperator(int &$length, ?string &$content): bool;
175
176
    /**
177
     * Returns whether there's an identifier at the current offset.
178
     *
179
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
180
     * also set `$content` to a string that will be used as a token content.
181
     *
182
     * @param int $length length of the matched string.
183
     * @param string|null $content optional content instead of the matched string.
184
     *
185
     * @return bool whether there's an identifier at the current offset.
186
     */
187
    abstract protected function isIdentifier(int &$length, ?string &$content): bool;
188
189
    /**
190
     * Returns whether there's a string literal at the current offset.
191
     *
192
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
193
     * also set `$content` to a string that will be used as a token content.
194
     *
195
     * @param int $length  length of the matched string.
196
     * @param string|null $content optional content instead of the matched string.
197
     *
198
     * @return bool whether there's a string literal at the current offset.
199
     */
200
    abstract protected function isStringLiteral(int &$length, ?string &$content): bool;
201
202
    /**
203
     * Returns whether the given string is a keyword.
204
     *
205
     * The method may set `$content` to a string that will be used as a token content.
206
     *
207
     * @param string $string  string to be matched.
208
     * @param string|null $content optional content instead of the matched string.
209
     *
210
     * @return bool whether the given string is a keyword.
211
     */
212
    abstract protected function isKeyword(string $string, ?string &$content): bool;
213
214
    /**
215
     * @param string $sql
216
     */
217
    public function setSql(string $sql): void
218
    {
219
        $this->sql = $sql;
220
    }
221
222
    /**
223
     * Returns whether the longest common prefix equals to the SQL code of the same length at the current offset.
224
     *
225
     * @param array $with strings to be tested. The method `will` modify this parameter to speed up lookups.
226
     * @param bool $caseSensitive whether to perform a case-sensitive comparison.
227
     * @param int $length length of the matched string.
228
     * @param string|null $content matched string.
229
     *
230
     * @return bool whether a match is found.
231
     *
232
     * @psalm-param array<array-key, string> $with
233
     */
234 17
    protected function startsWithAnyLongest(
235
        array $with,
236
        bool $caseSensitive,
237
        int &$length,
238
        ?string &$content = null
239
    ): bool {
240 17
        if (empty($with)) {
241
            return false;
242
        }
243
244 17
        if (!is_array(reset($with))) {
245 17
            usort($with, static function (string $string1, string $string2) {
246 17
                return mb_strlen($string2, 'UTF-8') - mb_strlen($string1, 'UTF-8');
247
            });
248
249 17
            $map = [];
250
251 17
            foreach ($with as $string) {
252 17
                $map[mb_strlen($string, 'UTF-8')][$caseSensitive ? $string : mb_strtoupper($string, 'UTF-8')] = true;
253
            }
254
255 17
            $with = $map;
256
        }
257
258
        /** @psalm-var array<int, array> $with */
259 17
        foreach ($with as $testLength => $testValues) {
260 17
            $content = $this->substring($testLength, $caseSensitive);
261
262 17
            if (isset($testValues[$content])) {
263 17
                $length = $testLength;
264 17
                return true;
265
            }
266
        }
267
268 17
        return false;
269
    }
270
271
    /**
272
     * Returns a string of the given length starting with the specified offset.
273
     *
274
     * @param int $length string length to be returned.
275
     * @param bool $caseSensitive if it's `false`, the string will be uppercase.
276
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
277
     *
278
     * @return string result string, it may be empty if there's nothing to return.
279
     */
280 17
    protected function substring(int $length, bool $caseSensitive = true, ?int $offset = null): string
281
    {
282 17
        if ($offset === null) {
283 17
            $offset = $this->offset;
284
        }
285
286 17
        if ($offset + $length > $this->length) {
287 17
            return '';
288
        }
289
290 17
        $cacheKey = $offset . ',' . $length;
291
292 17
        if (!isset($this->substrings[$cacheKey . ',1'])) {
293 17
            $this->substrings[$cacheKey . ',1'] = mb_substr($this->sql, $offset, $length, 'UTF-8');
294
        }
295
296 17
        if (!$caseSensitive && !isset($this->substrings[$cacheKey . ',0'])) {
297
            $this->substrings[$cacheKey . ',0'] = mb_strtoupper($this->substrings[$cacheKey . ',1'], 'UTF-8');
298
        }
299
300 17
        return $this->substrings[$cacheKey . ',' . (int) $caseSensitive];
301
    }
302
303
    /**
304
     * Returns an index after the given string in the SQL code starting with the specified offset.
305
     *
306
     * @param string $string string to be found.
307
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
308
     *
309
     * @return int index after the given string or end of string index.
310
     */
311 17
    protected function indexAfter(string $string, ?int $offset = null): int
312
    {
313 17
        if ($offset === null) {
314
            $offset = $this->offset;
315
        }
316
317 17
        if ($offset + mb_strlen($string, 'UTF-8') > $this->length) {
318
            return $this->length;
319
        }
320
321 17
        $afterIndexOf = mb_strpos($this->sql, $string, $offset, 'UTF-8');
322
323 17
        if ($afterIndexOf === false) {
324
            $afterIndexOf = $this->length;
325
        } else {
326 17
            $afterIndexOf += mb_strlen($string, 'UTF-8');
327
        }
328
329 17
        return $afterIndexOf;
330
    }
331
332
    /**
333
     * Determines whether there is a delimited string at the current offset and adds it to the token children.
334
     *
335
     * @param int $length
336
     *
337
     * @return bool
338
     */
339 17
    private function tokenizeDelimitedString(int &$length): bool
340
    {
341 17
        $isIdentifier = $this->isIdentifier($length, $content);
342 17
        $isStringLiteral = !$isIdentifier && $this->isStringLiteral($length, $content);
343
344 17
        if (!$isIdentifier && !$isStringLiteral) {
345 17
            return false;
346
        }
347
348 17
        $this->addTokenFromBuffer();
349
350 17
        $this->currentToken[] = (new SqlToken())
351 17
            ->type($isIdentifier ? SqlToken::TYPE_IDENTIFIER : SqlToken::TYPE_STRING_LITERAL)
352 17
            ->content(is_string($content) ? $content : $this->substring($length))
353 17
            ->startOffset($this->offset)
354 17
            ->endOffset($this->offset + $length);
355
356 17
        return true;
357
    }
358
359
    /**
360
     * Determines whether there is an operator at the current offset and adds it to the token children.
361
     *
362
     * @param int $length
363
     *
364
     * @return bool
365
     */
366 17
    private function tokenizeOperator(int &$length): bool
367
    {
368 17
        if (!$this->isOperator($length, $content)) {
369 17
            return false;
370
        }
371
372 17
        $this->addTokenFromBuffer();
373
374 17
        switch ($this->substring($length)) {
375 17
            case '(':
376 17
                $this->currentToken[] = (new SqlToken())
377 17
                    ->type(SqlToken::TYPE_OPERATOR)
378 17
                    ->content(is_string($content) ? $content : $this->substring($length))
379 17
                    ->startOffset($this->offset)
380 17
                    ->endOffset($this->offset + $length);
381 17
                $this->currentToken[] = (new SqlToken())->type(SqlToken::TYPE_PARENTHESIS);
382
383 17
                if ($this->currentToken[-1] !== null) {
384 17
                    $this->tokenStack->push($this->currentToken[-1]);
385
                }
386
387 17
                $this->currentToken = $this->tokenStack->top();
388
389 17
                break;
390
391 17
            case ')':
392 17
                $this->tokenStack->pop();
393 17
                $this->currentToken = $this->tokenStack->top();
394 17
                $this->currentToken[] = (new SqlToken())
395 17
                    ->type(SqlToken::TYPE_OPERATOR)
396 17
                    ->content(')')
397 17
                    ->startOffset($this->offset)
398 17
                    ->endOffset($this->offset + $length);
399
400 17
                break;
401 17
            case ';':
402 5
                if ($this->currentToken instanceof SqlToken && !$this->currentToken->getHasChildren()) {
403
                    break;
404
                }
405
406 5
                $this->currentToken[] = (new SqlToken())
407 5
                    ->type(SqlToken::TYPE_OPERATOR)
408 5
                    ->content(is_string($content) ? $content : $this->substring($length))
409 5
                    ->startOffset($this->offset)
410 5
                    ->endOffset($this->offset + $length);
411 5
                $this->tokenStack->pop();
412 5
                $this->currentToken = $this->tokenStack->top();
413 5
                $this->currentToken[] = (new SqlToken())->type(SqlToken::TYPE_STATEMENT);
414
415 5
                if ($this->currentToken[-1] instanceof SqlToken) {
416 5
                    $this->tokenStack->push($this->currentToken[-1]);
417
                }
418
419 5
                $this->currentToken = $this->tokenStack->top();
420
421 5
                break;
422
            default:
423 17
                $this->currentToken[] = (new SqlToken())
424 17
                    ->type(SqlToken::TYPE_OPERATOR)
425 17
                    ->content(is_string($content) ? $content : $this->substring($length))
426 17
                    ->startOffset($this->offset)
427 17
                    ->endOffset($this->offset + $length);
428
429 17
                break;
430
        }
431
432 17
        return true;
433
    }
434
435
    /**
436
     * Determines a type of text in the buffer, tokenizes it and adds it to the token children.
437
     */
438 17
    private function addTokenFromBuffer(): void
439
    {
440 17
        if ($this->buffer === '') {
441 17
            return;
442
        }
443
444 17
        $isKeyword = $this->isKeyword($this->buffer, $content);
445
446 17
        $this->currentToken[] = (new SqlToken())
447 17
            ->type($isKeyword ? SqlToken::TYPE_KEYWORD : SqlToken::TYPE_TOKEN)
448 17
            ->content(is_string($content) ? $content : $this->buffer)
449 17
            ->startOffset($this->offset - mb_strlen($this->buffer, 'UTF-8'))
450 17
            ->endOffset($this->offset);
451
452 17
        $this->buffer = '';
453
    }
454
455
    /**
456
     * Adds the specified length to the current offset.
457
     *
458
     * @param int $length
459
     *
460
     * @throws InvalidArgumentException
461
     */
462 17
    private function advance(int $length): void
463
    {
464 17
        if ($length <= 0) {
465
            throw new InvalidArgumentException('Length must be greater than 0.');
466
        }
467
468 17
        $this->offset += $length;
469 17
        $this->substrings = [];
470
    }
471
472
    /**
473
     * Returns whether the SQL code is completely traversed.
474
     *
475
     * @return bool
476
     */
477 17
    private function isEof(): bool
478
    {
479 17
        return $this->offset >= $this->length;
480
    }
481
}
482