Passed
Push — master ( 1265f7...db8617 )
by Alexander
01:41
created

BaseTokenizer   B

Complexity

Total Complexity 49

Size/Duplication

Total Lines 452
Duplicated Lines 0 %

Test Coverage

Coverage 93.67%

Importance

Changes 1
Bugs 0 Features 0
Metric Value
eloc 154
c 1
b 0
f 0
dl 0
loc 452
ccs 148
cts 158
cp 0.9367
rs 8.48
wmc 49

11 Methods

Rating   Name   Duplication   Size   Complexity  
A tokenizeDelimitedString() 0 21 6
B tokenizeOperator() 0 81 9
B tokenize() 0 44 8
A __construct() 0 3 1
A addTokenFromBuffer() 0 18 4
A advance() 0 8 2
A isEof() 0 3 1
A setSql() 0 3 1
B startsWithAnyLongest() 0 34 7
A indexAfter() 0 19 4
A substring() 0 20 6

How to fix   Complexity   

Complex Class

Complex classes like BaseTokenizer often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use BaseTokenizer, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
namespace Yiisoft\Db\Sqlite\Token;
6
7
/**
8
 * BaseTokenizer splits an SQL query into individual SQL tokens.
9
 *
10
 * It can be used to obtain an addition information from an SQL code.
11
 *
12
 * Usage example:
13
 *
14
 * ```php
15
 * $tokenizer = new SqlTokenizer("SELECT * FROM user WHERE id = 1");
16
 * $root = $tokeinzer->tokenize();
17
 * $sqlTokens = $root->getChildren();
18
 * ```
19
 *
20
 * Tokens are instances of {@see SqlToken}.
21
 */
22
abstract class BaseTokenizer
23
{
24
    /**
25
     * @var string SQL code.
26
     */
27
    private string $sql;
28
29
    /**
30
     * @var int SQL code string length.
31
     */
32
    protected int $length;
33
34
    /**
35
     * @var int SQL code string current offset.
36
     */
37
    protected int $offset;
38
39
    /**
40
     * @var \SplStack stack of active tokens.
41
     */
42
    private $tokenStack;
43
44
    /**
45
     * @var SqlToken|null active token. It's usually a top of the token stack.
46
     */
47
    private ?SqlToken $currentToken = null;
48
49
    /**
50
     * @var string[] cached substrings.
51
     */
52
    private array $substrings;
53
54
    /**
55
     * @var string string current buffer value.
56
     */
57
    private string $buffer = '';
58
59
    /**
60
     * @var SqlToken resulting token of a last {@see tokenize()} call.
61
     */
62
    private ?SqlToken $token = null;
63
64 17
    public function __construct(string $sql)
65
    {
66 17
        $this->sql = $sql;
67 17
    }
68
69
    /**
70
     * Tokenizes and returns a code type token.
71
     *
72
     * @return SqlToken code type token.
73
     */
74 17
    public function tokenize(): SqlToken
75
    {
76 17
        $this->length = mb_strlen($this->sql, 'UTF-8');
77 17
        $this->offset = 0;
78 17
        $this->substrings = [];
79 17
        $this->buffer = '';
80
81 17
        $this->token = new SqlToken();
82 17
        $this->token->setType(SqlToken::TYPE_CODE);
83 17
        $this->token->setContent($this->sql);
84
85 17
        $this->tokenStack = new \SplStack();
86 17
        $this->tokenStack->push($this->token);
87
88 17
        $tk = new SqlToken();
89 17
        $tk->setType(SqlToken::TYPE_STATEMENT);
90 17
        $this->token[] = $tk;
91
92 17
        $this->tokenStack->push($this->token[0]);
93 17
        $this->currentToken = $this->tokenStack->top();
94
95 17
        while (!$this->isEof()) {
96 17
            if ($this->isWhitespace($length) || $this->isComment($length)) {
97 17
                $this->addTokenFromBuffer();
98 17
                $this->advance($length);
99
100 17
                continue;
101
            }
102
103 17
            if ($this->tokenizeOperator($length) || $this->tokenizeDelimitedString($length)) {
104 17
                $this->advance($length);
105
106 17
                continue;
107
            }
108
109 17
            $this->buffer .= $this->substring(1);
110 17
            $this->advance(1);
111
        }
112 17
        $this->addTokenFromBuffer();
113 17
        if ($this->token->getHasChildren() && !$this->token[-1]->getHasChildren()) {
0 ignored issues
show
Bug introduced by
The method getHasChildren() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

113
        if ($this->token->getHasChildren() && !$this->token[-1]->/** @scrutinizer ignore-call */ getHasChildren()) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
114 5
            unset($this->token[-1]);
115
        }
116
117 17
        return $this->token;
118
    }
119
120
    /**
121
     * Returns whether there's a whitespace at the current offset.
122
     *
123
     * If this methos returns `true`, it has to set the `$length` parameter to the length of the matched string.
124
     *
125
     * @param int $length length of the matched string.
126
     *
127
     * @return bool whether there's a whitespace at the current offset.
128
     */
129
    abstract protected function isWhitespace(int &$length): bool;
130
131
    /**
132
     * Returns whether there's a commentary at the current offset.
133
     * If this methos returns `true`, it has to set the `$length` parameter to the length of the matched string.
134
     *
135
     * @param int $length length of the matched string.
136
     *
137
     * @return bool whether there's a commentary at the current offset.
138
     */
139
    abstract protected function isComment(int &$length): bool;
140
141
    /**
142
     * Returns whether there's an operator at the current offset.
143
     *
144
     * If this methos returns `true`, it has to set the `$length` parameter to the length of the matched string.
145
     * It may also set `$content` to a string that will be used as a token content.
146
     *
147
     * @param int $length  length of the matched string.
148
     * @param string|null $content optional content instead of the matched string.
149
     *
150
     * @return bool whether there's an operator at the current offset.
151
     */
152
    abstract protected function isOperator(int &$length, ?string &$content): bool;
153
154
    /**
155
     * Returns whether there's an identifier at the current offset.
156
     *
157
     * If this methos returns `true`, it has to set the `$length` parameter to the length of the matched string.
158
     * It may also set `$content` to a string that will be used as a token content.
159
     *
160
     * @param int $length length of the matched string.
161
     * @param string|null $content optional content instead of the matched string.
162
     *
163
     * @return bool whether there's an identifier at the current offset.
164
     */
165
    abstract protected function isIdentifier(int &$length, ?string &$content): bool;
166
167
    /**
168
     * Returns whether there's a string literal at the current offset.
169
     *
170
     * If this methos returns `true`, it has to set the `$length` parameter to the length of the matched string.
171
     * It may also set `$content` to a string that will be used as a token content.
172
     *
173
     * @param int $length  length of the matched string.
174
     * @param string|null $content optional content instead of the matched string.
175
     *
176
     * @return bool whether there's a string literal at the current offset.
177
     */
178
    abstract protected function isStringLiteral(int &$length, ?string &$content): bool;
179
180
    /**
181
     * Returns whether the given string is a keyword.
182
     *
183
     * The method may set `$content` to a string that will be used as a token content.
184
     *
185
     * @param string $string  string to be matched.
186
     * @param string|null $content optional content instead of the matched string.
187
     *
188
     * @return bool whether the given string is a keyword.
189
     */
190
    abstract protected function isKeyword(string $string, ?string &$content): bool;
191
192
    /**
193
     * @param string $sql
194
     */
195
    public function setSql(string $sql): void
196
    {
197
        $this->sql = $sql;
198
    }
199
200
    /**
201
     * Returns whether the longest common prefix equals to the SQL code of the same length at the current offset.
202
     *
203
     * @param string[] $with strings to be tested. The method **will** modify this parameter to speed up lookups.
204
     * @param bool $caseSensitive whether to perform a case sensitive comparison.
205
     * @param int|null $length length of the matched string.
206
     * @param string|null $content matched string.
207
     *
208
     * @return bool whether a match is found.
209
     */
210 17
    protected function startsWithAnyLongest(
211
        array &$with,
212
        bool $caseSensitive,
213
        ?int &$length = null,
214
        ?string &$content = null
215
    ): bool {
216 17
        if (empty($with)) {
217
            return false;
218
        }
219
220 17
        if (!is_array(reset($with))) {
221
            usort($with, function ($string1, $string2) {
222 1
                return mb_strlen($string2, 'UTF-8') - mb_strlen($string1, 'UTF-8');
223 1
            });
224
225 1
            $map = [];
226
227 1
            foreach ($with as $string) {
228 1
                $map[mb_strlen($string, 'UTF-8')][$caseSensitive ? $string : mb_strtoupper($string, 'UTF-8')] = true;
229
            }
230
231 1
            $with = $map;
232
        }
233 17
        foreach ($with as $testLength => $testValues) {
234 17
            $content = $this->substring($testLength, $caseSensitive);
235
236 17
            if (isset($testValues[$content])) {
237 17
                $length = $testLength;
238
239 17
                return true;
240
            }
241
        }
242
243 17
        return false;
244
    }
245
246
    /**
247
     * Returns a string of the given length starting with the specified offset.
248
     *
249
     * @param int $length string length to be returned.
250
     * @param bool $caseSensitive if it's `false`, the string will be uppercased.
251
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
252
     *
253
     * @return string result string, it may be empty if there's nothing to return.
254
     */
255 17
    protected function substring(int $length, bool $caseSensitive = true, ?int $offset = null)
256
    {
257 17
        if ($offset === null) {
258 17
            $offset = $this->offset;
259
        }
260
261 17
        if ($offset + $length > $this->length) {
262 17
            return '';
263
        }
264
265 17
        $cacheKey = $offset . ',' . $length;
266
267 17
        if (!isset($this->substrings[$cacheKey . ',1'])) {
268 17
            $this->substrings[$cacheKey . ',1'] = mb_substr($this->sql, $offset, $length, 'UTF-8');
269
        }
270 17
        if (!$caseSensitive && !isset($this->substrings[$cacheKey . ',0'])) {
271
            $this->substrings[$cacheKey . ',0'] = mb_strtoupper($this->substrings[$cacheKey . ',1'], 'UTF-8');
272
        }
273
274 17
        return $this->substrings[$cacheKey . ',' . (int) $caseSensitive];
275
    }
276
277
    /**
278
     * Returns an index after the given string in the SQL code starting with the specified offset.
279
     *
280
     * @param string $string string to be found.
281
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
282
     *
283
     * @return int index after the given string or end of string index.
284
     */
285 17
    protected function indexAfter(string $string, ?int $offset = null): int
286
    {
287 17
        if ($offset === null) {
288
            $offset = $this->offset;
289
        }
290
291 17
        if ($offset + mb_strlen($string, 'UTF-8') > $this->length) {
292
            return $this->length;
293
        }
294
295 17
        $afterIndexOf = mb_strpos($this->sql, $string, $offset, 'UTF-8');
296
297 17
        if ($afterIndexOf === false) {
298
            $afterIndexOf = $this->length;
299
        } else {
300 17
            $afterIndexOf += mb_strlen($string, 'UTF-8');
301
        }
302
303 17
        return $afterIndexOf;
304
    }
305
306
    /**
307
     * Determines whether there is a delimited string at the current offset and adds it to the token children.
308
     *
309
     * @param int $length
310
     *
311
     * @return bool
312
     */
313 17
    private function tokenizeDelimitedString(int &$length): bool
314
    {
315 17
        $isIdentifier = $this->isIdentifier($length, $content);
316 17
        $isStringLiteral = !$isIdentifier && $this->isStringLiteral($length, $content);
317
318 17
        if (!$isIdentifier && !$isStringLiteral) {
319 17
            return false;
320
        }
321
322 17
        $this->addTokenFromBuffer();
323
324 17
        $tk = new SqlToken();
325
326 17
        $tk->setType($isIdentifier ? SqlToken::TYPE_IDENTIFIER : SqlToken::TYPE_STRING_LITERAL);
327 17
        $tk->setContent(\is_string($content) ? $content : $this->substring($length));
328 17
        $tk->setStartOffset($this->offset);
329 17
        $tk->setEndOffset($this->offset + $length);
330
331 17
        $this->currentToken[] = $tk;
332
333 17
        return true;
334
    }
335
336
    /**
337
     * Determines whether there is an operator at the current offset and adds it to the token children.
338
     *
339
     * @param int $length
340
     *
341
     * @return bool
342
     */
343 17
    private function tokenizeOperator(int &$length): bool
344
    {
345 17
        if (!$this->isOperator($length, $content)) {
346 17
            return false;
347
        }
348
349 17
        $this->addTokenFromBuffer();
350
351 17
        switch ($this->substring($length)) {
352 17
            case '(':
353 17
                $tk = new SqlToken();
354
355 17
                $tk->setType(SqlToken::TYPE_OPERATOR);
356 17
                $tk->setContent(\is_string($content) ? $content : $this->substring($length));
357 17
                $tk->setStartOffset($this->offset);
358 17
                $tk->setEndOffset($this->offset + $length);
359
360 17
                $this->currentToken[] = $tk;
361
362 17
                $tk1 = new SqlToken();
363 17
                $tk1->setType(SqlToken::TYPE_PARENTHESIS);
364 17
                $this->currentToken[] = $tk1;
365
366 17
                $this->tokenStack->push($this->currentToken[-1]);
367 17
                $this->currentToken = $this->tokenStack->top();
368
369 17
                break;
370
371 17
            case ')':
372 17
                $this->tokenStack->pop();
373 17
                $this->currentToken = $this->tokenStack->top();
374
375 17
                $tk = new SqlToken();
376
377 17
                $tk->setType(SqlToken::TYPE_OPERATOR);
378 17
                $tk->setContent(')');
379 17
                $tk->setStartOffset($this->offset);
380 17
                $tk->setEndOffset($this->offset + $length);
381
382 17
                $this->currentToken[] = $tk;
383
384 17
                break;
385 17
            case ';':
386 5
                if (!$this->currentToken->getHasChildren()) {
0 ignored issues
show
Bug introduced by
The method getHasChildren() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

386
                if (!$this->currentToken->/** @scrutinizer ignore-call */ getHasChildren()) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
387
                    break;
388
                }
389
390 5
                $tk = new SqlToken();
391
392 5
                $tk->setType(SqlToken::TYPE_OPERATOR);
393 5
                $tk->setContent(\is_string($content) ? $content : $this->substring($length));
394 5
                $tk->setStartOffset($this->offset);
395 5
                $tk->setEndOffset($this->offset + $length);
396
397 5
                $this->currentToken[] = $tk;
398
399 5
                $this->tokenStack->pop();
400 5
                $this->currentToken = $this->tokenStack->top();
401
402 5
                $tk1 = new SqlToken();
403 5
                $tk1->setType(SqlToken::TYPE_STATEMENT);
404
405 5
                $this->currentToken[] = $tk1;
406 5
                $this->tokenStack->push($this->currentToken[-1]);
407 5
                $this->currentToken = $this->tokenStack->top();
408
409 5
                break;
410
            default:
411 17
                $tk = new SqlToken();
412
413 17
                $tk->setType(SqlToken::TYPE_OPERATOR);
414 17
                $tk->setContent(\is_string($content) ? $content : $this->substring($length));
415 17
                $tk->setStartOffset($this->offset);
416 17
                $tk->setEndOffset($this->offset + $length);
417
418 17
                $this->currentToken[] = $tk;
419
420 17
                break;
421
        }
422
423 17
        return true;
424
    }
425
426
    /**
427
     * Determines a type of text in the buffer, tokenizes it and adds it to the token children.
428
     */
429 17
    private function addTokenFromBuffer(): void
430
    {
431 17
        if ($this->buffer === '') {
432 17
            return;
433
        }
434
435 17
        $isKeyword = $this->isKeyword($this->buffer, $content);
436
437 17
        $tk = new SqlToken();
438
439 17
        $tk->setType($isKeyword ? SqlToken::TYPE_KEYWORD : SqlToken::TYPE_TOKEN);
440 17
        $tk->setContent(\is_string($content) ? $content : $this->buffer);
441 17
        $tk->setStartOffset($this->offset - mb_strlen($this->buffer, 'UTF-8'));
442 17
        $tk->setEndOffset($this->offset);
443
444 17
        $this->currentToken[] = $tk;
445
446 17
        $this->buffer = '';
447 17
    }
448
449
    /**
450
     * Adds the specified length to the current offset.
451
     *
452
     * @param int $length
453
     *
454
     * @throws \InvalidArgumentException
455
     */
456 17
    private function advance(int $length): void
457
    {
458 17
        if ($length <= 0) {
459
            throw new \InvalidArgumentException('Length must be greater than 0.');
460
        }
461
462 17
        $this->offset += $length;
463 17
        $this->substrings = [];
464 17
    }
465
466
    /**
467
     * Returns whether the SQL code is completely traversed.
468
     *
469
     * @return bool
470
     */
471 17
    private function isEof(): bool
472
    {
473 17
        return $this->offset >= $this->length;
474
    }
475
}
476