Passed
Push — master ( 21771a...749afb )
by Wilmer
18:31 queued 03:34
created

BaseTokenizer   B

Complexity

Total Complexity 49

Size/Duplication

Total Lines 449
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
wmc 49
eloc 153
dl 0
loc 449
rs 8.48
c 0
b 0
f 0

11 Methods

Rating   Name   Duplication   Size   Complexity  
A isEof() 0 3 1
A setSql() 0 3 1
A advance() 0 8 2
A substring() 0 20 6
A __construct() 0 3 1
A indexAfter() 0 19 4
B tokenize() 0 45 8
B startsWithAnyLongest() 0 34 7
B tokenizeOperator() 0 78 9
A addTokenFromBuffer() 0 17 4
A tokenizeDelimitedString() 0 20 6

How to fix   Complexity   

Complex Class

Complex classes like BaseTokenizer often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use BaseTokenizer, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
namespace Yiisoft\Db\Sqlite;
6
7
use SplStack;
8
use Yiisoft\Db\Exception\InvalidArgumentException;
9
10
use function is_array;
11
use function is_string;
12
use function mb_strlen;
13
use function mb_strpos;
14
use function mb_strtoupper;
15
use function mb_substr;
16
use function reset;
17
use function usort;
18
19
/**
20
 * BaseTokenizer splits an SQL query into individual SQL tokens.
21
 *
22
 * It can be used to obtain an addition information from an SQL code.
23
 *
24
 * Usage example:
25
 *
26
 * ```php
27
 * $tokenizer = new SqlTokenizer("SELECT * FROM user WHERE id = 1");
28
 * $root = $tokeinzer->tokenize();
29
 * $sqlTokens = $root->getChildren();
30
 * ```
31
 *
32
 * Tokens are instances of {@see SqlToken}.
33
 */
34
abstract class BaseTokenizer
35
{
36
    /**
37
     * @var string SQL code.
38
     */
39
    private string $sql;
40
41
    /**
42
     * @var int SQL code string length.
43
     */
44
    protected int $length;
45
46
    /**
47
     * @var int SQL code string current offset.
48
     */
49
    protected int $offset;
50
51
    /**
52
     * @var SplStack stack of active tokens.
53
     */
54
    private SplStack $tokenStack;
55
56
    /**
57
     * @var SqlToken|null active token. It's usually a top of the token stack.
58
     */
59
    private ?SqlToken $currentToken = null;
60
61
    /**
62
     * @var string[] cached substrings.
63
     */
64
    private array $substrings;
65
66
    /**
67
     * @var string string current buffer value.
68
     */
69
    private string $buffer = '';
70
71
    /**
72
     * @var SqlToken resulting token of a last {@see tokenize()} call.
73
     */
74
    private ?SqlToken $token = null;
75
76
    public function __construct(string $sql)
77
    {
78
        $this->sql = $sql;
79
    }
80
81
    /**
82
     * Tokenizes and returns a code type token.
83
     *
84
     * @return SqlToken code type token.
85
     */
86
    public function tokenize(): SqlToken
87
    {
88
        $this->length = mb_strlen($this->sql, 'UTF-8');
89
        $this->offset = 0;
90
        $this->substrings = [];
91
        $this->buffer = '';
92
93
        $this->token = (new SqlToken())
94
            ->type(SqlToken::TYPE_CODE)
95
            ->content($this->sql);
96
97
        $this->tokenStack = new SplStack();
98
        $this->tokenStack->push($this->token);
99
100
        $tk = (new SqlToken())
101
            ->type(SqlToken::TYPE_STATEMENT);
102
103
        $this->token[] = $tk;
104
105
        $this->tokenStack->push($this->token[0]);
106
        $this->currentToken = $this->tokenStack->top();
107
108
        while (!$this->isEof()) {
109
            if ($this->isWhitespace($length) || $this->isComment($length)) {
110
                $this->addTokenFromBuffer();
111
                $this->advance($length);
112
113
                continue;
114
            }
115
116
            if ($this->tokenizeOperator($length) || $this->tokenizeDelimitedString($length)) {
117
                $this->advance($length);
118
119
                continue;
120
            }
121
122
            $this->buffer .= $this->substring(1);
123
            $this->advance(1);
124
        }
125
        $this->addTokenFromBuffer();
126
        if ($this->token->getHasChildren() && !$this->token[-1]->getHasChildren()) {
0 ignored issues
show
Bug introduced by
The method getHasChildren() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

126
        if ($this->token->getHasChildren() && !$this->token[-1]->/** @scrutinizer ignore-call */ getHasChildren()) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
127
            unset($this->token[-1]);
128
        }
129
130
        return $this->token;
131
    }
132
133
    /**
134
     * Returns whether there's a whitespace at the current offset.
135
     *
136
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
137
     *
138
     * @param int|null $length length of the matched string.
139
     *
140
     * @return bool whether there's a whitespace at the current offset.
141
     */
142
    abstract protected function isWhitespace(?int &$length): bool;
143
144
    /**
145
     * Returns whether there's a commentary at the current offset.
146
     *
147
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
148
     *
149
     * @param int $length length of the matched string.
150
     *
151
     * @return bool whether there's a commentary at the current offset.
152
     */
153
    abstract protected function isComment(int &$length): bool;
154
155
    /**
156
     * Returns whether there's an operator at the current offset.
157
     *
158
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
159
     * also set `$content` to a string that will be used as a token content.
160
     *
161
     * @param int $length  length of the matched string.
162
     * @param string|null $content optional content instead of the matched string.
163
     *
164
     * @return bool whether there's an operator at the current offset.
165
     */
166
    abstract protected function isOperator(int &$length, ?string &$content): bool;
167
168
    /**
169
     * Returns whether there's an identifier at the current offset.
170
     *
171
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
172
     * also set `$content` to a string that will be used as a token content.
173
     *
174
     * @param int $length length of the matched string.
175
     * @param string|null $content optional content instead of the matched string.
176
     *
177
     * @return bool whether there's an identifier at the current offset.
178
     */
179
    abstract protected function isIdentifier(int &$length, ?string &$content): bool;
180
181
    /**
182
     * Returns whether there's a string literal at the current offset.
183
     *
184
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
185
     * also set `$content` to a string that will be used as a token content.
186
     *
187
     * @param int $length  length of the matched string.
188
     * @param string|null $content optional content instead of the matched string.
189
     *
190
     * @return bool whether there's a string literal at the current offset.
191
     */
192
    abstract protected function isStringLiteral(int &$length, ?string &$content): bool;
193
194
    /**
195
     * Returns whether the given string is a keyword.
196
     *
197
     * The method may set `$content` to a string that will be used as a token content.
198
     *
199
     * @param string $string  string to be matched.
200
     * @param string|null $content optional content instead of the matched string.
201
     *
202
     * @return bool whether the given string is a keyword.
203
     */
204
    abstract protected function isKeyword(string $string, ?string &$content): bool;
205
206
    /**
207
     * @param string $sql
208
     */
209
    public function setSql(string $sql): void
210
    {
211
        $this->sql = $sql;
212
    }
213
214
    /**
215
     * Returns whether the longest common prefix equals to the SQL code of the same length at the current offset.
216
     *
217
     * @param string[] $with strings to be tested. The method **will** modify this parameter to speed up lookups.
218
     * @param bool $caseSensitive whether to perform a case sensitive comparison.
219
     * @param int|null $length length of the matched string.
220
     * @param string|null $content matched string.
221
     *
222
     * @return bool whether a match is found.
223
     */
224
    protected function startsWithAnyLongest(
225
        array &$with,
226
        bool $caseSensitive,
227
        ?int &$length = null,
228
        ?string &$content = null
229
    ): bool {
230
        if (empty($with)) {
231
            return false;
232
        }
233
234
        if (!is_array(reset($with))) {
235
            usort($with, static function ($string1, $string2) {
236
                return mb_strlen($string2, 'UTF-8') - mb_strlen($string1, 'UTF-8');
237
            });
238
239
            $map = [];
240
241
            foreach ($with as $string) {
242
                $map[mb_strlen($string, 'UTF-8')][$caseSensitive ? $string : mb_strtoupper($string, 'UTF-8')] = true;
243
            }
244
245
            $with = $map;
246
        }
247
        foreach ($with as $testLength => $testValues) {
248
            $content = $this->substring($testLength, $caseSensitive);
249
250
            if (isset($testValues[$content])) {
251
                $length = $testLength;
252
253
                return true;
254
            }
255
        }
256
257
        return false;
258
    }
259
260
    /**
261
     * Returns a string of the given length starting with the specified offset.
262
     *
263
     * @param int $length string length to be returned.
264
     * @param bool $caseSensitive if it's `false`, the string will be uppercased.
265
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
266
     *
267
     * @return string result string, it may be empty if there's nothing to return.
268
     */
269
    protected function substring(int $length, bool $caseSensitive = true, ?int $offset = null): string
270
    {
271
        if ($offset === null) {
272
            $offset = $this->offset;
273
        }
274
275
        if ($offset + $length > $this->length) {
276
            return '';
277
        }
278
279
        $cacheKey = $offset . ',' . $length;
280
281
        if (!isset($this->substrings[$cacheKey . ',1'])) {
282
            $this->substrings[$cacheKey . ',1'] = mb_substr($this->sql, $offset, $length, 'UTF-8');
283
        }
284
        if (!$caseSensitive && !isset($this->substrings[$cacheKey . ',0'])) {
285
            $this->substrings[$cacheKey . ',0'] = mb_strtoupper($this->substrings[$cacheKey . ',1'], 'UTF-8');
286
        }
287
288
        return $this->substrings[$cacheKey . ',' . (int) $caseSensitive];
289
    }
290
291
    /**
292
     * Returns an index after the given string in the SQL code starting with the specified offset.
293
     *
294
     * @param string $string string to be found.
295
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
296
     *
297
     * @return int index after the given string or end of string index.
298
     */
299
    protected function indexAfter(string $string, ?int $offset = null): int
300
    {
301
        if ($offset === null) {
302
            $offset = $this->offset;
303
        }
304
305
        if ($offset + mb_strlen($string, 'UTF-8') > $this->length) {
306
            return $this->length;
307
        }
308
309
        $afterIndexOf = mb_strpos($this->sql, $string, $offset, 'UTF-8');
310
311
        if ($afterIndexOf === false) {
312
            $afterIndexOf = $this->length;
313
        } else {
314
            $afterIndexOf += mb_strlen($string, 'UTF-8');
315
        }
316
317
        return $afterIndexOf;
318
    }
319
320
    /**
321
     * Determines whether there is a delimited string at the current offset and adds it to the token children.
322
     *
323
     * @param int $length
324
     *
325
     * @return bool
326
     */
327
    private function tokenizeDelimitedString(int &$length): bool
328
    {
329
        $isIdentifier = $this->isIdentifier($length, $content);
330
        $isStringLiteral = !$isIdentifier && $this->isStringLiteral($length, $content);
331
332
        if (!$isIdentifier && !$isStringLiteral) {
333
            return false;
334
        }
335
336
        $this->addTokenFromBuffer();
337
338
        $tk = (new SqlToken())
339
            ->type($isIdentifier ? SqlToken::TYPE_IDENTIFIER : SqlToken::TYPE_STRING_LITERAL)
340
            ->content(is_string($content) ? $content : $this->substring($length))
341
            ->startOffset($this->offset)
342
            ->endOffset($this->offset + $length);
343
344
        $this->currentToken[] = $tk;
345
346
        return true;
347
    }
348
349
    /**
350
     * Determines whether there is an operator at the current offset and adds it to the token children.
351
     *
352
     * @param int $length
353
     *
354
     * @return bool
355
     */
356
    private function tokenizeOperator(int &$length): bool
357
    {
358
        if (!$this->isOperator($length, $content)) {
359
            return false;
360
        }
361
362
        $this->addTokenFromBuffer();
363
364
        switch ($this->substring($length)) {
365
            case '(':
366
                $tk = (new SqlToken())
367
                    ->type(SqlToken::TYPE_OPERATOR)
368
                    ->content(is_string($content) ? $content : $this->substring($length))
369
                    ->startOffset($this->offset)
370
                    ->endOffset($this->offset + $length);
371
372
                $this->currentToken[] = $tk;
373
374
                $tk1 = (new SqlToken())
375
                    ->type(SqlToken::TYPE_PARENTHESIS);
376
377
                $this->currentToken[] = $tk1;
378
379
                $this->tokenStack->push($this->currentToken[-1]);
380
                $this->currentToken = $this->tokenStack->top();
381
382
                break;
383
384
            case ')':
385
                $this->tokenStack->pop();
386
                $this->currentToken = $this->tokenStack->top();
387
388
                $tk = (new SqlToken())
389
                    ->type(SqlToken::TYPE_OPERATOR)
390
                    ->content(')')
391
                    ->startOffset($this->offset)
392
                    ->endOffset($this->offset + $length);
393
394
                $this->currentToken[] = $tk;
395
396
                break;
397
            case ';':
398
                if (!$this->currentToken->getHasChildren()) {
0 ignored issues
show
Bug introduced by
The method getHasChildren() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

398
                if (!$this->currentToken->/** @scrutinizer ignore-call */ getHasChildren()) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
399
                    break;
400
                }
401
402
                $tk = (new SqlToken())
403
                    ->type(SqlToken::TYPE_OPERATOR)
404
                    ->content(is_string($content) ? $content : $this->substring($length))
405
                    ->startOffset($this->offset)
406
                    ->endOffset($this->offset + $length);
407
408
                $this->currentToken[] = $tk;
409
410
                $this->tokenStack->pop();
411
                $this->currentToken = $this->tokenStack->top();
412
413
                $tk1 = (new SqlToken())
414
                    ->type(SqlToken::TYPE_STATEMENT);
415
416
                $this->currentToken[] = $tk1;
417
                $this->tokenStack->push($this->currentToken[-1]);
418
                $this->currentToken = $this->tokenStack->top();
419
420
                break;
421
            default:
422
                $tk = (new SqlToken())
423
                    ->type(SqlToken::TYPE_OPERATOR)
424
                    ->content(is_string($content) ? $content : $this->substring($length))
425
                    ->startOffset($this->offset)
426
                    ->endOffset($this->offset + $length);
427
428
                $this->currentToken[] = $tk;
429
430
                break;
431
        }
432
433
        return true;
434
    }
435
436
    /**
437
     * Determines a type of text in the buffer, tokenizes it and adds it to the token children.
438
     */
439
    private function addTokenFromBuffer(): void
440
    {
441
        if ($this->buffer === '') {
442
            return;
443
        }
444
445
        $isKeyword = $this->isKeyword($this->buffer, $content);
446
447
        $tk = (new SqlToken())
448
            ->type($isKeyword ? SqlToken::TYPE_KEYWORD : SqlToken::TYPE_TOKEN)
449
            ->content(is_string($content) ? $content : $this->buffer)
450
            ->startOffset($this->offset - mb_strlen($this->buffer, 'UTF-8'))
451
            ->endOffset($this->offset);
452
453
        $this->currentToken[] = $tk;
454
455
        $this->buffer = '';
456
    }
457
458
    /**
459
     * Adds the specified length to the current offset.
460
     *
461
     * @param int $length
462
     *
463
     * @throws InvalidArgumentException
464
     */
465
    private function advance(int $length): void
466
    {
467
        if ($length <= 0) {
468
            throw new InvalidArgumentException('Length must be greater than 0.');
469
        }
470
471
        $this->offset += $length;
472
        $this->substrings = [];
473
    }
474
475
    /**
476
     * Returns whether the SQL code is completely traversed.
477
     *
478
     * @return bool
479
     */
480
    private function isEof(): bool
481
    {
482
        return $this->offset >= $this->length;
483
    }
484
}
485