Passed
Branch master (e82d75)
by Wilmer
07:17 queued 02:54
created

BaseTokenizer   C

Complexity

Total Complexity 53

Size/Duplication

Total Lines 429
Duplicated Lines 0 %

Test Coverage

Coverage 93.75%

Importance

Changes 0
Metric Value
wmc 53
eloc 142
c 0
b 0
f 0
dl 0
loc 429
ccs 135
cts 144
cp 0.9375
rs 6.96

11 Methods

Rating   Name   Duplication   Size   Complexity  
B tokenize() 0 46 9
A isEof() 0 3 1
A setSql() 0 3 1
A advance() 0 8 2
A substring() 0 21 6
A __construct() 0 6 1
A indexAfter() 0 19 4
B startsWithAnyLongest() 0 33 7
C tokenizeOperator() 0 67 12
A addTokenFromBuffer() 0 15 4
A tokenizeDelimitedString() 0 18 6

How to fix   Complexity   

Complex Class

Complex classes like BaseTokenizer often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use BaseTokenizer, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
namespace Yiisoft\Db\Sqlite;
6
7
use SplStack;
8
use Yiisoft\Db\Exception\InvalidArgumentException;
9
10
use function is_array;
11
use function is_string;
12
use function mb_strlen;
13
use function mb_strpos;
14
use function mb_strtoupper;
15
use function mb_substr;
16
use function reset;
17
use function usort;
18
19
/**
20
 * BaseTokenizer splits an SQL query into individual SQL tokens.
21
 *
22
 * It can be used to obtain an addition information from an SQL code.
23
 *
24
 * Usage example:
25
 *
26
 * ```php
27
 * $tokenizer = new SqlTokenizer("SELECT * FROM user WHERE id = 1");
28
 * $root = $tokenizer->tokenize();
29
 * $sqlTokens = $root->getChildren();
30
 * ```
31
 *
32
 * Tokens are instances of {@see SqlToken}.
33
 */
34
abstract class BaseTokenizer
35
{
36
    /**
37
     * @var int SQL code string length.
38
     */
39
    protected int $length = 0;
40
41
    /**
42
     * @var int SQL code string current offset.
43
     */
44
    protected int $offset = 0;
45
46
    /**
47
     * @var SplStack of active tokens.
48
     *
49
     * @psalm-var SplStack<SqlToken>
50
     * @psalm-suppress PropertyNotSetInConstructor
51
     */
52
    private SplStack $tokenStack;
53
54
    /**
55
     * @psalm-var SqlToken|SqlToken[] active token. It's usually a top of the token stack.
56
     *
57
     * @psalm-suppress PropertyNotSetInConstructor
58
     */
59
    private array|SqlToken $currentToken;
60
61
    /**
62
     * @var string[] cached substrings.
63
     */
64
    private array $substrings = [];
65
66
    /**
67
     * @var string string current buffer value.
68
     */
69
    private string $buffer = '';
70
71
    /**
72
     * @var SqlToken|null resulting token of a last {@see tokenize()} call.
73
     */
74
    private SqlToken|null $token = null;
75
76 21
    public function __construct(
77
        /**
78
         * @var string SQL code.
79
         */
80
        private string $sql
81
    ) {
82 21
    }
83
84
    /**
85
     * Tokenizes and returns a code type token.
86
     *
87
     * @throws InvalidArgumentException
88
     *
89
     * @return SqlToken code type token.
90
     *
91
     * @psalm-suppress MixedPropertyTypeCoercion
92
     */
93 21
    public function tokenize(): SqlToken
94
    {
95 21
        $this->length = mb_strlen($this->sql, 'UTF-8');
96 21
        $this->offset = 0;
97 21
        $this->substrings = [];
98 21
        $this->buffer = '';
99 21
        $this->token = (new SqlToken())->type(SqlToken::TYPE_CODE)->content($this->sql);
100 21
        $this->tokenStack = new SplStack();
101 21
        $this->tokenStack->push($this->token);
102 21
        $this->token[] = (new SqlToken())->type(SqlToken::TYPE_STATEMENT);
103 21
        $this->tokenStack->push($this->token[0]);
104
        /** @var SqlToken */
105 21
        $this->currentToken = $this->tokenStack->top();
0 ignored issues
show
Bug introduced by
The property currentToken does not seem to exist on Yiisoft\Db\Sqlite\SqlToken.
Loading history...
106
107 21
        $length = 0;
108
109 21
        while (!$this->isEof()) {
110 21
            if ($this->isWhitespace($length) || $this->isComment($length)) {
111 21
                $this->addTokenFromBuffer();
112 21
                $this->advance($length);
113
114 21
                continue;
115
            }
116
117
            /** @psalm-suppress ConflictingReferenceConstraint */
118 21
            if ($this->tokenizeOperator($length) || $this->tokenizeDelimitedString($length)) {
119 21
                $this->advance($length);
120
121 21
                continue;
122
            }
123
124 21
            $this->buffer .= $this->substring(1);
125 21
            $this->advance(1);
126
        }
127
128 21
        $this->addTokenFromBuffer();
129
130
        if (
131 21
            $this->token->getHasChildren() &&
132 21
            $this->token[-1] instanceof SqlToken &&
133 21
            !$this->token[-1]->getHasChildren()
0 ignored issues
show
Bug introduced by
The method getHasChildren() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

133
            !$this->token[-1]->/** @scrutinizer ignore-call */ getHasChildren()

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
134
        ) {
135 5
            unset($this->token[-1]);
136
        }
137
138 21
        return $this->token;
139
    }
140
141
    /**
142
     * Returns whether there's a whitespace at the current offset.
143
     *
144
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
145
     *
146
     * @param int $length length of the matched string.
147
     *
148
     * @return bool whether there's a whitespace at the current offset.
149
     */
150
    abstract protected function isWhitespace(int &$length): bool;
151
152
    /**
153
     * Returns whether there's a commentary at the current offset.
154
     *
155
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
156
     *
157
     * @param int $length length of the matched string.
158
     *
159
     * @return bool whether there's a commentary at the current offset.
160
     */
161
    abstract protected function isComment(int &$length): bool;
162
163
    /**
164
     * Returns whether there's an operator at the current offset.
165
     *
166
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
167
     * also set `$content` to a string that will be used as a token content.
168
     *
169
     * @param int $length  length of the matched string.
170
     * @param string|null $content optional content instead of the matched string.
171
     *
172
     * @return bool whether there's an operator at the current offset.
173
     */
174
    abstract protected function isOperator(int &$length, string|null &$content): bool;
175
176
    /**
177
     * Returns whether there's an identifier at the current offset.
178
     *
179
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
180
     * also set `$content` to a string that will be used as a token content.
181
     *
182
     * @param int $length length of the matched string.
183
     * @param string|null $content optional content instead of the matched string.
184
     *
185
     * @return bool whether there's an identifier at the current offset.
186
     */
187
    abstract protected function isIdentifier(int &$length, string|null &$content): bool;
188
189
    /**
190
     * Returns whether there's a string literal at the current offset.
191
     *
192
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
193
     * also set `$content` to a string that will be used as a token content.
194
     *
195
     * @param int $length  length of the matched string.
196
     * @param string|null $content optional content instead of the matched string.
197
     *
198
     * @return bool whether there's a string literal at the current offset.
199
     */
200
    abstract protected function isStringLiteral(int &$length, string|null &$content): bool;
201
202
    /**
203
     * Returns whether the given string is a keyword.
204
     *
205
     * The method may set `$content` to a string that will be used as a token content.
206
     *
207
     * @param string $string  string to be matched.
208
     * @param string|null $content optional content instead of the matched string.
209
     *
210
     * @return bool whether the given string is a keyword.
211
     */
212
    abstract protected function isKeyword(string $string, string|null &$content): bool;
213
214
    public function setSql(string $sql): void
215
    {
216
        $this->sql = $sql;
217
    }
218
219
    /**
220
     * Returns whether the longest common prefix equals to the SQL code of the same length at the current offset.
221
     *
222
     * @param array $with strings to be tested. The method `will` modify this parameter to speed up lookups.
223
     * @param bool $caseSensitive whether to perform a case-sensitive comparison.
224
     * @param int $length length of the matched string.
225
     * @param string|null $content matched string.
226
     *
227
     * @return bool whether a match is found.
228
     *
229
     * @psalm-param array<array-key, string> $with
230
     */
231 21
    protected function startsWithAnyLongest(
232
        array $with,
233
        bool $caseSensitive,
234
        int &$length,
235
        string &$content = null
236
    ): bool {
237 21
        if (empty($with)) {
238
            return false;
239
        }
240
241 21
        if (!is_array(reset($with))) {
242 21
            usort($with, static fn (string $string1, string $string2) => mb_strlen($string2, 'UTF-8') - mb_strlen($string1, 'UTF-8'));
243
244 21
            $map = [];
245
246 21
            foreach ($with as $string) {
247 21
                $map[mb_strlen($string, 'UTF-8')][$caseSensitive ? $string : mb_strtoupper($string, 'UTF-8')] = true;
248
            }
249
250 21
            $with = $map;
251
        }
252
253
        /** @psalm-var array<int, array> $with */
254 21
        foreach ($with as $testLength => $testValues) {
255 21
            $content = $this->substring($testLength, $caseSensitive);
256
257 21
            if (isset($testValues[$content])) {
258 21
                $length = $testLength;
259 21
                return true;
260
            }
261
        }
262
263 21
        return false;
264
    }
265
266
    /**
267
     * Returns a string of the given length starting with the specified offset.
268
     *
269
     * @param int $length string length to be returned.
270
     * @param bool $caseSensitive if it's `false`, the string will be uppercase.
271
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
272
     *
273
     * @return string result string, it may be empty if there's nothing to return.
274
     */
275 21
    protected function substring(int $length, bool $caseSensitive = true, int $offset = null): string
276
    {
277 21
        if ($offset === null) {
278 21
            $offset = $this->offset;
279
        }
280
281 21
        if ($offset + $length > $this->length) {
282 21
            return '';
283
        }
284
285 21
        $cacheKey = $offset . ',' . $length;
286
287 21
        if (!isset($this->substrings[$cacheKey . ',1'])) {
288 21
            $this->substrings[$cacheKey . ',1'] = mb_substr($this->sql, $offset, $length, 'UTF-8');
289
        }
290
291 21
        if (!$caseSensitive && !isset($this->substrings[$cacheKey . ',0'])) {
292
            $this->substrings[$cacheKey . ',0'] = mb_strtoupper($this->substrings[$cacheKey . ',1'], 'UTF-8');
293
        }
294
295 21
        return $this->substrings[$cacheKey . ',' . (int) $caseSensitive];
296
    }
297
298
    /**
299
     * Returns an index after the given string in the SQL code starting with the specified offset.
300
     *
301
     * @param string $string string to be found.
302
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
303
     *
304
     * @return int index after the given string or end of string index.
305
     */
306 21
    protected function indexAfter(string $string, int $offset = null): int
307
    {
308 21
        if ($offset === null) {
309
            $offset = $this->offset;
310
        }
311
312 21
        if ($offset + mb_strlen($string, 'UTF-8') > $this->length) {
313
            return $this->length;
314
        }
315
316 21
        $afterIndexOf = mb_strpos($this->sql, $string, $offset, 'UTF-8');
317
318 21
        if ($afterIndexOf === false) {
319
            $afterIndexOf = $this->length;
320
        } else {
321 21
            $afterIndexOf += mb_strlen($string, 'UTF-8');
322
        }
323
324 21
        return $afterIndexOf;
325
    }
326
327
    /**
328
     * Determines whether there is a delimited string at the current offset and adds it to the token children.
329
     */
330 21
    private function tokenizeDelimitedString(int &$length): bool
331
    {
332 21
        $isIdentifier = $this->isIdentifier($length, $content);
333 21
        $isStringLiteral = !$isIdentifier && $this->isStringLiteral($length, $content);
334
335 21
        if (!$isIdentifier && !$isStringLiteral) {
336 21
            return false;
337
        }
338
339 21
        $this->addTokenFromBuffer();
340
341 21
        $this->currentToken[] = (new SqlToken())
342 21
            ->type($isIdentifier ? SqlToken::TYPE_IDENTIFIER : SqlToken::TYPE_STRING_LITERAL)
343 21
            ->content(is_string($content) ? $content : $this->substring($length))
344 21
            ->startOffset($this->offset)
345 21
            ->endOffset($this->offset + $length);
346
347 21
        return true;
348
    }
349
350
    /**
351
     * Determines whether there is an operator at the current offset and adds it to the token children.
352
     */
353 21
    private function tokenizeOperator(int &$length): bool
354
    {
355 21
        if (!$this->isOperator($length, $content)) {
356 21
            return false;
357
        }
358
359 21
        $this->addTokenFromBuffer();
360
361 21
        switch ($this->substring($length)) {
362 21
            case '(':
363 21
                $this->currentToken[] = (new SqlToken())
364 21
                    ->type(SqlToken::TYPE_OPERATOR)
365 21
                    ->content(is_string($content) ? $content : $this->substring($length))
366 21
                    ->startOffset($this->offset)
367 21
                    ->endOffset($this->offset + $length);
368 21
                $this->currentToken[] = (new SqlToken())->type(SqlToken::TYPE_PARENTHESIS);
369
370 21
                if ($this->currentToken[-1] !== null) {
371 21
                    $this->tokenStack->push($this->currentToken[-1]);
372
                }
373
374 21
                $this->currentToken = $this->tokenStack->top();
375
376 21
                break;
377
378 21
            case ')':
379 21
                $this->tokenStack->pop();
380 21
                $this->currentToken = $this->tokenStack->top();
381 21
                $this->currentToken[] = (new SqlToken())
382 21
                    ->type(SqlToken::TYPE_OPERATOR)
383 21
                    ->content(')')
384 21
                    ->startOffset($this->offset)
385 21
                    ->endOffset($this->offset + $length);
386
387 21
                break;
388 19
            case ';':
389 5
                if ($this->currentToken instanceof SqlToken && !$this->currentToken->getHasChildren()) {
390
                    break;
391
                }
392
393 5
                $this->currentToken[] = (new SqlToken())
394 5
                    ->type(SqlToken::TYPE_OPERATOR)
395 5
                    ->content(is_string($content) ? $content : $this->substring($length))
396 5
                    ->startOffset($this->offset)
397 5
                    ->endOffset($this->offset + $length);
398 5
                $this->tokenStack->pop();
399 5
                $this->currentToken = $this->tokenStack->top();
400 5
                $this->currentToken[] = (new SqlToken())->type(SqlToken::TYPE_STATEMENT);
401
402 5
                if ($this->currentToken[-1] instanceof SqlToken) {
403 5
                    $this->tokenStack->push($this->currentToken[-1]);
404
                }
405
406 5
                $this->currentToken = $this->tokenStack->top();
407
408 5
                break;
409
            default:
410 19
                $this->currentToken[] = (new SqlToken())
411 19
                    ->type(SqlToken::TYPE_OPERATOR)
412 19
                    ->content(is_string($content) ? $content : $this->substring($length))
413 19
                    ->startOffset($this->offset)
414 19
                    ->endOffset($this->offset + $length);
415
416 19
                break;
417
        }
418
419 21
        return true;
420
    }
421
422
    /**
423
     * Determines a type of text in the buffer, tokenizes it and adds it to the token children.
424
     */
425 21
    private function addTokenFromBuffer(): void
426
    {
427 21
        if ($this->buffer === '') {
428 21
            return;
429
        }
430
431 21
        $isKeyword = $this->isKeyword($this->buffer, $content);
432
433 21
        $this->currentToken[] = (new SqlToken())
434 21
            ->type($isKeyword ? SqlToken::TYPE_KEYWORD : SqlToken::TYPE_TOKEN)
435 21
            ->content(is_string($content) ? $content : $this->buffer)
436 21
            ->startOffset($this->offset - mb_strlen($this->buffer, 'UTF-8'))
437 21
            ->endOffset($this->offset);
438
439 21
        $this->buffer = '';
440
    }
441
442
    /**
443
     * Adds the specified length to the current offset.
444
     *
445
     * @throws InvalidArgumentException
446
     */
447 21
    private function advance(int $length): void
448
    {
449 21
        if ($length <= 0) {
450
            throw new InvalidArgumentException('Length must be greater than 0.');
451
        }
452
453 21
        $this->offset += $length;
454 21
        $this->substrings = [];
455
    }
456
457
    /**
458
     * Returns whether the SQL code is completely traversed.
459
     */
460 21
    private function isEof(): bool
461
    {
462 21
        return $this->offset >= $this->length;
463
    }
464
}
465