Passed
Push — master ( f6ff88...4766fa )
by Wilmer
03:54
created

AbstractTokenizer::tokenizeOperator()   C

Complexity

Conditions 12
Paths 8

Size

Total Lines 67
Code Lines 47

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 46
CRAP Score 12.0013

Importance

Changes 0
Metric Value
eloc 47
dl 0
loc 67
ccs 46
cts 47
cp 0.9787
rs 6.9666
c 0
b 0
f 0
cc 12
nc 8
nop 1
crap 12.0013

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
declare(strict_types=1);
4
5
namespace Yiisoft\Db\Sqlite;
6
7
use SplStack;
8
use Yiisoft\Db\Exception\InvalidArgumentException;
9
10
use function is_array;
11
use function is_string;
12
use function mb_strlen;
13
use function mb_strpos;
14
use function mb_strtoupper;
15
use function mb_substr;
16
use function reset;
17
use function usort;
18
19
/**
20
 * Splits an SQL query into individual SQL tokens.
21
 *
22
 * It can be used to obtain addition information from an SQL code.
23
 *
24
 * Usage example:
25
 *
26
 * ```php
27
 * $tokenizer = new SqlTokenizer("SELECT * FROM {{%user}} WHERE [[id]] = 1");
28
 * $root = $tokenizer->tokenize();
29
 * $sqlTokens = $root->getChildren();
30
 * ```
31
 *
32
 * Tokens are instances of {@see SqlToken}.
33
 */
34
abstract class AbstractTokenizer
35
{
36
    /**
37
     * @var int SQL code string length.
38
     */
39
    protected int $length = 0;
40
41
    /**
42
     * @var int SQL code string current offset.
43
     */
44
    protected int $offset = 0;
45
46
    /**
47
     * @var SplStack Of active tokens.
48
     *
49
     * @psalm-var SplStack<SqlToken>
50
     * @psalm-suppress PropertyNotSetInConstructor
51
     */
52
    private SplStack $tokenStack;
53
54
    /**
55
     * @var array|SqlToken Active token. It's usually a top of the token stack.
56
     *
57
     * @psalm-var SqlToken|SqlToken[]
58
     * @psalm-suppress PropertyNotSetInConstructor
59
     */
60
    private array|SqlToken $currentToken;
61
62
    /**
63
     * @var array Cached substrings.
64
     *
65
     * @psalm-var string[]
66
     */
67
    private array $substrings = [];
68
69
    /**
70
     * @var string Buffer for the current token.
71
     */
72
    private string $buffer = '';
73
74 28
    public function __construct(private string $sql)
75
    {
76 28
    }
77
78
    /**
79
     * Tokenizes and returns a code type token.
80
     *
81
     * @throws InvalidArgumentException If the SQL code is invalid.
82
     *
83
     * @return SqlToken Code type token.
84
     *
85
     * @psalm-suppress MixedPropertyTypeCoercion
86
     */
87 28
    public function tokenize(): SqlToken
88
    {
89 28
        $this->length = mb_strlen($this->sql, 'UTF-8');
90 28
        $this->offset = 0;
91 28
        $this->substrings = [];
92 28
        $this->buffer = '';
93
94 28
        $token = (new SqlToken())->type(SqlToken::TYPE_CODE)->content($this->sql);
95
96 28
        $this->tokenStack = new SplStack();
97 28
        $this->tokenStack->push($token);
98
99 28
        $token[] = (new SqlToken())->type(SqlToken::TYPE_STATEMENT);
100
101 28
        $this->tokenStack->push($token[0]);
102
        /** @psalm-var SqlToken */
103 28
        $this->currentToken = $this->tokenStack->top();
104 28
        $length = 0;
105
106 28
        while (!$this->isEof()) {
107 28
            if ($this->isWhitespace($length) || $this->isComment($length)) {
108 28
                $this->addTokenFromBuffer();
109 28
                $this->advance($length);
110
111 28
                continue;
112
            }
113
114
            /** @psalm-suppress ConflictingReferenceConstraint */
115 28
            if ($this->tokenizeOperator($length) || $this->tokenizeDelimitedString($length)) {
116 28
                $this->advance($length);
117
118 28
                continue;
119
            }
120
121 28
            $this->buffer .= $this->substring(1);
122 28
            $this->advance(1);
123
        }
124
125 28
        $this->addTokenFromBuffer();
126
127
        if (
128 28
            $token->getHasChildren() &&
129 28
            $token[-1] instanceof SqlToken &&
130 28
            !$token[-1]->getHasChildren()
0 ignored issues
show
Bug introduced by
The method getHasChildren() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

130
            !$token[-1]->/** @scrutinizer ignore-call */ getHasChildren()

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
131
        ) {
132 12
            unset($token[-1]);
133
        }
134
135 28
        return $token;
136
    }
137
138
    /**
139
     * Returns whether there's a space or blank at the current offset.
140
     *
141
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
142
     *
143
     * @param int $length Length of the matched string.
144
     *
145
     * @return bool Whether there's a space or blank at the current offset.
146
     */
147
    abstract protected function isWhitespace(int &$length): bool;
148
149
    /**
150
     * Returns whether there's a commentary at the current offset.
151
     *
152
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string.
153
     *
154
     * @param int $length Length of the matched string.
155
     *
156
     * @return bool Whether there's a commentary at the current offset.
157
     */
158
    abstract protected function isComment(int &$length): bool;
159
160
    /**
161
     * Returns whether there's an operator at the current offset.
162
     *
163
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
164
     * also set `$content` to a string that will be used as a token content.
165
     *
166
     * @param int $length  Length of the matched string.
167
     * @param string|null $content Optional content instead of the matched string.
168
     *
169
     * @return bool Whether there's an operator at the current offset.
170
     */
171
    abstract protected function isOperator(int &$length, string|null &$content): bool;
172
173
    /**
174
     * Returns whether there's an identifier at the current offset.
175
     *
176
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
177
     * also set `$content` to a string that will be used as a token content.
178
     *
179
     * @param int $length Length of the matched string.
180
     * @param string|null $content Optional content instead of the matched string.
181
     *
182
     * @return bool Whether there's an identifier at the current offset.
183
     */
184
    abstract protected function isIdentifier(int &$length, string|null &$content): bool;
185
186
    /**
187
     * Returns whether there's a string literal at the current offset.
188
     *
189
     * If this method returns `true`, it has to set the `$length` parameter to the length of the matched string. It may
190
     * also set `$content` to a string that will be used as a token content.
191
     *
192
     * @param int $length Length of the matched string.
193
     * @param string|null $content Optional content instead of the matched string.
194
     *
195
     * @return bool Whether there's a string literal at the current offset.
196
     */
197
    abstract protected function isStringLiteral(int &$length, string|null &$content): bool;
198
199
    /**
200
     * Returns whether the given string is a keyword.
201
     *
202
     * The method may set `$content` to a string that will be used as a token content.
203
     *
204
     * @param string $string String to be matched.
205
     * @param string|null $content Optional content instead of the matched string.
206
     *
207
     * @return bool Whether the given string is a keyword.
208
     */
209
    abstract protected function isKeyword(string $string, string|null &$content): bool;
210
211
    /**
212
     * Returns whether the longest common prefix equals to the SQL code of the same length at the current offset.
213
     *
214
     * @param array $with Strings to be tested. The method `will` modify this parameter to speed up lookups.
215
     * @param bool $caseSensitive Whether to perform a case-sensitive comparison.
216
     * @param int $length Length of the matched string.
217
     * @param string|null $content Matched string.
218
     *
219
     * @return bool Whether a match is found.
220
     *
221
     * @psalm-param array<array-key, string> $with
222
     */
223 28
    protected function startsWithAnyLongest(
224
        array $with,
225
        bool $caseSensitive,
226
        int &$length,
227
        string &$content = null
228
    ): bool {
229 28
        if (empty($with)) {
230
            return false;
231
        }
232
233 28
        if (!is_array(reset($with))) {
234 28
            usort($with, static fn (string $string1, string $string2) => mb_strlen($string2, 'UTF-8') - mb_strlen($string1, 'UTF-8'));
235
236 28
            $map = [];
237
238 28
            foreach ($with as $string) {
239 28
                $map[mb_strlen($string, 'UTF-8')][$caseSensitive ? $string : mb_strtoupper($string, 'UTF-8')] = true;
240
            }
241
242 28
            $with = $map;
243
        }
244
245
        /** @psalm-var array<int, array> $with */
246 28
        foreach ($with as $testLength => $testValues) {
247 28
            $content = $this->substring($testLength, $caseSensitive);
248
249 28
            if (isset($testValues[$content])) {
250 28
                $length = $testLength;
251 28
                return true;
252
            }
253
        }
254
255 28
        return false;
256
    }
257
258
    /**
259
     * Returns a string of the given length starting with the specified offset.
260
     *
261
     * @param int $length String length to be returned.
262
     * @param bool $caseSensitive If it's `false`, the string will be uppercase.
263
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
264
     *
265
     * @return string Result string, it may be empty if there's nothing to return.
266
     */
267 28
    protected function substring(int $length, bool $caseSensitive = true, int $offset = null): string
268
    {
269 28
        if ($offset === null) {
270 28
            $offset = $this->offset;
271
        }
272
273 28
        if ($offset + $length > $this->length) {
274 28
            return '';
275
        }
276
277 28
        $cacheKey = $offset . ',' . $length;
278
279 28
        if (!isset($this->substrings[$cacheKey . ',1'])) {
280 28
            $this->substrings[$cacheKey . ',1'] = mb_substr($this->sql, $offset, $length, 'UTF-8');
281
        }
282
283 28
        if (!$caseSensitive && !isset($this->substrings[$cacheKey . ',0'])) {
284
            $this->substrings[$cacheKey . ',0'] = mb_strtoupper($this->substrings[$cacheKey . ',1'], 'UTF-8');
285
        }
286
287 28
        return $this->substrings[$cacheKey . ',' . (int) $caseSensitive];
288
    }
289
290
    /**
291
     * Returns an index after the given string in the SQL code starting with the specified offset.
292
     *
293
     * @param string $string String to be found.
294
     * @param int|null $offset SQL code offset, defaults to current if `null` is passed.
295
     *
296
     * @return int Index after the given string or end of string index.
297
     */
298 28
    protected function indexAfter(string $string, int $offset = null): int
299
    {
300 28
        if ($offset === null) {
301
            $offset = $this->offset;
302
        }
303
304 28
        if ($offset + mb_strlen($string, 'UTF-8') > $this->length) {
305
            return $this->length;
306
        }
307
308 28
        $afterIndexOf = mb_strpos($this->sql, $string, $offset, 'UTF-8');
309
310 28
        if ($afterIndexOf === false) {
311
            $afterIndexOf = $this->length;
312
        } else {
313 28
            $afterIndexOf += mb_strlen($string, 'UTF-8');
314
        }
315
316 28
        return $afterIndexOf;
317
    }
318
319
    /**
320
     * Determines whether there is a delimited string at the current offset and adds it to the token children.
321
     */
322 28
    private function tokenizeDelimitedString(int &$length): bool
323
    {
324 28
        $isIdentifier = $this->isIdentifier($length, $content);
325 28
        $isStringLiteral = !$isIdentifier && $this->isStringLiteral($length, $content);
326
327 28
        if (!$isIdentifier && !$isStringLiteral) {
328 28
            return false;
329
        }
330
331 28
        $this->addTokenFromBuffer();
332
333 28
        $this->currentToken[] = (new SqlToken())
334 28
            ->type($isIdentifier ? SqlToken::TYPE_IDENTIFIER : SqlToken::TYPE_STRING_LITERAL)
335 28
            ->content(is_string($content) ? $content : $this->substring($length))
336 28
            ->startOffset($this->offset)
337 28
            ->endOffset($this->offset + $length);
338
339 28
        return true;
340
    }
341
342
    /**
343
     * Determines whether there is an operator at the current offset and adds it to the token children.
344
     */
345 28
    private function tokenizeOperator(int &$length): bool
346
    {
347 28
        if (!$this->isOperator($length, $content)) {
348 28
            return false;
349
        }
350
351 28
        $this->addTokenFromBuffer();
352
353 28
        switch ($this->substring($length)) {
354 28
            case '(':
355 28
                $this->currentToken[] = (new SqlToken())
356 28
                    ->type(SqlToken::TYPE_OPERATOR)
357 28
                    ->content(is_string($content) ? $content : $this->substring($length))
358 28
                    ->startOffset($this->offset)
359 28
                    ->endOffset($this->offset + $length);
360 28
                $this->currentToken[] = (new SqlToken())->type(SqlToken::TYPE_PARENTHESIS);
361
362 28
                if ($this->currentToken[-1] !== null) {
363 28
                    $this->tokenStack->push($this->currentToken[-1]);
364
                }
365
366 28
                $this->currentToken = $this->tokenStack->top();
367
368 28
                break;
369
370 28
            case ')':
371 28
                $this->tokenStack->pop();
372 28
                $this->currentToken = $this->tokenStack->top();
373 28
                $this->currentToken[] = (new SqlToken())
374 28
                    ->type(SqlToken::TYPE_OPERATOR)
375 28
                    ->content(')')
376 28
                    ->startOffset($this->offset)
377 28
                    ->endOffset($this->offset + $length);
378
379 28
                break;
380 26
            case ';':
381 12
                if ($this->currentToken instanceof SqlToken && !$this->currentToken->getHasChildren()) {
382
                    break;
383
                }
384
385 12
                $this->currentToken[] = (new SqlToken())
386 12
                    ->type(SqlToken::TYPE_OPERATOR)
387 12
                    ->content(is_string($content) ? $content : $this->substring($length))
388 12
                    ->startOffset($this->offset)
389 12
                    ->endOffset($this->offset + $length);
390 12
                $this->tokenStack->pop();
391 12
                $this->currentToken = $this->tokenStack->top();
392 12
                $this->currentToken[] = (new SqlToken())->type(SqlToken::TYPE_STATEMENT);
393
394 12
                if ($this->currentToken[-1] instanceof SqlToken) {
395 12
                    $this->tokenStack->push($this->currentToken[-1]);
396
                }
397
398 12
                $this->currentToken = $this->tokenStack->top();
399
400 12
                break;
401
            default:
402 26
                $this->currentToken[] = (new SqlToken())
403 26
                    ->type(SqlToken::TYPE_OPERATOR)
404 26
                    ->content(is_string($content) ? $content : $this->substring($length))
405 26
                    ->startOffset($this->offset)
406 26
                    ->endOffset($this->offset + $length);
407
408 26
                break;
409
        }
410
411 28
        return true;
412
    }
413
414
    /**
415
     * Determines a type of text in the buffer, tokenizes it and adds it to the token children.
416
     */
417 28
    private function addTokenFromBuffer(): void
418
    {
419 28
        if ($this->buffer === '') {
420 28
            return;
421
        }
422
423 28
        $isKeyword = $this->isKeyword($this->buffer, $content);
424
425 28
        $this->currentToken[] = (new SqlToken())
426 28
            ->type($isKeyword ? SqlToken::TYPE_KEYWORD : SqlToken::TYPE_TOKEN)
427 28
            ->content(is_string($content) ? $content : $this->buffer)
428 28
            ->startOffset($this->offset - mb_strlen($this->buffer, 'UTF-8'))
429 28
            ->endOffset($this->offset);
430
431 28
        $this->buffer = '';
432
    }
433
434
    /**
435
     * Adds the specified length to the current offset.
436
     *
437
     * @throws InvalidArgumentException If the length is less than or equal to 0.
438
     */
439 28
    private function advance(int $length): void
440
    {
441 28
        if ($length <= 0) {
442
            throw new InvalidArgumentException('Length must be greater than 0.');
443
        }
444
445 28
        $this->offset += $length;
446 28
        $this->substrings = [];
447
    }
448
449
    /**
450
     * Returns whether the SQL code is completely traversed.
451
     */
452 28
    private function isEof(): bool
453
    {
454 28
        return $this->offset >= $this->length;
455
    }
456
}
457