Passed
Pull Request — master (#506)
by
unknown
04:51 queued 02:06
created

Lexer::parseNumber()   D

Complexity

Conditions 64
Paths 62

Size

Total Lines 157
Code Lines 85

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 77
CRAP Score 64.0085

Importance

Changes 0
Metric Value
cc 64
eloc 85
nc 62
nop 0
dl 0
loc 157
ccs 77
cts 78
cp 0.9872
crap 64.0085
rs 4.1666
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
declare(strict_types=1);
4
5
namespace PhpMyAdmin\SqlParser;
6
7
use PhpMyAdmin\SqlParser\Exceptions\LexerException;
8
9
use function in_array;
10
use function mb_strlen;
11
use function sprintf;
12
use function str_ends_with;
13
use function strlen;
14
use function substr;
15
16
/**
17
 * Defines the lexer of the library.
18
 *
19
 * This is one of the most important components, along with the parser.
20
 *
21
 * Depends on context to extract lexemes.
22
 *
23
 * Performs lexical analysis over a SQL statement and splits it in multiple tokens.
24
 *
25
 * The output of the lexer is affected by the context of the SQL statement.
26
 *
27
 * @see Context
28
 */
29
class Lexer extends Core
30
{
31
    /**
32
     * A list of methods that are used in lexing the SQL query.
33
     *
34
     * @var string[]
35
     */
36
    public static $parserMethods = [
37
        // It is best to put the parsers in order of their complexity
38
        // (ascending) and their occurrence rate (descending).
39
        //
40
        // Conflicts:
41
        //
42
        // 1. `parseDelimiter`, `parseUnknown`, `parseKeyword`, `parseNumber`
43
        // They fight over delimiter. The delimiter may be a keyword, a
44
        // number or almost any character which makes the delimiter one of
45
        // the first tokens that must be parsed.
46
        //
47
        // 1. `parseNumber` and `parseOperator`
48
        // They fight over `+` and `-`.
49
        //
50
        // 2. `parseComment` and `parseOperator`
51
        // They fight over `/` (as in ```/*comment*/``` or ```a / b```)
52
        //
53
        // 3. `parseBool` and `parseKeyword`
54
        // They fight over `TRUE` and `FALSE`.
55
        //
56
        // 4. `parseKeyword` and `parseUnknown`
57
        // They fight over words. `parseUnknown` does not know about
58
        // keywords.
59
60
        'parseDelimiter',
61
        'parseWhitespace',
62
        'parseNumber',
63
        'parseComment',
64
        'parseOperator',
65
        'parseBool',
66
        'parseString',
67
        'parseSymbol',
68
        'parseKeyword',
69
        'parseLabel',
70
        'parseUnknown',
71
    ];
72
73
74
    /**
75
     * A list of keywords that indicate that the function keyword
76
     * is not used as a function
77
     *
78
     * @var string[]
79
     */
80
    public $keywordNameIndicators = [
81
        'FROM',
82
        'SET',
83
        'WHERE',
84
    ];
85
86
    /**
87
     * A list of operators that indicate that the function keyword
88
     * is not used as a function
89
     *
90
     * @var string[]
91
     */
92
    public $operatorNameIndicators = [
93
        ',',
94
        '.',
95
    ];
96
97
    /**
98
     * The string to be parsed.
99
     *
100
     * @var string|UtfString
101
     */
102
    public $str = '';
103
104
    /**
105
     * The length of `$str`.
106
     *
107
     * By storing its length, a lot of time is saved, because parsing methods
108
     * would call `strlen` everytime.
109
     *
110
     * @var int
111
     */
112
    public $len = 0;
113
114
    /**
115
     * The index of the last parsed character.
116
     *
117
     * @var int
118
     */
119
    public $last = 0;
120
121
    /**
122
     * Tokens extracted from given strings.
123
     *
124
     * @var TokensList
125
     */
126
    public $list;
127
128
    /**
129
     * The default delimiter. This is used, by default, in all new instances.
130
     *
131
     * @var string
132
     */
133
    public static $defaultDelimiter = ';';
134
135
    /**
136
     * Statements delimiter.
137
     * This may change during lexing.
138
     *
139
     * @var string
140
     */
141
    public $delimiter;
142
143
    /**
144
     * The length of the delimiter.
145
     *
146
     * Because `parseDelimiter` can be called a lot, it would perform a lot of
147
     * calls to `strlen`, which might affect performance when the delimiter is
148
     * big.
149
     *
150
     * @var int
151
     */
152
    public $delimiterLen;
153
154
    /**
155
     * Gets the tokens list parsed by a new instance of a lexer.
156
     *
157
     * @param string|UtfString $str       the query to be lexed
158
     * @param bool             $strict    whether strict mode should be
159
     *                                    enabled or not
160
     * @param string           $delimiter the delimiter to be used
161
     */
162 2
    public static function getTokens($str, $strict = false, $delimiter = null): TokensList
163
    {
164 2
        $lexer = new self($str, $strict, $delimiter);
165
166 2
        return $lexer->list;
167
    }
168
169
    /**
170
     * @param string|UtfString $str       the query to be lexed
171
     * @param bool             $strict    whether strict mode should be
172
     *                                    enabled or not
173
     * @param string           $delimiter the delimiter to be used
174
     */
175 1418
    public function __construct($str, $strict = false, $delimiter = null)
176
    {
177 1418
        parent::__construct();
178
179
        // `strlen` is used instead of `mb_strlen` because the lexer needs to
180
        // parse each byte of the input.
181 1418
        $len = $str instanceof UtfString ? $str->length() : strlen($str);
182
183
        // For multi-byte strings, a new instance of `UtfString` is initialized.
184 1418
        if (! $str instanceof UtfString && $len !== mb_strlen($str, 'UTF-8')) {
185 10
            $str = new UtfString($str);
186
        }
187
188 1418
        $this->str = $str;
189 1418
        $this->len = $str instanceof UtfString ? $str->length() : $len;
190
191 1418
        $this->strict = $strict;
192
193
        // Setting the delimiter.
194 1418
        $this->setDelimiter(! empty($delimiter) ? $delimiter : static::$defaultDelimiter);
195
196 1418
        $this->lex();
197
    }
198
199
    /**
200
     * Sets the delimiter.
201
     *
202
     * @param string $delimiter the new delimiter
203
     */
204 1418
    public function setDelimiter($delimiter): void
205
    {
206 1418
        $this->delimiter = $delimiter;
207 1418
        $this->delimiterLen = strlen($delimiter);
208
    }
209
210
    /**
211
     * Parses the string and extracts lexemes.
212
     */
213 1418
    public function lex(): void
214
    {
215
        // TODO: Sometimes, static::parse* functions make unnecessary calls to
216
        // is* functions. For a better performance, some rules can be deduced
217
        // from context.
218
        // For example, in `parseBool` there is no need to compare the token
219
        // every time with `true` and `false`. The first step would be to
220
        // compare with 'true' only and just after that add another letter from
221
        // context and compare again with `false`.
222
        // Another example is `parseComment`.
223
224 1418
        $list = new TokensList();
225
226
        /**
227
         * Last processed token.
228
         *
229
         * @var Token
230
         */
231 1418
        $lastToken = null;
232
233 1418
        for ($this->last = 0, $lastIdx = 0; $this->last < $this->len; $lastIdx = ++$this->last) {
234
            /**
235
             * The new token.
236
             *
237
             * @var Token
238
             */
239 1408
            $token = null;
240
241 1408
            foreach (static::$parserMethods as $method) {
242 1408
                $token = $this->$method();
243
244 1408
                if ($token) {
245 1408
                    break;
246
                }
247
            }
248
249 1408
            if ($token === null) {
250
                // @assert($this->last === $lastIdx);
251 4
                $token = new Token($this->str[$this->last]);
252 4
                $this->error('Unexpected character.', $this->str[$this->last], $this->last);
253
            } elseif (
254 1408
                $lastToken !== null
255 1408
                && $token->type === Token::TYPE_SYMBOL
256 1408
                && $token->flags & Token::FLAG_SYMBOL_VARIABLE
257
                && (
258 1408
                    $lastToken->type === Token::TYPE_STRING
259 1408
                    || (
260 1408
                        $lastToken->type === Token::TYPE_SYMBOL
261 1408
                        && $lastToken->flags & Token::FLAG_SYMBOL_BACKTICK
262 1408
                    )
263
                )
264
            ) {
265
                // Handles ```... FROM 'user'@'%' ...```.
266 46
                $lastToken->token .= $token->token;
267 46
                $lastToken->type = Token::TYPE_SYMBOL;
268 46
                $lastToken->flags = Token::FLAG_SYMBOL_USER;
269 46
                $lastToken->value .= '@' . $token->value;
270 46
                continue;
271
            } elseif (
272 1408
                $lastToken !== null
273 1408
                && $token->type === Token::TYPE_KEYWORD
274 1408
                && $lastToken->type === Token::TYPE_OPERATOR
275 1408
                && $lastToken->value === '.'
276
            ) {
277
                // Handles ```... tbl.FROM ...```. In this case, FROM is not
278
                // a reserved word.
279 30
                $token->type = Token::TYPE_NONE;
280 30
                $token->flags = 0;
281 30
                $token->value = $token->token;
282
            }
283
284 1408
            $token->position = $lastIdx;
285
286 1408
            $list->tokens[$list->count++] = $token;
287
288
            // Handling delimiters.
289 1408
            if ($token->type === Token::TYPE_NONE && $token->value === 'DELIMITER') {
290 36
                if ($this->last + 1 >= $this->len) {
291 2
                    $this->error('Expected whitespace(s) before delimiter.', '', $this->last + 1);
292 2
                    continue;
293
                }
294
295
                // Skipping last R (from `delimiteR`) and whitespaces between
296
                // the keyword `DELIMITER` and the actual delimiter.
297 34
                $pos = ++$this->last;
298 34
                $token = $this->parseWhitespace();
299
300 34
                if ($token !== null) {
301 32
                    $token->position = $pos;
302 32
                    $list->tokens[$list->count++] = $token;
303
                }
304
305
                // Preparing the token that holds the new delimiter.
306 34
                if ($this->last + 1 >= $this->len) {
307 2
                    $this->error('Expected delimiter.', '', $this->last + 1);
308 2
                    continue;
309
                }
310
311 32
                $pos = $this->last + 1;
312
313
                // Parsing the delimiter.
314 32
                $this->delimiter = null;
315 32
                $delimiterLen = 0;
316
                while (
317 32
                    ++$this->last < $this->len
318 32
                    && ! Context::isWhitespace($this->str[$this->last])
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

318
                    && ! Context::isWhitespace(/** @scrutinizer ignore-type */ $this->str[$this->last])
Loading history...
319 32
                    && $delimiterLen < 15
320
                ) {
321 30
                    $this->delimiter .= $this->str[$this->last];
322 30
                    ++$delimiterLen;
323
                }
324
325 32
                if (empty($this->delimiter)) {
326 2
                    $this->error('Expected delimiter.', '', $this->last);
327 2
                    $this->delimiter = ';';
328
                }
329
330 32
                --$this->last;
331
332
                // Saving the delimiter and its token.
333 32
                $this->delimiterLen = strlen($this->delimiter);
334 32
                $token = new Token($this->delimiter, Token::TYPE_DELIMITER);
335 32
                $token->position = $pos;
336 32
                $list->tokens[$list->count++] = $token;
337
            }
338
339 1404
            $lastToken = $token;
340
        }
341
342
        // Adding a final delimiter to mark the ending.
343 1418
        $list->tokens[$list->count++] = new Token(null, Token::TYPE_DELIMITER);
344
345
        // Saving the tokens list.
346 1418
        $this->list = $list;
347
348 1418
        $this->solveAmbiguityOnStarOperator();
349 1418
        $this->solveAmbiguityOnFunctionKeywords();
350
    }
351
352
    /**
353
     * Resolves the ambiguity when dealing with the "*" operator.
354
     *
355
     * In SQL statements, the "*" operator can be an arithmetic operator (like in 2*3) or an SQL wildcard (like in
356
     * SELECT a.* FROM ...). To solve this ambiguity, the solution is to find the next token, excluding whitespaces and
357
     * comments, right after the "*" position. The "*" is for sure an SQL wildcard if the next token found is any of:
358
     * - "FROM" (the FROM keyword like in "SELECT * FROM...");
359
     * - "USING" (the USING keyword like in "DELETE table_name.* USING...");
360
     * - "," (a comma separator like in "SELECT *, field FROM...");
361
     * - ")" (a closing parenthesis like in "COUNT(*)").
362
     * This methods will change the flag of the "*" tokens when any of those condition above is true. Otherwise, the
363
     * default flag (arithmetic) will be kept.
364
     */
365 1418
    private function solveAmbiguityOnStarOperator(): void
366
    {
367 1418
        $iBak = $this->list->idx;
368 1418
        while (($starToken = $this->list->getNextOfTypeAndValue(Token::TYPE_OPERATOR, '*')) !== null) {
369
            // getNext() already gets rid of whitespaces and comments.
370 200
            $next = $this->list->getNext();
371
372 200
            if ($next === null) {
373
                continue;
374
            }
375
376
            if (
377 200
                ($next->type !== Token::TYPE_KEYWORD || ! in_array($next->value, ['FROM', 'USING'], true))
378 200
                && ($next->type !== Token::TYPE_OPERATOR || ! in_array($next->value, [',', ')'], true))
379
            ) {
380 16
                continue;
381
            }
382
383 186
            $starToken->flags = Token::FLAG_OPERATOR_SQL;
384
        }
385
386 1418
        $this->list->idx = $iBak;
387
    }
388
389
    /**
390
     * Resolves the ambiguity when dealing with the functions keywords.
391
     *
392
     * In SQL statements, the function keywords might be used as table names or columns names.
393
     * To solve this ambiguity, the solution is to find the next token, excluding whitespaces and
394
     * comments, right after the function keyword position. The function keyword is for sure used
395
     * as column name or table name if the next token found is any of:
396
     *
397
     * - "FROM" (the FROM keyword like in "SELECT Country x, AverageSalary avg FROM...");
398
     * - "WHERE" (the WHERE keyword like in "DELETE FROM emp x WHERE x.salary = 20");
399
     * - "SET" (the SET keyword like in "UPDATE Country x, City y set x.Name=x.Name");
400
     * - "," (a comma separator like 'x,' in "UPDATE Country x, City y set x.Name=x.Name");
401
     * - "." (a dot separator like in "x.asset_id FROM (SELECT evt.asset_id FROM evt)".
402
     * - "NULL" (when used as a table alias like in "avg.col FROM (SELECT ev.col FROM ev) avg").
403
     *
404
     * This method will change the flag of the function keyword tokens when any of those
405
     * condition above is true. Otherwise, the
406
     * default flag (function keyword) will be kept.
407
     */
408 1418
    private function solveAmbiguityOnFunctionKeywords(): void
409
    {
410 1418
        $iBak = $this->list->idx;
411 1418
        $keywordFunction = Token::TYPE_KEYWORD | Token::FLAG_KEYWORD_FUNCTION;
412 1418
        while (($keywordToken = $this->list->getNextOfTypeAndFlag(Token::TYPE_KEYWORD, $keywordFunction)) !== null) {
413 214
            $next = $this->list->getNext();
414
            if (
415 214
                ($next->type !== Token::TYPE_KEYWORD
416 214
                    || ! in_array($next->value, $this->keywordNameIndicators, true)
417
                )
418 214
                && ($next->type !== Token::TYPE_OPERATOR
419 214
                    || ! in_array($next->value, $this->operatorNameIndicators, true)
420
                )
421 214
                && ($next->value !== null)
422
            ) {
423 204
                continue;
424
            }
425
426 12
            $keywordToken->type = Token::TYPE_NONE;
427 12
            $keywordToken->flags = Token::TYPE_NONE;
428 12
            $keywordToken->keyword = $keywordToken->value;
429
        }
430
431 1418
        $this->list->idx = $iBak;
432
    }
433
434
    /**
435
     * Creates a new error log.
436
     *
437
     * @param string $msg  the error message
438
     * @param string $str  the character that produced the error
439
     * @param int    $pos  the position of the character
440
     * @param int    $code the code of the error
441
     *
442
     * @throws LexerException throws the exception, if strict mode is enabled.
443
     */
444 34
    public function error($msg, $str = '', $pos = 0, $code = 0): void
445
    {
446 34
        $error = new LexerException(
447 34
            Translator::gettext($msg),
448 34
            $str,
449 34
            $pos,
450 34
            $code
451 34
        );
452 34
        parent::error($error);
453
    }
454
455
    /**
456
     * Parses a keyword.
457
     */
458 1390
    public function parseKeyword(): Token|null
459
    {
460 1390
        $token = '';
461
462
        /**
463
         * Value to be returned.
464
         *
465
         * @var Token
466
         */
467 1390
        $ret = null;
468
469
        /**
470
         * The value of `$this->last` where `$token` ends in `$this->str`.
471
         */
472 1390
        $iEnd = $this->last;
473
474
        /**
475
         * Whether last parsed character is a whitespace.
476
         *
477
         * @var bool
478
         */
479 1390
        $lastSpace = false;
480
481 1390
        for ($j = 1; $j < Context::KEYWORD_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
482
            // Composed keywords shouldn't have more than one whitespace between
483
            // keywords.
484 1390
            if (Context::isWhitespace($this->str[$this->last])) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

484
            if (Context::isWhitespace(/** @scrutinizer ignore-type */ $this->str[$this->last])) {
Loading history...
485 1364
                if ($lastSpace) {
486 264
                    --$j; // The size of the keyword didn't increase.
487 264
                    continue;
488
                }
489
490 1364
                $lastSpace = true;
491
            } else {
492 1390
                $lastSpace = false;
493
            }
494
495 1390
            $token .= $this->str[$this->last];
496 1390
            $flags = Context::isKeyword($token);
497
498 1390
            if (($this->last + 1 !== $this->len && ! Context::isSeparator($this->str[$this->last + 1])) || ! $flags) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last + 1] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSeparator() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

498
            if (($this->last + 1 !== $this->len && ! Context::isSeparator(/** @scrutinizer ignore-type */ $this->str[$this->last + 1])) || ! $flags) {
Loading history...
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
499 1390
                continue;
500
            }
501
502 1356
            $ret = new Token($token, Token::TYPE_KEYWORD, $flags);
503 1356
            $iEnd = $this->last;
504
505
            // We don't break so we find longest keyword.
506
            // For example, `OR` and `ORDER` have a common prefix `OR`.
507
            // If we stopped at `OR`, the parsing would be invalid.
508
        }
509
510 1390
        $this->last = $iEnd;
511
512 1390
        return $ret;
513
    }
514
515
    /**
516
     * Parses a label.
517
     */
518 1050
    public function parseLabel(): Token|null
519
    {
520 1050
        $token = '';
521
522
        /**
523
         * Value to be returned.
524
         *
525
         * @var Token
526
         */
527 1050
        $ret = null;
528
529
        /**
530
         * The value of `$this->last` where `$token` ends in `$this->str`.
531
         */
532 1050
        $iEnd = $this->last;
533 1050
        for ($j = 1; $j < Context::LABEL_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
534 1050
            if ($this->str[$this->last] === ':' && $j > 1) {
535
                // End of label
536 4
                $token .= $this->str[$this->last];
537 4
                $ret = new Token($token, Token::TYPE_LABEL);
538 4
                $iEnd = $this->last;
539 4
                break;
540
            }
541
542 1050
            if (Context::isWhitespace($this->str[$this->last]) && $j > 1) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

542
            if (Context::isWhitespace(/** @scrutinizer ignore-type */ $this->str[$this->last]) && $j > 1) {
Loading history...
543
                // Whitespace between label and :
544
                // The size of the keyword didn't increase.
545 818
                --$j;
546 1050
            } elseif (Context::isSeparator($this->str[$this->last])) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSeparator() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

546
            } elseif (Context::isSeparator(/** @scrutinizer ignore-type */ $this->str[$this->last])) {
Loading history...
547
                // Any other separator
548 800
                break;
549
            }
550
551 1048
            $token .= $this->str[$this->last];
552
        }
553
554 1050
        $this->last = $iEnd;
555
556 1050
        return $ret;
557
    }
558
559
    /**
560
     * Parses an operator.
561
     */
562 1408
    public function parseOperator(): Token|null
563
    {
564 1408
        $token = '';
565
566
        /**
567
         * Value to be returned.
568
         *
569
         * @var Token
570
         */
571 1408
        $ret = null;
572
573
        /**
574
         * The value of `$this->last` where `$token` ends in `$this->str`.
575
         */
576 1408
        $iEnd = $this->last;
577
578 1408
        for ($j = 1; $j < Context::OPERATOR_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
579 1408
            $token .= $this->str[$this->last];
580 1408
            $flags = Context::isOperator($token);
581
582 1408
            if (! $flags) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
583 1404
                continue;
584
            }
585
586 1002
            $ret = new Token($token, Token::TYPE_OPERATOR, $flags);
587 1002
            $iEnd = $this->last;
588
        }
589
590 1408
        $this->last = $iEnd;
591
592 1408
        return $ret;
593
    }
594
595
    /**
596
     * Parses a whitespace.
597
     */
598 1408
    public function parseWhitespace(): Token|null
599
    {
600 1408
        $token = $this->str[$this->last];
601
602 1408
        if (! Context::isWhitespace($token)) {
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

602
        if (! Context::isWhitespace(/** @scrutinizer ignore-type */ $token)) {
Loading history...
603 1408
            return null;
604
        }
605
606 1380
        while (++$this->last < $this->len && Context::isWhitespace($this->str[$this->last])) {
607 268
            $token .= $this->str[$this->last];
608
        }
609
610 1380
        --$this->last;
611
612 1380
        return new Token($token, Token::TYPE_WHITESPACE);
613
    }
614
615
    /**
616
     * Parses a comment.
617
     */
618 1408
    public function parseComment(): Token|null
619
    {
620 1408
        $iBak = $this->last;
621 1408
        $token = $this->str[$this->last];
622
623
        // Bash style comments. (#comment\n)
624 1408
        if (Context::isComment($token)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression PhpMyAdmin\SqlParser\Context::isComment($token) of type integer|null is loosely compared to true; this is ambiguous if the integer can be 0. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isComment() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

624
        if (Context::isComment(/** @scrutinizer ignore-type */ $token)) {
Loading history...
625 6
            while (++$this->last < $this->len && $this->str[$this->last] !== "\n") {
626 6
                $token .= $this->str[$this->last];
627
            }
628
629
            // Include trailing \n as whitespace token
630 6
            if ($this->last < $this->len) {
631 6
                --$this->last;
632
            }
633
634 6
            return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_BASH);
635
        }
636
637
        // C style comments. (/*comment*\/)
638 1408
        if (++$this->last < $this->len) {
639 1404
            $token .= $this->str[$this->last];
640 1404
            if (Context::isComment($token)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression PhpMyAdmin\SqlParser\Context::isComment($token) of type integer|null is loosely compared to true; this is ambiguous if the integer can be 0. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
641
                // There might be a conflict with "*" operator here, when string is "*/*".
642
                // This can occurs in the following statements:
643
                // - "SELECT */* comment */ FROM ..."
644
                // - "SELECT 2*/* comment */3 AS `six`;"
645 100
                $next = $this->last + 1;
646 100
                if (($next < $this->len) && $this->str[$next] === '*') {
647
                    // Conflict in "*/*": first "*" was not for ending a comment.
648
                    // Stop here and let other parsing method define the true behavior of that first star.
649 2
                    $this->last = $iBak;
650
651 2
                    return null;
652
                }
653
654 100
                $flags = Token::FLAG_COMMENT_C;
655
656
                // This comment already ended. It may be a part of a
657
                // previous MySQL specific command.
658 100
                if ($token === '*/') {
659 34
                    return new Token($token, Token::TYPE_COMMENT, $flags);
660
                }
661
662
                // Checking if this is a MySQL-specific command.
663 100
                if ($this->last + 1 < $this->len && $this->str[$this->last + 1] === '!') {
664 34
                    $flags |= Token::FLAG_COMMENT_MYSQL_CMD;
665 34
                    $token .= $this->str[++$this->last];
666
667
                    while (
668 34
                        ++$this->last < $this->len
669 34
                        && $this->str[$this->last] >= '0'
670 34
                        && $this->str[$this->last] <= '9'
671
                    ) {
672 32
                        $token .= $this->str[$this->last];
673
                    }
674
675 34
                    --$this->last;
676
677
                    // We split this comment and parse only its beginning
678
                    // here.
679 34
                    return new Token($token, Token::TYPE_COMMENT, $flags);
680
                }
681
682
                // Parsing the comment.
683
                while (
684 70
                    ++$this->last < $this->len
685 70
                    && (
686 70
                        $this->str[$this->last - 1] !== '*'
687 70
                        || $this->str[$this->last] !== '/'
688 70
                    )
689
                ) {
690 70
                    $token .= $this->str[$this->last];
691
                }
692
693
                // Adding the ending.
694 70
                if ($this->last < $this->len) {
695 70
                    $token .= $this->str[$this->last];
696
                }
697
698 70
                return new Token($token, Token::TYPE_COMMENT, $flags);
699
            }
700
        }
701
702
        // SQL style comments. (-- comment\n)
703 1408
        if (++$this->last < $this->len) {
704 1402
            $token .= $this->str[$this->last];
705 1402
            $end = false;
706
        } else {
707 410
            --$this->last;
708 410
            $end = true;
709
        }
710
711 1408
        if (Context::isComment($token, $end)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression PhpMyAdmin\SqlParser\Con...isComment($token, $end) of type integer|null is loosely compared to true; this is ambiguous if the integer can be 0. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
712
            // Checking if this comment did not end already (```--\n```).
713 70
            if ($this->str[$this->last] !== "\n") {
714 70
                while (++$this->last < $this->len && $this->str[$this->last] !== "\n") {
715 70
                    $token .= $this->str[$this->last];
716
                }
717
            }
718
719
            // Include trailing \n as whitespace token
720 70
            if ($this->last < $this->len) {
721 62
                --$this->last;
722
            }
723
724 70
            return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_SQL);
725
        }
726
727 1408
        $this->last = $iBak;
728
729 1408
        return null;
730
    }
731
732
    /**
733
     * Parses a boolean.
734
     */
735 1392
    public function parseBool(): Token|null
736
    {
737 1392
        if ($this->last + 3 >= $this->len) {
738
            // At least `min(strlen('TRUE'), strlen('FALSE'))` characters are
739
            // required.
740 306
            return null;
741
        }
742
743 1392
        $iBak = $this->last;
744 1392
        $token = $this->str[$this->last] . $this->str[++$this->last]
745 1392
        . $this->str[++$this->last] . $this->str[++$this->last]; // _TRUE_ or _FALS_e
746
747 1392
        if (Context::isBool($token)) {
748 4
            return new Token($token, Token::TYPE_BOOL);
749
        }
750
751 1392
        if (++$this->last < $this->len) {
752 1390
            $token .= $this->str[$this->last]; // fals_E_
753 1390
            if (Context::isBool($token)) {
754 6
                return new Token($token, Token::TYPE_BOOL, 1);
755
            }
756
        }
757
758 1392
        $this->last = $iBak;
759
760 1392
        return null;
761
    }
762
763
    /**
764
     * Parses a number.
765
     */
766 1408
    public function parseNumber(): Token|null
767
    {
768
        // A rudimentary state machine is being used to parse numbers due to
769
        // the various forms of their notation.
770
        //
771
        // Below are the states of the machines and the conditions to change
772
        // the state.
773
        //
774
        //      1 --------------------[ + or - ]-------------------> 1
775
        //      1 -------------------[ 0x or 0X ]------------------> 2
776
        //      1 --------------------[ 0 to 9 ]-------------------> 3
777
        //      1 -----------------------[ . ]---------------------> 4
778
        //      1 -----------------------[ b ]---------------------> 7
779
        //
780
        //      2 --------------------[ 0 to F ]-------------------> 2
781
        //
782
        //      3 --------------------[ 0 to 9 ]-------------------> 3
783
        //      3 -----------------------[ . ]---------------------> 4
784
        //      3 --------------------[ e or E ]-------------------> 5
785
        //
786
        //      4 --------------------[ 0 to 9 ]-------------------> 4
787
        //      4 --------------------[ e or E ]-------------------> 5
788
        //
789
        //      5 ---------------[ + or - or 0 to 9 ]--------------> 6
790
        //
791
        //      7 -----------------------[ ' ]---------------------> 8
792
        //
793
        //      8 --------------------[ 0 or 1 ]-------------------> 8
794
        //      8 -----------------------[ ' ]---------------------> 9
795
        //
796
        // State 1 may be reached by negative numbers.
797
        // State 2 is reached only by hex numbers.
798
        // State 4 is reached only by float numbers.
799
        // State 5 is reached only by numbers in approximate form.
800
        // State 7 is reached only by numbers in bit representation.
801
        //
802
        // Valid final states are: 2, 3, 4 and 6. Any parsing that finished in a
803
        // state other than these is invalid.
804
        // Also, negative states are invalid states.
805 1408
        $iBak = $this->last;
806 1408
        $token = '';
807 1408
        $flags = 0;
808 1408
        $state = 1;
809 1408
        for (; $this->last < $this->len; ++$this->last) {
810 1408
            if ($state === 1) {
811 1408
                if ($this->str[$this->last] === '-') {
812 70
                    $flags |= Token::FLAG_NUMBER_NEGATIVE;
813
                } elseif (
814 1408
                    $this->last + 1 < $this->len
815 1408
                    && $this->str[$this->last] === '0'
816
                    && (
817 1408
                        $this->str[$this->last + 1] === 'x'
818 1408
                        || $this->str[$this->last + 1] === 'X'
819
                    )
820
                ) {
821 4
                    $token .= $this->str[$this->last++];
822 4
                    $state = 2;
823 1408
                } elseif ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9') {
824 626
                    $state = 3;
825 1408
                } elseif ($this->str[$this->last] === '.') {
826 220
                    $state = 4;
827 1408
                } elseif ($this->str[$this->last] === 'b') {
828 110
                    $state = 7;
829 1408
                } elseif ($this->str[$this->last] !== '+') {
830
                    // `+` is a valid character in a number.
831 1408
                    break;
832
                }
833 726
            } elseif ($state === 2) {
834 4
                $flags |= Token::FLAG_NUMBER_HEX;
835
                if (
836
                    ! (
837 4
                        ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
838 4
                        || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'F')
839 4
                        || ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'f')
840
                    )
841
                ) {
842 4
                    break;
843
                }
844 726
            } elseif ($state === 3) {
845 566
                if ($this->str[$this->last] === '.') {
846 12
                    $state = 4;
847 564
                } elseif ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
848 2
                    $state = 5;
849
                } elseif (
850 564
                    ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'z')
851 564
                    || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'Z')
852
                ) {
853
                    // A number can't be directly followed by a letter
854 6
                    $state = -$state;
855 562
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
856
                    // Just digits and `.`, `e` and `E` are valid characters.
857 552
                    break;
858
                }
859 314
            } elseif ($state === 4) {
860 230
                $flags |= Token::FLAG_NUMBER_FLOAT;
861 230
                if ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
862 14
                    $state = 5;
863
                } elseif (
864 230
                    ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'z')
865 230
                    || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'Z')
866
                ) {
867
                    // A number can't be directly followed by a letter
868 172
                    $state = -$state;
869 90
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
870
                    // Just digits, `e` and `E` are valid characters.
871 88
                    break;
872
                }
873 264
            } elseif ($state === 5) {
874 14
                $flags |= Token::FLAG_NUMBER_APPROXIMATE;
875
                if (
876 14
                    $this->str[$this->last] === '+' || $this->str[$this->last] === '-'
877 14
                    || ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
878
                ) {
879 2
                    $state = 6;
880
                } elseif (
881 14
                    ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'z')
882 14
                    || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'Z')
883
                ) {
884
                    // A number can't be directly followed by a letter
885 14
                    $state = -$state;
886
                } else {
887
                    break;
888
                }
889 264
            } elseif ($state === 6) {
890 2
                if ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
891
                    // Just digits are valid characters.
892 2
                    break;
893
                }
894 264
            } elseif ($state === 7) {
895 106
                $flags |= Token::FLAG_NUMBER_BINARY;
896 106
                if ($this->str[$this->last] !== '\'') {
897 104
                    break;
898
                }
899
900 2
                $state = 8;
901 178
            } elseif ($state === 8) {
902 2
                if ($this->str[$this->last] === '\'') {
903 2
                    $state = 9;
904 2
                } elseif ($this->str[$this->last] !== '0' && $this->str[$this->last] !== '1') {
905 2
                    break;
906
                }
907 178
            } elseif ($state === 9) {
908 2
                break;
909
            }
910
911 810
            $token .= $this->str[$this->last];
912
        }
913
914 1408
        if ($state === 2 || $state === 3 || ($token !== '.' && $state === 4) || $state === 6 || $state === 9) {
915 626
            --$this->last;
916
917 626
            return new Token($token, Token::TYPE_NUMBER, $flags);
918
        }
919
920 1408
        $this->last = $iBak;
921
922 1408
        return null;
923
    }
924
925
    /**
926
     * Parses a string.
927
     *
928
     * @param string $quote additional starting symbol
929
     *
930
     * @throws LexerException
931
     */
932 1392
    public function parseString($quote = ''): Token|null
933
    {
934 1392
        $token = $this->str[$this->last];
935 1392
        $flags = Context::isString($token);
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isString() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

935
        $flags = Context::isString(/** @scrutinizer ignore-type */ $token);
Loading history...
936
937 1392
        if (! $flags && $token !== $quote) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
938 1392
            return null;
939
        }
940
941 686
        $quote = $token;
942
943 686
        while (++$this->last < $this->len) {
944
            if (
945 686
                $this->last + 1 < $this->len
946
                && (
947 686
                    ($this->str[$this->last] === $quote && $this->str[$this->last + 1] === $quote)
948 686
                    || ($this->str[$this->last] === '\\' && $quote !== '`')
949
                )
950
            ) {
951 30
                $token .= $this->str[$this->last] . $this->str[++$this->last];
952
            } else {
953 686
                if ($this->str[$this->last] === $quote) {
954 682
                    break;
955
                }
956
957 680
                $token .= $this->str[$this->last];
958
            }
959
        }
960
961 686
        if ($this->last >= $this->len || $this->str[$this->last] !== $quote) {
962 14
            $this->error(
963 14
                sprintf(
964 14
                    Translator::gettext('Ending quote %1$s was expected.'),
965 14
                    $quote
966 14
                ),
967 14
                '',
968 14
                $this->last
969 14
            );
970
        } else {
971 682
            $token .= $this->str[$this->last];
972
        }
973
974 686
        return new Token($token, Token::TYPE_STRING, $flags);
975
    }
976
977
    /**
978
     * Parses a symbol.
979
     *
980
     * @throws LexerException
981
     */
982 1392
    public function parseSymbol(): Token|null
983
    {
984 1392
        $token = $this->str[$this->last];
985 1392
        $flags = Context::isSymbol($token);
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSymbol() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

985
        $flags = Context::isSymbol(/** @scrutinizer ignore-type */ $token);
Loading history...
986
987 1392
        if (! $flags) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
988 1390
            return null;
989
        }
990
991 452
        if ($flags & Token::FLAG_SYMBOL_VARIABLE) {
992 122
            if ($this->last + 1 < $this->len && $this->str[++$this->last] === '@') {
993
                // This is a system variable (e.g. `@@hostname`).
994 26
                $token .= $this->str[$this->last++];
995 26
                $flags |= Token::FLAG_SYMBOL_SYSTEM;
996
            }
997 362
        } elseif ($flags & Token::FLAG_SYMBOL_PARAMETER) {
998 6
            if ($token !== '?' && $this->last + 1 < $this->len) {
999 4
                ++$this->last;
1000
            }
1001
        } else {
1002 358
            $token = '';
1003
        }
1004
1005 452
        $str = null;
1006
1007 452
        if ($this->last < $this->len) {
1008 452
            $str = $this->parseString('`');
1009
1010 452
            if ($str === null) {
1011 88
                $str = $this->parseUnknown();
1012
1013 88
                if ($str === null) {
1014 6
                    $this->error('Variable name was expected.', $this->str[$this->last], $this->last);
1015
                }
1016
            }
1017
        }
1018
1019 452
        if ($str !== null) {
1020 448
            $token .= $str->token;
1021
        }
1022
1023 452
        return new Token($token, Token::TYPE_SYMBOL, $flags);
1024
    }
1025
1026
    /**
1027
     * Parses unknown parts of the query.
1028
     */
1029 1074
    public function parseUnknown(): Token|null
1030
    {
1031 1074
        $token = $this->str[$this->last];
1032 1074
        if (Context::isSeparator($token)) {
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSeparator() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

1032
        if (Context::isSeparator(/** @scrutinizer ignore-type */ $token)) {
Loading history...
1033 10
            return null;
1034
        }
1035
1036 1072
        while (++$this->last < $this->len && ! Context::isSeparator($this->str[$this->last])) {
1037 1040
            $token .= $this->str[$this->last];
1038
1039
            // Test if end of token equals the current delimiter. If so, remove it from the token.
1040 1040
            if (str_ends_with($token, $this->delimiter)) {
1041 4
                $token = substr($token, 0, -$this->delimiterLen);
1042 4
                $this->last -= $this->delimiterLen - 1;
1043 4
                break;
1044
            }
1045
        }
1046
1047 1072
        --$this->last;
1048
1049 1072
        return new Token($token);
1050
    }
1051
1052
    /**
1053
     * Parses the delimiter of the query.
1054
     */
1055 1408
    public function parseDelimiter(): Token|null
1056
    {
1057 1408
        $idx = 0;
1058
1059 1408
        while ($idx < $this->delimiterLen && $this->last + $idx < $this->len) {
1060 1408
            if ($this->delimiter[$idx] !== $this->str[$this->last + $idx]) {
1061 1408
                return null;
1062
            }
1063
1064 566
            ++$idx;
1065
        }
1066
1067 566
        $this->last += $this->delimiterLen - 1;
1068
1069 566
        return new Token($this->delimiter, Token::TYPE_DELIMITER);
1070
    }
1071
}
1072