Passed
Pull Request — master (#514)
by Maurício
03:54
created

Lexer::lex()   D

Complexity

Conditions 22
Paths 26

Size

Total Lines 122
Code Lines 65

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 63
CRAP Score 22

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 22
eloc 65
c 1
b 0
f 0
nc 26
nop 0
dl 0
loc 122
ccs 63
cts 63
cp 1
crap 22
rs 4.1666

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
declare(strict_types=1);
4
5
namespace PhpMyAdmin\SqlParser;
6
7
use PhpMyAdmin\SqlParser\Exceptions\LexerException;
8
9
use function in_array;
10
use function mb_strlen;
11
use function sprintf;
12
use function str_ends_with;
13
use function strlen;
14
use function substr;
15
16
/**
17
 * Defines the lexer of the library.
18
 *
19
 * This is one of the most important components, along with the parser.
20
 *
21
 * Depends on context to extract lexemes.
22
 *
23
 * Performs lexical analysis over a SQL statement and splits it in multiple tokens.
24
 *
25
 * The output of the lexer is affected by the context of the SQL statement.
26
 *
27
 * @see Context
28
 */
29
class Lexer extends Core
30
{
31
    /**
32
     * A list of keywords that indicate that the function keyword
33
     * is not used as a function
34
     */
35
    private const KEYWORD_NAME_INDICATORS = [
36
        'FROM',
37
        'SET',
38
        'WHERE',
39
    ];
40
41
    /**
42
     * A list of operators that indicate that the function keyword
43
     * is not used as a function
44
     */
45
    private const OPERATOR_NAME_INDICATORS = [
46
        ',',
47
        '.',
48
    ];
49
50
    /**
51
     * The string to be parsed.
52
     *
53
     * @var string|UtfString
54
     */
55
    public $str = '';
56
57
    /**
58
     * The length of `$str`.
59
     *
60
     * By storing its length, a lot of time is saved, because parsing methods
61
     * would call `strlen` everytime.
62
     *
63
     * @var int
64
     */
65
    public $len = 0;
66
67
    /**
68
     * The index of the last parsed character.
69
     *
70
     * @var int
71
     */
72
    public $last = 0;
73
74
    /**
75
     * Tokens extracted from given strings.
76
     *
77
     * @var TokensList
78
     */
79
    public $list;
80
81
    /**
82
     * The default delimiter. This is used, by default, in all new instances.
83
     *
84
     * @var string
85
     */
86
    public static $defaultDelimiter = ';';
87
88
    /**
89
     * Statements delimiter.
90
     * This may change during lexing.
91
     *
92
     * @var string
93
     */
94
    public $delimiter;
95
96
    /**
97
     * The length of the delimiter.
98
     *
99
     * Because `parseDelimiter` can be called a lot, it would perform a lot of
100
     * calls to `strlen`, which might affect performance when the delimiter is
101
     * big.
102
     *
103
     * @var int
104
     */
105
    public $delimiterLen;
106
107
    /**
108
     * @param string|UtfString $str       the query to be lexed
109
     * @param bool             $strict    whether strict mode should be
110
     *                                    enabled or not
111
     * @param string           $delimiter the delimiter to be used
112
     */
113 1432
    public function __construct($str, $strict = false, $delimiter = null)
114
    {
115 1432
        parent::__construct();
116
117
        // `strlen` is used instead of `mb_strlen` because the lexer needs to
118
        // parse each byte of the input.
119 1432
        $len = $str instanceof UtfString ? $str->length() : strlen($str);
120
121
        // For multi-byte strings, a new instance of `UtfString` is initialized.
122 1432
        if (! $str instanceof UtfString && $len !== mb_strlen($str, 'UTF-8')) {
123 10
            $str = new UtfString($str);
124
        }
125
126 1432
        $this->str = $str;
127 1432
        $this->len = $str instanceof UtfString ? $str->length() : $len;
128
129 1432
        $this->strict = $strict;
130
131
        // Setting the delimiter.
132 1432
        $this->setDelimiter(! empty($delimiter) ? $delimiter : static::$defaultDelimiter);
133
134 1432
        $this->lex();
135
    }
136
137
    /**
138
     * Sets the delimiter.
139
     *
140
     * @param string $delimiter the new delimiter
141
     */
142 1432
    public function setDelimiter($delimiter): void
143
    {
144 1432
        $this->delimiter = $delimiter;
145 1432
        $this->delimiterLen = strlen($delimiter);
146
    }
147
148
    /**
149
     * Parses the string and extracts lexemes.
150
     */
151 1432
    public function lex(): void
152
    {
153
        // TODO: Sometimes, static::parse* functions make unnecessary calls to
154
        // is* functions. For a better performance, some rules can be deduced
155
        // from context.
156
        // For example, in `parseBool` there is no need to compare the token
157
        // every time with `true` and `false`. The first step would be to
158
        // compare with 'true' only and just after that add another letter from
159
        // context and compare again with `false`.
160
        // Another example is `parseComment`.
161
162 1432
        $list = new TokensList();
163
164
        /**
165
         * Last processed token.
166
         */
167 1432
        $lastToken = null;
168
169 1432
        for ($this->last = 0, $lastIdx = 0; $this->last < $this->len; $lastIdx = ++$this->last) {
170 1422
            $token = $this->parse();
171
172 1422
            if ($token === null) {
173
                // @assert($this->last === $lastIdx);
174 6
                $token = new Token($this->str[$this->last]);
175 6
                $this->error('Unexpected character.', $this->str[$this->last], $this->last);
176
            } elseif (
177 1422
                $lastToken !== null
178 1422
                && $token->type === TokenType::Symbol
179 1422
                && $token->flags & Token::FLAG_SYMBOL_VARIABLE
180
                && (
181 1422
                    $lastToken->type === TokenType::String
182 1422
                    || (
183 1422
                        $lastToken->type === TokenType::Symbol
184 1422
                        && $lastToken->flags & Token::FLAG_SYMBOL_BACKTICK
185 1422
                    )
186
                )
187
            ) {
188
                // Handles ```... FROM 'user'@'%' ...```.
189 46
                $lastToken->token .= $token->token;
190 46
                $lastToken->type = TokenType::Symbol;
191 46
                $lastToken->flags = Token::FLAG_SYMBOL_USER;
192 46
                $lastToken->value .= '@' . $token->value;
193 46
                continue;
194
            } elseif (
195 1422
                $lastToken !== null
196 1422
                && $token->type === TokenType::Keyword
197 1422
                && $lastToken->type === TokenType::Operator
198 1422
                && $lastToken->value === '.'
199
            ) {
200
                // Handles ```... tbl.FROM ...```. In this case, FROM is not
201
                // a reserved word.
202 30
                $token->type = TokenType::None;
203 30
                $token->flags = 0;
204 30
                $token->value = $token->token;
205
            }
206
207 1422
            $token->position = $lastIdx;
208
209 1422
            $list->tokens[$list->count++] = $token;
210
211
            // Handling delimiters.
212 1422
            if ($token->type === TokenType::None && $token->value === 'DELIMITER') {
213 36
                if ($this->last + 1 >= $this->len) {
214 2
                    $this->error('Expected whitespace(s) before delimiter.', '', $this->last + 1);
215 2
                    continue;
216
                }
217
218
                // Skipping last R (from `delimiteR`) and whitespaces between
219
                // the keyword `DELIMITER` and the actual delimiter.
220 34
                $pos = ++$this->last;
221 34
                $token = $this->parseWhitespace();
222
223 34
                if ($token !== null) {
224 32
                    $token->position = $pos;
225 32
                    $list->tokens[$list->count++] = $token;
226
                }
227
228
                // Preparing the token that holds the new delimiter.
229 34
                if ($this->last + 1 >= $this->len) {
230 2
                    $this->error('Expected delimiter.', '', $this->last + 1);
231 2
                    continue;
232
                }
233
234 32
                $pos = $this->last + 1;
235
236
                // Parsing the delimiter.
237 32
                $this->delimiter = null;
238 32
                $delimiterLen = 0;
239
                while (
240 32
                    ++$this->last < $this->len
241 32
                    && ! Context::isWhitespace($this->str[$this->last])
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

241
                    && ! Context::isWhitespace(/** @scrutinizer ignore-type */ $this->str[$this->last])
Loading history...
242 32
                    && $delimiterLen < 15
243
                ) {
244 30
                    $this->delimiter .= $this->str[$this->last];
245 30
                    ++$delimiterLen;
246
                }
247
248 32
                if (empty($this->delimiter)) {
249 2
                    $this->error('Expected delimiter.', '', $this->last);
250 2
                    $this->delimiter = ';';
251
                }
252
253 32
                --$this->last;
254
255
                // Saving the delimiter and its token.
256 32
                $this->delimiterLen = strlen($this->delimiter);
257 32
                $token = new Token($this->delimiter, TokenType::Delimiter);
258 32
                $token->position = $pos;
259 32
                $list->tokens[$list->count++] = $token;
260
            }
261
262 1418
            $lastToken = $token;
263
        }
264
265
        // Adding a final delimiter to mark the ending.
266 1432
        $list->tokens[$list->count++] = new Token(null, TokenType::Delimiter);
267
268
        // Saving the tokens list.
269 1432
        $this->list = $list;
270
271 1432
        $this->solveAmbiguityOnStarOperator();
272 1432
        $this->solveAmbiguityOnFunctionKeywords();
273
    }
274
275
    /**
276
     * Resolves the ambiguity when dealing with the "*" operator.
277
     *
278
     * In SQL statements, the "*" operator can be an arithmetic operator (like in 2*3) or an SQL wildcard (like in
279
     * SELECT a.* FROM ...). To solve this ambiguity, the solution is to find the next token, excluding whitespaces and
280
     * comments, right after the "*" position. The "*" is for sure an SQL wildcard if the next token found is any of:
281
     * - "FROM" (the FROM keyword like in "SELECT * FROM...");
282
     * - "USING" (the USING keyword like in "DELETE table_name.* USING...");
283
     * - "," (a comma separator like in "SELECT *, field FROM...");
284
     * - ")" (a closing parenthesis like in "COUNT(*)").
285
     * This methods will change the flag of the "*" tokens when any of those condition above is true. Otherwise, the
286
     * default flag (arithmetic) will be kept.
287
     */
288 1432
    private function solveAmbiguityOnStarOperator(): void
289
    {
290 1432
        $iBak = $this->list->idx;
291 1432
        while (($starToken = $this->list->getNextOfTypeAndValue(TokenType::Operator, '*')) !== null) {
292
            // getNext() already gets rid of whitespaces and comments.
293 198
            $next = $this->list->getNext();
294
295 198
            if ($next === null) {
296
                continue;
297
            }
298
299
            if (
300 198
                ($next->type !== TokenType::Keyword || ! in_array($next->value, ['FROM', 'USING'], true))
301 198
                && ($next->type !== TokenType::Operator || ! in_array($next->value, [',', ')'], true))
302
            ) {
303 16
                continue;
304
            }
305
306 184
            $starToken->flags = Token::FLAG_OPERATOR_SQL;
307
        }
308
309 1432
        $this->list->idx = $iBak;
310
    }
311
312
    /**
313
     * Resolves the ambiguity when dealing with the functions keywords.
314
     *
315
     * In SQL statements, the function keywords might be used as table names or columns names.
316
     * To solve this ambiguity, the solution is to find the next token, excluding whitespaces and
317
     * comments, right after the function keyword position. The function keyword is for sure used
318
     * as column name or table name if the next token found is any of:
319
     *
320
     * - "FROM" (the FROM keyword like in "SELECT Country x, AverageSalary avg FROM...");
321
     * - "WHERE" (the WHERE keyword like in "DELETE FROM emp x WHERE x.salary = 20");
322
     * - "SET" (the SET keyword like in "UPDATE Country x, City y set x.Name=x.Name");
323
     * - "," (a comma separator like 'x,' in "UPDATE Country x, City y set x.Name=x.Name");
324
     * - "." (a dot separator like in "x.asset_id FROM (SELECT evt.asset_id FROM evt)".
325
     * - "NULL" (when used as a table alias like in "avg.col FROM (SELECT ev.col FROM ev) avg").
326
     *
327
     * This method will change the flag of the function keyword tokens when any of those
328
     * condition above is true. Otherwise, the
329
     * default flag (function keyword) will be kept.
330
     */
331 1432
    private function solveAmbiguityOnFunctionKeywords(): void
332
    {
333 1432
        $iBak = $this->list->idx;
334 1432
        $keywordFunction = TokenType::Keyword->value | Token::FLAG_KEYWORD_FUNCTION;
335 1432
        while (($keywordToken = $this->list->getNextOfTypeAndFlag(TokenType::Keyword, $keywordFunction)) !== null) {
336 214
            $next = $this->list->getNext();
337
            if (
338 214
                ($next->type !== TokenType::Keyword
339 214
                    || ! in_array($next->value, self::KEYWORD_NAME_INDICATORS, true)
340
                )
341 214
                && ($next->type !== TokenType::Operator
342 214
                    || ! in_array($next->value, self::OPERATOR_NAME_INDICATORS, true)
343
                )
344 214
                && ($next->value !== null)
345
            ) {
346 204
                continue;
347
            }
348
349 12
            $keywordToken->type = TokenType::None;
350 12
            $keywordToken->flags = Token::FLAG_NONE;
351 12
            $keywordToken->keyword = $keywordToken->value;
352
        }
353
354 1432
        $this->list->idx = $iBak;
355
    }
356
357
    /**
358
     * Creates a new error log.
359
     *
360
     * @param string $msg  the error message
361
     * @param string $str  the character that produced the error
362
     * @param int    $pos  the position of the character
363
     * @param int    $code the code of the error
364
     *
365
     * @throws LexerException throws the exception, if strict mode is enabled.
366
     */
367 36
    public function error($msg, $str = '', $pos = 0, $code = 0): void
368
    {
369 36
        $error = new LexerException(
370 36
            Translator::gettext($msg),
371 36
            $str,
372 36
            $pos,
373 36
            $code
374 36
        );
375 36
        parent::error($error);
376
    }
377
378
    /**
379
     * Parses a keyword.
380
     */
381 1404
    public function parseKeyword(): Token|null
382
    {
383 1404
        $token = '';
384
385
        /**
386
         * Value to be returned.
387
         *
388
         * @var Token
389
         */
390 1404
        $ret = null;
391
392
        /**
393
         * The value of `$this->last` where `$token` ends in `$this->str`.
394
         */
395 1404
        $iEnd = $this->last;
396
397
        /**
398
         * Whether last parsed character is a whitespace.
399
         *
400
         * @var bool
401
         */
402 1404
        $lastSpace = false;
403
404 1404
        for ($j = 1; $j < Context::KEYWORD_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
405
            // Composed keywords shouldn't have more than one whitespace between
406
            // keywords.
407 1404
            if (Context::isWhitespace($this->str[$this->last])) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

407
            if (Context::isWhitespace(/** @scrutinizer ignore-type */ $this->str[$this->last])) {
Loading history...
408 1368
                if ($lastSpace) {
409 264
                    --$j; // The size of the keyword didn't increase.
410 264
                    continue;
411
                }
412
413 1368
                $lastSpace = true;
414
            } else {
415 1404
                $lastSpace = false;
416
            }
417
418 1404
            $token .= $this->str[$this->last];
419 1404
            $flags = Context::isKeyword($token);
420
421 1404
            if (($this->last + 1 !== $this->len && ! Context::isSeparator($this->str[$this->last + 1])) || ! $flags) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last + 1] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSeparator() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

421
            if (($this->last + 1 !== $this->len && ! Context::isSeparator(/** @scrutinizer ignore-type */ $this->str[$this->last + 1])) || ! $flags) {
Loading history...
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
422 1404
                continue;
423
            }
424
425 1368
            $ret = new Token($token, TokenType::Keyword, $flags);
426 1368
            $iEnd = $this->last;
427
428
            // We don't break so we find longest keyword.
429
            // For example, `OR` and `ORDER` have a common prefix `OR`.
430
            // If we stopped at `OR`, the parsing would be invalid.
431
        }
432
433 1404
        $this->last = $iEnd;
434
435 1404
        return $ret;
436
    }
437
438
    /**
439
     * Parses a label.
440
     */
441 1056
    public function parseLabel(): Token|null
442
    {
443 1056
        $token = '';
444
445
        /**
446
         * Value to be returned.
447
         *
448
         * @var Token
449
         */
450 1056
        $ret = null;
451
452
        /**
453
         * The value of `$this->last` where `$token` ends in `$this->str`.
454
         */
455 1056
        $iEnd = $this->last;
456 1056
        for ($j = 1; $j < Context::LABEL_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
457 1056
            if ($this->str[$this->last] === ':' && $j > 1) {
458
                // End of label
459 4
                $token .= $this->str[$this->last];
460 4
                $ret = new Token($token, TokenType::Label);
461 4
                $iEnd = $this->last;
462 4
                break;
463
            }
464
465 1056
            if (Context::isWhitespace($this->str[$this->last]) && $j > 1) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

465
            if (Context::isWhitespace(/** @scrutinizer ignore-type */ $this->str[$this->last]) && $j > 1) {
Loading history...
466
                // Whitespace between label and :
467
                // The size of the keyword didn't increase.
468 820
                --$j;
469 1056
            } elseif (Context::isSeparator($this->str[$this->last])) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSeparator() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

469
            } elseif (Context::isSeparator(/** @scrutinizer ignore-type */ $this->str[$this->last])) {
Loading history...
470
                // Any other separator
471 804
                break;
472
            }
473
474 1052
            $token .= $this->str[$this->last];
475
        }
476
477 1056
        $this->last = $iEnd;
478
479 1056
        return $ret;
480
    }
481
482
    /**
483
     * Parses an operator.
484
     */
485 1422
    public function parseOperator(): Token|null
486
    {
487 1422
        $token = '';
488
489
        /**
490
         * Value to be returned.
491
         *
492
         * @var Token
493
         */
494 1422
        $ret = null;
495
496
        /**
497
         * The value of `$this->last` where `$token` ends in `$this->str`.
498
         */
499 1422
        $iEnd = $this->last;
500
501 1422
        for ($j = 1; $j < Context::OPERATOR_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
502 1422
            $token .= $this->str[$this->last];
503 1422
            $flags = Context::isOperator($token);
504
505 1422
            if (! $flags) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
506 1418
                continue;
507
            }
508
509 1014
            $ret = new Token($token, TokenType::Operator, $flags);
510 1014
            $iEnd = $this->last;
511
        }
512
513 1422
        $this->last = $iEnd;
514
515 1422
        return $ret;
516
    }
517
518
    /**
519
     * Parses a whitespace.
520
     */
521 1422
    public function parseWhitespace(): Token|null
522
    {
523 1422
        $token = $this->str[$this->last];
524
525 1422
        if (! Context::isWhitespace($token)) {
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

525
        if (! Context::isWhitespace(/** @scrutinizer ignore-type */ $token)) {
Loading history...
526 1422
            return null;
527
        }
528
529 1384
        while (++$this->last < $this->len && Context::isWhitespace($this->str[$this->last])) {
530 268
            $token .= $this->str[$this->last];
531
        }
532
533 1384
        --$this->last;
534
535 1384
        return new Token($token, TokenType::Whitespace);
536
    }
537
538
    /**
539
     * Parses a comment.
540
     */
541 1422
    public function parseComment(): Token|null
542
    {
543 1422
        $iBak = $this->last;
544 1422
        $token = $this->str[$this->last];
545
546
        // Bash style comments. (#comment\n)
547 1422
        if (Context::isComment($token)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression PhpMyAdmin\SqlParser\Context::isComment($token) of type integer|null is loosely compared to true; this is ambiguous if the integer can be 0. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isComment() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

547
        if (Context::isComment(/** @scrutinizer ignore-type */ $token)) {
Loading history...
548 6
            while (++$this->last < $this->len && $this->str[$this->last] !== "\n") {
549 6
                $token .= $this->str[$this->last];
550
            }
551
552
            // Include trailing \n as whitespace token
553 6
            if ($this->last < $this->len) {
554 6
                --$this->last;
555
            }
556
557 6
            return new Token($token, TokenType::Comment, Token::FLAG_COMMENT_BASH);
558
        }
559
560
        // C style comments. (/*comment*\/)
561 1422
        if (++$this->last < $this->len) {
562 1418
            $token .= $this->str[$this->last];
563 1418
            if (Context::isComment($token)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression PhpMyAdmin\SqlParser\Context::isComment($token) of type integer|null is loosely compared to true; this is ambiguous if the integer can be 0. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
564
                // There might be a conflict with "*" operator here, when string is "*/*".
565
                // This can occurs in the following statements:
566
                // - "SELECT */* comment */ FROM ..."
567
                // - "SELECT 2*/* comment */3 AS `six`;"
568 100
                $next = $this->last + 1;
569 100
                if (($next < $this->len) && $this->str[$next] === '*') {
570
                    // Conflict in "*/*": first "*" was not for ending a comment.
571
                    // Stop here and let other parsing method define the true behavior of that first star.
572 2
                    $this->last = $iBak;
573
574 2
                    return null;
575
                }
576
577 100
                $flags = Token::FLAG_COMMENT_C;
578
579
                // This comment already ended. It may be a part of a
580
                // previous MySQL specific command.
581 100
                if ($token === '*/') {
582 36
                    return new Token($token, TokenType::Comment, $flags);
583
                }
584
585
                // Checking if this is a MySQL-specific command.
586 98
                if ($this->last + 1 < $this->len && $this->str[$this->last + 1] === '!') {
587 34
                    $flags |= Token::FLAG_COMMENT_MYSQL_CMD;
588 34
                    $token .= $this->str[++$this->last];
589
590
                    while (
591 34
                        ++$this->last < $this->len
592 34
                        && $this->str[$this->last] >= '0'
593 34
                        && $this->str[$this->last] <= '9'
594
                    ) {
595 32
                        $token .= $this->str[$this->last];
596
                    }
597
598 34
                    --$this->last;
599
600
                    // We split this comment and parse only its beginning
601
                    // here.
602 34
                    return new Token($token, TokenType::Comment, $flags);
603
                }
604
605
                // Parsing the comment.
606
                while (
607 68
                    ++$this->last < $this->len
608 68
                    && (
609 68
                        $this->str[$this->last - 1] !== '*'
610 68
                        || $this->str[$this->last] !== '/'
611 68
                    )
612
                ) {
613 68
                    $token .= $this->str[$this->last];
614
                }
615
616
                // Adding the ending.
617 68
                if ($this->last < $this->len) {
618 68
                    $token .= $this->str[$this->last];
619
                }
620
621 68
                return new Token($token, TokenType::Comment, $flags);
622
            }
623
        }
624
625
        // SQL style comments. (-- comment\n)
626 1422
        if (++$this->last < $this->len) {
627 1416
            $token .= $this->str[$this->last];
628 1416
            $end = false;
629
        } else {
630 412
            --$this->last;
631 412
            $end = true;
632
        }
633
634 1422
        if (Context::isComment($token, $end)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression PhpMyAdmin\SqlParser\Con...isComment($token, $end) of type integer|null is loosely compared to true; this is ambiguous if the integer can be 0. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
635
            // Checking if this comment did not end already (```--\n```).
636 70
            if ($this->str[$this->last] !== "\n") {
637 70
                while (++$this->last < $this->len && $this->str[$this->last] !== "\n") {
638 70
                    $token .= $this->str[$this->last];
639
                }
640
            }
641
642
            // Include trailing \n as whitespace token
643 70
            if ($this->last < $this->len) {
644 62
                --$this->last;
645
            }
646
647 70
            return new Token($token, TokenType::Comment, Token::FLAG_COMMENT_SQL);
648
        }
649
650 1422
        $this->last = $iBak;
651
652 1422
        return null;
653
    }
654
655
    /**
656
     * Parses a boolean.
657
     */
658 1406
    public function parseBool(): Token|null
659
    {
660 1406
        if ($this->last + 3 >= $this->len) {
661
            // At least `min(strlen('TRUE'), strlen('FALSE'))` characters are
662
            // required.
663 310
            return null;
664
        }
665
666 1406
        $iBak = $this->last;
667 1406
        $token = $this->str[$this->last] . $this->str[++$this->last]
668 1406
        . $this->str[++$this->last] . $this->str[++$this->last]; // _TRUE_ or _FALS_e
669
670 1406
        if (Context::isBool($token)) {
671 4
            return new Token($token, TokenType::Bool);
672
        }
673
674 1406
        if (++$this->last < $this->len) {
675 1402
            $token .= $this->str[$this->last]; // fals_E_
676 1402
            if (Context::isBool($token)) {
677 6
                return new Token($token, TokenType::Bool, 1);
678
            }
679
        }
680
681 1406
        $this->last = $iBak;
682
683 1406
        return null;
684
    }
685
686
    /**
687
     * Parses a number.
688
     */
689 1422
    public function parseNumber(): Token|null
690
    {
691
        // A rudimentary state machine is being used to parse numbers due to
692
        // the various forms of their notation.
693
        //
694
        // Below are the states of the machines and the conditions to change
695
        // the state.
696
        //
697
        //      1 --------------------[ + or - ]-------------------> 1
698
        //      1 -------------------[ 0x or 0X ]------------------> 2
699
        //      1 --------------------[ 0 to 9 ]-------------------> 3
700
        //      1 -----------------------[ . ]---------------------> 4
701
        //      1 -----------------------[ b ]---------------------> 7
702
        //
703
        //      2 --------------------[ 0 to F ]-------------------> 2
704
        //
705
        //      3 --------------------[ 0 to 9 ]-------------------> 3
706
        //      3 -----------------------[ . ]---------------------> 4
707
        //      3 --------------------[ e or E ]-------------------> 5
708
        //
709
        //      4 --------------------[ 0 to 9 ]-------------------> 4
710
        //      4 --------------------[ e or E ]-------------------> 5
711
        //
712
        //      5 ---------------[ + or - or 0 to 9 ]--------------> 6
713
        //
714
        //      7 -----------------------[ ' ]---------------------> 8
715
        //
716
        //      8 --------------------[ 0 or 1 ]-------------------> 8
717
        //      8 -----------------------[ ' ]---------------------> 9
718
        //
719
        // State 1 may be reached by negative numbers.
720
        // State 2 is reached only by hex numbers.
721
        // State 4 is reached only by float numbers.
722
        // State 5 is reached only by numbers in approximate form.
723
        // State 7 is reached only by numbers in bit representation.
724
        //
725
        // Valid final states are: 2, 3, 4 and 6. Any parsing that finished in a
726
        // state other than these is invalid.
727
        // Also, negative states are invalid states.
728 1422
        $iBak = $this->last;
729 1422
        $token = '';
730 1422
        $flags = 0;
731 1422
        $state = 1;
732 1422
        for (; $this->last < $this->len; ++$this->last) {
733 1422
            if ($state === 1) {
734 1422
                if ($this->str[$this->last] === '-') {
735 70
                    $flags |= Token::FLAG_NUMBER_NEGATIVE;
736
                } elseif (
737 1422
                    $this->last + 1 < $this->len
738 1422
                    && $this->str[$this->last] === '0'
739 1422
                    && $this->str[$this->last + 1] === 'x'
740
                ) {
741 4
                    $token .= $this->str[$this->last++];
742 4
                    $state = 2;
743 1422
                } elseif ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9') {
744 632
                    $state = 3;
745 1420
                } elseif ($this->str[$this->last] === '.') {
746 220
                    $state = 4;
747 1420
                } elseif ($this->str[$this->last] === 'b') {
748 108
                    $state = 7;
749 1420
                } elseif ($this->str[$this->last] !== '+') {
750
                    // `+` is a valid character in a number.
751 1421
                    break;
752
                }
753 732
            } elseif ($state === 2) {
754 4
                $flags |= Token::FLAG_NUMBER_HEX;
755
                if (
756
                    ! (
757 4
                        ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
758 4
                        || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'F')
759 4
                        || ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'f')
760
                    )
761
                ) {
762 4
                    break;
763
                }
764 732
            } elseif ($state === 3) {
765 572
                if ($this->str[$this->last] === '.') {
766 12
                    $state = 4;
767 570
                } elseif ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
768 2
                    $state = 5;
769
                } elseif (
770 570
                    ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'z')
771 570
                    || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'Z')
772
                ) {
773
                    // A number can't be directly followed by a letter
774 10
                    $state = -$state;
775 566
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
776
                    // Just digits and `.`, `e` and `E` are valid characters.
777 564
                    break;
778
                }
779 316
            } elseif ($state === 4) {
780 230
                $flags |= Token::FLAG_NUMBER_FLOAT;
781 230
                if ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
782 14
                    $state = 5;
783
                } elseif (
784 230
                    ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'z')
785 230
                    || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'Z')
786
                ) {
787
                    // A number can't be directly followed by a letter
788 170
                    $state = -$state;
789 92
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
790
                    // Just digits, `e` and `E` are valid characters.
791 160
                    break;
792
                }
793 266
            } elseif ($state === 5) {
794 14
                $flags |= Token::FLAG_NUMBER_APPROXIMATE;
795
                if (
796 14
                    $this->str[$this->last] === '+' || $this->str[$this->last] === '-'
797 14
                    || ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
798
                ) {
799 2
                    $state = 6;
800
                } elseif (
801 14
                    ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'z')
802 14
                    || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'Z')
803
                ) {
804
                    // A number can't be directly followed by a letter
805 14
                    $state = -$state;
806
                } else {
807 7
                    break;
808
                }
809 266
            } elseif ($state === 6) {
810 2
                if ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
811
                    // Just digits are valid characters.
812 2
                    break;
813
                }
814 266
            } elseif ($state === 7) {
815 106
                $flags |= Token::FLAG_NUMBER_BINARY;
816 106
                if ($this->str[$this->last] !== '\'') {
817 104
                    break;
818
                }
819
820 2
                $state = 8;
821 180
            } elseif ($state === 8) {
822 2
                if ($this->str[$this->last] === '\'') {
823 2
                    $state = 9;
824 2
                } elseif ($this->str[$this->last] !== '0' && $this->str[$this->last] !== '1') {
825 2
                    break;
826
                }
827 180
            } elseif ($state === 9) {
828 2
                break;
829
            }
830
831 816
            $token .= $this->str[$this->last];
832
        }
833
834 1422
        if ($state === 2 || $state === 3 || ($token !== '.' && $state === 4) || $state === 6 || $state === 9) {
835 632
            --$this->last;
836
837 632
            return new Token($token, TokenType::Number, $flags);
838
        }
839
840 1422
        $this->last = $iBak;
841
842 1422
        return null;
843
    }
844
845
    /**
846
     * Parses a string.
847
     *
848
     * @param string $quote additional starting symbol
849
     *
850
     * @throws LexerException
851
     */
852 1406
    public function parseString($quote = ''): Token|null
853
    {
854 1406
        $token = $this->str[$this->last];
855 1406
        $flags = Context::isString($token);
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isString() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

855
        $flags = Context::isString(/** @scrutinizer ignore-type */ $token);
Loading history...
856
857 1406
        if (! $flags && $token !== $quote) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
858 1406
            return null;
859
        }
860
861 692
        $quote = $token;
862
863 692
        while (++$this->last < $this->len) {
864
            if (
865 692
                $this->last + 1 < $this->len
866
                && (
867 692
                    ($this->str[$this->last] === $quote && $this->str[$this->last + 1] === $quote)
868 692
                    || ($this->str[$this->last] === '\\' && $quote !== '`')
869
                )
870
            ) {
871 30
                $token .= $this->str[$this->last] . $this->str[++$this->last];
872
            } else {
873 692
                if ($this->str[$this->last] === $quote) {
874 688
                    break;
875
                }
876
877 686
                $token .= $this->str[$this->last];
878
            }
879
        }
880
881 692
        if ($this->last >= $this->len || $this->str[$this->last] !== $quote) {
882 14
            $this->error(
883 14
                sprintf(
884 14
                    Translator::gettext('Ending quote %1$s was expected.'),
885 14
                    $quote
886 14
                ),
887 14
                '',
888 14
                $this->last
889 14
            );
890
        } else {
891 688
            $token .= $this->str[$this->last];
892
        }
893
894 692
        return new Token($token, TokenType::String, $flags);
895
    }
896
897
    /**
898
     * Parses a symbol.
899
     *
900
     * @throws LexerException
901
     */
902 1406
    public function parseSymbol(): Token|null
903
    {
904 1406
        $token = $this->str[$this->last];
905 1406
        $flags = Context::isSymbol($token);
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSymbol() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

905
        $flags = Context::isSymbol(/** @scrutinizer ignore-type */ $token);
Loading history...
906
907 1406
        if (! $flags) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
908 1404
            return null;
909
        }
910
911 456
        if ($flags & Token::FLAG_SYMBOL_VARIABLE) {
912 122
            if ($this->last + 1 < $this->len && $this->str[++$this->last] === '@') {
913
                // This is a system variable (e.g. `@@hostname`).
914 26
                $token .= $this->str[$this->last++];
915 74
                $flags |= Token::FLAG_SYMBOL_SYSTEM;
916
            }
917 366
        } elseif ($flags & Token::FLAG_SYMBOL_PARAMETER) {
918 6
            if ($token !== '?' && $this->last + 1 < $this->len) {
919 5
                ++$this->last;
920
            }
921
        } else {
922 362
            $token = '';
923
        }
924
925 456
        $str = null;
926
927 456
        if ($this->last < $this->len) {
928 456
            $str = $this->parseString('`');
929
930 456
            if ($str === null) {
931 88
                $str = $this->parseUnknown();
932
933 88
                if ($str === null) {
934 6
                    $this->error('Variable name was expected.', $this->str[$this->last], $this->last);
935
                }
936
            }
937
        }
938
939 456
        if ($str !== null) {
940 452
            $token .= $str->token;
941
        }
942
943 456
        return new Token($token, TokenType::Symbol, $flags);
944
    }
945
946
    /**
947
     * Parses unknown parts of the query.
948
     */
949 1080
    public function parseUnknown(): Token|null
950
    {
951 1080
        $token = $this->str[$this->last];
952 1080
        if (Context::isSeparator($token)) {
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSeparator() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

952
        if (Context::isSeparator(/** @scrutinizer ignore-type */ $token)) {
Loading history...
953 12
            return null;
954
        }
955
956 1076
        while (++$this->last < $this->len && ! Context::isSeparator($this->str[$this->last])) {
957 1044
            $token .= $this->str[$this->last];
958
959
            // Test if end of token equals the current delimiter. If so, remove it from the token.
960 1044
            if (str_ends_with($token, $this->delimiter)) {
961 4
                $token = substr($token, 0, -$this->delimiterLen);
962 4
                $this->last -= $this->delimiterLen - 1;
963 4
                break;
964
            }
965
        }
966
967 1076
        --$this->last;
968
969 1076
        return new Token($token);
970
    }
971
972
    /**
973
     * Parses the delimiter of the query.
974
     */
975 1422
    public function parseDelimiter(): Token|null
976
    {
977 1422
        $idx = 0;
978
979 1422
        while ($idx < $this->delimiterLen && $this->last + $idx < $this->len) {
980 1422
            if ($this->delimiter[$idx] !== $this->str[$this->last + $idx]) {
981 1422
                return null;
982
            }
983
984 572
            ++$idx;
985
        }
986
987 572
        $this->last += $this->delimiterLen - 1;
988
989 572
        return new Token($this->delimiter, TokenType::Delimiter);
990
    }
991
992 1422
    private function parse(): Token|null
993
    {
994
        // It is best to put the parsers in order of their complexity
995
        // (ascending) and their occurrence rate (descending).
996
        //
997
        // Conflicts:
998
        //
999
        // 1. `parseDelimiter`, `parseUnknown`, `parseKeyword`, `parseNumber`
1000
        // They fight over delimiter. The delimiter may be a keyword, a
1001
        // number or almost any character which makes the delimiter one of
1002
        // the first tokens that must be parsed.
1003
        //
1004
        // 1. `parseNumber` and `parseOperator`
1005
        // They fight over `+` and `-`.
1006
        //
1007
        // 2. `parseComment` and `parseOperator`
1008
        // They fight over `/` (as in ```/*comment*/``` or ```a / b```)
1009
        //
1010
        // 3. `parseBool` and `parseKeyword`
1011
        // They fight over `TRUE` and `FALSE`.
1012
        //
1013
        // 4. `parseKeyword` and `parseUnknown`
1014
        // They fight over words. `parseUnknown` does not know about
1015
        // keywords.
1016
1017 1422
        return $this->parseDelimiter()
1018 1422
            ?? $this->parseWhitespace()
1019 1422
            ?? $this->parseNumber()
1020 1422
            ?? $this->parseComment()
1021 1422
            ?? $this->parseOperator()
0 ignored issues
show
Bug introduced by
Are you sure the usage of $this->parseOperator() targeting PhpMyAdmin\SqlParser\Lexer::parseOperator() seems to always return null.

This check looks for function or method calls that always return null and whose return value is used.

class A
{
    function getObject()
    {
        return null;
    }

}

$a = new A();
if ($a->getObject()) {

The method getObject() can return nothing but null, so it makes no sense to use the return value.

The reason is most likely that a function or method is imcomplete or has been reduced for debug purposes.

Loading history...
1022 1422
            ?? $this->parseBool()
1023 1422
            ?? $this->parseString()
1024 1422
            ?? $this->parseSymbol()
1025 1422
            ?? $this->parseKeyword()
0 ignored issues
show
Bug introduced by
Are you sure the usage of $this->parseKeyword() targeting PhpMyAdmin\SqlParser\Lexer::parseKeyword() seems to always return null.

This check looks for function or method calls that always return null and whose return value is used.

class A
{
    function getObject()
    {
        return null;
    }

}

$a = new A();
if ($a->getObject()) {

The method getObject() can return nothing but null, so it makes no sense to use the return value.

The reason is most likely that a function or method is imcomplete or has been reduced for debug purposes.

Loading history...
1026 1422
            ?? $this->parseLabel()
1027 1422
            ?? $this->parseUnknown();
1028
    }
1029
}
1030