Passed
Pull Request — master (#521)
by
unknown
11:34
created

Lexer::setDelimiter()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 4
Code Lines 2

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 1
CRAP Score 1

Importance

Changes 0
Metric Value
cc 1
eloc 2
nc 1
nop 1
dl 0
loc 4
ccs 1
cts 1
cp 1
crap 1
rs 10
c 0
b 0
f 0
1
<?php
2
3
declare(strict_types=1);
4
5
namespace PhpMyAdmin\SqlParser;
6
7
use Exception;
8
use PhpMyAdmin\SqlParser\Exceptions\LexerException;
9
10
use function in_array;
11
use function mb_strlen;
12
use function sprintf;
13
use function str_ends_with;
14
use function strlen;
15
use function substr;
16
17
/**
18
 * Defines the lexer of the library.
19
 *
20
 * This is one of the most important components, along with the parser.
21
 *
22
 * Depends on context to extract lexemes.
23
 *
24
 * Performs lexical analysis over a SQL statement and splits it in multiple tokens.
25
 *
26
 * The output of the lexer is affected by the context of the SQL statement.
27
 *
28
 * @see Context
29
 */
30
class Lexer
31
{
32
    /**
33
     * Whether errors should throw exceptions or just be stored.
34
     */
35
    private bool $strict = false;
36
37
    /**
38
     * List of errors that occurred during lexing.
39
     *
40
     * Usually, the lexing does not stop once an error occurred because that
41
     * error might be false positive or a partial result (even a bad one)
42
     * might be needed.
43
     *
44
     * @var Exception[]
45
     */
46
    public array $errors = [];
47
48
    /**
49
     * A list of keywords that indicate that the function keyword
50
     * is not used as a function
51
     */
52
    private const KEYWORD_NAME_INDICATORS = [
53
        'FROM',
54
        'SET',
55
        'WHERE',
56
    ];
57
58
    /**
59
     * A list of operators that indicate that the function keyword
60
     * is not used as a function
61
     */
62
    private const OPERATOR_NAME_INDICATORS = [
63
        ',',
64
        '.',
65
    ];
66
67
    /**
68
     * The string to be parsed.
69
     *
70
     * @var string|UtfString
71
     */
72
    public $str = '';
73
74
    /**
75
     * The length of `$str`.
76
     *
77
     * By storing its length, a lot of time is saved, because parsing methods
78
     * would call `strlen` everytime.
79
     *
80
     * @var int
81
     */
82
    public $len = 0;
83
84
    /**
85
     * The index of the last parsed character.
86
     *
87
     * @var int
88
     */
89
    public $last = 0;
90
91
    /**
92
     * Tokens extracted from given strings.
93
     *
94
     * @var TokensList
95
     */
96
    public $list;
97
98
    /**
99
     * The default delimiter. This is used, by default, in all new instances.
100
     *
101
     * @var string
102
     */
103
    public static $defaultDelimiter = ';';
104
105
    /**
106
     * Statements delimiter.
107
     * This may change during lexing.
108
     *
109
     * @var string
110
     */
111
    public $delimiter;
112
113 1444
    /**
114
     * The length of the delimiter.
115 1444
     *
116
     * Because `parseDelimiter` can be called a lot, it would perform a lot of
117
     * calls to `strlen`, which might affect performance when the delimiter is
118
     * big.
119 1444
     *
120
     * @var int
121
     */
122 1444
    public $delimiterLen;
123 10
124
    /**
125
     * @param string|UtfString $str       the query to be lexed
126 1444
     * @param bool             $strict    whether strict mode should be
127 1444
     *                                    enabled or not
128
     * @param string           $delimiter the delimiter to be used
129 1444
     */
130
    public function __construct(string|UtfString $str, bool $strict = false, string|null $delimiter = null)
131
    {
132 1444
        if (Context::$keywords === []) {
133
            Context::load();
134 1444
        }
135
136
        // `strlen` is used instead of `mb_strlen` because the lexer needs to
137
        // parse each byte of the input.
138
        $len = $str instanceof UtfString ? $str->length() : strlen($str);
139
140
        // For multi-byte strings, a new instance of `UtfString` is initialized.
141
        if (! $str instanceof UtfString && $len !== mb_strlen($str, 'UTF-8')) {
142 1444
            $str = new UtfString($str);
143
        }
144 1444
145 1444
        $this->str = $str;
146
        $this->len = $str instanceof UtfString ? $str->length() : $len;
147
148
        $this->strict = $strict;
149
150
        // Setting the delimiter.
151 1444
        $this->setDelimiter(! empty($delimiter) ? $delimiter : static::$defaultDelimiter);
152
153
        $this->lex();
154
    }
155
156
    /**
157
     * Sets the delimiter.
158
     *
159
     * @param string $delimiter the new delimiter
160
     */
161
    public function setDelimiter(string $delimiter): void
162 1444
    {
163
        $this->delimiter = $delimiter;
164
        $this->delimiterLen = strlen($delimiter);
165
    }
166
167 1444
    /**
168
     * Parses the string and extracts lexemes.
169 1444
     */
170 1434
    public function lex(): void
171
    {
172 1434
        // TODO: Sometimes, static::parse* functions make unnecessary calls to
173
        // is* functions. For a better performance, some rules can be deduced
174 6
        // from context.
175 6
        // For example, in `parseBool` there is no need to compare the token
176
        // every time with `true` and `false`. The first step would be to
177 1434
        // compare with 'true' only and just after that add another letter from
178 1434
        // context and compare again with `false`.
179 1434
        // Another example is `parseComment`.
180
181 1434
        $list = new TokensList();
182 1434
183 1434
        /**
184 1434
         * Last processed token.
185 1434
         */
186
        $lastToken = null;
187
188
        for ($this->last = 0, $lastIdx = 0; $this->last < $this->len; $lastIdx = ++$this->last) {
189 46
            $token = $this->parse();
190 46
191 46
            if ($token === null) {
192 46
                // @assert($this->last === $lastIdx);
193 46
                $token = new Token($this->str[$this->last]);
194
                $this->error('Unexpected character.', $this->str[$this->last], $this->last);
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $str of PhpMyAdmin\SqlParser\Lexer::error() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

194
                $this->error('Unexpected character.', /** @scrutinizer ignore-type */ $this->str[$this->last], $this->last);
Loading history...
195 1434
            } elseif (
196 1434
                $lastToken !== null
197 1434
                && $token->type === TokenType::Symbol
198 1434
                && $token->flags & Token::FLAG_SYMBOL_VARIABLE
199
                && (
200
                    $lastToken->type === TokenType::String
201
                    || (
202 30
                        $lastToken->type === TokenType::Symbol
203 30
                        && $lastToken->flags & Token::FLAG_SYMBOL_BACKTICK
204 30
                    )
205
                )
206
            ) {
207 1434
                // Handles ```... FROM 'user'@'%' ...```.
208
                $lastToken->token .= $token->token;
209 1434
                $lastToken->type = TokenType::Symbol;
210
                $lastToken->flags = Token::FLAG_SYMBOL_USER;
211
                $lastToken->value .= '@' . $token->value;
212 1434
                continue;
213 36
            } elseif (
214 2
                $lastToken !== null
215 2
                && $token->type === TokenType::Keyword
216
                && $lastToken->type === TokenType::Operator
217
                && $lastToken->value === '.'
218
            ) {
219
                // Handles ```... tbl.FROM ...```. In this case, FROM is not
220 34
                // a reserved word.
221 34
                $token->type = TokenType::None;
222
                $token->flags = 0;
223 34
                $token->value = $token->token;
224 32
            }
225 32
226
            $token->position = $lastIdx;
227
228
            $list->tokens[$list->count++] = $token;
229 34
230 2
            // Handling delimiters.
231 2
            if ($token->type === TokenType::None && $token->value === 'DELIMITER') {
232
                if ($this->last + 1 >= $this->len) {
233
                    $this->error('Expected whitespace(s) before delimiter.', '', $this->last + 1);
234 32
                    continue;
235
                }
236
237 32
                // Skipping last R (from `delimiteR`) and whitespaces between
238 32
                // the keyword `DELIMITER` and the actual delimiter.
239
                $pos = ++$this->last;
240 32
                $token = $this->parseWhitespace();
241 32
242 32
                if ($token !== null) {
243
                    $token->position = $pos;
244 30
                    $list->tokens[$list->count++] = $token;
245 30
                }
246
247
                // Preparing the token that holds the new delimiter.
248 32
                if ($this->last + 1 >= $this->len) {
249 2
                    $this->error('Expected delimiter.', '', $this->last + 1);
250 2
                    continue;
251
                }
252
253 32
                $pos = $this->last + 1;
254
255
                // Parsing the delimiter.
256 32
                $this->delimiter = null;
257 32
                $delimiterLen = 0;
258 32
                while (
259 32
                    ++$this->last < $this->len
260
                    && ! Context::isWhitespace($this->str[$this->last])
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

260
                    && ! Context::isWhitespace(/** @scrutinizer ignore-type */ $this->str[$this->last])
Loading history...
261
                    && $delimiterLen < 15
262 1430
                ) {
263
                    $this->delimiter .= $this->str[$this->last];
264
                    ++$delimiterLen;
265
                }
266 1444
267
                if (empty($this->delimiter)) {
268
                    $this->error('Expected delimiter.', '', $this->last);
269 1444
                    $this->delimiter = ';';
270
                }
271 1444
272 1444
                --$this->last;
273
274
                // Saving the delimiter and its token.
275
                $this->delimiterLen = strlen($this->delimiter);
276
                $token = new Token($this->delimiter, TokenType::Delimiter);
277
                $token->position = $pos;
278
                $list->tokens[$list->count++] = $token;
279
            }
280
281
            $lastToken = $token;
282
        }
283
284
        // Adding a final delimiter to mark the ending.
285
        $list->tokens[$list->count++] = new Token(null, TokenType::Delimiter);
286
287
        // Saving the tokens list.
288 1444
        $this->list = $list;
289
290 1444
        $this->solveAmbiguityOnStarOperator();
291 1444
        $this->solveAmbiguityOnFunctionKeywords();
292
    }
293 198
294
    /**
295 198
     * Resolves the ambiguity when dealing with the "*" operator.
296
     *
297
     * In SQL statements, the "*" operator can be an arithmetic operator (like in 2*3) or an SQL wildcard (like in
298
     * SELECT a.* FROM ...). To solve this ambiguity, the solution is to find the next token, excluding whitespaces and
299
     * comments, right after the "*" position. The "*" is for sure an SQL wildcard if the next token found is any of:
300 198
     * - "FROM" (the FROM keyword like in "SELECT * FROM...");
301 198
     * - "USING" (the USING keyword like in "DELETE table_name.* USING...");
302
     * - "," (a comma separator like in "SELECT *, field FROM...");
303 16
     * - ")" (a closing parenthesis like in "COUNT(*)").
304
     * This methods will change the flag of the "*" tokens when any of those condition above is true. Otherwise, the
305
     * default flag (arithmetic) will be kept.
306 184
     */
307
    private function solveAmbiguityOnStarOperator(): void
308
    {
309 1444
        $iBak = $this->list->idx;
310
        while (($starToken = $this->list->getNextOfTypeAndValue(TokenType::Operator, '*')) !== null) {
311
            // getNext() already gets rid of whitespaces and comments.
312
            $next = $this->list->getNext();
313
314
            if ($next === null) {
315
                continue;
316
            }
317
318
            if (
319
                ($next->type !== TokenType::Keyword || ! in_array($next->value, ['FROM', 'USING'], true))
320
                && ($next->type !== TokenType::Operator || ! in_array($next->value, [',', ')'], true))
321
            ) {
322
                continue;
323
            }
324
325
            $starToken->flags = Token::FLAG_OPERATOR_SQL;
326
        }
327
328
        $this->list->idx = $iBak;
329
    }
330
331 1444
    /**
332
     * Resolves the ambiguity when dealing with the functions keywords.
333 1444
     *
334 1444
     * In SQL statements, the function keywords might be used as table names or columns names.
335 1444
     * To solve this ambiguity, the solution is to find the next token, excluding whitespaces and
336 214
     * comments, right after the function keyword position. The function keyword is for sure used
337
     * as column name or table name if the next token found is any of:
338 214
     *
339 214
     * - "FROM" (the FROM keyword like in "SELECT Country x, AverageSalary avg FROM...");
340
     * - "WHERE" (the WHERE keyword like in "DELETE FROM emp x WHERE x.salary = 20");
341 214
     * - "SET" (the SET keyword like in "UPDATE Country x, City y set x.Name=x.Name");
342 214
     * - "," (a comma separator like 'x,' in "UPDATE Country x, City y set x.Name=x.Name");
343
     * - "." (a dot separator like in "x.asset_id FROM (SELECT evt.asset_id FROM evt)".
344 214
     * - "NULL" (when used as a table alias like in "avg.col FROM (SELECT ev.col FROM ev) avg").
345
     *
346 204
     * This method will change the flag of the function keyword tokens when any of those
347
     * condition above is true. Otherwise, the
348
     * default flag (function keyword) will be kept.
349 12
     */
350 12
    private function solveAmbiguityOnFunctionKeywords(): void
351 12
    {
352
        $iBak = $this->list->idx;
353
        $keywordFunction = TokenType::Keyword->value | Token::FLAG_KEYWORD_FUNCTION;
354 1444
        while (($keywordToken = $this->list->getNextOfTypeAndFlag(TokenType::Keyword, $keywordFunction)) !== null) {
355
            $next = $this->list->getNext();
356
            if (
357
                ($next->type !== TokenType::Keyword
358
                    || ! in_array($next->value, self::KEYWORD_NAME_INDICATORS, true)
359
                )
360
                && ($next->type !== TokenType::Operator
361
                    || ! in_array($next->value, self::OPERATOR_NAME_INDICATORS, true)
362
                )
363
                && ($next->value !== null)
364
            ) {
365
                continue;
366
            }
367 34
368
            $keywordToken->type = TokenType::None;
369 34
            $keywordToken->flags = Token::FLAG_NONE;
370 34
            $keywordToken->keyword = $keywordToken->value;
371 34
        }
372 34
373 34
        $this->list->idx = $iBak;
374 34
    }
375 34
376
    /**
377
     * Creates a new error log.
378
     *
379
     * @param string $msg  the error message
380
     * @param string $str  the character that produced the error
381 1416
     * @param int    $pos  the position of the character
382
     * @param int    $code the code of the error
383 1416
     *
384
     * @throws LexerException throws the exception, if strict mode is enabled.
385
     */
386
    public function error(string $msg, string $str = '', int $pos = 0, int $code = 0): void
387
    {
388
        $error = new LexerException(
389
            Translator::gettext($msg),
390 1416
            $str,
391
            $pos,
392
            $code
393
        );
394
395 1416
        if ($this->strict) {
396
            throw $error;
397
        }
398
399
        $this->errors[] = $error;
400
    }
401
402 1416
    /**
403
     * Parses a keyword.
404 1416
     */
405
    public function parseKeyword(): Token|null
406
    {
407 1416
        $token = '';
408 1380
409 270
        /**
410 270
         * Value to be returned.
411
         *
412
         * @var Token
413 1380
         */
414
        $ret = null;
415 1416
416
        /**
417
         * The value of `$this->last` where `$token` ends in `$this->str`.
418 1416
         */
419 1416
        $iEnd = $this->last;
420
421 1416
        /**
422 1416
         * Whether last parsed character is a whitespace.
423
         *
424
         * @var bool
425 1380
         */
426 1380
        $lastSpace = false;
427
428
        for ($j = 1; $j < Context::KEYWORD_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
429
            // Composed keywords shouldn't have more than one whitespace between
430
            // keywords.
431
            if (Context::isWhitespace($this->str[$this->last])) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

431
            if (Context::isWhitespace(/** @scrutinizer ignore-type */ $this->str[$this->last])) {
Loading history...
432
                if ($lastSpace) {
433 1416
                    --$j; // The size of the keyword didn't increase.
434
                    continue;
435 1416
                }
436
437
                $lastSpace = true;
438
            } else {
439
                $lastSpace = false;
440
            }
441 1064
442
            $token .= $this->str[$this->last];
443 1064
            $flags = Context::isKeyword($token);
444
445
            if (($this->last + 1 !== $this->len && ! Context::isSeparator($this->str[$this->last + 1])) || ! $flags) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last + 1] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSeparator() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

445
            if (($this->last + 1 !== $this->len && ! Context::isSeparator(/** @scrutinizer ignore-type */ $this->str[$this->last + 1])) || ! $flags) {
Loading history...
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
446
                continue;
447
            }
448
449
            $ret = new Token($token, TokenType::Keyword, $flags);
450 1064
            $iEnd = $this->last;
451
452
            // We don't break so we find longest keyword.
453
            // For example, `OR` and `ORDER` have a common prefix `OR`.
454
            // If we stopped at `OR`, the parsing would be invalid.
455 1064
        }
456 1064
457 1064
        $this->last = $iEnd;
458
459 4
        return $ret;
460 4
    }
461 4
462 4
    /**
463
     * Parses a label.
464
     */
465 1064
    public function parseLabel(): Token|null
466
    {
467
        $token = '';
468 828
469 1064
        /**
470
         * Value to be returned.
471 812
         *
472
         * @var Token
473
         */
474 1060
        $ret = null;
475
476
        /**
477 1064
         * The value of `$this->last` where `$token` ends in `$this->str`.
478
         */
479 1064
        $iEnd = $this->last;
480
        for ($j = 1; $j < Context::LABEL_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
481
            if ($this->str[$this->last] === ':' && $j > 1) {
482
                // End of label
483
                $token .= $this->str[$this->last];
484
                $ret = new Token($token, TokenType::Label);
485 1434
                $iEnd = $this->last;
486
                break;
487 1434
            }
488
489
            if (Context::isWhitespace($this->str[$this->last]) && $j > 1) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

489
            if (Context::isWhitespace(/** @scrutinizer ignore-type */ $this->str[$this->last]) && $j > 1) {
Loading history...
490
                // Whitespace between label and :
491
                // The size of the keyword didn't increase.
492
                --$j;
493
            } elseif (Context::isSeparator($this->str[$this->last])) {
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSeparator() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

493
            } elseif (Context::isSeparator(/** @scrutinizer ignore-type */ $this->str[$this->last])) {
Loading history...
494 1434
                // Any other separator
495
                break;
496
            }
497
498
            $token .= $this->str[$this->last];
499 1434
        }
500
501 1434
        $this->last = $iEnd;
502 1434
503 1434
        return $ret;
504
    }
505 1434
506 1430
    /**
507
     * Parses an operator.
508
     */
509 1026
    public function parseOperator(): Token|null
510 1026
    {
511
        $token = '';
512
513 1434
        /**
514
         * Value to be returned.
515 1434
         *
516
         * @var Token
517
         */
518
        $ret = null;
519
520
        /**
521 1434
         * The value of `$this->last` where `$token` ends in `$this->str`.
522
         */
523 1434
        $iEnd = $this->last;
524
525 1434
        for ($j = 1; $j < Context::OPERATOR_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
526 1434
            $token .= $this->str[$this->last];
527
            $flags = Context::isOperator($token);
528
529 1396
            if (! $flags) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
530 274
                continue;
531
            }
532
533 1396
            $ret = new Token($token, TokenType::Operator, $flags);
534
            $iEnd = $this->last;
535 1396
        }
536
537
        $this->last = $iEnd;
538
539
        return $ret;
540
    }
541 1434
542
    /**
543 1434
     * Parses a whitespace.
544 1434
     */
545
    public function parseWhitespace(): Token|null
546
    {
547 1434
        $token = $this->str[$this->last];
548 6
549 6
        if (! Context::isWhitespace($token)) {
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isWhitespace() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

549
        if (! Context::isWhitespace(/** @scrutinizer ignore-type */ $token)) {
Loading history...
550
            return null;
551
        }
552
553 6
        while (++$this->last < $this->len && Context::isWhitespace($this->str[$this->last])) {
554 6
            $token .= $this->str[$this->last];
555
        }
556
557 6
        --$this->last;
558
559
        return new Token($token, TokenType::Whitespace);
560
    }
561 1434
562 1430
    /**
563 1430
     * Parses a comment.
564
     */
565
    public function parseComment(): Token|null
566
    {
567
        $iBak = $this->last;
568 100
        $token = $this->str[$this->last];
569 100
570
        // Bash style comments. (#comment\n)
571
        if (Context::isComment($token)) {
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isComment() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

571
        if (Context::isComment(/** @scrutinizer ignore-type */ $token)) {
Loading history...
Bug Best Practice introduced by
The expression PhpMyAdmin\SqlParser\Context::isComment($token) of type integer|null is loosely compared to true; this is ambiguous if the integer can be 0. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
572 2
            while (++$this->last < $this->len && $this->str[$this->last] !== "\n") {
573
                $token .= $this->str[$this->last];
574 2
            }
575
576
            // Include trailing \n as whitespace token
577 100
            if ($this->last < $this->len) {
578
                --$this->last;
579
            }
580
581 100
            return new Token($token, TokenType::Comment, Token::FLAG_COMMENT_BASH);
582 36
        }
583
584
        // C style comments. (/*comment*\/)
585
        if (++$this->last < $this->len) {
586 98
            $token .= $this->str[$this->last];
587 34
            if (Context::isComment($token)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression PhpMyAdmin\SqlParser\Context::isComment($token) of type integer|null is loosely compared to true; this is ambiguous if the integer can be 0. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
588 34
                // There might be a conflict with "*" operator here, when string is "*/*".
589
                // This can occurs in the following statements:
590
                // - "SELECT */* comment */ FROM ..."
591 34
                // - "SELECT 2*/* comment */3 AS `six`;"
592 34
                $next = $this->last + 1;
593 34
                if (($next < $this->len) && $this->str[$next] === '*') {
594
                    // Conflict in "*/*": first "*" was not for ending a comment.
595 32
                    // Stop here and let other parsing method define the true behavior of that first star.
596
                    $this->last = $iBak;
597
598 34
                    return null;
599
                }
600
601
                $flags = Token::FLAG_COMMENT_C;
602 34
603
                // This comment already ended. It may be a part of a
604
                // previous MySQL specific command.
605
                if ($token === '*/') {
606
                    return new Token($token, TokenType::Comment, $flags);
607 68
                }
608 68
609 68
                // Checking if this is a MySQL-specific command.
610 68
                if ($this->last + 1 < $this->len && $this->str[$this->last + 1] === '!') {
611 68
                    $flags |= Token::FLAG_COMMENT_MYSQL_CMD;
612
                    $token .= $this->str[++$this->last];
613 68
614
                    while (
615
                        ++$this->last < $this->len
616
                        && $this->str[$this->last] >= '0'
617 68
                        && $this->str[$this->last] <= '9'
618 68
                    ) {
619
                        $token .= $this->str[$this->last];
620
                    }
621 68
622
                    --$this->last;
623
624
                    // We split this comment and parse only its beginning
625
                    // here.
626 1434
                    return new Token($token, TokenType::Comment, $flags);
627 1428
                }
628 1428
629
                // Parsing the comment.
630 418
                while (
631 418
                    ++$this->last < $this->len
632
                    && (
633
                        $this->str[$this->last - 1] !== '*'
634 1434
                        || $this->str[$this->last] !== '/'
635
                    )
636 70
                ) {
637 70
                    $token .= $this->str[$this->last];
638 70
                }
639
640
                // Adding the ending.
641
                if ($this->last < $this->len) {
642
                    $token .= $this->str[$this->last];
643 70
                }
644 62
645
                return new Token($token, TokenType::Comment, $flags);
646
            }
647 70
        }
648
649
        // SQL style comments. (-- comment\n)
650 1434
        if (++$this->last < $this->len) {
651
            $token .= $this->str[$this->last];
652 1434
            $end = false;
653
        } else {
654
            --$this->last;
655
            $end = true;
656
        }
657
658 1418
        if (Context::isComment($token, $end)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression PhpMyAdmin\SqlParser\Con...isComment($token, $end) of type integer|null is loosely compared to true; this is ambiguous if the integer can be 0. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
659
            // Checking if this comment did not end already (```--\n```).
660 1418
            if ($this->str[$this->last] !== "\n") {
661
                while (++$this->last < $this->len && $this->str[$this->last] !== "\n") {
662
                    $token .= $this->str[$this->last];
663 316
                }
664
            }
665
666 1418
            // Include trailing \n as whitespace token
667 1418
            if ($this->last < $this->len) {
668 1418
                --$this->last;
669
            }
670 1418
671 4
            return new Token($token, TokenType::Comment, Token::FLAG_COMMENT_SQL);
672
        }
673
674 1418
        $this->last = $iBak;
675 1414
676 1414
        return null;
677 6
    }
678
679
    /**
680
     * Parses a boolean.
681 1418
     */
682
    public function parseBool(): Token|null
683 1418
    {
684
        if ($this->last + 3 >= $this->len) {
685
            // At least `min(strlen('TRUE'), strlen('FALSE'))` characters are
686
            // required.
687
            return null;
688
        }
689 1434
690
        $iBak = $this->last;
691
        $token = $this->str[$this->last] . $this->str[++$this->last]
692
        . $this->str[++$this->last] . $this->str[++$this->last]; // _TRUE_ or _FALS_e
693
694
        if (Context::isBool($token)) {
695
            return new Token($token, TokenType::Bool);
696
        }
697
698
        if (++$this->last < $this->len) {
699
            $token .= $this->str[$this->last]; // fals_E_
700
            if (Context::isBool($token)) {
701
                return new Token($token, TokenType::Bool, 1);
702
            }
703
        }
704
705
        $this->last = $iBak;
706
707
        return null;
708
    }
709
710
    /**
711
     * Parses a number.
712
     */
713
    public function parseNumber(): Token|null
714
    {
715
        // A rudimentary state machine is being used to parse numbers due to
716
        // the various forms of their notation.
717
        //
718
        // Below are the states of the machines and the conditions to change
719
        // the state.
720
        //
721
        //      1 --------------------[ + or - ]-------------------> 1
722
        //      1 -------------------[ 0x or 0X ]------------------> 2
723
        //      1 --------------------[ 0 to 9 ]-------------------> 3
724
        //      1 -----------------------[ . ]---------------------> 4
725
        //      1 -----------------------[ b ]---------------------> 7
726
        //
727
        //      2 --------------------[ 0 to F ]-------------------> 2
728 1434
        //
729 1434
        //      3 --------------------[ 0 to 9 ]-------------------> 3
730 1434
        //      3 -----------------------[ . ]---------------------> 4
731 1434
        //      3 --------------------[ e or E ]-------------------> 5
732 1434
        //
733 1434
        //      4 --------------------[ 0 to 9 ]-------------------> 4
734 1434
        //      4 --------------------[ e or E ]-------------------> 5
735 70
        //
736
        //      5 ---------------[ + or - or 0 to 9 ]--------------> 6
737 1434
        //
738 1434
        //      7 -----------------------[ ' ]---------------------> 8
739 1434
        //
740
        //      8 --------------------[ 0 or 1 ]-------------------> 8
741 4
        //      8 -----------------------[ ' ]---------------------> 9
742 4
        //
743 1434
        // State 1 may be reached by negative numbers.
744 638
        // State 2 is reached only by hex numbers.
745 1432
        // State 4 is reached only by float numbers.
746 224
        // State 5 is reached only by numbers in approximate form.
747 1432
        // State 7 is reached only by numbers in bit representation.
748 108
        //
749 1432
        // Valid final states are: 2, 3, 4 and 6. Any parsing that finished in a
750
        // state other than these is invalid.
751 1433
        // Also, negative states are invalid states.
752
        $iBak = $this->last;
753 740
        $token = '';
754 4
        $flags = 0;
755
        $state = 1;
756
        for (; $this->last < $this->len; ++$this->last) {
757 4
            if ($state === 1) {
758 4
                if ($this->str[$this->last] === '-') {
759 4
                    $flags |= Token::FLAG_NUMBER_NEGATIVE;
760
                } elseif (
761
                    $this->last + 1 < $this->len
762 4
                    && $this->str[$this->last] === '0'
763
                    && $this->str[$this->last + 1] === 'x'
764 740
                ) {
765 578
                    $token .= $this->str[$this->last++];
766 12
                    $state = 2;
767 576
                } elseif ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9') {
768 2
                    $state = 3;
769
                } elseif ($this->str[$this->last] === '.') {
770 576
                    $state = 4;
771 576
                } elseif ($this->str[$this->last] === 'b') {
772
                    $state = 7;
773
                } elseif ($this->str[$this->last] !== '+') {
774 10
                    // `+` is a valid character in a number.
775 572
                    break;
776
                }
777 570
            } elseif ($state === 2) {
778
                $flags |= Token::FLAG_NUMBER_HEX;
779 320
                if (
780 234
                    ! (
781 234
                        ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
782 14
                        || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'F')
783
                        || ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'f')
784 234
                    )
785 234
                ) {
786
                    break;
787
                }
788 174
            } elseif ($state === 3) {
789 94
                if ($this->str[$this->last] === '.') {
790
                    $state = 4;
791 163
                } elseif ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
792
                    $state = 5;
793 270
                } elseif (
794 14
                    ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'z')
795
                    || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'Z')
796 14
                ) {
797 14
                    // A number can't be directly followed by a letter
798
                    $state = -$state;
799 2
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
800
                    // Just digits and `.`, `e` and `E` are valid characters.
801 14
                    break;
802 14
                }
803
            } elseif ($state === 4) {
804
                $flags |= Token::FLAG_NUMBER_FLOAT;
805 14
                if ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
806
                    $state = 5;
807 7
                } elseif (
808
                    ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'z')
809 270
                    || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'Z')
810 2
                ) {
811
                    // A number can't be directly followed by a letter
812 2
                    $state = -$state;
813
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
814 270
                    // Just digits, `e` and `E` are valid characters.
815 106
                    break;
816 106
                }
817 104
            } elseif ($state === 5) {
818
                $flags |= Token::FLAG_NUMBER_APPROXIMATE;
819
                if (
820 2
                    $this->str[$this->last] === '+' || $this->str[$this->last] === '-'
821 184
                    || ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
822 2
                ) {
823 2
                    $state = 6;
824 2
                } elseif (
825 2
                    ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'z')
826
                    || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'Z')
827 184
                ) {
828 2
                    // A number can't be directly followed by a letter
829
                    $state = -$state;
830
                } else {
831 824
                    break;
832
                }
833
            } elseif ($state === 6) {
834 1434
                if ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
835 638
                    // Just digits are valid characters.
836
                    break;
837 638
                }
838
            } elseif ($state === 7) {
839
                $flags |= Token::FLAG_NUMBER_BINARY;
840 1434
                if ($this->str[$this->last] !== '\'') {
841
                    break;
842 1434
                }
843
844
                $state = 8;
845
            } elseif ($state === 8) {
846
                if ($this->str[$this->last] === '\'') {
847
                    $state = 9;
848
                } elseif ($this->str[$this->last] !== '0' && $this->str[$this->last] !== '1') {
849
                    break;
850
                }
851
            } elseif ($state === 9) {
852 1418
                break;
853
            }
854 1418
855 1418
            $token .= $this->str[$this->last];
856
        }
857 1418
858 1418
        if ($state === 2 || $state === 3 || ($token !== '.' && $state === 4) || $state === 6 || $state === 9) {
859
            --$this->last;
860
861 696
            return new Token($token, TokenType::Number, $flags);
862
        }
863 696
864
        $this->last = $iBak;
865 696
866
        return null;
867 696
    }
868 696
869
    /**
870
     * Parses a string.
871 30
     *
872
     * @param string $quote additional starting symbol
873 696
     *
874 692
     * @throws LexerException
875
     */
876
    public function parseString(string $quote = ''): Token|null
877 690
    {
878
        $token = $this->str[$this->last];
879
        $flags = Context::isString($token);
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isString() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

879
        $flags = Context::isString(/** @scrutinizer ignore-type */ $token);
Loading history...
880
881 696
        if (! $flags && $token !== $quote) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
882 14
            return null;
883 14
        }
884 14
885 14
        $quote = $token;
886 14
887 14
        while (++$this->last < $this->len) {
888 14
            if (
889 14
                $this->last + 1 < $this->len
890
                && (
891 692
                    ($this->str[$this->last] === $quote && $this->str[$this->last + 1] === $quote)
892
                    || ($this->str[$this->last] === '\\' && $quote !== '`')
893
                )
894 696
            ) {
895
                $token .= $this->str[$this->last] . $this->str[++$this->last];
896
            } else {
897
                if ($this->str[$this->last] === $quote) {
898
                    break;
899
                }
900
901
                $token .= $this->str[$this->last];
902 1418
            }
903
        }
904 1418
905 1418
        if ($this->last >= $this->len || $this->str[$this->last] !== $quote) {
906
            $this->error(
907 1418
                sprintf(
908 1416
                    Translator::gettext('Ending quote %1$s was expected.'),
909
                    $quote
910
                ),
911 468
                '',
912 122
                $this->last
913
            );
914 26
        } else {
915 74
            $token .= $this->str[$this->last];
916
        }
917 378
918 18
        return new Token($token, TokenType::String, $flags);
919 13
    }
920
921
    /**
922 366
     * Parses a symbol.
923
     *
924
     * @throws LexerException
925 468
     */
926
    public function parseSymbol(): Token|null
927 468
    {
928 468
        $token = $this->str[$this->last];
929
        $flags = Context::isSymbol($token);
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSymbol() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

929
        $flags = Context::isSymbol(/** @scrutinizer ignore-type */ $token);
Loading history...
930 468
931 100
        if (! $flags) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $flags of type integer|null is loosely compared to false; this is ambiguous if the integer can be 0. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
932
            return null;
933 100
        }
934 4
935
        if ($flags & Token::FLAG_SYMBOL_VARIABLE) {
936
            if ($this->last + 1 < $this->len && $this->str[++$this->last] === '@') {
937
                // This is a system variable (e.g. `@@hostname`).
938
                $token .= $this->str[$this->last++];
939 468
                $flags |= Token::FLAG_SYMBOL_SYSTEM;
940 458
            }
941
        } elseif ($flags & Token::FLAG_SYMBOL_PARAMETER) {
942
            if ($token !== '?' && $this->last + 1 < $this->len) {
943 468
                ++$this->last;
944
            }
945
        } else {
946
            $token = '';
947
        }
948
949 1092
        $str = null;
950
951 1092
        if ($this->last < $this->len) {
952 1092
            $str = $this->parseString('`');
953 22
954
            if ($str === null) {
955
                $str = $this->parseUnknown();
956 1084
957 1052
                if ($str === null && ! ($flags & Token::FLAG_SYMBOL_PARAMETER)) {
958
                    $this->error('Variable name was expected.', $this->str[$this->last], $this->last);
0 ignored issues
show
Bug introduced by
It seems like $this->str[$this->last] can also be of type null; however, parameter $str of PhpMyAdmin\SqlParser\Lexer::error() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

958
                    $this->error('Variable name was expected.', /** @scrutinizer ignore-type */ $this->str[$this->last], $this->last);
Loading history...
959
                }
960 1052
            }
961 4
        }
962 4
963 4
        if ($str !== null) {
964
            $token .= $str->token;
965
        }
966
967 1084
        return new Token($token, TokenType::Symbol, $flags);
968
    }
969 1084
970
    /**
971
     * Parses unknown parts of the query.
972
     */
973
    public function parseUnknown(): Token|null
974
    {
975 1434
        $token = $this->str[$this->last];
976
        if (Context::isSeparator($token)) {
0 ignored issues
show
Bug introduced by
It seems like $token can also be of type null; however, parameter $string of PhpMyAdmin\SqlParser\Context::isSeparator() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

976
        if (Context::isSeparator(/** @scrutinizer ignore-type */ $token)) {
Loading history...
977 1434
            return null;
978
        }
979 1434
980 1434
        while (++$this->last < $this->len && ! Context::isSeparator($this->str[$this->last])) {
981 1434
            $token .= $this->str[$this->last];
982
983
            // Test if end of token equals the current delimiter. If so, remove it from the token.
984 578
            if (str_ends_with($token, $this->delimiter)) {
985
                $token = substr($token, 0, -$this->delimiterLen);
986
                $this->last -= $this->delimiterLen - 1;
987 578
                break;
988
            }
989 578
        }
990
991
        --$this->last;
992 1434
993
        return new Token($token);
994
    }
995
996
    /**
997
     * Parses the delimiter of the query.
998
     */
999
    public function parseDelimiter(): Token|null
1000
    {
1001
        $idx = 0;
1002
1003
        while ($idx < $this->delimiterLen && $this->last + $idx < $this->len) {
1004
            if ($this->delimiter[$idx] !== $this->str[$this->last + $idx]) {
1005
                return null;
1006
            }
1007
1008
            ++$idx;
1009
        }
1010
1011
        $this->last += $this->delimiterLen - 1;
1012
1013
        return new Token($this->delimiter, TokenType::Delimiter);
1014
    }
1015
1016
    private function parse(): Token|null
1017 1434
    {
1018 1434
        // It is best to put the parsers in order of their complexity
1019 1434
        // (ascending) and their occurrence rate (descending).
1020 1434
        //
1021 1434
        // Conflicts:
1022 1434
        //
1023 1434
        // 1. `parseDelimiter`, `parseUnknown`, `parseKeyword`, `parseNumber`
1024 1434
        // They fight over delimiter. The delimiter may be a keyword, a
1025 1434
        // number or almost any character which makes the delimiter one of
1026 1434
        // the first tokens that must be parsed.
1027 1434
        //
1028
        // 1. `parseNumber` and `parseOperator`
1029
        // They fight over `+` and `-`.
1030
        //
1031
        // 2. `parseComment` and `parseOperator`
1032
        // They fight over `/` (as in ```/*comment*/``` or ```a / b```)
1033
        //
1034
        // 3. `parseBool` and `parseKeyword`
1035
        // They fight over `TRUE` and `FALSE`.
1036
        //
1037
        // 4. `parseKeyword` and `parseUnknown`
1038
        // They fight over words. `parseUnknown` does not know about
1039
        // keywords.
1040
1041
        return $this->parseDelimiter()
1042
            ?? $this->parseWhitespace()
1043
            ?? $this->parseNumber()
1044
            ?? $this->parseComment()
1045
            ?? $this->parseOperator()
0 ignored issues
show
Bug introduced by
Are you sure the usage of $this->parseOperator() targeting PhpMyAdmin\SqlParser\Lexer::parseOperator() seems to always return null.

This check looks for function or method calls that always return null and whose return value is used.

class A
{
    function getObject()
    {
        return null;
    }

}

$a = new A();
if ($a->getObject()) {

The method getObject() can return nothing but null, so it makes no sense to use the return value.

The reason is most likely that a function or method is imcomplete or has been reduced for debug purposes.

Loading history...
1046
            ?? $this->parseBool()
1047
            ?? $this->parseString()
1048
            ?? $this->parseSymbol()
1049
            ?? $this->parseKeyword()
0 ignored issues
show
Bug introduced by
Are you sure the usage of $this->parseKeyword() targeting PhpMyAdmin\SqlParser\Lexer::parseKeyword() seems to always return null.

This check looks for function or method calls that always return null and whose return value is used.

class A
{
    function getObject()
    {
        return null;
    }

}

$a = new A();
if ($a->getObject()) {

The method getObject() can return nothing but null, so it makes no sense to use the return value.

The reason is most likely that a function or method is imcomplete or has been reduced for debug purposes.

Loading history...
1050
            ?? $this->parseLabel()
1051
            ?? $this->parseUnknown();
1052
    }
1053
}
1054