Test Failed
Pull Request — master (#291)
by William
12:25
created

Lexer::error()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 10

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 5
CRAP Score 1

Importance

Changes 0
Metric Value
dl 0
loc 10
ccs 5
cts 5
cp 1
rs 9.9332
c 0
b 0
f 0
cc 1
nc 1
nop 4
crap 1
1
<?php
2
3
/**
4
 * Defines the lexer of the library.
5
 *
6
 * This is one of the most important components, along with the parser.
7
 *
8
 * Depends on context to extract lexemes.
9
 */
10
11
namespace PhpMyAdmin\SqlParser;
12
13
use PhpMyAdmin\SqlParser\Exceptions\LexerException;
14
15 1
if (! defined('USE_UTF_STRINGS')) {
16
    // NOTE: In previous versions of PHP (5.5 and older) the default
17
    // internal encoding is "ISO-8859-1".
18
    // All `mb_` functions must specify the correct encoding, which is
19
    // 'UTF-8' in order to work properly.
20
21
    /*
22
     * Forces usage of `UtfString` if the string is multibyte.
23
     * `UtfString` may be slower, but it gives better results.
24
     *
25
     * @var bool
26
     */
27 1
    define('USE_UTF_STRINGS', true);
28
}
29
30
/**
31
 * Performs lexical analysis over a SQL statement and splits it in multiple
32
 * tokens.
33
 *
34
 * The output of the lexer is affected by the context of the SQL statement.
35
 *
36
 * @category Lexer
37
 *
38
 * @license  https://www.gnu.org/licenses/gpl-2.0.txt GPL-2.0+
39
 *
40
 * @see      Context
41
 */
42
class Lexer extends Core
43
{
44
    /**
45
     * A list of methods that are used in lexing the SQL query.
46
     *
47
     * @var array
48
     */
49
    public static $PARSER_METHODS = array(
50
        // It is best to put the parsers in order of their complexity
51
        // (ascending) and their occurrence rate (descending).
52
        //
53
        // Conflicts:
54
        //
55
        // 1. `parseDelimiter`, `parseUnknown`, `parseKeyword`, `parseNumber`
56
        // They fight over delimiter. The delimiter may be a keyword, a
57
        // number or almost any character which makes the delimiter one of
58
        // the first tokens that must be parsed.
59
        //
60
        // 1. `parseNumber` and `parseOperator`
61
        // They fight over `+` and `-`.
62
        //
63
        // 2. `parseComment` and `parseOperator`
64
        // They fight over `/` (as in ```/*comment*/``` or ```a / b```)
65
        //
66
        // 3. `parseBool` and `parseKeyword`
67
        // They fight over `TRUE` and `FALSE`.
68
        //
69
        // 4. `parseKeyword` and `parseUnknown`
70
        // They fight over words. `parseUnknown` does not know about
71
        // keywords.
72
73
        'parseDelimiter',
74
        'parseWhitespace',
75
        'parseNumber',
76
        'parseComment',
77
        'parseOperator',
78
        'parseBool',
79
        'parseString',
80
        'parseSymbol',
81
        'parseKeyword',
82
        'parseLabel',
83
        'parseUnknown'
84
    );
85
86
    /**
87
     * The string to be parsed.
88
     *
89
     * @var string|UtfString
90
     */
91
    public $str = '';
92
93
    /**
94
     * The length of `$str`.
95
     *
96
     * By storing its length, a lot of time is saved, because parsing methods
97
     * would call `strlen` everytime.
98
     *
99
     * @var int
100
     */
101
    public $len = 0;
102
103
    /**
104
     * The index of the last parsed character.
105
     *
106
     * @var int
107
     */
108
    public $last = 0;
109
110
    /**
111
     * Tokens extracted from given strings.
112
     *
113
     * @var TokensList
114
     */
115
    public $list;
116
117
    /**
118
     * The default delimiter. This is used, by default, in all new instances.
119
     *
120
     * @var string
121
     */
122
    public static $DEFAULT_DELIMITER = ';';
123
124
    /**
125
     * Statements delimiter.
126
     * This may change during lexing.
127
     *
128
     * @var string
129
     */
130
    public $delimiter;
131
132
    /**
133
     * The length of the delimiter.
134
     *
135
     * Because `parseDelimiter` can be called a lot, it would perform a lot of
136
     * calls to `strlen`, which might affect performance when the delimiter is
137
     * big.
138
     *
139
     * @var int
140
     */
141
    public $delimiterLen;
142
143
    /**
144
     * Gets the tokens list parsed by a new instance of a lexer.
145
     *
146
     * @param string|UtfString $str       the query to be lexed
147
     * @param bool             $strict    whether strict mode should be
148
     *                                    enabled or not
149
     * @param string           $delimiter the delimiter to be used
0 ignored issues
show
Documentation introduced by
Should the type for parameter $delimiter not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
150
     *
151
     * @return TokensList
152
     */
153 1
    public static function getTokens($str, $strict = false, $delimiter = null)
154
    {
155 1
        $lexer = new self($str, $strict, $delimiter);
156
157 1
        return $lexer->list;
158
    }
159
160
    /**
161
     * Constructor.
162
     *
163
     * @param string|UtfString $str       the query to be lexed
164
     * @param bool             $strict    whether strict mode should be
165
     *                                    enabled or not
166
     * @param string           $delimiter the delimiter to be used
0 ignored issues
show
Documentation introduced by
Should the type for parameter $delimiter not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
167
     */
168 473
    public function __construct($str, $strict = false, $delimiter = null)
169
    {
170
        // `strlen` is used instead of `mb_strlen` because the lexer needs to
171
        // parse each byte of the input.
172 473
        $len = $str instanceof UtfString ? $str->length() : strlen($str);
173
174
        // For multi-byte strings, a new instance of `UtfString` is
175
        // initialized (only if `UtfString` usage is forced.
176 473
        if (! $str instanceof UtfString && USE_UTF_STRINGS && $len !== mb_strlen($str, 'UTF-8')) {
177 1
            $str = new UtfString($str);
178
        }
179
180 473
        $this->str = $str;
181 473
        $this->len = $str instanceof UtfString ? $str->length() : $len;
182
183 473
        $this->strict = $strict;
184
185
        // Setting the delimiter.
186 473
        $this->setDelimiter(
187 473
            ! empty($delimiter) ? $delimiter : static::$DEFAULT_DELIMITER
188
        );
189
190 473
        $this->lex();
191 473
    }
192
193
    /**
194
     * Sets the delimiter.
195
     *
196
     * @param string $delimiter the new delimiter
197
     */
198 473
    public function setDelimiter($delimiter)
199
    {
200 473
        $this->delimiter = $delimiter;
201 473
        $this->delimiterLen = strlen($delimiter);
202 473
    }
203
204
    /**
205
     * Parses the string and extracts lexemes.
206
     */
207 473
    public function lex()
208
    {
209
        // TODO: Sometimes, static::parse* functions make unnecessary calls to
210
        // is* functions. For a better performance, some rules can be deduced
211
        // from context.
212
        // For example, in `parseBool` there is no need to compare the token
213
        // every time with `true` and `false`. The first step would be to
214
        // compare with 'true' only and just after that add another letter from
215
        // context and compare again with `false`.
216
        // Another example is `parseComment`.
217
218 473
        $list = new TokensList();
219
220
        /**
221
         * Last processed token.
222
         *
223
         * @var Token
224
         */
225 473
        $lastToken = null;
226
227 473
        for ($this->last = 0, $lastIdx = 0; $this->last < $this->len; $lastIdx = ++$this->last) {
228
            /**
229
             * The new token.
230
             *
231
             * @var Token
232
             */
233 467
            $token = null;
234
235 467
            foreach (static::$PARSER_METHODS as $method) {
236 467
                if ($token = $this->$method()) {
237 467
                    break;
238
                }
239
            }
240
241 467
            if ($token === null) {
242
                // @assert($this->last === $lastIdx);
243 2
                $token = new Token($this->str[$this->last]);
244 2
                $this->error(
245 2
                    'Unexpected character.',
246 2
                    $this->str[$this->last],
247 2
                    $this->last
248
                );
249 467
            } elseif ($lastToken !== null
250 467
                && $token->type === Token::TYPE_SYMBOL
251 467
                && $token->flags & Token::FLAG_SYMBOL_VARIABLE
252
                && (
253 31
                    $lastToken->type === Token::TYPE_STRING
254
                    || (
255 27
                        $lastToken->type === Token::TYPE_SYMBOL
256 467
                        && $lastToken->flags & Token::FLAG_SYMBOL_BACKTICK
257
                    )
258
                )
259
            ) {
260
                // Handles ```... FROM 'user'@'%' ...```.
261 7
                $lastToken->token .= $token->token;
262 7
                $lastToken->type = Token::TYPE_SYMBOL;
263 7
                $lastToken->flags = Token::FLAG_SYMBOL_USER;
264 7
                $lastToken->value .= '@' . $token->value;
265 7
                continue;
266 467
            } elseif ($lastToken !== null
267 467
                && $token->type === Token::TYPE_KEYWORD
268 467
                && $lastToken->type === Token::TYPE_OPERATOR
269 467
                && $lastToken->value === '.'
270
            ) {
271
                // Handles ```... tbl.FROM ...```. In this case, FROM is not
272
                // a reserved word.
273 4
                $token->type = Token::TYPE_NONE;
274 4
                $token->flags = 0;
275 4
                $token->value = $token->token;
276
            }
277
278 467
            $token->position = $lastIdx;
279
280 467
            $list->tokens[$list->count++] = $token;
281
282
            // Handling delimiters.
283 467
            if ($token->type === Token::TYPE_NONE && $token->value === 'DELIMITER') {
284 8 View Code Duplication
                if ($this->last + 1 >= $this->len) {
285 1
                    $this->error(
286 1
                        'Expected whitespace(s) before delimiter.',
287 1
                        '',
288 1
                        $this->last + 1
289
                    );
290 1
                    continue;
291
                }
292
293
                // Skipping last R (from `delimiteR`) and whitespaces between
294
                // the keyword `DELIMITER` and the actual delimiter.
295 7
                $pos = ++$this->last;
296 7
                if (($token = $this->parseWhitespace()) !== null) {
297 5
                    $token->position = $pos;
298 5
                    $list->tokens[$list->count++] = $token;
299
                }
300
301
                // Preparing the token that holds the new delimiter.
302 7 View Code Duplication
                if ($this->last + 1 >= $this->len) {
303 1
                    $this->error(
304 1
                        'Expected delimiter.',
305 1
                        '',
306 1
                        $this->last + 1
307
                    );
308 1
                    continue;
309
                }
310 6
                $pos = $this->last + 1;
311
312
                // Parsing the delimiter.
313 6
                $this->delimiter = null;
314 6
                $delimiterLen = 0;
315 6
                while (++$this->last < $this->len && ! Context::isWhitespace($this->str[$this->last]) && $delimiterLen < 15) {
316 5
                    $this->delimiter .= $this->str[$this->last];
317 5
                    ++$delimiterLen;
318
                }
319
320 6
                if (empty($this->delimiter)) {
321 1
                    $this->error(
322 1
                        'Expected delimiter.',
323 1
                        '',
324 1
                        $this->last
325
                    );
326 1
                    $this->delimiter = ';';
327
                }
328
329 6
                --$this->last;
330
331
                // Saving the delimiter and its token.
332 6
                $this->delimiterLen = strlen($this->delimiter);
333 6
                $token = new Token($this->delimiter, Token::TYPE_DELIMITER);
334 6
                $token->position = $pos;
335 6
                $list->tokens[$list->count++] = $token;
336
            }
337
338 465
            $lastToken = $token;
339
        }
340
341
        // Adding a final delimiter to mark the ending.
342 473
        $list->tokens[$list->count++] = new Token(null, Token::TYPE_DELIMITER);
343
344
        // Saving the tokens list.
345 473
        $this->list = $list;
346
347 473
        $this->solveAmbiguityOnStarOperator();
348 473
    }
349
350
    /**
351
     * Resolves the ambiguity when dealing with the "*" operator.
352
     *
353
     * In SQL statements, the "*" operator can be an arithmetic operator (like in 2*3) or an SQL wildcard (like in
354
     * SELECT a.* FROM ...). To solve this ambiguity, the solution is to find the next token, excluding whitespaces and
355
     * comments, right after the "*" position. The "*" is for sure an SQL wildcard if the next token found is any of:
356
     * - "FROM" (the FROM keyword like in "SELECT * FROM...");
357
     * - "USING" (the USING keyword like in "DELETE table_name.* USING...");
358
     * - "," (a comma separator like in "SELECT *, field FROM...");
359
     * - ")" (a closing parenthesis like in "COUNT(*)").
360
     * This methods will change the flag of the "*" tokens when any of those condition above is true. Otherwise, the
361
     * default flag (arithmetic) will be kept.
362
     *
363
     * @return void
364
     */
365 473
    private function solveAmbiguityOnStarOperator()
366
    {
367 473
        $iBak = $this->list->idx;
368 473
        while (null !== ($starToken = $this->list->getNextOfTypeAndValue(Token::TYPE_OPERATOR, '*'))) {
369
            // ::getNext already gets rid of whitespaces and comments.
370 76
            if (($next = $this->list->getNext()) !== null) {
371 76
                if (($next->type === Token::TYPE_KEYWORD && in_array($next->value, array('FROM', 'USING'), true))
372 76
                    || ($next->type === Token::TYPE_OPERATOR && in_array($next->value, array(',', ')'), true))
373
                ) {
374 70
                    $starToken->flags = Token::FLAG_OPERATOR_SQL;
375
                }
376
            }
377
        }
378 473
        $this->list->idx = $iBak;
379 473
    }
380
381
    /**
382
     * Creates a new error log.
383
     *
384
     * @param string $msg  the error message
385
     * @param string $str  the character that produced the error
386
     * @param int    $pos  the position of the character
387
     * @param int    $code the code of the error
388
     *
389
     * @throws LexerException throws the exception, if strict mode is enabled
390
     */
391 15
    public function error($msg, $str = '', $pos = 0, $code = 0)
392
    {
393 15
        $error = new LexerException(
394 15
            Translator::gettext($msg),
395
            $str,
396
            $pos,
397
            $code
398
        );
399 15
        parent::error($error);
400 14
    }
401
402
    /**
403
     * Parses a keyword.
404
     *
405
     * @return null|Token
406
     */
407 458
    public function parseKeyword()
408
    {
409 458
        $token = '';
410
411
        /**
412
         * Value to be returned.
413
         *
414
         * @var Token
415
         */
416 458
        $ret = null;
417
418
        /**
419
         * The value of `$this->last` where `$token` ends in `$this->str`.
420
         *
421
         * @var int
422
         */
423 458
        $iEnd = $this->last;
424
425
        /**
426
         * Whether last parsed character is a whitespace.
427
         *
428
         * @var bool
429
         */
430 458
        $lastSpace = false;
431
432 458
        for ($j = 1; $j < Context::KEYWORD_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
433
            // Composed keywords shouldn't have more than one whitespace between
434
            // keywords.
435 458
            if (Context::isWhitespace($this->str[$this->last])) {
436 447
                if ($lastSpace) {
437 69
                    --$j; // The size of the keyword didn't increase.
438 69
                    continue;
439
                }
440 447
                $lastSpace = true;
441
            } else {
442 458
                $lastSpace = false;
443
            }
444
445 458
            $token .= $this->str[$this->last];
446 458
            if (($this->last + 1 === $this->len || Context::isSeparator($this->str[$this->last + 1]))
447 458
                && $flags = Context::isKeyword($token)
448
            ) {
449 444
                $ret = new Token($token, Token::TYPE_KEYWORD, $flags);
450 444
                $iEnd = $this->last;
451
452
                // We don't break so we find longest keyword.
453
                // For example, `OR` and `ORDER` have a common prefix `OR`.
454
                // If we stopped at `OR`, the parsing would be invalid.
455
            }
456
        }
457
458 458
        $this->last = $iEnd;
459
460 458
        return $ret;
461
    }
462
463
    /**
464
     * Parses a label.
465
     *
466
     * @return null|Token
467
     */
468 340
    public function parseLabel()
469
    {
470 340
        $token = '';
471
472
        /**
473
         * Value to be returned.
474
         *
475
         * @var Token
476
         */
477 340
        $ret = null;
478
479
        /**
480
         * The value of `$this->last` where `$token` ends in `$this->str`.
481
         *
482
         * @var int
483
         */
484 340
        $iEnd = $this->last;
485 340
        for ($j = 1; $j < Context::LABEL_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
486 340
            if ($this->str[$this->last] === ':' && $j > 1) {
487
                // End of label
488 2
                $token .= $this->str[$this->last];
489 2
                $ret = new Token($token, Token::TYPE_LABEL);
490 2
                $iEnd = $this->last;
491 2
                break;
492 340
            } elseif (Context::isWhitespace($this->str[$this->last]) && $j > 1) {
493
                // Whitespace between label and :
494
                // The size of the keyword didn't increase.
495 267
                --$j;
496 340
            } elseif (Context::isSeparator($this->str[$this->last])) {
497
                // Any other separator
498 259
                break;
499
            }
500 339
            $token .= $this->str[$this->last];
501
        }
502
503 340
        $this->last = $iEnd;
504
505 340
        return $ret;
506
    }
507
508
    /**
509
     * Parses an operator.
510
     *
511
     * @return null|Token
512
     */
513 467
    public function parseOperator()
514
    {
515 467
        $token = '';
516
517
        /**
518
         * Value to be returned.
519
         *
520
         * @var Token
521
         */
522 467
        $ret = null;
523
524
        /**
525
         * The value of `$this->last` where `$token` ends in `$this->str`.
526
         *
527
         * @var int
528
         */
529 467
        $iEnd = $this->last;
530
531 467
        for ($j = 1; $j < Context::OPERATOR_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
532 467
            $token .= $this->str[$this->last];
533 467
            if ($flags = Context::isOperator($token)) {
534 329
                $ret = new Token($token, Token::TYPE_OPERATOR, $flags);
535 329
                $iEnd = $this->last;
536
            }
537
        }
538
539 467
        $this->last = $iEnd;
540
541 467
        return $ret;
542
    }
543
544
    /**
545
     * Parses a whitespace.
546
     *
547
     * @return null|Token
548
     */
549 467 View Code Duplication
    public function parseWhitespace()
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
550
    {
551 467
        $token = $this->str[$this->last];
552
553 467
        if (! Context::isWhitespace($token)) {
554 467
            return null;
555
        }
556
557 455
        while (++$this->last < $this->len && Context::isWhitespace($this->str[$this->last])) {
558 69
            $token .= $this->str[$this->last];
559
        }
560
561 455
        --$this->last;
562
563 455
        return new Token($token, Token::TYPE_WHITESPACE);
564
    }
565
566
    /**
567
     * Parses a comment.
568
     *
569
     * @return null|Token
570
     */
571 467
    public function parseComment()
572
    {
573 467
        $iBak = $this->last;
574 467
        $token = $this->str[$this->last];
575
576
        // Bash style comments. (#comment\n)
577 467
        if (Context::isComment($token)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression \PhpMyAdmin\SqlParser\Context::isComment($token) of type integer|null is loosely compared to true; this is ambiguous if the integer can be zero. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
578 3 View Code Duplication
            while (++$this->last < $this->len
579 3
                && $this->str[$this->last] !== "\n"
580
            ) {
581 3
                $token .= $this->str[$this->last];
582
            }
583
            // Include trailing \n as whitespace token
584 3
            if ($this->last < $this->len) {
585 3
                --$this->last;
586
            }
587
588 3
            return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_BASH);
589
        }
590
591
        // C style comments. (/*comment*\/)
592 467
        if (++$this->last < $this->len) {
593 465
            $token .= $this->str[$this->last];
594 465
            if (Context::isComment($token)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression \PhpMyAdmin\SqlParser\Context::isComment($token) of type integer|null is loosely compared to true; this is ambiguous if the integer can be zero. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
595
                // There might be a conflict with "*" operator here, when string is "*/*".
596
                // This can occurs in the following statements:
597
                // - "SELECT */* comment */ FROM ..."
598
                // - "SELECT 2*/* comment */3 AS `six`;"
599 31
                $next = $this->last+1;
600 31
                if (($next < $this->len) && $this->str[$next] === '*') {
601
                    // Conflict in "*/*": first "*" was not for ending a comment.
602
                    // Stop here and let other parsing method define the true behavior of that first star.
603 1
                    $this->last = $iBak;
604
605 1
                    return null;
606
                }
607
608 31
                $flags = Token::FLAG_COMMENT_C;
609
610
                // This comment already ended. It may be a part of a
611
                // previous MySQL specific command.
612 31
                if ($token === '*/') {
613 2
                    return new Token($token, Token::TYPE_COMMENT, $flags);
614
                }
615
616
                // Checking if this is a MySQL-specific command.
617 31
                if ($this->last + 1 < $this->len
618 31
                    && $this->str[$this->last + 1] === '!'
619
                ) {
620 2
                    $flags |= Token::FLAG_COMMENT_MYSQL_CMD;
621 2
                    $token .= $this->str[++$this->last];
622
623 2
                    while (++$this->last < $this->len
624 2
                        && $this->str[$this->last] >= '0'
625 2
                        && $this->str[$this->last] <= '9'
626
                    ) {
627 1
                        $token .= $this->str[$this->last];
628
                    }
629 2
                    --$this->last;
630
631
                    // We split this comment and parse only its beginning
632
                    // here.
633 2
                    return new Token($token, Token::TYPE_COMMENT, $flags);
634
                }
635
636
                // Parsing the comment.
637 31
                while (++$this->last < $this->len
638
                    && (
639 31
                        $this->str[$this->last - 1] !== '*'
640 31
                        || $this->str[$this->last] !== '/'
641
                    )
642
                ) {
643 31
                    $token .= $this->str[$this->last];
644
                }
645
646
                // Adding the ending.
647 31
                if ($this->last < $this->len) {
648 31
                    $token .= $this->str[$this->last];
649
                }
650
651 31
                return new Token($token, Token::TYPE_COMMENT, $flags);
652
            }
653
        }
654
655
        // SQL style comments. (-- comment\n)
656 467
        if (++$this->last < $this->len) {
657 464
            $token .= $this->str[$this->last];
658 464
            $end = false;
659
        } else {
660 146
            --$this->last;
661 146
            $end = true;
662
        }
663 467
        if (Context::isComment($token, $end)) {
0 ignored issues
show
Bug Best Practice introduced by
The expression \PhpMyAdmin\SqlParser\Co...isComment($token, $end) of type integer|null is loosely compared to true; this is ambiguous if the integer can be zero. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
664
            // Checking if this comment did not end already (```--\n```).
665 18
            if ($this->str[$this->last] !== "\n") {
666 18 View Code Duplication
                while (++$this->last < $this->len
667 18
                    && $this->str[$this->last] !== "\n"
668
                ) {
669 18
                    $token .= $this->str[$this->last];
670
                }
671
            }
672
            // Include trailing \n as whitespace token
673 18
            if ($this->last < $this->len) {
674 16
                --$this->last;
675
            }
676
677 18
            return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_SQL);
678
        }
679
680 467
        $this->last = $iBak;
681
682 467
        return null;
683
    }
684
685
    /**
686
     * Parses a boolean.
687
     *
688
     * @return null|Token
689
     */
690 459
    public function parseBool()
691
    {
692 459
        if ($this->last + 3 >= $this->len) {
693
            // At least `min(strlen('TRUE'), strlen('FALSE'))` characters are
694
            // required.
695 116
            return null;
696
        }
697
698 459
        $iBak = $this->last;
699 459
        $token = $this->str[$this->last] . $this->str[++$this->last]
700 459
        . $this->str[++$this->last] . $this->str[++$this->last]; // _TRUE_ or _FALS_e
701
702 459
        if (Context::isBool($token)) {
703 1
            return new Token($token, Token::TYPE_BOOL);
704 459
        } elseif (++$this->last < $this->len) {
705 458
            $token .= $this->str[$this->last]; // fals_E_
706 458
            if (Context::isBool($token)) {
707 1
                return new Token($token, Token::TYPE_BOOL, 1);
708
            }
709
        }
710
711 459
        $this->last = $iBak;
712
713 459
        return null;
714
    }
715
716
    /**
717
     * Parses a number.
718
     *
719
     * @return null|Token
720
     */
721 467
    public function parseNumber()
722
    {
723
        // A rudimentary state machine is being used to parse numbers due to
724
        // the various forms of their notation.
725
        //
726
        // Below are the states of the machines and the conditions to change
727
        // the state.
728
        //
729
        //      1 --------------------[ + or - ]-------------------> 1
730
        //      1 -------------------[ 0x or 0X ]------------------> 2
731
        //      1 --------------------[ 0 to 9 ]-------------------> 3
732
        //      1 -----------------------[ . ]---------------------> 4
733
        //      1 -----------------------[ b ]---------------------> 7
734
        //
735
        //      2 --------------------[ 0 to F ]-------------------> 2
736
        //
737
        //      3 --------------------[ 0 to 9 ]-------------------> 3
738
        //      3 -----------------------[ . ]---------------------> 4
739
        //      3 --------------------[ e or E ]-------------------> 5
740
        //
741
        //      4 --------------------[ 0 to 9 ]-------------------> 4
742
        //      4 --------------------[ e or E ]-------------------> 5
743
        //
744
        //      5 ---------------[ + or - or 0 to 9 ]--------------> 6
745
        //
746
        //      7 -----------------------[ ' ]---------------------> 8
747
        //
748
        //      8 --------------------[ 0 or 1 ]-------------------> 8
749
        //      8 -----------------------[ ' ]---------------------> 9
750
        //
751
        // State 1 may be reached by negative numbers.
752
        // State 2 is reached only by hex numbers.
753
        // State 4 is reached only by float numbers.
754
        // State 5 is reached only by numbers in approximate form.
755
        // State 7 is reached only by numbers in bit representation.
756
        //
757
        // Valid final states are: 2, 3, 4 and 6. Any parsing that finished in a
758
        // state other than these is invalid.
759 467
        $iBak = $this->last;
760 467
        $token = '';
761 467
        $flags = 0;
762 467
        $state = 1;
763 467
        for (; $this->last < $this->len; ++$this->last) {
764 467
            if ($state === 1) {
765 467
                if ($this->str[$this->last] === '-') {
766 18
                    $flags |= Token::FLAG_NUMBER_NEGATIVE;
767 467
                } elseif ($this->last + 1 < $this->len
768 467
                    && $this->str[$this->last] === '0'
769
                    && (
770 29
                        $this->str[$this->last + 1] === 'x'
771 467
                        || $this->str[$this->last + 1] === 'X'
772
                    )
773
                ) {
774 2
                    $token .= $this->str[$this->last++];
775 2
                    $state = 2;
776 467
                } elseif ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9') {
777 220
                    $state = 3;
778 467
                } elseif ($this->str[$this->last] === '.') {
779 75
                    $state = 4;
780 467
                } elseif ($this->str[$this->last] === 'b') {
781 41
                    $state = 7;
782 467
                } elseif ($this->str[$this->last] !== '+') {
783
                    // `+` is a valid character in a number.
784 467
                    break;
785
                }
786 245
            } elseif ($state === 2) {
787 2
                $flags |= Token::FLAG_NUMBER_HEX;
788
                if (! (
789 2
                        ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
790 2
                        || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'F')
791 2
                        || ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'f')
792
                    )
793
                ) {
794 2
                    break;
795
                }
796 245
            } elseif ($state === 3) {
797 194
                if ($this->str[$this->last] === '.') {
798 5
                    $state = 4;
799 194
                } elseif ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
800 1
                    $state = 5;
801 194
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
802
                    // Just digits and `.`, `e` and `E` are valid characters.
803 194
                    break;
804
                }
805 108
            } elseif ($state === 4) {
806 79
                $flags |= Token::FLAG_NUMBER_FLOAT;
807 79 View Code Duplication
                if ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
808 7
                    $state = 5;
809 79
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
810
                    // Just digits, `e` and `E` are valid characters.
811 79
                    break;
812
                }
813 46
            } elseif ($state === 5) {
814 7
                $flags |= Token::FLAG_NUMBER_APPROXIMATE;
815 7 View Code Duplication
                if ($this->str[$this->last] === '+' || $this->str[$this->last] === '-'
816 7
                    || ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
817
                ) {
818 1
                    $state = 6;
819
                } else {
820 7
                    break;
821
                }
822 40
            } elseif ($state === 6) {
823 1
                if ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
824
                    // Just digits are valid characters.
825 1
                    break;
826
                }
827 40
            } elseif ($state === 7) {
828 40
                $flags |= Token::FLAG_NUMBER_BINARY;
829 40
                if ($this->str[$this->last] === '\'') {
830 1
                    $state = 8;
831
                } else {
832 40
                    break;
833
                }
834 1
            } elseif ($state === 8) {
835 1
                if ($this->str[$this->last] === '\'') {
836 1
                    $state = 9;
837 1
                } elseif ($this->str[$this->last] !== '0'
838 1
                    && $this->str[$this->last] !== '1'
839
                ) {
840 1
                    break;
841
                }
842 1
            } elseif ($state === 9) {
843 1
                break;
844
            }
845 275
            $token .= $this->str[$this->last];
846
        }
847 467
        if ($state === 2 || $state === 3
848 467
            || ($token !== '.' && $state === 4)
849 467
            || $state === 6 || $state === 9
850
        ) {
851 220
            --$this->last;
852
853 220
            return new Token($token, Token::TYPE_NUMBER, $flags);
854
        }
855 467
        $this->last = $iBak;
856
857 467
        return null;
858
    }
859
860
    /**
861
     * Parses a string.
862
     *
863
     * @param string $quote additional starting symbol
864
     *
865
     * @return null|Token
866
     * @throws LexerException
867
     */
868 459
    public function parseString($quote = '')
869
    {
870 459
        $token = $this->str[$this->last];
871 459
        if (! ($flags = Context::isString($token)) && $token !== $quote) {
872 459
            return null;
873
        }
874 221
        $quote = $token;
875
876 221
        while (++$this->last < $this->len) {
877 221
            if ($this->last + 1 < $this->len
878
                && (
879 221
                    ($this->str[$this->last] === $quote && $this->str[$this->last + 1] === $quote)
880 221
                    || ($this->str[$this->last] === '\\' && $quote !== '`')
881
                )
882
            ) {
883 10
                $token .= $this->str[$this->last] . $this->str[++$this->last];
884
            } else {
885 221
                if ($this->str[$this->last] === $quote) {
886 220
                    break;
887
                }
888 219
                $token .= $this->str[$this->last];
889
            }
890
        }
891
892 221
        if ($this->last >= $this->len || $this->str[$this->last] !== $quote) {
893 5
            $this->error(
894 5
                sprintf(
895 5
                    Translator::gettext('Ending quote %1$s was expected.'),
896
                    $quote
897
                ),
898 5
                '',
899 5
                $this->last
900
            );
901
        } else {
902 220
            $token .= $this->str[$this->last];
903
        }
904
905 221
        return new Token($token, Token::TYPE_STRING, $flags);
906
    }
907
908
    /**
909
     * Parses a symbol.
910
     *
911
     * @return null|Token
912
     * @throws LexerException
913
     */
914 459
    public function parseSymbol()
915
    {
916 459
        $token = $this->str[$this->last];
917 459
        if (! ($flags = Context::isSymbol($token))) {
918 458
            return null;
919
        }
920
921 135
        if ($flags & Token::FLAG_SYMBOL_VARIABLE) {
922 31
            if ($this->last + 1 < $this->len && $this->str[++$this->last] === '@') {
923
                // This is a system variable (e.g. `@@hostname`).
924 2
                $token .= $this->str[$this->last++];
925 31
                $flags |= Token::FLAG_SYMBOL_SYSTEM;
926
            }
927 111
        } elseif ($flags & Token::FLAG_SYMBOL_PARAMETER) {
928 3
            if ($token !== '?' && $this->last + 1 < $this->len) {
929 3
                ++$this->last;
930
            }
931
        } else {
932 109
            $token = '';
933
        }
934
935 135
        $str = null;
936
937 135
        if ($this->last < $this->len) {
938 135
            if (($str = $this->parseString(Context::getIdentifierQuote())) === null) {
939 27
                if (($str = $this->parseUnknown()) === null) {
940 3
                    $this->error(
941 3
                        'Variable name was expected.',
942 3
                        $this->str[$this->last],
943 3
                        $this->last
944
                    );
945
                }
946
            }
947
        }
948
949 135
        if ($str !== null) {
950 133
            $token .= $str->token;
951
        }
952
953 135
        return new Token($token, Token::TYPE_SYMBOL, $flags);
954
    }
955
956
    /**
957
     * Parses unknown parts of the query.
958
     *
959
     * @return null|Token
960
     */
961 346 View Code Duplication
    public function parseUnknown()
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
962
    {
963 346
        $token = $this->str[$this->last];
964 346
        if (Context::isSeparator($token)) {
965 5
            return null;
966
        }
967
968 345
        while (++$this->last < $this->len && ! Context::isSeparator($this->str[$this->last])) {
969 335
            $token .= $this->str[$this->last];
970
        }
971 345
        --$this->last;
972
973 345
        return new Token($token);
974
    }
975
976
    /**
977
     * Parses the delimiter of the query.
978
     *
979
     * @return null|Token
980
     */
981 467
    public function parseDelimiter()
982
    {
983 467
        $idx = 0;
984
985 467
        while ($idx < $this->delimiterLen && $this->last + $idx < $this->len) {
986 467
            if ($this->delimiter[$idx] !== $this->str[$this->last + $idx]) {
987 467
                return null;
988
            }
989 160
            ++$idx;
990
        }
991
992 160
        $this->last += $this->delimiterLen - 1;
993
994 160
        return new Token($this->delimiter, Token::TYPE_DELIMITER);
995
    }
996
}
997