Completed
Push — master ( 65f66e...428edc )
by Michal
04:14
created

Lexer::lex()   D

Complexity

Conditions 23
Paths 76

Size

Total Lines 138
Code Lines 71

Duplication

Lines 16
Ratio 11.59 %

Code Coverage

Tests 86
CRAP Score 23

Importance

Changes 0
Metric Value
cc 23
eloc 71
nc 76
nop 0
dl 16
loc 138
ccs 86
cts 86
cp 1
crap 23
rs 4.6303
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * Defines the lexer of the library.
5
 *
6
 * This is one of the most important components, along with the parser.
7
 *
8
 * Depends on context to extract lexemes.
9
 */
10
11
namespace PhpMyAdmin\SqlParser;
12
13 2
require_once 'common.php';
14
15
use PhpMyAdmin\SqlParser\Exceptions\LexerException;
16
17 2
if (!defined('USE_UTF_STRINGS')) {
18
    // NOTE: In previous versions of PHP (5.5 and older) the default
19
    // internal encoding is "ISO-8859-1".
20
    // All `mb_` functions must specify the correct encoding, which is
21
    // 'UTF-8' in order to work properly.
22
23
    /*
24
     * Forces usage of `UtfString` if the string is multibyte.
25
     * `UtfString` may be slower, but it gives better results.
26
     *
27
     * @var bool
28
     */
29 2
    define('USE_UTF_STRINGS', true);
30 2
}
31
32
/**
33
 * Performs lexical analysis over a SQL statement and splits it in multiple
34
 * tokens.
35
 *
36
 * The output of the lexer is affected by the context of the SQL statement.
37
 *
38
 * @category Lexer
39
 *
40
 * @license  https://www.gnu.org/licenses/gpl-2.0.txt GPL-2.0+
41
 *
42
 * @see      Context
43
 */
44
class Lexer
45
{
46
    /**
47
     * A list of methods that are used in lexing the SQL query.
48
     *
49
     * @var array
50
     */
51
    public static $PARSER_METHODS = array(
52
        // It is best to put the parsers in order of their complexity
53
        // (ascending) and their occurrence rate (descending).
54
        //
55
        // Conflicts:
56
        //
57
        // 1. `parseDelimiter`, `parseUnknown`, `parseKeyword`, `parseNumber`
58
        // They fight over delimiter. The delimiter may be a keyword, a
59
        // number or almost any character which makes the delimiter one of
60
        // the first tokens that must be parsed.
61
        //
62
        // 1. `parseNumber` and `parseOperator`
63
        // They fight over `+` and `-`.
64
        //
65
        // 2. `parseComment` and `parseOperator`
66
        // They fight over `/` (as in ```/*comment*/``` or ```a / b```)
67
        //
68
        // 3. `parseBool` and `parseKeyword`
69
        // They fight over `TRUE` and `FALSE`.
70
        //
71
        // 4. `parseKeyword` and `parseUnknown`
72
        // They fight over words. `parseUnknown` does not know about
73
        // keywords.
74
75
        'parseDelimiter', 'parseWhitespace', 'parseNumber',
76
        'parseComment', 'parseOperator', 'parseBool', 'parseString',
77
        'parseSymbol', 'parseKeyword', 'parseLabel', 'parseUnknown',
78
    );
79
80
    /**
81
     * Whether errors should throw exceptions or just be stored.
82
     *
83
     * @var bool
84
     *
85
     * @see static::$errors
86
     */
87
    public $strict = false;
88
89
    /**
90
     * The string to be parsed.
91
     *
92
     * @var string|UtfString
93
     */
94
    public $str = '';
95
96
    /**
97
     * The length of `$str`.
98
     *
99
     * By storing its length, a lot of time is saved, because parsing methods
100
     * would call `strlen` everytime.
101
     *
102
     * @var int
103
     */
104
    public $len = 0;
105
106
    /**
107
     * The index of the last parsed character.
108
     *
109
     * @var int
110
     */
111
    public $last = 0;
112
113
    /**
114
     * Tokens extracted from given strings.
115
     *
116
     * @var TokensList
117
     */
118
    public $list;
119
120
    /**
121
     * The default delimiter. This is used, by default, in all new instances.
122
     *
123
     * @var string
124
     */
125
    public static $DEFAULT_DELIMITER = ';';
126
127
    /**
128
     * Statements delimiter.
129
     * This may change during lexing.
130
     *
131
     * @var string
132
     */
133
    public $delimiter;
134
135
    /**
136
     * The length of the delimiter.
137
     *
138
     * Because `parseDelimiter` can be called a lot, it would perform a lot of
139
     * calls to `strlen`, which might affect performance when the delimiter is
140
     * big.
141
     *
142
     * @var int
143
     */
144
    public $delimiterLen;
145
146
    /**
147
     * List of errors that occurred during lexing.
148
     *
149
     * Usually, the lexing does not stop once an error occurred because that
150
     * error might be false positive or a partial result (even a bad one)
151
     * might be needed.
152
     *
153
     * @var LexerException[]
154
     *
155
     * @see Lexer::error()
156
     */
157
    public $errors = array();
158
159
    /**
160
     * Gets the tokens list parsed by a new instance of a lexer.
161
     *
162
     * @param string|UtfString $str       the query to be lexed
163
     * @param bool             $strict    whether strict mode should be
164
     *                                    enabled or not
165
     * @param string           $delimiter the delimiter to be used
0 ignored issues
show
Documentation introduced by
Should the type for parameter $delimiter not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
166
     *
167
     * @return TokensList
168
     */
169 1
    public static function getTokens($str, $strict = false, $delimiter = null)
170
    {
171 1
        $lexer = new self($str, $strict, $delimiter);
172
173 1
        return $lexer->list;
174
    }
175
176
    /**
177
     * Constructor.
178
     *
179
     * @param string|UtfString $str       the query to be lexed
180
     * @param bool             $strict    whether strict mode should be
181
     *                                    enabled or not
182
     * @param string           $delimiter the delimiter to be used
0 ignored issues
show
Documentation introduced by
Should the type for parameter $delimiter not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
183
     */
184 352
    public function __construct($str, $strict = false, $delimiter = null)
185
    {
186
        // `strlen` is used instead of `mb_strlen` because the lexer needs to
187
        // parse each byte of the input.
188 352
        $len = $str instanceof UtfString ? $str->length() : strlen($str);
189
190
        // For multi-byte strings, a new instance of `UtfString` is
191
        // initialized (only if `UtfString` usage is forced.
192 352
        if (!$str instanceof UtfString && USE_UTF_STRINGS && $len !== mb_strlen($str, 'UTF-8')) {
193 1
            $str = new UtfString($str);
194 1
        }
195
196 352
        $this->str = $str;
197 352
        $this->len = $str instanceof UtfString ? $str->length() : $len;
198
199 352
        $this->strict = $strict;
200
201
        // Setting the delimiter.
202 352
        $this->setDelimiter(
203 352
            !empty($delimiter) ? $delimiter : static::$DEFAULT_DELIMITER
204 352
        );
205
206 352
        $this->lex();
207 352
    }
208
209
    /**
210
     * Sets the delimiter.
211
     *
212
     * @param string $delimiter the new delimiter
213
     */
214 352
    public function setDelimiter($delimiter)
215
    {
216 352
        $this->delimiter = $delimiter;
217 352
        $this->delimiterLen = strlen($delimiter);
218 352
    }
219
220
    /**
221
     * Parses the string and extracts lexemes.
222
     */
223 352
    public function lex()
224
    {
225
        // TODO: Sometimes, static::parse* functions make unnecessary calls to
0 ignored issues
show
Coding Style Best Practice introduced by
Comments for TODO tasks are often forgotten in the code; it might be better to use a dedicated issue tracker.
Loading history...
226
        // is* functions. For a better performance, some rules can be deduced
227
        // from context.
228
        // For example, in `parseBool` there is no need to compare the token
229
        // every time with `true` and `false`. The first step would be to
230
        // compare with 'true' only and just after that add another letter from
231
        // context and compare again with `false`.
232
        // Another example is `parseComment`.
233
234 352
        $list = new TokensList();
235
236
        /**
237
         * Last processed token.
238
         *
239
         * @var Token
240
         */
241 352
        $lastToken = null;
242
243 352
        for ($this->last = 0, $lastIdx = 0; $this->last < $this->len; $lastIdx = ++$this->last) {
244
            /**
245
             * The new token.
246
             *
247
             * @var Token
248
             */
249 347
            $token = null;
250
251 347
            foreach (static::$PARSER_METHODS as $method) {
252 347
                if ($token = $this->$method()) {
253 347
                    break;
254
                }
255 347
            }
256
257 347
            if ($token === null) {
258
                // @assert($this->last === $lastIdx);
259 2
                $token = new Token($this->str[$this->last]);
260 2
                $this->error(
261 2
                    __('Unexpected character.'),
262 2
                    $this->str[$this->last],
263 2
                    $this->last
264 2
                );
265 2
            } elseif ($lastToken !== null
266 347
                && $token->type === Token::TYPE_SYMBOL
267 347
                && $token->flags & Token::FLAG_SYMBOL_VARIABLE
268 347
                && (
269 21
                    $lastToken->type === Token::TYPE_STRING
270 21
                    || (
271 19
                        $lastToken->type === Token::TYPE_SYMBOL
272 19
                        && $lastToken->flags & Token::FLAG_SYMBOL_BACKTICK
273 3
                    )
274 19
                )
275 347
            ) {
276
                // Handles ```... FROM 'user'@'%' ...```.
277 5
                $lastToken->token .= $token->token;
278 5
                $lastToken->type = Token::TYPE_SYMBOL;
279 5
                $lastToken->flags = Token::FLAG_SYMBOL_USER;
280 5
                $lastToken->value .= '@' . $token->value;
281 5
                continue;
282
            } elseif ($lastToken !== null
283 347
                && $token->type === Token::TYPE_KEYWORD
284 347
                && $lastToken->type === Token::TYPE_OPERATOR
285 347
                && $lastToken->value === '.'
286 347
            ) {
287
                // Handles ```... tbl.FROM ...```. In this case, FROM is not
288
                // a reserved word.
289 3
                $token->type = Token::TYPE_NONE;
290 3
                $token->flags = 0;
291 3
                $token->value = $token->token;
292 3
            }
293
294 347
            $token->position = $lastIdx;
295
296 347
            $list->tokens[$list->count++] = $token;
297
298
            // Handling delimiters.
299 347
            if ($token->type === Token::TYPE_NONE && $token->value === 'DELIMITER') {
300 6 View Code Duplication
                if ($this->last + 1 >= $this->len) {
301 1
                    $this->error(
302 1
                        __('Expected whitespace(s) before delimiter.'),
303 1
                        '',
304 1
                        $this->last + 1
305 1
                    );
306 1
                    continue;
307
                }
308
309
                // Skipping last R (from `delimiteR`) and whitespaces between
310
                // the keyword `DELIMITER` and the actual delimiter.
311 5
                $pos = ++$this->last;
312 5
                if (($token = $this->parseWhitespace()) !== null) {
313 4
                    $token->position = $pos;
314 4
                    $list->tokens[$list->count++] = $token;
315 4
                }
316
317
                // Preparing the token that holds the new delimiter.
318 5 View Code Duplication
                if ($this->last + 1 >= $this->len) {
319 1
                    $this->error(
320 1
                        __('Expected delimiter.'),
321 1
                        '',
322 1
                        $this->last + 1
323 1
                    );
324 1
                    continue;
325
                }
326 4
                $pos = $this->last + 1;
327
328
                // Parsing the delimiter.
329 4
                $this->delimiter = null;
330 4
                while (++$this->last < $this->len && !Context::isWhitespace($this->str[$this->last])) {
331 3
                    $this->delimiter .= $this->str[$this->last];
332 3
                }
333
334 4
                if (empty($this->delimiter)) {
335 1
                    $this->error(
336 1
                        __('Expected delimiter.'),
337 1
                        '',
338 1
                        $this->last
339 1
                    );
340 1
                    $this->delimiter = ';';
341 1
                }
342
343 4
                --$this->last;
344
345
                // Saving the delimiter and its token.
346 4
                $this->delimiterLen = strlen($this->delimiter);
347 4
                $token = new Token($this->delimiter, Token::TYPE_DELIMITER);
348 4
                $token->position = $pos;
349 4
                $list->tokens[$list->count++] = $token;
350 4
            }
351
352 345
            $lastToken = $token;
353 345
        }
354
355
        // Adding a final delimiter to mark the ending.
356 352
        $list->tokens[$list->count++] = new Token(null, Token::TYPE_DELIMITER);
357
358
        // Saving the tokens list.
359 352
        $this->list = $list;
360 352
    }
361
362
    /**
363
     * Creates a new error log.
364
     *
365
     * @param string $msg  the error message
366
     * @param string $str  the character that produced the error
367
     * @param int    $pos  the position of the character
368
     * @param int    $code the code of the error
369
     *
370
     * @throws LexerException throws the exception, if strict mode is enabled
371
     */
372 13
    public function error($msg = '', $str = '', $pos = 0, $code = 0)
373
    {
374 13
        $error = new LexerException($msg, $str, $pos, $code);
375 13
        if ($this->strict) {
376 1
            throw $error;
377
        }
378 12
        $this->errors[] = $error;
379 12
    }
380
381
    /**
382
     * Parses a keyword.
383
     *
384
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
385
     */
386 337
    public function parseKeyword()
387
    {
388 337
        $token = '';
389
390
        /**
391
         * Value to be returned.
392
         *
393
         * @var Token
394
         */
395 337
        $ret = null;
396
397
        /**
398
         * The value of `$this->last` where `$token` ends in `$this->str`.
399
         *
400
         * @var int
401
         */
402 337
        $iEnd = $this->last;
403
404
        /**
405
         * Whether last parsed character is a whitespace.
406
         *
407
         * @var bool
408
         */
409 337
        $lastSpace = false;
410
411 337
        for ($j = 1; $j < Context::KEYWORD_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
412
            // Composed keywords shouldn't have more than one whitespace between
413
            // keywords.
414 337
            if (Context::isWhitespace($this->str[$this->last])) {
415 324
                if ($lastSpace) {
416 58
                    --$j; // The size of the keyword didn't increase.
417 58
                    continue;
418
                } else {
419 324
                    $lastSpace = true;
420
                }
421 324
            } else {
422 337
                $lastSpace = false;
423
            }
424
425 337
            $token .= $this->str[$this->last];
426 337
            if (($this->last + 1 === $this->len || Context::isSeparator($this->str[$this->last + 1]))
427 337
                && $flags = Context::isKeyword($token)
428 337
            ) {
429 322
                $ret = new Token($token, Token::TYPE_KEYWORD, $flags);
430 322
                $iEnd = $this->last;
431
432
                // We don't break so we find longest keyword.
433
                // For example, `OR` and `ORDER` have a common prefix `OR`.
434
                // If we stopped at `OR`, the parsing would be invalid.
435 322
            }
436 337
        }
437
438 337
        $this->last = $iEnd;
439
440 337
        return $ret;
441
    }
442
443
    /**
444
     * Parses a label.
445
     *
446
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
447
     */
448 242
    public function parseLabel()
449
    {
450 242
        $token = '';
451
452
        /**
453
         * Value to be returned.
454
         *
455
         * @var Token
456
         */
457 242
        $ret = null;
458
459
        /**
460
         * The value of `$this->last` where `$token` ends in `$this->str`.
461
         *
462
         * @var int
463
         */
464 242
        $iEnd = $this->last;
465
466
        /**
467
         * Whether last parsed character is a whitespace.
468
         *
469
         * @var bool
470
         */
471 242
        $lastSpace = false;
472
473 242
        for ($j = 1; $j < Context::LABEL_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
474
            // Composed keywords shouldn't have more than one whitespace between
475
            // keywords.
476 242
            if (Context::isWhitespace($this->str[$this->last])) {
477 198
                if ($lastSpace) {
478 24
                    --$j; // The size of the keyword didn't increase.
479 24
                    continue;
480
                } else {
481 198
                    $lastSpace = true;
482
                }
483 242
            } elseif ($this->str[$this->last] === ':') {
484 2
                $token .= $this->str[$this->last];
485 2
                $ret = new Token($token, Token::TYPE_LABEL);
486 2
                $iEnd = $this->last;
487 2
                break;
488
            } else {
489 242
                $lastSpace = false;
490
            }
491 242
            $token .= $this->str[$this->last];
492 242
        }
493
494 242
        $this->last = $iEnd;
495
496 242
        return $ret;
497
    }
498
499
    /**
500
     * Parses an operator.
501
     *
502
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
503
     */
504 347
    public function parseOperator()
505
    {
506 347
        $token = '';
507
508
        /**
509
         * Value to be returned.
510
         *
511
         * @var Token
512
         */
513 347
        $ret = null;
514
515
        /**
516
         * The value of `$this->last` where `$token` ends in `$this->str`.
517
         *
518
         * @var int
519
         */
520 347
        $iEnd = $this->last;
521
522 347
        for ($j = 1; $j < Context::OPERATOR_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
523 347
            $token .= $this->str[$this->last];
524 347
            if ($flags = Context::isOperator($token)) {
525 256
                $ret = new Token($token, Token::TYPE_OPERATOR, $flags);
526 256
                $iEnd = $this->last;
527 256
            }
528 347
        }
529
530 347
        $this->last = $iEnd;
531
532 347
        return $ret;
533
    }
534
535
    /**
536
     * Parses a whitespace.
537
     *
538
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
539
     */
540 347 View Code Duplication
    public function parseWhitespace()
1 ignored issue
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
541
    {
542 347
        $token = $this->str[$this->last];
543
544 347
        if (!Context::isWhitespace($token)) {
545 347
            return null;
546
        }
547
548 334
        while (++$this->last < $this->len && Context::isWhitespace($this->str[$this->last])) {
549 58
            $token .= $this->str[$this->last];
550 58
        }
551
552 334
        --$this->last;
553
554 334
        return new Token($token, Token::TYPE_WHITESPACE);
555
    }
556
557
    /**
558
     * Parses a comment.
559
     *
560
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
561
     */
562 347
    public function parseComment()
563
    {
564 347
        $iBak = $this->last;
565 347
        $token = $this->str[$this->last];
566
567
        // Bash style comments. (#comment\n)
568 347
        if (Context::isComment($token)) {
569 View Code Duplication
            while (
570 3
                ++$this->last < $this->len
571 3
                && $this->str[$this->last] !== "\n"
572 3
            ) {
573 3
                $token .= $this->str[$this->last];
574 3
            }
575
576 3
            return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_BASH);
577
        }
578
579
        // C style comments. (/*comment*\/)
580 347
        if (++$this->last < $this->len) {
581 345
            $token .= $this->str[$this->last];
582 345
            if (Context::isComment($token)) {
583 22
                $flags = Token::FLAG_COMMENT_C;
584
585
                // This comment already ended. It may be a part of a
586
                // previous MySQL specific command.
587 22
                if ($token === '*/') {
588 2
                    return new Token($token, Token::TYPE_COMMENT, $flags);
589
                }
590
591
                // Checking if this is a MySQL-specific command.
592 22
                if ($this->last + 1 < $this->len
593 22
                    && $this->str[$this->last + 1] === '!'
594 22
                ) {
595 2
                    $flags |= Token::FLAG_COMMENT_MYSQL_CMD;
596 2
                    $token .= $this->str[++$this->last];
597
598
                    while (
599 2
                        ++$this->last < $this->len
600 2
                        && '0' <= $this->str[$this->last]
601 2
                        && $this->str[$this->last] <= '9'
602 2
                    ) {
603 1
                        $token .= $this->str[$this->last];
604 1
                    }
605 2
                    --$this->last;
606
607
                    // We split this comment and parse only its beginning
608
                    // here.
609 2
                    return new Token($token, Token::TYPE_COMMENT, $flags);
610
                }
611
612
                // Parsing the comment.
613
                while (
614 22
                    ++$this->last < $this->len
615 22
                    && (
616 22
                        $this->str[$this->last - 1] !== '*'
617 22
                        || $this->str[$this->last] !== '/'
618 22
                    )
619 22
                ) {
620 22
                    $token .= $this->str[$this->last];
621 22
                }
622
623
                // Adding the ending.
624 22
                if ($this->last < $this->len) {
625 22
                    $token .= $this->str[$this->last];
626 22
                }
627
628 22
                return new Token($token, Token::TYPE_COMMENT, $flags);
629
            }
630 345
        }
631
632
        // SQL style comments. (-- comment\n)
633 347
        if (++$this->last < $this->len) {
634 344
            $token .= $this->str[$this->last];
635 344
            if (Context::isComment($token)) {
636
                // Checking if this comment did not end already (```--\n```).
637 6
                if ($this->str[$this->last] !== "\n") {
638 View Code Duplication
                    while (
639 6
                        ++$this->last < $this->len
640 6
                        && $this->str[$this->last] !== "\n"
641 6
                    ) {
642 6
                        $token .= $this->str[$this->last];
643 6
                    }
644 6
                }
645
646 6
                return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_SQL);
647
            }
648 344
        }
649
650 347
        $this->last = $iBak;
651
652 347
        return null;
653
    }
654
655
    /**
656
     * Parses a boolean.
657
     *
658
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
659
     */
660 338
    public function parseBool()
661
    {
662 338
        if ($this->last + 3 >= $this->len) {
663
            // At least `min(strlen('TRUE'), strlen('FALSE'))` characters are
664
            // required.
665 95
            return null;
666
        }
667
668 338
        $iBak = $this->last;
669 338
        $token = $this->str[$this->last] . $this->str[++$this->last]
670 338
        . $this->str[++$this->last] . $this->str[++$this->last]; // _TRUE_ or _FALS_e
671
672 338
        if (Context::isBool($token)) {
673 1
            return new Token($token, Token::TYPE_BOOL);
674 338
        } elseif (++$this->last < $this->len) {
675 337
            $token .= $this->str[$this->last]; // fals_E_
676 337
            if (Context::isBool($token)) {
677 1
                return new Token($token, Token::TYPE_BOOL, 1);
678
            }
679 337
        }
680
681 338
        $this->last = $iBak;
682
683 338
        return null;
684
    }
685
686
    /**
687
     * Parses a number.
688
     *
689
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
690
     */
691 347
    public function parseNumber()
692
    {
693
        // A rudimentary state machine is being used to parse numbers due to
694
        // the various forms of their notation.
695
        //
696
        // Below are the states of the machines and the conditions to change
697
        // the state.
698
        //
699
        //      1 --------------------[ + or - ]-------------------> 1
700
        //      1 -------------------[ 0x or 0X ]------------------> 2
701
        //      1 --------------------[ 0 to 9 ]-------------------> 3
702
        //      1 -----------------------[ . ]---------------------> 4
703
        //      1 -----------------------[ b ]---------------------> 7
704
        //
705
        //      2 --------------------[ 0 to F ]-------------------> 2
706
        //
707
        //      3 --------------------[ 0 to 9 ]-------------------> 3
708
        //      3 -----------------------[ . ]---------------------> 4
709
        //      3 --------------------[ e or E ]-------------------> 5
710
        //
711
        //      4 --------------------[ 0 to 9 ]-------------------> 4
712
        //      4 --------------------[ e or E ]-------------------> 5
713
        //
714
        //      5 ---------------[ + or - or 0 to 9 ]--------------> 6
715
        //
716
        //      7 -----------------------[ ' ]---------------------> 8
717
        //
718
        //      8 --------------------[ 0 or 1 ]-------------------> 8
719
        //      8 -----------------------[ ' ]---------------------> 9
720
        //
721
        // State 1 may be reached by negative numbers.
722
        // State 2 is reached only by hex numbers.
723
        // State 4 is reached only by float numbers.
724
        // State 5 is reached only by numbers in approximate form.
725
        // State 7 is reached only by numbers in bit representation.
726
        //
727
        // Valid final states are: 2, 3, 4 and 6. Any parsing that finished in a
728
        // state other than these is invalid.
729 347
        $iBak = $this->last;
730 347
        $token = '';
731 347
        $flags = 0;
732 347
        $state = 1;
733 347
        for (; $this->last < $this->len; ++$this->last) {
734 347
            if ($state === 1) {
735 347
                if ($this->str[$this->last] === '-') {
736 6
                    $flags |= Token::FLAG_NUMBER_NEGATIVE;
737 347
                } elseif ($this->last + 1 < $this->len
738 347
                    && $this->str[$this->last] === '0'
739 347
                    && (
740 19
                        $this->str[$this->last + 1] === 'x'
741 19
                        || $this->str[$this->last + 1] === 'X'
742 19
                    )
743 347
                ) {
744 1
                    $token .= $this->str[$this->last++];
745 1
                    $state = 2;
746 347
                } elseif ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9') {
747 164
                    $state = 3;
748 347
                } elseif ($this->str[$this->last] === '.') {
749 53
                    $state = 4;
750 347
                } elseif ($this->str[$this->last] === 'b') {
751 32
                    $state = 7;
752 347
                } elseif ($this->str[$this->last] !== '+') {
753
                    // `+` is a valid character in a number.
754 347
                    break;
755
                }
756 208
            } elseif ($state === 2) {
757 1
                $flags |= Token::FLAG_NUMBER_HEX;
758
                if (
759
                    !(
760 1
                        ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
761 1
                        || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'F')
762 1
                        || ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'f')
763 1
                    )
764 1
                ) {
765 1
                    break;
766
                }
767 190
            } elseif ($state === 3) {
768 146
                if ($this->str[$this->last] === '.') {
769 4
                    $state = 4;
770 146
                } elseif ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
771 1
                    $state = 5;
772 146
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
773
                    // Just digits and `.`, `e` and `E` are valid characters.
774 144
                    break;
775
                }
776 127
            } elseif ($state === 4) {
777 56
                $flags |= Token::FLAG_NUMBER_FLOAT;
778 56 View Code Duplication
                if ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
779 2
                    $state = 5;
780 56
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
781
                    // Just digits, `e` and `E` are valid characters.
782 55
                    break;
783
                }
784 34
            } elseif ($state === 5) {
785 2
                $flags |= Token::FLAG_NUMBER_APPROXIMATE;
786 2 View Code Duplication
                if ($this->str[$this->last] === '+' || $this->str[$this->last] === '-'
787 2
                    || ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
788 2
                ) {
789 1
                    $state = 6;
790 1
                } else {
791 2
                    break;
792
                }
793 31
            } elseif ($state === 6) {
794 1
                if ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
795
                    // Just digits are valid characters.
796 1
                    break;
797
                }
798 31
            } elseif ($state === 7) {
799 31
                $flags |= Token::FLAG_NUMBER_BINARY;
800 31
                if ($this->str[$this->last] === '\'') {
801 1
                    $state = 8;
802 1
                } else {
803 30
                    break;
804
                }
805 1
            } elseif ($state === 8) {
806 1
                if ($this->str[$this->last] === '\'') {
807 1
                    $state = 9;
808 1
                } elseif ($this->str[$this->last] !== '0'
809 1
                    && $this->str[$this->last] !== '1'
810 1
                ) {
811 1
                    break;
812
                }
813 1
            } elseif ($state === 9) {
814 1
                break;
815
            }
816 208
            $token .= $this->str[$this->last];
817 208
        }
818 347
        if ($state === 2 || $state === 3
819 347
            || ($token !== '.' && $state === 4)
820 347
            || $state === 6 || $state === 9
821 347
        ) {
822 164
            --$this->last;
823
824 164
            return new Token($token, Token::TYPE_NUMBER, $flags);
825
        }
826 347
        $this->last = $iBak;
827
828 347
        return null;
829
    }
830
831
    /**
832
     * Parses a string.
833
     *
834
     * @param string $quote additional starting symbol
835
     *
836
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
837
     */
838 338
    public function parseString($quote = '')
839
    {
840 338
        $token = $this->str[$this->last];
841 338
        if (!($flags = Context::isString($token)) && $token !== $quote) {
842 338
            return null;
843
        }
844 140
        $quote = $token;
845
846 140
        while (++$this->last < $this->len) {
847 140
            if ($this->last + 1 < $this->len
848 140
                && (
849 140
                    ($this->str[$this->last] === $quote && $this->str[$this->last + 1] === $quote)
850 140
                    || ($this->str[$this->last] === '\\' && $quote !== '`')
851 140
                )
852 140
            ) {
853 10
                $token .= $this->str[$this->last] . $this->str[++$this->last];
854 10
            } else {
855 140
                if ($this->str[$this->last] === $quote) {
856 139
                    break;
857
                }
858 138
                $token .= $this->str[$this->last];
859
            }
860 138
        }
861
862 140
        if ($this->last >= $this->len || $this->str[$this->last] !== $quote) {
863 4
            $this->error(
864 4
                sprintf(
865 4
                    __('Ending quote %1$s was expected.'),
866
                    $quote
867 4
                ),
868 4
                '',
869 4
                $this->last
870 4
            );
871 4
        } else {
872 139
            $token .= $this->str[$this->last];
873
        }
874
875 140
        return new Token($token, Token::TYPE_STRING, $flags);
876
    }
877
878
    /**
879
     * Parses a symbol.
880
     *
881
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
882
     */
883 338
    public function parseSymbol()
884
    {
885 338
        $token = $this->str[$this->last];
886 338
        if (!($flags = Context::isSymbol($token))) {
887 337
            return null;
888
        }
889
890 89
        if ($flags & Token::FLAG_SYMBOL_VARIABLE) {
891 21
            if ($this->last + 1 < $this->len && $this->str[++$this->last] === '@') {
892
                // This is a system variable (e.g. `@@hostname`).
893 1
                $token .= $this->str[$this->last++];
894 1
                $flags |= Token::FLAG_SYMBOL_SYSTEM;
895 1
            }
896 21
        } else {
897 75
            $token = '';
898
        }
899
900 89
        $str = null;
901
902 89
        if ($this->last < $this->len) {
903 89
            if (($str = $this->parseString('`')) === null) {
904 16
                if (($str = static::parseUnknown()) === null) {
905 2
                    $this->error(
906 2
                        __('Variable name was expected.'),
907 2
                        $this->str[$this->last],
908 2
                        $this->last
909 2
                    );
910 2
                }
911 16
            }
912 89
        }
913
914 89
        if ($str !== null) {
915 88
            $token .= $str->token;
916 88
        }
917
918 89
        return new Token($token, Token::TYPE_SYMBOL, $flags);
919
    }
920
921
    /**
922
     * Parses unknown parts of the query.
923
     *
924
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
925
     */
926 247 View Code Duplication
    public function parseUnknown()
1 ignored issue
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
927
    {
928 247
        $token = $this->str[$this->last];
929 247
        if (Context::isSeparator($token)) {
930 4
            return null;
931
        }
932
933 246
        while (++$this->last < $this->len && !Context::isSeparator($this->str[$this->last])) {
934 235
            $token .= $this->str[$this->last];
935 235
        }
936 246
        --$this->last;
937
938 246
        return new Token($token);
939
    }
940
941
    /**
942
     * Parses the delimiter of the query.
943
     *
944
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
945
     */
946 347
    public function parseDelimiter()
947
    {
948 347
        $idx = 0;
949
950 347
        while ($idx < $this->delimiterLen && $this->last + $idx < $this->len) {
951 347
            if ($this->delimiter[$idx] !== $this->str[$this->last + $idx]) {
952 347
                return null;
953
            }
954 100
            ++$idx;
955 100
        }
956
957 100
        $this->last += $this->delimiterLen - 1;
958
959 100
        return new Token($this->delimiter, Token::TYPE_DELIMITER);
960
    }
961
}
962