Completed
Pull Request — master (#94)
by Deven
78:53
created

Lexer::parseLabel()   B

Complexity

Conditions 6
Paths 5

Size

Total Lines 49
Code Lines 22

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 21
CRAP Score 6

Importance

Changes 0
Metric Value
cc 6
eloc 22
nc 5
nop 0
dl 0
loc 49
ccs 21
cts 21
cp 1
crap 6
rs 8.5906
c 0
b 0
f 0
1
<?php
2
3
/**
4
 * Defines the lexer of the library.
5
 *
6
 * This is one of the most important components, along with the parser.
7
 *
8
 * Depends on context to extract lexemes.
9
 *
10
 * @package SqlParser
11
 */
12
namespace SqlParser;
13
14 2
require_once 'common.php';
15
16
use SqlParser\Exceptions\LexerException;
17
18 2
if (!defined('USE_UTF_STRINGS')) {
19
    // NOTE: In previous versions of PHP (5.5 and older) the default
20
    // internal encoding is "ISO-8859-1".
21
    // All `mb_` functions must specify the correct encoding, which is
22
    // 'UTF-8' in order to work properly.
23
24
    /**
25
     * Forces usage of `UtfString` if the string is multibyte.
26
     * `UtfString` may be slower, but it gives better results.
27
     *
28
     * @var bool
29
     */
30 2
    define('USE_UTF_STRINGS', true);
31 2
}
32
33
/**
34
 * Performs lexical analysis over a SQL statement and splits it in multiple
35
 * tokens.
36
 *
37
 * The output of the lexer is affected by the context of the SQL statement.
38
 *
39
 * @category Lexer
40
 * @package  SqlParser
41
 * @license  https://www.gnu.org/licenses/gpl-2.0.txt GPL-2.0+
42
 * @see      Context
43
 */
44
class Lexer
45
{
46
47
    /**
48
     * A list of methods that are used in lexing the SQL query.
49
     *
50
     * @var array
51
     */
52
    public static $PARSER_METHODS = array(
53
54
        // It is best to put the parsers in order of their complexity
55
        // (ascending) and their occurrence rate (descending).
56
        //
57
        // Conflicts:
58
        //
59
        // 1. `parseDelimiter`, `parseUnknown`, `parseKeyword`, `parseNumber`
60
        // They fight over delimiter. The delimiter may be a keyword, a
61
        // number or almost any character which makes the delimiter one of
62
        // the first tokens that must be parsed.
63
        //
64
        // 1. `parseNumber` and `parseOperator`
65
        // They fight over `+` and `-`.
66
        //
67
        // 2. `parseComment` and `parseOperator`
68
        // They fight over `/` (as in ```/*comment*/``` or ```a / b```)
69
        //
70
        // 3. `parseBool` and `parseKeyword`
71
        // They fight over `TRUE` and `FALSE`.
72
        //
73
        // 4. `parseKeyword` and `parseUnknown`
74
        // They fight over words. `parseUnknown` does not know about
75
        // keywords.
76
77
        'parseDelimiter', 'parseWhitespace', 'parseNumber',
78
        'parseComment', 'parseOperator', 'parseBool', 'parseString',
79
        'parseSymbol', 'parseKeyword', 'parseLabel', 'parseUnknown'
80
    );
81
82
    /**
83
     * Whether errors should throw exceptions or just be stored.
84
     *
85
     * @var bool
86
     *
87
     * @see static::$errors
88
     */
89
    public $strict = false;
90
91
    /**
92
     * The string to be parsed.
93
     *
94
     * @var string|UtfString
95
     */
96
    public $str = '';
97
98
    /**
99
     * The length of `$str`.
100
     *
101
     * By storing its length, a lot of time is saved, because parsing methods
102
     * would call `strlen` everytime.
103
     *
104
     * @var int
105
     */
106
    public $len = 0;
107
108
    /**
109
     * The index of the last parsed character.
110
     *
111
     * @var int
112
     */
113
    public $last = 0;
114
115
    /**
116
     * Tokens extracted from given strings.
117
     *
118
     * @var TokensList
119
     */
120
    public $list;
121
122
    /**
123
     * The default delimiter. This is used, by default, in all new instances.
124
     *
125
     * @var string
126
     */
127
    public static $DEFAULT_DELIMITER = ';';
128
129
    /**
130
     * Statements delimiter.
131
     * This may change during lexing.
132
     *
133
     * @var string
134
     */
135
    public $delimiter;
136
137
    /**
138
     * The length of the delimiter.
139
     *
140
     * Because `parseDelimiter` can be called a lot, it would perform a lot of
141
     * calls to `strlen`, which might affect performance when the delimiter is
142
     * big.
143
     *
144
     * @var int
145
     */
146
    public $delimiterLen;
147
148
    /**
149
     * List of errors that occurred during lexing.
150
     *
151
     * Usually, the lexing does not stop once an error occurred because that
152
     * error might be false positive or a partial result (even a bad one)
153
     * might be needed.
154
     *
155
     * @var LexerException[]
156
     *
157
     * @see Lexer::error()
158
     */
159
    public $errors = array();
160
161
    /**
162
     * Gets the tokens list parsed by a new instance of a lexer.
163
     *
164
     * @param string|UtfString $str       The query to be lexed.
165
     * @param bool             $strict    Whether strict mode should be
166
     *                                    enabled or not.
167
     * @param string           $delimiter The delimiter to be used.
0 ignored issues
show
Documentation introduced by
Should the type for parameter $delimiter not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
168
     *
169
     * @return TokensList
170
     */
171
    public static function getTokens($str, $strict = false, $delimiter = null)
172 1
    {
173
        $lexer = new Lexer($str, $strict, $delimiter);
174 1
        return $lexer->list;
175 1
    }
176
177
    /**
178
     * Constructor.
179
     *
180
     * @param string|UtfString $str       The query to be lexed.
181
     * @param bool             $strict    Whether strict mode should be
182
     *                                    enabled or not.
183
     * @param string           $delimiter The delimiter to be used.
0 ignored issues
show
Documentation introduced by
Should the type for parameter $delimiter not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
184
     */
185
    public function __construct($str, $strict = false, $delimiter = null)
186 223
    {
187
        // `strlen` is used instead of `mb_strlen` because the lexer needs to
188
        // parse each byte of the input.
189
        $len = ($str instanceof UtfString) ? $str->length() : strlen($str);
190 223
191
        // For multi-byte strings, a new instance of `UtfString` is
192
        // initialized (only if `UtfString` usage is forced.
193
        if (!($str instanceof UtfString)) {
194 223
            if ((USE_UTF_STRINGS) && ($len !== mb_strlen($str, 'UTF-8'))) {
195 223
                $str = new UtfString($str);
196 1
            }
197 1
        }
198 223
199
        $this->str = $str;
200 223
        $this->len = ($str instanceof UtfString) ? $str->length() : $len;
201 223
202
        $this->strict = $strict;
203 223
204
        // Setting the delimiter.
205
        $this->setDelimiter(
206 223
            !empty($delimiter) ? $delimiter : static::$DEFAULT_DELIMITER
207 223
        );
208 223
209
        $this->lex();
210 223
    }
211 223
212
    /**
213
     * Sets the delimiter.
214
     *
215
     * @param string $delimiter The new delimiter.
216
     */
217
    public function setDelimiter($delimiter)
218 223
    {
219
        $this->delimiter = $delimiter;
220 223
        $this->delimiterLen = strlen($delimiter);
221 223
    }
222 223
223
    /**
224
     * Parses the string and extracts lexemes.
225
     *
226
     * @return void
227
     */
228
    public function lex()
229 223
    {
230
        // TODO: Sometimes, static::parse* functions make unnecessary calls to
0 ignored issues
show
Coding Style Best Practice introduced by
Comments for TODO tasks are often forgotten in the code; it might be better to use a dedicated issue tracker.
Loading history...
231
        // is* functions. For a better performance, some rules can be deduced
232
        // from context.
233
        // For example, in `parseBool` there is no need to compare the token
234
        // every time with `true` and `false`. The first step would be to
235
        // compare with 'true' only and just after that add another letter from
236
        // context and compare again with `false`.
237
        // Another example is `parseComment`.
238
239
        $list = new TokensList();
240 223
241
        /**
242
         * Last processed token.
243
         *
244
         * @var Token $lastToken
245
         */
246
        $lastToken = null;
247 223
248
        for ($this->last = 0, $lastIdx = 0; $this->last < $this->len; $lastIdx = ++$this->last) {
249 223
            /**
250
             * The new token.
251
             *
252
             * @var Token $token
253
             */
254
            $token = null;
255 218
256
            foreach (static::$PARSER_METHODS as $method) {
257 218
                if (($token = $this->$method())) {
258 218
                    break;
259 218
                }
260
            }
261 218
262
            if ($token === null) {
263 218
                // @assert($this->last === $lastIdx);
264
                $token = new Token($this->str[$this->last]);
265 2
                $this->error(
266 2
                    __('Unexpected character.'),
267 2
                    $this->str[$this->last],
268 2
                    $this->last
269 2
                );
270 2
            } elseif (($lastToken !== null)
271 218
                && ($token->type === Token::TYPE_SYMBOL)
272 218
                && ($token->flags & Token::FLAG_SYMBOL_VARIABLE)
273 218
                && (($lastToken->type === Token::TYPE_STRING)
274 218
                || (($lastToken->type === Token::TYPE_SYMBOL)
275 13
                && ($lastToken->flags & Token::FLAG_SYMBOL_BACKTICK)))
276 13
            ) {
277 218
                // Handles ```... FROM 'user'@'%' ...```.
278
                $lastToken->token .= $token->token;
279 4
                $lastToken->type = Token::TYPE_SYMBOL;
280 4
                $lastToken->flags = Token::FLAG_SYMBOL_USER;
281 4
                $lastToken->value .= '@' . $token->value;
282 4
                continue;
283 4
            } elseif (($lastToken !== null)
284 218
                && ($token->type === Token::TYPE_KEYWORD)
285 218
                && ($lastToken->type === Token::TYPE_OPERATOR)
286 218
                && ($lastToken->value === '.')
287 218
            ) {
288 218
                // Handles ```... tbl.FROM ...```. In this case, FROM is not
289
                // a reserved word.
290
                $token->type = Token::TYPE_NONE;
291 2
                $token->flags = 0;
292 2
                $token->value = $token->token;
293 2
            }
294 2
295
            $token->position = $lastIdx;
296 218
297
            $list->tokens[$list->count++] = $token;
298 218
299
            // Handling delimiters.
300
            if (($token->type === Token::TYPE_NONE) && ($token->value === 'DELIMITER')) {
301 218 View Code Duplication
                if ($this->last + 1 >= $this->len) {
302 6
                    $this->error(
303 1
                        __('Expected whitespace(s) before delimiter.'),
304 1
                        '',
305 1
                        $this->last + 1
306 1
                    );
307 1
                    continue;
308 1
                }
309
310
                // Skipping last R (from `delimiteR`) and whitespaces between
311
                // the keyword `DELIMITER` and the actual delimiter.
312
                $pos = ++$this->last;
313 5
                if (($token = $this->parseWhitespace()) !== null) {
314 5
                    $token->position = $pos;
315 4
                    $list->tokens[$list->count++] = $token;
316 4
                }
317 4
318
                // Preparing the token that holds the new delimiter.
319 View Code Duplication
                if ($this->last + 1 >= $this->len) {
320 5
                    $this->error(
321 1
                        __('Expected delimiter.'),
322 1
                        '',
323 1
                        $this->last + 1
324 1
                    );
325 1
                    continue;
326 1
                }
327
                $pos = $this->last + 1;
328 4
329
                // Parsing the delimiter.
330
                $this->delimiter = null;
331 4
                while ((++$this->last < $this->len) && (!Context::isWhitespace($this->str[$this->last]))) {
332 4
                    $this->delimiter .= $this->str[$this->last];
333 3
                }
334 3
335
                if (empty($this->delimiter)) {
336 4
                    $this->error(
337 1
                        __('Expected delimiter.'),
338 1
                        '',
339 1
                        $this->last
340 1
                    );
341 1
                    $this->delimiter = ';';
342 1
                }
343 1
344
                --$this->last;
345 4
346
                // Saving the delimiter and its token.
347
                $this->delimiterLen = strlen($this->delimiter);
348 4
                $token = new Token($this->delimiter, Token::TYPE_DELIMITER);
349 4
                $token->position = $pos;
350 4
                $list->tokens[$list->count++] = $token;
351 4
            }
352 4
353
            $lastToken = $token;
354 216
        }
355 216
356
        // Adding a final delimiter to mark the ending.
357
        $list->tokens[$list->count++] = new Token(null, Token::TYPE_DELIMITER);
358 223
359
        // Saving the tokens list.
360
        $this->list = $list;
361 223
    }
362 223
363
    /**
364
     * Creates a new error log.
365
     *
366
     * @param string $msg  The error message.
367
     * @param string $str  The character that produced the error.
368
     * @param int    $pos  The position of the character.
369
     * @param int    $code The code of the error.
370
     *
371
     * @throws LexerException Throws the exception, if strict mode is enabled.
372
     *
373
     * @return void
374
     */
375
    public function error($msg = '', $str = '', $pos = 0, $code = 0)
376 12
    {
377
        $error = new LexerException($msg, $str, $pos, $code);
378 12
        if ($this->strict) {
379 12
            throw $error;
380 1
        }
381
        $this->errors[] = $error;
382 11
    }
383 11
384
    /**
385
     * Parses a keyword.
386
     *
387
     * @return Token
388
     */
389
    public function parseKeyword()
390 210
    {
391
        $token = '';
392 210
393
        /**
394
         * Value to be returned.
395
         *
396
         * @var Token $ret
397
         */
398
        $ret = null;
399 210
400
        /**
401
         * The value of `$this->last` where `$token` ends in `$this->str`.
402
         *
403
         * @var int $iEnd
404
         */
405
        $iEnd = $this->last;
406 210
407
        /**
408
         * Whether last parsed character is a whitespace.
409
         *
410
         * @var bool $lastSpace
411
         */
412
        $lastSpace = false;
413 210
414
        for ($j = 1; $j < Context::KEYWORD_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
415 210
            // Composed keywords shouldn't have more than one whitespace between
416
            // keywords.
417
            if (Context::isWhitespace($this->str[$this->last])) {
418 210
                if ($lastSpace) {
419 197
                    --$j; // The size of the keyword didn't increase.
420 36
                    continue;
421 36
                } else {
422
                    $lastSpace = true;
423 197
                }
424
            } else {
425 197
                $lastSpace = false;
426 210
            }
427
            $token .= $this->str[$this->last];
428 210
            if (($this->last + 1 === $this->len) || (Context::isSeparator($this->str[$this->last + 1]))) {
429 210 View Code Duplication
                if (($flags = Context::isKeyword($token))) {
430 210
                    $ret = new Token($token, Token::TYPE_KEYWORD, $flags);
431 196
                    $iEnd = $this->last;
432 196
433
                    // We don't break so we find longest keyword.
434
                    // For example, `OR` and `ORDER` have a common prefix `OR`.
435
                    // If we stopped at `OR`, the parsing would be invalid.
436
                }
437 196
            }
438 210
        }
439 210
440
        $this->last = $iEnd;
441 210
        return $ret;
442 210
    }
443
444
    /**
445
     * Parses a label.
446
     *
447
     * @return Token
448
     */
449
    public function parseLabel()
450 218
    {
451
        $token = '';
452 218
453
        /**
454
         * Value to be returned.
455
         *
456
         * @var Token $ret
457
         */
458
        $ret = null;
459 218
460
        /**
461
         * The value of `$this->last` where `$token` ends in `$this->str`.
462
         *
463
         * @var int $iEnd
464
         */
465
        $iEnd = $this->last;
466 218
467
        /**
468 218
         * Whether last parsed character is a whitespace.
469 218
         *
470 218
         * @var bool $lastSpace
471 165
         */
472 165
        $lastSpace = false;
473 165
474 218
        for ($j = 1; $j < Context::LABEL_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
475
            // Composed keywords shouldn't have more than one whitespace between
476 218
            // keywords.
477 218
            if (Context::isWhitespace($this->str[$this->last])) {
478
                if ($lastSpace) {
479
                    --$j; // The size of the keyword didn't increase.
480
                    continue;
481
                } else {
482
                    $lastSpace = true;
483
                }
484
            } elseif ($this->str[$this->last] === ':') {
485 218
                $token .= $this->str[$this->last];
486
                $ret = new Token($token, Token::TYPE_LABEL);
487 218
                $iEnd = $this->last;
488
                break;
489 218
            } else {
490 218
                $lastSpace = false;
491
            }
492
            $token .= $this->str[$this->last];
493 206
        }
494 36
495 36
        $this->last = $iEnd;
496
        return $ret;
497 206
    }
498 206
499
    /**
500
     * Parses an operator.
501
     *
502
     * @return Token
503
     */
504
    public function parseOperator()
505
    {
506 218
        $token = '';
507
508 218
        /**
509 218
         * Value to be returned.
510
         *
511
         * @var Token $ret
512 218
         */
513 2
        $ret = null;
514 2
515 2
        /**
516 2
         * The value of `$this->last` where `$token` ends in `$this->str`.
517 2
         *
518
         * @var int $iEnd
519
         */
520
        $iEnd = $this->last;
521 218
522 217
        for ($j = 1; $j < Context::OPERATOR_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
523 217
            $token .= $this->str[$this->last];
524 9 View Code Duplication
            if ($flags = Context::isOperator($token)) {
525
                $ret = new Token($token, Token::TYPE_OPERATOR, $flags);
526
                $iEnd = $this->last;
527
            }
528 9
        }
529 2
530
        $this->last = $iEnd;
531
        return $ret;
532
    }
533 9
534 2
    /**
535 2
     * Parses a whitespace.
536
     *
537 2
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
538 2
     */
539 2 View Code Duplication
    public function parseWhitespace()
1 ignored issue
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
540 2
    {
541 1
        $token = $this->str[$this->last];
542 1
543 2
        if (!Context::isWhitespace($token)) {
544
            return null;
545
        }
546
547 2
        while ((++$this->last < $this->len) && (Context::isWhitespace($this->str[$this->last]))) {
548
            $token .= $this->str[$this->last];
549
        }
550
551 9
        --$this->last;
552 9
        return new Token($token, Token::TYPE_WHITESPACE);
553 9
    }
554 9
555 9
    /**
556
     * Parses a comment.
557
     *
558 9
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
559 9
     */
560 9
    public function parseComment()
561 9
    {
562
        $iBak = $this->last;
563 217
        $token = $this->str[$this->last];
564
565
        // Bash style comments. (#comment\n)
566 218
        if (Context::isComment($token)) {
567 216 View Code Duplication
            while ((++$this->last < $this->len) && ($this->str[$this->last] !== "\n")) {
568 216
                $token .= $this->str[$this->last];
569
            }
570 3
            $token .= "\n"; // Adding the line ending.
571 3
            return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_BASH);
572 3
        }
573 3
574 3
        // C style comments. (/*comment*\/)
575 3
        if (++$this->last < $this->len) {
576 3
            $token .= $this->str[$this->last];
577
            if (Context::isComment($token)) {
578 216
                $flags = Token::FLAG_COMMENT_C;
579
580 218
                // This comment already ended. It may be a part of a
581 218
                // previous MySQL specific command.
582
                if ($token === '*/') {
583
                    return new Token($token, Token::TYPE_COMMENT, $flags);
584
                }
585
586
                // Checking if this is a MySQL-specific command.
587
                if (($this->last + 1 < $this->len) && ($this->str[$this->last + 1] === '!')) {
588
                    $flags |= Token::FLAG_COMMENT_MYSQL_CMD;
589 210
                    $token .= $this->str[++$this->last];
590
591 210
                    while ((++$this->last < $this->len)
592
                        && ('0' <= $this->str[$this->last])
593
                        && ($this->str[$this->last] <= '9')
594 55
                    ) {
595
                        $token .= $this->str[$this->last];
596
                    }
597 210
                    --$this->last;
598 210
599 210
                    // We split this comment and parse only its beginning
600
                    // here.
601 210
                    return new Token($token, Token::TYPE_COMMENT, $flags);
602 1
                }
603 210
604 209
                // Parsing the comment.
605 209
                while ((++$this->last < $this->len)
606 1
                    && (($this->str[$this->last - 1] !== '*') || ($this->str[$this->last] !== '/'))
607
                ) {
608 209
                    $token .= $this->str[$this->last];
609
                }
610 210
611 210
                // Adding the ending.
612
                if ($this->last < $this->len) {
613
                    $token .= $this->str[$this->last];
614
                }
615
                return new Token($token, Token::TYPE_COMMENT, $flags);
616
            }
617
        }
618
619 218
        // SQL style comments. (-- comment\n)
620
        if (++$this->last < $this->len) {
621
            $token .= $this->str[$this->last];
622
            if (Context::isComment($token)) {
623
                // Checking if this comment did not end already (```--\n```).
624
                if ($this->str[$this->last] !== "\n") {
625 View Code Duplication
                    while ((++$this->last < $this->len) && ($this->str[$this->last] !== "\n")) {
626
                        $token .= $this->str[$this->last];
627
                    }
628
                    $token .= "\n"; // Adding the line ending.
629
                }
630
                return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_SQL);
631
            }
632
        }
633
634
        $this->last = $iBak;
635
        return null;
636
    }
637
638
    /**
639
     * Parses a boolean.
640
     *
641
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
642
     */
643
    public function parseBool()
644
    {
645
        if ($this->last + 3 >= $this->len) {
646
            // At least `min(strlen('TRUE'), strlen('FALSE'))` characters are
647
            // required.
648
            return null;
649
        }
650
651
        $iBak = $this->last;
652
        $token = $this->str[$this->last] . $this->str[++$this->last]
653
        . $this->str[++$this->last] . $this->str[++$this->last]; // _TRUE_ or _FALS_e
654
655
        if (Context::isBool($token)) {
656
            return new Token($token, Token::TYPE_BOOL);
657 218
        } elseif (++$this->last < $this->len) {
658 218
            $token .= $this->str[$this->last]; // fals_E_
659 218
            if (Context::isBool($token)) {
660 218
                return new Token($token, Token::TYPE_BOOL, 1);
661 218
            }
662 218
        }
663 218
664 3
        $this->last = $iBak;
665 218
        return null;
666 218
    }
667 218
668 12
    /**
669 218
     * Parses a number.
670 1
     *
671 1
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
672 218
     */
673 93
    public function parseNumber()
674 218
    {
675 21
        // A rudimentary state machine is being used to parse numbers due to
676 218
        // the various forms of their notation.
677 22
        //
678 218
        // Below are the states of the machines and the conditions to change
679
        // the state.
680 218
        //
681
        //      1 --------------------[ + or - ]-------------------> 1
682 120
        //      1 -------------------[ 0x or 0X ]------------------> 2
683 1
        //      1 --------------------[ 0 to 9 ]-------------------> 3
684 1
        //      1 -----------------------[ . ]---------------------> 4
685 1
        //      1 -----------------------[ b ]---------------------> 7
686 1
        //
687 1
        //      2 --------------------[ 0 to F ]-------------------> 2
688 1
        //
689
        //      3 --------------------[ 0 to 9 ]-------------------> 3
690 108
        //      3 -----------------------[ . ]---------------------> 4
691 82
        //      3 --------------------[ e or E ]-------------------> 5
692 2
        //
693 82
        //      4 --------------------[ 0 to 9 ]-------------------> 4
694 1
        //      4 --------------------[ e or E ]-------------------> 5
695 82
        //
696
        //      5 ---------------[ + or - or 0 to 9 ]--------------> 6
697 80
        //
698
        //      7 -----------------------[ ' ]---------------------> 8
699 67
        //
700 22
        //      8 --------------------[ 0 or 1 ]-------------------> 8
701 22
        //      8 -----------------------[ ' ]---------------------> 9
702 2
        //
703 22
        // State 1 may be reached by negative numbers.
704
        // State 2 is reached only by hex numbers.
705 22
        // State 4 is reached only by float numbers.
706
        // State 5 is reached only by numbers in approximate form.
707 23
        // State 7 is reached only by numbers in bit representation.
708 2
        //
709 2
        // Valid final states are: 2, 3, 4 and 6. Any parsing that finished in a
710 2
        // state other than these is invalid.
711 2
        $iBak = $this->last;
712 1
        $token = '';
713 1
        $flags = 0;
714 2
        $state = 1;
715
        for (; $this->last < $this->len; ++$this->last) {
716 21
            if ($state === 1) {
717 1
                if ($this->str[$this->last] === '-') {
718
                    $flags |= Token::FLAG_NUMBER_NEGATIVE;
719 1
                } elseif (($this->last + 1 < $this->len)
720
                    && ($this->str[$this->last] === '0')
721 21
                    && (($this->str[$this->last + 1] === 'x')
722 21
                    || ($this->str[$this->last + 1] === 'X'))
723 21
                ) {
724 1
                    $token .= $this->str[$this->last++];
725 1
                    $state = 2;
726 20
                } elseif (($this->str[$this->last] >= '0') && ($this->str[$this->last] <= '9')) {
727
                    $state = 3;
728 1
                } elseif ($this->str[$this->last] === '.') {
729 1
                    $state = 4;
730 1
                } elseif ($this->str[$this->last] === 'b') {
731 1
                    $state = 7;
732 1
                } elseif ($this->str[$this->last] !== '+') {
733 1
                    // `+` is a valid character in a number.
734 1
                    break;
735
                }
736 1
            } elseif ($state === 2) {
737 1
                $flags |= Token::FLAG_NUMBER_HEX;
738
                if (!((($this->str[$this->last] >= '0') && ($this->str[$this->last] <= '9'))
739 120
                    || (($this->str[$this->last] >= 'A') && ($this->str[$this->last] <= 'F'))
740 120
                    || (($this->str[$this->last] >= 'a') && ($this->str[$this->last] <= 'f')))
741 218
                ) {
742 218
                    break;
743 218
                }
744 218
            } elseif ($state === 3) {
745 93
                if ($this->str[$this->last] === '.') {
746 93
                    $state = 4;
747
                } elseif (($this->str[$this->last] === 'e') || ($this->str[$this->last] === 'E')) {
748 218
                    $state = 5;
749 218
                } elseif (($this->str[$this->last] < '0') || ($this->str[$this->last] > '9')) {
750
                    // Just digits and `.`, `e` and `E` are valid characters.
751
                    break;
752
                }
753
            } elseif ($state === 4) {
754
                $flags |= Token::FLAG_NUMBER_FLOAT;
755 View Code Duplication
                if (($this->str[$this->last] === 'e') || ($this->str[$this->last] === 'E')) {
756
                    $state = 5;
757
                } elseif (($this->str[$this->last] < '0') || ($this->str[$this->last] > '9')) {
758
                    // Just digits, `e` and `E` are valid characters.
759 210
                    break;
760
                }
761 210
            } elseif ($state === 5) {
762 210
                $flags |= Token::FLAG_NUMBER_APPROXIMATE;
763 210 View Code Duplication
                if (($this->str[$this->last] === '+') || ($this->str[$this->last] === '-')
764
                    || ((($this->str[$this->last] >= '0') && ($this->str[$this->last] <= '9')))
765 75
                ) {
766
                    $state = 6;
767 75
                } else {
768 75
                    break;
769 75
                }
770 75
            } elseif ($state === 6) {
771 75
                if (($this->str[$this->last] < '0') || ($this->str[$this->last] > '9')) {
772 6
                    // Just digits are valid characters.
773 6
                    break;
774 75
                }
775 74
            } elseif ($state === 7) {
776
                $flags |= Token::FLAG_NUMBER_BINARY;
777 73
                if ($this->str[$this->last] === '\'') {
778
                    $state = 8;
779 73
                } else {
780
                    break;
781 75
                }
782 4
            } elseif ($state === 8) {
783 4
                if ($this->str[$this->last] === '\'') {
784 4
                    $state = 9;
785
                } elseif (($this->str[$this->last] !== '0')
786 4
                    && ($this->str[$this->last] !== '1')
787 4
                ) {
788 4
                    break;
789 4
                }
790 4
            } elseif ($state === 9) {
791 74
                break;
792
            }
793 75
            $token .= $this->str[$this->last];
794
        }
795
        if (($state === 2) || ($state === 3)
796
            || (($token !== '.') && ($state === 4))
797
            || ($state === 6) || ($state === 9)
798
        ) {
799
            --$this->last;
800
            return new Token($token, Token::TYPE_NUMBER, $flags);
801 210
        }
802
        $this->last = $iBak;
803 210
        return null;
804 210
    }
805 210
806
    /**
807
     * Parses a string.
808 55
     *
809 15
     * @param string $quote Additional starting symbol.
810
     *
811 1
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
812 1
     */
813 1
    public function parseString($quote = '')
814 15
    {
815 45
        $token = $this->str[$this->last];
816
        if ((!($flags = Context::isString($token))) && ($token !== $quote)) {
817
            return null;
818 55
        }
819
        $quote = $token;
820 55
821 55
        while (++$this->last < $this->len) {
822 11
            if (($this->last + 1 < $this->len)
823 1
                && ((($this->str[$this->last] === $quote) && ($this->str[$this->last + 1] === $quote))
824 1
                || (($this->str[$this->last] === '\\') && ($quote !== '`')))
825 1
            ) {
826 1
                $token .= $this->str[$this->last] . $this->str[++$this->last];
827 1
            } else {
828 1
                if ($this->str[$this->last] === $quote) {
829 11
                    break;
830 55
                }
831
                $token .= $this->str[$this->last];
832 55
            }
833 55
        }
834 55
835
        if (($this->last >= $this->len) || ($this->str[$this->last] !== $quote)) {
836 55
            $this->error(
837
                sprintf(
838
                    __('Ending quote %1$s was expected.'),
839
                    $quote
840
                ),
841
                '',
842
                $this->last
843
            );
844 164
        } else {
845
            $token .= $this->str[$this->last];
846 164
        }
847 164
        return new Token($token, Token::TYPE_STRING, $flags);
848 3
    }
849
850 163
    /**
851 153
     * Parses a symbol.
852 153
     *
853 163
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
854 163
     */
855
    public function parseSymbol()
856
    {
857
        $token = $this->str[$this->last];
858
        if (!($flags = Context::isSymbol($token))) {
859
            return null;
860
        }
861
862 218
        if ($flags & Token::FLAG_SYMBOL_VARIABLE) {
863
            if ($this->str[++$this->last] === '@') {
864 218
                // This is a system variable (e.g. `@@hostname`).
865
                $token .= $this->str[$this->last++];
866 218
                $flags |= Token::FLAG_SYMBOL_SYSTEM;
867 218
            }
868 218
        } else {
869
            $token = '';
870 65
        }
871 65
872
        $str = null;
873 65
874 65
        if ($this->last < $this->len) {
875
            if (($str = $this->parseString('`')) === null) {
876
                if (($str = static::parseUnknown()) === null) {
877
                    $this->error(
878
                        __('Variable name was expected.'),
879
                        $this->str[$this->last],
880
                        $this->last
881
                    );
882
                }
883
            }
884
        }
885
886
        if ($str !== null) {
887
            $token .= $str->token;
888
        }
889
890
        return new Token($token, Token::TYPE_SYMBOL, $flags);
891
    }
892
893
    /**
894
     * Parses unknown parts of the query.
895
     *
896
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
897
     */
898 View Code Duplication
    public function parseUnknown()
1 ignored issue
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
899
    {
900
        $token = $this->str[$this->last];
901
        if (Context::isSeparator($token)) {
902
            return null;
903
        }
904
        while ((++$this->last < $this->len) && (!Context::isSeparator($this->str[$this->last]))) {
905
            $token .= $this->str[$this->last];
906
        }
907
        --$this->last;
908
        return new Token($token);
909
    }
910
911
    /**
912
     * Parses the delimiter of the query.
913
     *
914
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
915
     */
916
    public function parseDelimiter()
917
    {
918
        $idx = 0;
919
920
        while (($idx < $this->delimiterLen) && ($this->last + $idx < $this->len)) {
921
            if ($this->delimiter[$idx] !== $this->str[$this->last + $idx]) {
922
                return null;
923
            }
924
            ++$idx;
925
        }
926
927
        $this->last += $this->delimiterLen - 1;
928
        return new Token($this->delimiter, Token::TYPE_DELIMITER);
929
    }
930
}
931