Completed
Push — master ( 3420cb...0c5d87 )
by Michal
129:46 queued 64:55
created

Lexer::parseNumber()   D

Complexity

Conditions 52
Paths 56

Size

Total Lines 139
Code Lines 76

Duplication

Lines 13
Ratio 9.35 %

Code Coverage

Tests 97
CRAP Score 52

Importance

Changes 0
Metric Value
cc 52
eloc 76
nc 56
nop 0
dl 13
loc 139
ccs 97
cts 97
cp 1
crap 52
rs 4.1818
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * Defines the lexer of the library.
5
 *
6
 * This is one of the most important components, along with the parser.
7
 *
8
 * Depends on context to extract lexemes.
9
 */
10
11
namespace SqlParser;
12
13
require_once 'common.php';
14 2
15
use SqlParser\Exceptions\LexerException;
16
17
if (!defined('USE_UTF_STRINGS')) {
18 2
    // NOTE: In previous versions of PHP (5.5 and older) the default
19
    // internal encoding is "ISO-8859-1".
20
    // All `mb_` functions must specify the correct encoding, which is
21
    // 'UTF-8' in order to work properly.
22
23
    /*
24
     * Forces usage of `UtfString` if the string is multibyte.
25
     * `UtfString` may be slower, but it gives better results.
26
     *
27
     * @var bool
28
     */
29
    define('USE_UTF_STRINGS', true);
30 2
}
31 2
32
/**
33
 * Performs lexical analysis over a SQL statement and splits it in multiple
34
 * tokens.
35
 *
36
 * The output of the lexer is affected by the context of the SQL statement.
37
 *
38
 * @category Lexer
39
 *
40
 * @license  https://www.gnu.org/licenses/gpl-2.0.txt GPL-2.0+
41
 *
42
 * @see      Context
43
 */
44
class Lexer
45
{
46
    /**
47
     * A list of methods that are used in lexing the SQL query.
48
     *
49
     * @var array
50
     */
51
    public static $PARSER_METHODS = array(
52
        // It is best to put the parsers in order of their complexity
53
        // (ascending) and their occurrence rate (descending).
54
        //
55
        // Conflicts:
56
        //
57
        // 1. `parseDelimiter`, `parseUnknown`, `parseKeyword`, `parseNumber`
58
        // They fight over delimiter. The delimiter may be a keyword, a
59
        // number or almost any character which makes the delimiter one of
60
        // the first tokens that must be parsed.
61
        //
62
        // 1. `parseNumber` and `parseOperator`
63
        // They fight over `+` and `-`.
64
        //
65
        // 2. `parseComment` and `parseOperator`
66
        // They fight over `/` (as in ```/*comment*/``` or ```a / b```)
67
        //
68
        // 3. `parseBool` and `parseKeyword`
69
        // They fight over `TRUE` and `FALSE`.
70
        //
71
        // 4. `parseKeyword` and `parseUnknown`
72
        // They fight over words. `parseUnknown` does not know about
73
        // keywords.
74
75
        'parseDelimiter', 'parseWhitespace', 'parseNumber',
76
        'parseComment', 'parseOperator', 'parseBool', 'parseString',
77
        'parseSymbol', 'parseKeyword', 'parseLabel', 'parseUnknown',
78
    );
79
80
    /**
81
     * Whether errors should throw exceptions or just be stored.
82
     *
83
     * @var bool
84
     *
85
     * @see static::$errors
86
     */
87
    public $strict = false;
88
89
    /**
90
     * The string to be parsed.
91
     *
92
     * @var string|UtfString
93
     */
94
    public $str = '';
95
96
    /**
97
     * The length of `$str`.
98
     *
99
     * By storing its length, a lot of time is saved, because parsing methods
100
     * would call `strlen` everytime.
101
     *
102
     * @var int
103
     */
104
    public $len = 0;
105
106
    /**
107
     * The index of the last parsed character.
108
     *
109
     * @var int
110
     */
111
    public $last = 0;
112
113
    /**
114
     * Tokens extracted from given strings.
115
     *
116
     * @var TokensList
117
     */
118
    public $list;
119
120
    /**
121
     * The default delimiter. This is used, by default, in all new instances.
122
     *
123
     * @var string
124
     */
125
    public static $DEFAULT_DELIMITER = ';';
126
127
    /**
128
     * Statements delimiter.
129
     * This may change during lexing.
130
     *
131
     * @var string
132
     */
133
    public $delimiter;
134
135
    /**
136
     * The length of the delimiter.
137
     *
138
     * Because `parseDelimiter` can be called a lot, it would perform a lot of
139
     * calls to `strlen`, which might affect performance when the delimiter is
140
     * big.
141
     *
142
     * @var int
143
     */
144
    public $delimiterLen;
145
146
    /**
147
     * List of errors that occurred during lexing.
148
     *
149
     * Usually, the lexing does not stop once an error occurred because that
150
     * error might be false positive or a partial result (even a bad one)
151
     * might be needed.
152
     *
153
     * @var LexerException[]
154
     *
155
     * @see Lexer::error()
156
     */
157
    public $errors = array();
158
159
    /**
160
     * Gets the tokens list parsed by a new instance of a lexer.
161
     *
162
     * @param string|UtfString $str       the query to be lexed
163
     * @param bool             $strict    whether strict mode should be
164
     *                                    enabled or not
165
     * @param string           $delimiter the delimiter to be used
0 ignored issues
show
Documentation introduced by
Should the type for parameter $delimiter not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
166
     *
167
     * @return TokensList
168
     */
169
    public static function getTokens($str, $strict = false, $delimiter = null)
170
    {
171
        $lexer = new self($str, $strict, $delimiter);
172 1
173
        return $lexer->list;
174 1
    }
175 1
176
    /**
177
     * Constructor.
178
     *
179
     * @param string|UtfString $str       the query to be lexed
180
     * @param bool             $strict    whether strict mode should be
181
     *                                    enabled or not
182
     * @param string           $delimiter the delimiter to be used
0 ignored issues
show
Documentation introduced by
Should the type for parameter $delimiter not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
183
     */
184
    public function __construct($str, $strict = false, $delimiter = null)
185
    {
186 223
        // `strlen` is used instead of `mb_strlen` because the lexer needs to
187
        // parse each byte of the input.
188
        $len = $str instanceof UtfString ? $str->length() : strlen($str);
189
190 223
        // For multi-byte strings, a new instance of `UtfString` is
191
        // initialized (only if `UtfString` usage is forced.
192
        if (!$str instanceof UtfString && USE_UTF_STRINGS && $len !== mb_strlen($str, 'UTF-8')) {
193
            $str = new UtfString($str);
194 223
        }
195 223
196 1
        $this->str = $str;
197 1
        $this->len = $str instanceof UtfString ? $str->length() : $len;
198 223
199
        $this->strict = $strict;
200 223
201 223
        // Setting the delimiter.
202
        $this->setDelimiter(
203 223
            !empty($delimiter) ? $delimiter : static::$DEFAULT_DELIMITER
204
        );
205
206 223
        $this->lex();
207 223
    }
208 223
209
    /**
210 223
     * Sets the delimiter.
211 223
     *
212
     * @param string $delimiter the new delimiter
213
     */
214
    public function setDelimiter($delimiter)
215
    {
216
        $this->delimiter = $delimiter;
217
        $this->delimiterLen = strlen($delimiter);
218 223
    }
219
220 223
    /**
221 223
     * Parses the string and extracts lexemes.
222 223
     */
223
    public function lex()
224
    {
225
        // TODO: Sometimes, static::parse* functions make unnecessary calls to
0 ignored issues
show
Coding Style Best Practice introduced by
Comments for TODO tasks are often forgotten in the code; it might be better to use a dedicated issue tracker.
Loading history...
226
        // is* functions. For a better performance, some rules can be deduced
227
        // from context.
228
        // For example, in `parseBool` there is no need to compare the token
229 223
        // every time with `true` and `false`. The first step would be to
230
        // compare with 'true' only and just after that add another letter from
231
        // context and compare again with `false`.
232
        // Another example is `parseComment`.
233
234
        $list = new TokensList();
235
236
        /**
237
         * Last processed token.
238
         *
239
         * @var Token
240 223
         */
241
        $lastToken = null;
242
243
        for ($this->last = 0, $lastIdx = 0; $this->last < $this->len; $lastIdx = ++$this->last) {
244
            /**
245
             * The new token.
246
             *
247 223
             * @var Token
248
             */
249 223
            $token = null;
250
251
            foreach (static::$PARSER_METHODS as $method) {
252
                if ($token = $this->$method()) {
253
                    break;
254
                }
255 218
            }
256
257 218
            if ($token === null) {
258 218
                // @assert($this->last === $lastIdx);
259 218
                $token = new Token($this->str[$this->last]);
260
                $this->error(
261 218
                    __('Unexpected character.'),
262
                    $this->str[$this->last],
263 218
                    $this->last
264
                );
265 2
            } elseif ($lastToken !== null
266 2
                && $token->type === Token::TYPE_SYMBOL
267 2
                && $token->flags & Token::FLAG_SYMBOL_VARIABLE
268 2
                && (
269 2
                    $lastToken->type === Token::TYPE_STRING
270 2
                    || (
271 218
                        $lastToken->type === Token::TYPE_SYMBOL
272 218
                        && $lastToken->flags & Token::FLAG_SYMBOL_BACKTICK
273 218
                    )
274 218
                )
275 13
            ) {
276 13
                // Handles ```... FROM 'user'@'%' ...```.
277 218
                $lastToken->token .= $token->token;
278
                $lastToken->type = Token::TYPE_SYMBOL;
279 4
                $lastToken->flags = Token::FLAG_SYMBOL_USER;
280 4
                $lastToken->value .= '@' . $token->value;
281 4
                continue;
282 4
            } elseif ($lastToken !== null
283 4
                && $token->type === Token::TYPE_KEYWORD
284 218
                && $lastToken->type === Token::TYPE_OPERATOR
285 218
                && $lastToken->value === '.'
286 218
            ) {
287 218
                // Handles ```... tbl.FROM ...```. In this case, FROM is not
288 218
                // a reserved word.
289
                $token->type = Token::TYPE_NONE;
290
                $token->flags = 0;
291 2
                $token->value = $token->token;
292 2
            }
293 2
294 2
            $token->position = $lastIdx;
295
296 218
            $list->tokens[$list->count++] = $token;
297
298 218
            // Handling delimiters.
299
            if ($token->type === Token::TYPE_NONE && $token->value === 'DELIMITER') {
300 View Code Duplication
                if ($this->last + 1 >= $this->len) {
301 218
                    $this->error(
302 6
                        __('Expected whitespace(s) before delimiter.'),
303 1
                        '',
304 1
                        $this->last + 1
305 1
                    );
306 1
                    continue;
307 1
                }
308 1
309
                // Skipping last R (from `delimiteR`) and whitespaces between
310
                // the keyword `DELIMITER` and the actual delimiter.
311
                $pos = ++$this->last;
312
                if (($token = $this->parseWhitespace()) !== null) {
313 5
                    $token->position = $pos;
314 5
                    $list->tokens[$list->count++] = $token;
315 4
                }
316 4
317 4
                // Preparing the token that holds the new delimiter.
318 View Code Duplication
                if ($this->last + 1 >= $this->len) {
319
                    $this->error(
320 5
                        __('Expected delimiter.'),
321 1
                        '',
322 1
                        $this->last + 1
323 1
                    );
324 1
                    continue;
325 1
                }
326 1
                $pos = $this->last + 1;
327
328 4
                // Parsing the delimiter.
329
                $this->delimiter = null;
330
                while (++$this->last < $this->len && !Context::isWhitespace($this->str[$this->last])) {
331 4
                    $this->delimiter .= $this->str[$this->last];
332 4
                }
333 3
334 3
                if (empty($this->delimiter)) {
335
                    $this->error(
336 4
                        __('Expected delimiter.'),
337 1
                        '',
338 1
                        $this->last
339 1
                    );
340 1
                    $this->delimiter = ';';
341 1
                }
342 1
343 1
                --$this->last;
344
345 4
                // Saving the delimiter and its token.
346
                $this->delimiterLen = strlen($this->delimiter);
347
                $token = new Token($this->delimiter, Token::TYPE_DELIMITER);
348 4
                $token->position = $pos;
349 4
                $list->tokens[$list->count++] = $token;
350 4
            }
351 4
352 4
            $lastToken = $token;
353
        }
354 216
355 216
        // Adding a final delimiter to mark the ending.
356
        $list->tokens[$list->count++] = new Token(null, Token::TYPE_DELIMITER);
357
358 223
        // Saving the tokens list.
359
        $this->list = $list;
360
    }
361 223
362 223
    /**
363
     * Creates a new error log.
364
     *
365
     * @param string $msg  the error message
366
     * @param string $str  the character that produced the error
367
     * @param int    $pos  the position of the character
368
     * @param int    $code the code of the error
369
     *
370
     * @throws LexerException throws the exception, if strict mode is enabled
371
     */
372
    public function error($msg = '', $str = '', $pos = 0, $code = 0)
373
    {
374
        $error = new LexerException($msg, $str, $pos, $code);
375
        if ($this->strict) {
376 12
            throw $error;
377
        }
378 12
        $this->errors[] = $error;
379 12
    }
380 1
381
    /**
382 11
     * Parses a keyword.
383 11
     *
384
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
385
     */
386
    public function parseKeyword()
387
    {
388
        $token = '';
389
390 210
        /**
391
         * Value to be returned.
392 210
         *
393
         * @var Token
394
         */
395
        $ret = null;
396
397
        /**
398
         * The value of `$this->last` where `$token` ends in `$this->str`.
399 210
         *
400
         * @var int
401
         */
402
        $iEnd = $this->last;
403
404
        /**
405
         * Whether last parsed character is a whitespace.
406 210
         *
407
         * @var bool
408
         */
409
        $lastSpace = false;
410
411
        for ($j = 1; $j < Context::KEYWORD_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
412
            // Composed keywords shouldn't have more than one whitespace between
413 210
            // keywords.
414
            if (Context::isWhitespace($this->str[$this->last])) {
415 210
                if ($lastSpace) {
416
                    --$j; // The size of the keyword didn't increase.
417
                    continue;
418 210
                } else {
419 197
                    $lastSpace = true;
420 36
                }
421 36
            } else {
422
                $lastSpace = false;
423 197
            }
424
425 197
            $token .= $this->str[$this->last];
426 210
            if (($this->last + 1 === $this->len || Context::isSeparator($this->str[$this->last + 1]))
427
                && $flags = Context::isKeyword($token)
428 210
            ) {
429 210
                $ret = new Token($token, Token::TYPE_KEYWORD, $flags);
430 210
                $iEnd = $this->last;
431 196
432 196
                // We don't break so we find longest keyword.
433
                // For example, `OR` and `ORDER` have a common prefix `OR`.
434
                // If we stopped at `OR`, the parsing would be invalid.
435
            }
436
        }
437 196
438 210
        $this->last = $iEnd;
439 210
440
        return $ret;
441 210
    }
442 210
443
    /**
444
     * Parses a label.
445
     *
446
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
447
     */
448
    public function parseLabel()
449
    {
450 218
        $token = '';
451
452 218
        /**
453
         * Value to be returned.
454
         *
455
         * @var Token
456
         */
457
        $ret = null;
458
459 218
        /**
460
         * The value of `$this->last` where `$token` ends in `$this->str`.
461
         *
462
         * @var int
463
         */
464
        $iEnd = $this->last;
465
466 218
        /**
467
         * Whether last parsed character is a whitespace.
468 218
         *
469 218
         * @var bool
470 218
         */
471 165
        $lastSpace = false;
472 165
473 165
        for ($j = 1; $j < Context::LABEL_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
474 218
            // Composed keywords shouldn't have more than one whitespace between
475
            // keywords.
476 218
            if (Context::isWhitespace($this->str[$this->last])) {
477 218
                if ($lastSpace) {
478
                    --$j; // The size of the keyword didn't increase.
479
                    continue;
480
                } else {
481
                    $lastSpace = true;
482
                }
483
            } elseif ($this->str[$this->last] === ':') {
484
                $token .= $this->str[$this->last];
485 218
                $ret = new Token($token, Token::TYPE_LABEL);
486
                $iEnd = $this->last;
487 218
                break;
488
            } else {
489 218
                $lastSpace = false;
490 218
            }
491
            $token .= $this->str[$this->last];
492
        }
493 206
494 36
        $this->last = $iEnd;
495 36
496
        return $ret;
497 206
    }
498 206
499
    /**
500
     * Parses an operator.
501
     *
502
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
503
     */
504
    public function parseOperator()
505
    {
506 218
        $token = '';
507
508 218
        /**
509 218
         * Value to be returned.
510
         *
511
         * @var Token
512 218
         */
513 2
        $ret = null;
514 2
515 2
        /**
516 2
         * The value of `$this->last` where `$token` ends in `$this->str`.
517 2
         *
518
         * @var int
519
         */
520
        $iEnd = $this->last;
521 218
522 217
        for ($j = 1; $j < Context::OPERATOR_MAX_LENGTH && $this->last < $this->len; ++$j, ++$this->last) {
523 217
            $token .= $this->str[$this->last];
524 9
            if ($flags = Context::isOperator($token)) {
525
                $ret = new Token($token, Token::TYPE_OPERATOR, $flags);
526
                $iEnd = $this->last;
527
            }
528 9
        }
529 2
530
        $this->last = $iEnd;
531
532
        return $ret;
533 9
    }
534 2
535 2
    /**
536
     * Parses a whitespace.
537 2
     *
538 2
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
539 2
     */
540 2 View Code Duplication
    public function parseWhitespace()
1 ignored issue
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
541 1
    {
542 1
        $token = $this->str[$this->last];
543 2
544
        if (!Context::isWhitespace($token)) {
545
            return null;
546
        }
547 2
548
        while (++$this->last < $this->len && Context::isWhitespace($this->str[$this->last])) {
549
            $token .= $this->str[$this->last];
550
        }
551 9
552 9
        --$this->last;
553 9
554 9
        return new Token($token, Token::TYPE_WHITESPACE);
555 9
    }
556
557
    /**
558 9
     * Parses a comment.
559 9
     *
560 9
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
561 9
     */
562
    public function parseComment()
563 217
    {
564
        $iBak = $this->last;
565
        $token = $this->str[$this->last];
566 218
567 216
        // Bash style comments. (#comment\n)
568 216
        if (Context::isComment($token)) {
569 View Code Duplication
            while (
570 3
                ++$this->last < $this->len
571 3
                && $this->str[$this->last] !== "\n"
572 3
            ) {
573 3
                $token .= $this->str[$this->last];
574 3
            }
575 3
            $token .= "\n"; // Adding the line ending.
576 3
            return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_BASH);
577
        }
578 216
579
        // C style comments. (/*comment*\/)
580 218
        if (++$this->last < $this->len) {
581 218
            $token .= $this->str[$this->last];
582
            if (Context::isComment($token)) {
583
                $flags = Token::FLAG_COMMENT_C;
584
585
                // This comment already ended. It may be a part of a
586
                // previous MySQL specific command.
587
                if ($token === '*/') {
588
                    return new Token($token, Token::TYPE_COMMENT, $flags);
589 210
                }
590
591 210
                // Checking if this is a MySQL-specific command.
592
                if ($this->last + 1 < $this->len
593
                    && $this->str[$this->last + 1] === '!'
594 55
                ) {
595
                    $flags |= Token::FLAG_COMMENT_MYSQL_CMD;
596
                    $token .= $this->str[++$this->last];
597 210
598 210
                    while (
599 210
                        ++$this->last < $this->len
600
                        && '0' <= $this->str[$this->last]
601 210
                        && $this->str[$this->last] <= '9'
602 1
                    ) {
603 210
                        $token .= $this->str[$this->last];
604 209
                    }
605 209
                    --$this->last;
606 1
607
                    // We split this comment and parse only its beginning
608 209
                    // here.
609
                    return new Token($token, Token::TYPE_COMMENT, $flags);
610 210
                }
611 210
612
                // Parsing the comment.
613
                while (
614
                    ++$this->last < $this->len
615
                    && (
616
                        $this->str[$this->last - 1] !== '*'
617
                        || $this->str[$this->last] !== '/'
618
                    )
619 218
                ) {
620
                    $token .= $this->str[$this->last];
621
                }
622
623
                // Adding the ending.
624
                if ($this->last < $this->len) {
625
                    $token .= $this->str[$this->last];
626
                }
627
628
                return new Token($token, Token::TYPE_COMMENT, $flags);
629
            }
630
        }
631
632
        // SQL style comments. (-- comment\n)
633
        if (++$this->last < $this->len) {
634
            $token .= $this->str[$this->last];
635
            if (Context::isComment($token)) {
636
                // Checking if this comment did not end already (```--\n```).
637
                if ($this->str[$this->last] !== "\n") {
638 View Code Duplication
                    while (
639
                        ++$this->last < $this->len
640
                        && $this->str[$this->last] !== "\n"
641
                    ) {
642
                        $token .= $this->str[$this->last];
643
                    }
644
                    $token .= "\n"; // Adding the line ending.
645
                }
646
647
                return new Token($token, Token::TYPE_COMMENT, Token::FLAG_COMMENT_SQL);
648
            }
649
        }
650
651
        $this->last = $iBak;
652
653
        return null;
654
    }
655
656
    /**
657 218
     * Parses a boolean.
658 218
     *
659 218
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
660 218
     */
661 218
    public function parseBool()
662 218
    {
663 218
        if ($this->last + 3 >= $this->len) {
664 3
            // At least `min(strlen('TRUE'), strlen('FALSE'))` characters are
665 218
            // required.
666 218
            return null;
667 218
        }
668 12
669 218
        $iBak = $this->last;
670 1
        $token = $this->str[$this->last] . $this->str[++$this->last]
671 1
        . $this->str[++$this->last] . $this->str[++$this->last]; // _TRUE_ or _FALS_e
672 218
673 93
        if (Context::isBool($token)) {
674 218
            return new Token($token, Token::TYPE_BOOL);
675 21
        } elseif (++$this->last < $this->len) {
676 218
            $token .= $this->str[$this->last]; // fals_E_
677 22
            if (Context::isBool($token)) {
678 218
                return new Token($token, Token::TYPE_BOOL, 1);
679
            }
680 218
        }
681
682 120
        $this->last = $iBak;
683 1
684 1
        return null;
685 1
    }
686 1
687 1
    /**
688 1
     * Parses a number.
689
     *
690 108
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
691 82
     */
692 2
    public function parseNumber()
693 82
    {
694 1
        // A rudimentary state machine is being used to parse numbers due to
695 82
        // the various forms of their notation.
696
        //
697 80
        // Below are the states of the machines and the conditions to change
698
        // the state.
699 67
        //
700 22
        //      1 --------------------[ + or - ]-------------------> 1
701 22
        //      1 -------------------[ 0x or 0X ]------------------> 2
702 2
        //      1 --------------------[ 0 to 9 ]-------------------> 3
703 22
        //      1 -----------------------[ . ]---------------------> 4
704
        //      1 -----------------------[ b ]---------------------> 7
705 22
        //
706
        //      2 --------------------[ 0 to F ]-------------------> 2
707 23
        //
708 2
        //      3 --------------------[ 0 to 9 ]-------------------> 3
709 2
        //      3 -----------------------[ . ]---------------------> 4
710 2
        //      3 --------------------[ e or E ]-------------------> 5
711 2
        //
712 1
        //      4 --------------------[ 0 to 9 ]-------------------> 4
713 1
        //      4 --------------------[ e or E ]-------------------> 5
714 2
        //
715
        //      5 ---------------[ + or - or 0 to 9 ]--------------> 6
716 21
        //
717 1
        //      7 -----------------------[ ' ]---------------------> 8
718
        //
719 1
        //      8 --------------------[ 0 or 1 ]-------------------> 8
720
        //      8 -----------------------[ ' ]---------------------> 9
721 21
        //
722 21
        // State 1 may be reached by negative numbers.
723 21
        // State 2 is reached only by hex numbers.
724 1
        // State 4 is reached only by float numbers.
725 1
        // State 5 is reached only by numbers in approximate form.
726 20
        // State 7 is reached only by numbers in bit representation.
727
        //
728 1
        // Valid final states are: 2, 3, 4 and 6. Any parsing that finished in a
729 1
        // state other than these is invalid.
730 1
        $iBak = $this->last;
731 1
        $token = '';
732 1
        $flags = 0;
733 1
        $state = 1;
734 1
        for (; $this->last < $this->len; ++$this->last) {
735
            if ($state === 1) {
736 1
                if ($this->str[$this->last] === '-') {
737 1
                    $flags |= Token::FLAG_NUMBER_NEGATIVE;
738
                } elseif ($this->last + 1 < $this->len
739 120
                    && $this->str[$this->last] === '0'
740 120
                    && (
741 218
                        $this->str[$this->last + 1] === 'x'
742 218
                        || $this->str[$this->last + 1] === 'X'
743 218
                    )
744 218
                ) {
745 93
                    $token .= $this->str[$this->last++];
746 93
                    $state = 2;
747
                } elseif ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9') {
748 218
                    $state = 3;
749 218
                } elseif ($this->str[$this->last] === '.') {
750
                    $state = 4;
751
                } elseif ($this->str[$this->last] === 'b') {
752
                    $state = 7;
753
                } elseif ($this->str[$this->last] !== '+') {
754
                    // `+` is a valid character in a number.
755
                    break;
756
                }
757
            } elseif ($state === 2) {
758
                $flags |= Token::FLAG_NUMBER_HEX;
759 210
                if (
760
                    !(
761 210
                        ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
762 210
                        || ($this->str[$this->last] >= 'A' && $this->str[$this->last] <= 'F')
763 210
                        || ($this->str[$this->last] >= 'a' && $this->str[$this->last] <= 'f')
764
                    )
765 75
                ) {
766
                    break;
767 75
                }
768 75
            } elseif ($state === 3) {
769 75
                if ($this->str[$this->last] === '.') {
770 75
                    $state = 4;
771 75
                } elseif ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
772 6
                    $state = 5;
773 6
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
774 75
                    // Just digits and `.`, `e` and `E` are valid characters.
775 74
                    break;
776
                }
777 73
            } elseif ($state === 4) {
778
                $flags |= Token::FLAG_NUMBER_FLOAT;
779 73 View Code Duplication
                if ($this->str[$this->last] === 'e' || $this->str[$this->last] === 'E') {
780
                    $state = 5;
781 75
                } elseif ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
782 4
                    // Just digits, `e` and `E` are valid characters.
783 4
                    break;
784 4
                }
785
            } elseif ($state === 5) {
786 4
                $flags |= Token::FLAG_NUMBER_APPROXIMATE;
787 4 View Code Duplication
                if ($this->str[$this->last] === '+' || $this->str[$this->last] === '-'
788 4
                    || ($this->str[$this->last] >= '0' && $this->str[$this->last] <= '9')
789 4
                ) {
790 4
                    $state = 6;
791 74
                } else {
792
                    break;
793 75
                }
794
            } elseif ($state === 6) {
795
                if ($this->str[$this->last] < '0' || $this->str[$this->last] > '9') {
796
                    // Just digits are valid characters.
797
                    break;
798
                }
799
            } elseif ($state === 7) {
800
                $flags |= Token::FLAG_NUMBER_BINARY;
801 210
                if ($this->str[$this->last] === '\'') {
802
                    $state = 8;
803 210
                } else {
804 210
                    break;
805 210
                }
806
            } elseif ($state === 8) {
807
                if ($this->str[$this->last] === '\'') {
808 55
                    $state = 9;
809 15
                } elseif ($this->str[$this->last] !== '0'
810
                    && $this->str[$this->last] !== '1'
811 1
                ) {
812 1
                    break;
813 1
                }
814 15
            } elseif ($state === 9) {
815 45
                break;
816
            }
817
            $token .= $this->str[$this->last];
818 55
        }
819
        if ($state === 2 || $state === 3
820 55
            || ($token !== '.' && $state === 4)
821 55
            || $state === 6 || $state === 9
822 11
        ) {
823 1
            --$this->last;
824 1
825 1
            return new Token($token, Token::TYPE_NUMBER, $flags);
826 1
        }
827 1
        $this->last = $iBak;
828 1
829 11
        return null;
830 55
    }
831
832 55
    /**
833 55
     * Parses a string.
834 55
     *
835
     * @param string $quote additional starting symbol
836 55
     *
837
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
838
     */
839
    public function parseString($quote = '')
840
    {
841
        $token = $this->str[$this->last];
842
        if (!($flags = Context::isString($token)) && $token !== $quote) {
843
            return null;
844 164
        }
845
        $quote = $token;
846 164
847 164
        while (++$this->last < $this->len) {
848 3
            if ($this->last + 1 < $this->len
849
                && (
850 163
                    ($this->str[$this->last] === $quote && $this->str[$this->last + 1] === $quote)
851 153
                    || ($this->str[$this->last] === '\\' && $quote !== '`')
852 153
                )
853 163
            ) {
854 163
                $token .= $this->str[$this->last] . $this->str[++$this->last];
855
            } else {
856
                if ($this->str[$this->last] === $quote) {
857
                    break;
858
                }
859
                $token .= $this->str[$this->last];
860
            }
861
        }
862 218
863
        if ($this->last >= $this->len || $this->str[$this->last] !== $quote) {
864 218
            $this->error(
865
                sprintf(
866 218
                    __('Ending quote %1$s was expected.'),
867 218
                    $quote
868 218
                ),
869
                '',
870 65
                $this->last
871 65
            );
872
        } else {
873 65
            $token .= $this->str[$this->last];
874 65
        }
875
876
        return new Token($token, Token::TYPE_STRING, $flags);
877
    }
878
879
    /**
880
     * Parses a symbol.
881
     *
882
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
883
     */
884
    public function parseSymbol()
885
    {
886
        $token = $this->str[$this->last];
887
        if (!($flags = Context::isSymbol($token))) {
888
            return null;
889
        }
890
891
        if ($flags & Token::FLAG_SYMBOL_VARIABLE) {
892
            if ($this->last + 1 < $this->len && $this->str[++$this->last] === '@') {
893
                // This is a system variable (e.g. `@@hostname`).
894
                $token .= $this->str[$this->last++];
895
                $flags |= Token::FLAG_SYMBOL_SYSTEM;
896
            }
897
        } else {
898
            $token = '';
899
        }
900
901
        $str = null;
902
903
        if ($this->last < $this->len) {
904
            if (($str = $this->parseString('`')) === null) {
905
                if (($str = static::parseUnknown()) === null) {
906
                    $this->error(
907
                        __('Variable name was expected.'),
908
                        $this->str[$this->last],
909
                        $this->last
910
                    );
911
                }
912
            }
913
        }
914
915
        if ($str !== null) {
916
            $token .= $str->token;
917
        }
918
919
        return new Token($token, Token::TYPE_SYMBOL, $flags);
920
    }
921
922
    /**
923
     * Parses unknown parts of the query.
924
     *
925
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
926
     */
927 View Code Duplication
    public function parseUnknown()
1 ignored issue
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
928
    {
929
        $token = $this->str[$this->last];
930
        if (Context::isSeparator($token)) {
931
            return null;
932
        }
933
934
        while (++$this->last < $this->len && !Context::isSeparator($this->str[$this->last])) {
935
            $token .= $this->str[$this->last];
936
        }
937
        --$this->last;
938
939
        return new Token($token);
940
    }
941
942
    /**
943
     * Parses the delimiter of the query.
944
     *
945
     * @return Token
0 ignored issues
show
Documentation introduced by
Should the return type not be Token|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
946
     */
947
    public function parseDelimiter()
948
    {
949
        $idx = 0;
950
951
        while ($idx < $this->delimiterLen && $this->last + $idx < $this->len) {
952
            if ($this->delimiter[$idx] !== $this->str[$this->last + $idx]) {
953
                return null;
954
            }
955
            ++$idx;
956
        }
957
958
        $this->last += $this->delimiterLen - 1;
959
960
        return new Token($this->delimiter, Token::TYPE_DELIMITER);
961
    }
962
}
963