Completed
Pull Request — master (#204)
by Ryan
11:34
created

Tokenizer::scanWord()   C

Complexity

Conditions 10
Paths 2

Size

Total Lines 21
Code Lines 11

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 14
CRAP Score 10

Importance

Changes 0
Metric Value
dl 0
loc 21
ccs 14
cts 14
cp 1
rs 6.6746
c 0
b 0
f 0
cc 10
eloc 11
nc 2
nop 0
crap 10

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
/**
3
 * Copyright (c) 2015–2018 Alexandr Viniychuk <http://youshido.com>.
4
 * Copyright (c) 2015–2018 Portey Vasil <https://github.com/portey>.
5
 * Copyright (c) 2018 Ryan Parman <https://github.com/skyzyx>.
6
 * Copyright (c) 2018 Ashley Hutson <https://github.com/asheliahut>.
7
 * Copyright (c) 2015–2018 Contributors.
8
 *
9
 * http://opensource.org/licenses/MIT
10
 */
11
12
declare(strict_types=1);
13
/**
14
 * Date: 23.11.15.
15
 */
16
17
namespace Youshido\GraphQL\Parser;
18
19
use Youshido\GraphQL\Exception\Parser\SyntaxErrorException;
20
21
class Tokenizer
22 107
{
23
    protected $source;
24 107
25 107
    protected $pos = 0;
26 107
27
    protected $line = 1;
28
29
    protected $lineStart = 0;
30
31 107
    /** @var Token */
32
    protected $lookAhead;
33 107
34
    protected function initTokenizer($source): void
35 107
    {
36
        $this->source    = $source;
37
        $this->lookAhead = $this->next();
38 107
    }
39
40 107
    /**
41 106
     * @return Token
42 106
     */
43 103
    protected function next()
44 106
    {
45 1
        $this->skipWhitespace();
46
47 1
        return $this->scan();
48 1
    }
49 1
50
    protected function skipWhitespace(): void
51 1
    {
52
        while ($this->pos < \mb_strlen($this->source)) {
53 106
            $ch = $this->source[$this->pos];
54 1
55 1
            if (' ' === $ch || "\t" === $ch || ',' === $ch) {
56 1
                $this->pos++;
57
58 1
                continue;
59 1
            }
60 106
61 40
            if ('#' === $ch) {
62 40
                $this->pos++;
63 40
64
                while ($this->pos < \mb_strlen($this->source) && ($code = \ord($this->source[$this->pos])) && 10 !== $code && 13 !== $code && 0x2028 !== $code && 0x2029 !== $code
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 178 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
65 106
                ) {
66
                    $this->pos++;
67
                }
68 107
69
                continue;
70
            }
71
72
            if ("\r" === $ch) {
73
                $this->pos++;
74
75 107
                if ("\n" === $this->source[$this->pos]) {
76
                    $this->pos++;
77 107
                }
78 97
                $this->line++;
79
                $this->lineStart = $this->pos;
80
81 106
                continue;
82
            }
83 106
84 61
            if ("\n" === $ch) {
85
                $this->pos++;
86 61
                $this->line++;
87 106
                $this->lineStart = $this->pos;
88 54
89
                continue;
90 54
            }
91 106
92 104
            break;
93
        }
94 104
    }
95 106
96 97
    /**
97
     * @throws SyntaxErrorException
98 97
     *
99 105
     * @return Token
100
     */
101
    protected function scan()
102
    {
103 105
        if ($this->pos >= \mb_strlen($this->source)) {
104 16
            return new Token(Token::TYPE_END, $this->getLine(), $this->getColumn());
105
        }
106 16
107 105
        $ch = $this->source[$this->pos];
108 15
        switch ($ch) {
109
            case Token::TYPE_LPAREN:
110 15
                ++$this->pos;
111 105
112 3
                return new Token(Token::TYPE_LPAREN, $this->getLine(), $this->getColumn());
113
            case Token::TYPE_RPAREN:
114 3
                ++$this->pos;
115 105
116
                return new Token(Token::TYPE_RPAREN, $this->getLine(), $this->getColumn());
117
            case Token::TYPE_LBRACE:
118
                ++$this->pos;
119 105
120 65
                return new Token(Token::TYPE_LBRACE, $this->getLine(), $this->getColumn());
121
            case Token::TYPE_RBRACE:
122 65
                ++$this->pos;
123
124 105
                return new Token(Token::TYPE_RBRACE, $this->getLine(), $this->getColumn());
125 1
            case Token::TYPE_COMMA:
126
                ++$this->pos;
127 1
128
                return new Token(Token::TYPE_COMMA, $this->getLine(), $this->getColumn());
129 105
            case Token::TYPE_LSQUARE_BRACE:
130 16
                ++$this->pos;
131 15
132
                return new Token(Token::TYPE_LSQUARE_BRACE, $this->getLine(), $this->getColumn());
133
            case Token::TYPE_RSQUARE_BRACE:
134 1
                ++$this->pos;
135
136
                return new Token(Token::TYPE_RSQUARE_BRACE, $this->getLine(), $this->getColumn());
137 105
            case Token::TYPE_REQUIRED:
138 13
                ++$this->pos;
139
140 13
                return new Token(Token::TYPE_REQUIRED, $this->getLine(), $this->getColumn());
141
            case Token::TYPE_AT:
142
                ++$this->pos;
143 105
144 104
                return new Token(Token::TYPE_AT, $this->getLine(), $this->getColumn());
145
            case Token::TYPE_COLON:
146
                ++$this->pos;
147 43
148 20
                return new Token(Token::TYPE_COLON, $this->getLine(), $this->getColumn());
149
150
            case Token::TYPE_EQUAL:
151 29
                ++$this->pos;
152 29
153
                return new Token(Token::TYPE_EQUAL, $this->getLine(), $this->getColumn());
154
155 1
            case Token::TYPE_POINT:
156
                if ($this->checkFragment()) {
157
                    return new Token(Token::TYPE_FRAGMENT_REFERENCE, $this->getLine(), $this->getColumn());
158 16
                }
159
160 16
                return new Token(Token::TYPE_POINT, $this->getLine(), $this->getColumn());
161 16
162
            case Token::TYPE_VARIABLE:
163 16
                ++$this->pos;
164 16
165
                return new Token(Token::TYPE_VARIABLE, $this->getLine(), $this->getColumn());
166 16
        }
167
168 16
        if ('_' === $ch || ($ch >= 'a' && $ch <= 'z') || ($ch >= 'A' && $ch <= 'Z')) {
169 15
            return $this->scanWord();
170
        }
171 15
172
        if ('-' === $ch || ($ch >= '0' && $ch <= '9')) {
173
            return $this->scanNumber();
174 1
        }
175
176
        if ('"' === $ch) {
177 104
            return $this->scanString();
178
        }
179 104
180 104
        throw $this->createException('Can\t recognize token type');
181
    }
182 104
183 104
    protected function checkFragment()
0 ignored issues
show
Coding Style introduced by
function checkFragment() does not seem to conform to the naming convention (^(?:is|has|should|may|supports)).

This check examines a number of code elements and verifies that they conform to the given naming conventions.

You can set conventions for local variables, abstract classes, utility classes, constant, properties, methods, parameters, interfaces, classes, exceptions and special methods.

Loading history...
184
    {
185 104
        $this->pos++;
186 104
        $ch = $this->source[$this->pos];
187
188 103
        $this->pos++;
189
        $nextCh = $this->source[$this->pos];
190
191
        if (Token::TYPE_POINT === $ch && Token::TYPE_POINT === $nextCh) {
192 104
            $this->pos++;
193
194 104
            return true;
195
        }
196
197 104
        return false;
198
    }
199
200 104
    protected function scanWord()
201 9
    {
202
        $start = $this->pos;
203 104
        $this->pos++;
204 5
205
        while ($this->pos < \mb_strlen($this->source)) {
206 104
            $ch = $this->source[$this->pos];
207 1
208
            if ('_' === $ch || '$' === $ch || ($ch >= 'a' && $ch <= 'z') || ($ch >= 'A' && $ch <= 'Z') || ($ch >= '0' && $ch <= '9')) {
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 135 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
209 104
                $this->pos++;
210 32
211
                continue;
212 103
            }
213 11
214
            break;
215 103
        }
216 17
217
        $value = \mb_substr($this->source, $start, $this->pos - $start);
218 103
219 13
        return new Token($this->getKeyword($value), $this->getLine(), $this->getColumn(), $value);
220
    }
221
222 103
    protected function getKeyword($name)
0 ignored issues
show
Documentation introduced by
The return type could not be reliably inferred; please add a @return annotation.

Our type inference engine in quite powerful, but sometimes the code does not provide enough clues to go by. In these cases we request you to add a @return annotation as described here.

Loading history...
223
    {
224
        switch ($name) {
225 103
            case 'null':
226
                return Token::TYPE_NULL;
227 103
228 103
            case 'true':
229
                return Token::TYPE_TRUE;
230
231 2
            case 'false':
232
                return Token::TYPE_FALSE;
233
234 104
            case 'query':
235
                return Token::TYPE_QUERY;
236 104
237
            case 'fragment':
238
                return Token::TYPE_FRAGMENT;
239 20
240
            case 'mutation':
241 20
                return Token::TYPE_MUTATION;
242 20
243 2
            case 'on':
244
                return Token::TYPE_ON;
245
        }
246 20
247
        return Token::TYPE_IDENTIFIER;
248 20
    }
249 1
250 1
    protected function expect($type)
251
    {
252
        if ($this->match($type)) {
253 20
            return $this->lex();
254
        }
255 20
256 20
        throw $this->createUnexpectedException($this->peek());
257
    }
258 1
259
    protected function match($type)
0 ignored issues
show
Coding Style introduced by
function match() does not seem to conform to the naming convention (^(?:is|has|should|may|supports)).

This check examines a number of code elements and verifies that they conform to the given naming conventions.

You can set conventions for local variables, abstract classes, utility classes, constant, properties, methods, parameters, interfaces, classes, exceptions and special methods.

Loading history...
260
    {
261 20
        return $this->peek()->getType() === $type;
262
    }
263
264 20
    protected function scanNumber()
265
    {
266 20
        $start = $this->pos;
267 20
268 20
        if ('-' === $this->source[$this->pos]) {
269 20
            $this->pos++;
270
        }
271 19
272
        $this->skipInteger();
273
274 20
        if (isset($this->source[$this->pos]) && '.' === $this->source[$this->pos]) {
275
            $this->pos++;
276 10
            $this->skipInteger();
277
        }
278 10
279
        $value = \mb_substr($this->source, $start, $this->pos - $start);
280
281 12
        if (false === \mb_strpos($value, '.')) {
282
            $value = (int) $value;
283 12
        } else {
284
            $value = (float) $value;
285
        }
286 107
287
        return new Token(Token::TYPE_NUMBER, $this->getLine(), $this->getColumn(), $value);
288 107
    }
289
290
    protected function skipInteger(): void
291 107
    {
292
        while ($this->pos < \mb_strlen($this->source)) {
293 107
            $ch = $this->source[$this->pos];
294
295
            if ($ch >= '0' && $ch <= '9') {
296
                $this->pos++;
297
298
                continue;
299 29
            }
300
301 29
            break;
302 29
        }
303
    }
304 29
305 29
    protected function createException($message)
306 29
    {
307 29
        return new SyntaxErrorException(\sprintf('%s', $message), $this->getLocation());
308 28
    }
309 28
310
    protected function getLocation()
311 28
    {
312
        return new Location($this->getLine(), $this->getColumn());
313
    }
314 29
315 1
    protected function getColumn()
316 1
    {
317
        return $this->pos - $this->lineStart;
318 1
    }
319 1
320 1
    protected function getLine()
321 1
    {
322 1
        return $this->line;
323 1
    }
324 1
325 1
    /*
326 1
        http://facebook.github.io/graphql/October2016/#sec-String-Value
327 1
     */
328 1
    protected function scanString()
329 1
    {
330 1
        $len = \mb_strlen($this->source);
331 1
        $this->pos++;
332 1
333 1
        $value = '';
334 1
335 1
        while ($this->pos < $len) {
336 1
            $ch = $this->source[$this->pos];
337
338
            if ('"' === $ch) {
339 1
                $token = new Token(Token::TYPE_STRING, $this->getLine(), $this->getColumn(), $value);
340 1
                $this->pos++;
341 1
342
                return $token;
343
            }
344
345
            if ('\\' === $ch && ($this->pos < ($len - 1))) {
346
                $this->pos++;
347
                $ch = $this->source[$this->pos];
348
                switch ($ch) {
349 29
                    case '"':
350 29
                    case '\\':
351
                    case '/':
352
                        break;
353 1
                    case 'b':
354
                        $ch = \sprintf('%c', 8);
355
356 106
                        break;
357
                    case 'f':
358 106
                        $ch = "\f";
359
360
                        break;
361 105
                    case 'n':
362
                        $ch = "\n";
363 105
364
                        break;
365
                    case 'r':
366 104
                        $ch = "\r";
367
368 104
                        break;
369 104
                    case 'u':
370
                        $codepoint = \mb_substr($this->source, $this->pos + 1, 4);
371 104
372
                        if (!\preg_match('/[0-9A-Fa-f]{4}/', $codepoint)) {
373
                            throw $this->createException(\sprintf('Invalid string unicode escape sequece "%s"', $codepoint));
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 125 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
374 5
                        }
375
                        $ch = \html_entity_decode("&#x{$codepoint};", ENT_QUOTES, 'UTF-8');
0 ignored issues
show
Coding Style introduced by
Equals sign not aligned with surrounding assignments; expected 9 spaces but found 1 space

This check looks for multiple assignments in successive lines of code. It will report an issue if the operators are not in a straight line.

To visualize

$a = "a";
$ab = "ab";
$abc = "abc";

will produce issues in the first and second line, while this second example

$a   = "a";
$ab  = "ab";
$abc = "abc";

will produce no issues.

Loading history...
Coding Style Best Practice introduced by
As per coding-style, please use concatenation or sprintf for the variable $codepoint instead of interpolation.

It is generally a best practice as it is often more readable to use concatenation instead of interpolation for variables inside strings.

// Instead of
$x = "foo $bar $baz";

// Better use either
$x = "foo " . $bar . " " . $baz;
$x = sprintf("foo %s %s", $bar, $baz);
Loading history...
376 5
                        $this->pos += 4;
377
378
                        break;
379 9
                    default:
380
                        throw $this->createException(\sprintf('Unexpected string escaped character "%s"', $ch));
381 9
382
                        break;
0 ignored issues
show
Unused Code introduced by
break; does not seem to be reachable.

This check looks for unreachable code. It uses sophisticated control flow analysis techniques to find statements which will never be executed.

Unreachable code is most often the result of return, die or exit statements that have been added for debug purposes.

function fx() {
    try {
        doSomething();
        return true;
    }
    catch (\Exception $e) {
        return false;
    }

    return false;
}

In the above example, the last return false will never be executed, because a return statement has already been met in every possible execution path.

Loading history...
383
384
                }
385
            }
386
387
            $value .= $ch;
388
            $this->pos++;
389
        }
390
391
        throw $this->createUnexpectedTokenTypeException(Token::TYPE_END);
392
    }
393
394
    protected function end()
0 ignored issues
show
Coding Style introduced by
function end() does not seem to conform to the naming convention (^(?:is|has|should|may|supports)).

This check examines a number of code elements and verifies that they conform to the given naming conventions.

You can set conventions for local variables, abstract classes, utility classes, constant, properties, methods, parameters, interfaces, classes, exceptions and special methods.

Loading history...
395
    {
396
        return Token::TYPE_END === $this->lookAhead->getType();
397
    }
398
399
    protected function peek()
400
    {
401
        return $this->lookAhead;
402
    }
403
404
    protected function lex()
405
    {
406
        $prev            = $this->lookAhead;
407
        $this->lookAhead = $this->next();
408
409
        return $prev;
410
    }
411
412
    protected function createUnexpectedException(Token $token)
413
    {
414
        return $this->createUnexpectedTokenTypeException($token->getType());
415
    }
416
417
    protected function createUnexpectedTokenTypeException($tokenType)
418
    {
419
        return $this->createException(\sprintf('Unexpected token "%s"', Token::tokenName($tokenType)));
420
    }
421
}
422