Completed
Push — master ( bd17ed...fa6e51 )
by Colin
02:18
created

Json5Decoder   D

Complexity

Total Complexity 114

Size/Duplication

Total Lines 619
Duplicated Lines 0 %

Coupling/Cohesion

Components 1
Dependencies 1

Test Coverage

Coverage 90.91%

Importance

Changes 7
Bugs 0 Features 0
Metric Value
c 7
b 0
f 0
dl 0
loc 619
ccs 300
cts 330
cp 0.9091
rs 4.7957
wmc 114
lcom 1
cbo 1

21 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 11 1
A decode() 0 15 3
A charAt() 0 8 3
B next() 0 26 6
A peek() 0 4 1
A getLineRemainder() 0 11 2
A match() 0 22 2
F number() 0 79 21
A identifier() 0 18 2
C string() 0 45 11
A inlineComment() 0 11 4
A blockComment() 0 16 4
A comment() 0 17 4
A white() 0 12 4
B word() 0 41 6
C arr() 0 39 7
C obj() 0 48 10
B value() 0 19 9
A throwSyntaxError() 0 4 1
A renderChar() 0 4 2
B getEscapee() 0 18 11

How to fix   Complexity   

Complex Class

Complex classes like Json5Decoder often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Json5Decoder, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
/*
4
 * This file is part of the colinodell/json5 package.
5
 *
6
 * (c) Colin O'Dell <[email protected]>
7
 *
8
 * Based on the official JSON5 implementation for JavaScript (https://github.com/json5/json5)
9
 *  - (c) 2012-2016 Aseem Kishore and others (https://github.com/json5/json5/contributors)
10
 *
11
 * For the full copyright and license information, please view the LICENSE
12
 * file that was distributed with this source code.
13
 */
14
15
namespace ColinODell\Json5;
16
17
final class Json5Decoder
18
{
19
    const REGEX_WHITESPACE = '/[ \t\r\n\v\f\xA0\x{FEFF}]/u';
20
21
    private $json;
22
23
    private $at = 0;
24
25
    private $lineNumber = 1;
26
27
    private $columnNumber = 1;
28
29
    private $ch;
30
31
    private $associative = false;
32
33
    private $maxDepth = 512;
34
35
    private $castBigIntToString = false;
36
37
    private $depth = 1;
38
39
    private $length;
40
41
    private $lineCache;
42
43
    /**
44
     * Private constructor.
45
     *
46
     * @param string $json
47
     * @param bool   $associative
48
     * @param int    $depth
49
     * @param bool   $castBigIntToString
50
     */
51 387
    private function __construct($json, $associative = false, $depth = 512, $castBigIntToString = false)
52
    {
53 387
        $this->json = $json;
54 387
        $this->associative = $associative;
55 387
        $this->maxDepth = $depth;
56 387
        $this->castBigIntToString = $castBigIntToString;
57
58 387
        $this->length = mb_strlen($json, 'utf-8');
59
60 387
        $this->ch = $this->charAt(0);
61 387
    }
62
63
    /**
64
     * Takes a JSON encoded string and converts it into a PHP variable.
65
     *
66
     * The parameters exactly match PHP's json_decode() function - see
67
     * http://php.net/manual/en/function.json-decode.php for more information.
68
     *
69
     * @param string $source      The JSON string being decoded.
70
     * @param bool   $associative When TRUE, returned objects will be converted into associative arrays.
71
     * @param int    $depth       User specified recursion depth.
72
     * @param int    $options     Bitmask of JSON decode options.
73
     *
74
     * @return mixed
75
     */
76 387
    public static function decode($source, $associative = false, $depth = 512, $options = 0)
77
    {
78 387
        $associative = $associative || ($options & JSON_OBJECT_AS_ARRAY);
79 387
        $castBigIntToString = $options & JSON_BIGINT_AS_STRING;
80
81 387
        $decoder = new self((string)$source, $associative, $depth, $castBigIntToString);
0 ignored issues
show
Documentation introduced by
$castBigIntToString is of type integer, but the function expects a boolean.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
82
83 387
        $result = $decoder->value();
84 300
        $decoder->white();
85 297
        if ($decoder->ch) {
86 18
            $decoder->throwSyntaxError('Syntax error');
87
        }
88
89 279
        return $result;
90
    }
91
92
    /**
93
     * @param int $at
94
     *
95
     * @return string|null
96
     */
97 387
    private function charAt($at)
98
    {
99 387
        if ($at < 0 || $at >= $this->length) {
100 291
            return null;
101
        }
102
103 384
        return mb_substr($this->json, $at, 1, 'utf-8');
104
    }
105
106
    /**
107
     * Parse the next character.
108
     *
109
     * If $c is given, the next char will only be parsed if the current
110
     * one matches $c.
111
     *
112
     * @param string|null $c
113
     *
114
     * @return null|string
115
     */
116 357
    private function next($c = null)
117
    {
118
        // If a c parameter is provided, verify that it matches the current character.
119 357
        if ($c !== null && $c !== $this->ch) {
120 15
            $this->throwSyntaxError(sprintf(
121 15
                'Expected %s instead of %s',
122 15
                self::renderChar($c),
123 15
                self::renderChar($this->ch)
124 10
            ));
125
        }
126
127
        // Get the next character. When there are no more characters,
128
        // return the empty string.
129 357
        if ($this->ch === "\n" || ($this->ch === "\r" && $this->peek() !== "\n")) {
130 279
            $this->at++;
131 279
            $this->lineNumber++;
132 279
            $this->columnNumber = 1;
133 186
        } else {
134 318
            $this->at++;
135 318
            $this->columnNumber++;
136
        }
137
138 357
        $this->ch = $this->charAt($this->at);
139
140 357
        return $this->ch;
141
    }
142
143
    /**
144
     * Get the next character without consuming it or
145
     * assigning it to the ch variable.
146
     *
147
     * @return mixed
148
     */
149 12
    private function peek()
150
    {
151 12
        return $this->charAt($this->at + 1);
152
    }
153
154
    /**
155
     * @return string
156
     */
157 231
    private function getLineRemainder()
158
    {
159
        // Line are separated by "\n" or "\r" without an "\n" next
160 231
        if ($this->lineCache === null) {
161 231
            $this->lineCache = preg_split('/\n|\r\n?/u', $this->json);
162 154
        }
163
164 231
        $line = $this->lineCache[$this->lineNumber - 1];
165
166 231
        return mb_substr($line, $this->columnNumber - 1);
167
    }
168
169
    /**
170
     * Attempt to match a regular expression at the current position on the current line.
171
     *
172
     * This function will not match across multiple lines.
173
     *
174
     * @param string $regex
175
     *
176
     * @return string|null
177
     */
178 231
    private function match($regex)
179
    {
180 231
        $subject = $this->getLineRemainder();
181
182 231
        $matches = [];
183 231
        if (!preg_match($regex, $subject, $matches, PREG_OFFSET_CAPTURE)) {
184 132
            return null;
185
        }
186
187
        // PREG_OFFSET_CAPTURE always returns the byte offset, not the char offset, which is annoying
188 207
        $offset = mb_strlen(mb_strcut($subject, 0, $matches[0][1], 'utf-8'), 'utf-8');
189
190
        // [0][0] contains the matched text
0 ignored issues
show
Unused Code Comprehensibility introduced by
40% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
191
        // [0][1] contains the index of that match
192 207
        $advanceBy = $offset + mb_strlen($matches[0][0], 'utf-8');
193
194 207
        $this->at += $advanceBy;
195 207
        $this->columnNumber += $advanceBy;
196 207
        $this->ch = $this->charAt($this->at);
197
198 207
        return $matches[0][0];
199
    }
200
201
    /**
202
     * Parse an identifier.
203
     *
204
     * Normally, reserved words are disallowed here, but we
205
     * only use this for unquoted object keys, where reserved words are allowed,
206
     * so we don't check for those here. References:
207
     * - http://es5.github.com/#x7.6
208
     * - https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Core_Language_Features#Variables
209
     * - http://docstore.mik.ua/orelly/webprog/jscript/ch02_07.htm
210
     */
211 42
    private function identifier()
212
    {
213
        // @codingStandardsIgnoreStart
214
        // Be careful when editing this regex, there are a couple Unicode characters in between here -------------vv
215 42
        $match = $this->match('/^(?:[\$_\p{L}\p{Nl}]|\\\\u[0-9A-Fa-f]{4})(?:[\$_\p{L}\p{Nl}\p{Mn}\p{Mc}\p{Nd}\p{Pc}‌‍]|\\\\u[0-9A-Fa-f]{4})*/u');
216
        // @codingStandardsIgnoreEnd
217
218 42
        if ($match === null) {
219 9
            $this->throwSyntaxError('Bad identifier as unquoted key');
220
        }
221
222
        // Un-escape escaped Unicode chars
223 33
        $unescaped = preg_replace_callback('/(?:\\\\u[0-9A-Fa-f]{4})+/', function ($m) {
224 6
            return json_decode('"'.$m[0].'"');
225 33
        }, $match);
226
227 33
        return $unescaped;
228
    }
229
230 210
    private function number()
231
    {
232 210
        $number = null;
233 210
        $sign = '';
234 210
        $string = '';
235 210
        $base = 10;
236
237 210
        if ($this->ch === '-' || $this->ch === '+') {
238 93
            $sign = $this->ch;
239 93
            $this->next($this->ch);
240 62
        }
241
242
        // support for Infinity
243 210
        if ($this->ch === 'I') {
244 6
            $number = $this->word();
245 6
            if ($number === null) {
246
                $this->throwSyntaxError('Unexpected word for number');
247
            }
248
249 6
            return ($sign === '-') ? -INF : INF;
250
        }
251
252
        // support for NaN
253 204
        if ($this->ch === 'N') {
254
            $number = $this->word();
255
            if ($number !== NAN) {
256
                $this->throwSyntaxError('expected word to be NaN');
257
            }
258
259
            // ignore sign as -NaN also is NaN
260
            return $number;
261
        }
262
263 204
        if ($this->ch === '0') {
264 105
            $string .= $this->ch;
265 105
            $this->next();
266 105
            if ($this->ch === 'x' || $this->ch === 'X') {
267 33
                $string .= $this->ch;
268 33
                $this->next();
269 33
                $base = 16;
270 94
            } elseif (is_numeric($this->ch)) {
271 30
                $this->throwSyntaxError('Octal literal');
272
            }
273 50
        }
274
275
        switch ($base) {
276 174
            case 10:
277 144
                if (($match = $this->match('/^\d*\.?\d*/')) !== null) {
278 144
                    $string .= $match;
279 96
                }
280 144
                if (($match = $this->match('/^[Ee][-+]?\d*/')) !== null) {
281 45
                    $string .= $match;
282 30
                }
283 144
                $number = $string;
284 144
                break;
285 33
            case 16:
286 33
                if (($match = $this->match('/^[A-Fa-f0-9]+/')) !== null) {
287 30
                    $string .= $match;
288 30
                    $number = hexdec($string);
289 30
                    break;
290
                }
291 3
                $this->throwSyntaxError('Bad hex number');
292
        }
293
294 171
        if ($sign === '-') {
295 33
            $number = -$number;
296 22
        }
297
298 171
        if (!is_numeric($number) || !is_finite($number)) {
299 3
            $this->throwSyntaxError('Bad number');
300
        }
301
302 168
        if ($this->castBigIntToString) {
303 3
            return $number;
304
        }
305
306
        // Adding 0 will automatically cast this to an int or float
307 165
        return $number + 0;
308
    }
309
310 84
    private function string()
311
    {
312 84
        if (!($this->ch === '"' || $this->ch === "'")) {
313
            $this->throwSyntaxError('Bad string');
314
        }
315
316 84
        $string = '';
317
318 84
        $delim = $this->ch;
319 84
        $this->next();
320 84
        while ($this->ch !== null) {
321 84
            if ($this->ch === $delim) {
322 78
                $this->next();
323
324 78
                return $string;
325 84
            } elseif ($this->ch === '\\') {
326 30
                if ($unicodeEscaped = $this->match('/^(?:\\\\u[A-Fa-f0-9]{4})+/')) {
327 6
                    $string .= json_decode('"'.$unicodeEscaped.'"');
328 6
                    continue;
329
                }
330
331 24
                $this->next();
332 24
                if ($this->ch === "\r") {
333 6
                    if ($this->peek() === "\n") {
334 4
                        $this->next();
335 2
                    }
336 22
                } elseif (($escapee = self::getEscapee($this->ch)) !== null) {
337 15
                    $string .= $escapee;
338 10
                } else {
339 10
                    break;
340
                }
341 84
            } elseif ($this->ch === "\n") {
342
                // unescaped newlines are invalid; see:
343
                // https://github.com/json5/json5/issues/24
344
                // @todo this feels special-cased; are there other invalid unescaped chars?
345 3
                break;
346
            } else {
347 84
                $string .= $this->ch;
348
            }
349
350 84
            $this->next();
351 56
        }
352
353 6
        $this->throwSyntaxError('Bad string');
354
    }
355
356
    /**
357
     * Skip an inline comment, assuming this is one.
358
     *
359
     * The current character should be the second / character in the // pair that begins this inline comment.
360
     * To finish the inline comment, we look for a newline or the end of the text.
361
     */
362 36
    private function inlineComment()
363
    {
364
        do {
365 36
            $this->next();
366 36
            if ($this->ch === "\n" || $this->ch === "\r") {
367 33
                $this->next();
368
369 33
                return;
370
            }
371 36
        } while ($this->ch !== null);
372 3
    }
373
374
    /**
375
     * Skip a block comment, assuming this is one.
376
     *
377
     * The current character should be the * character in the /* pair that begins this block comment.
378
     * To finish the block comment, we look for an ending *​/ pair of characters,
379
     * but we also watch for the end of text before the comment is terminated.
380
     */
381 21
    private function blockComment()
382
    {
383
        do {
384 21
            $this->next();
385 21
            while ($this->ch === '*') {
386 18
                $this->next('*');
387 18
                if ($this->ch === '/') {
388 18
                    $this->next('/');
389
390 18
                    return;
391
                }
392 2
            }
393 21
        } while ($this->ch !== null);
394
395 3
        $this->throwSyntaxError('Unterminated block comment');
396
    }
397
398
    /**
399
     * Skip a comment, whether inline or block-level, assuming this is one.
400
     */
401 57
    private function comment()
402
    {
403
        // Comments always begin with a / character.
404 57
        if ($this->ch !== '/') {
405
            $this->throwSyntaxError('Not a comment');
406
        }
407
408 57
        $this->next('/');
409
410 57
        if ($this->ch === '/') {
411 36
            $this->inlineComment();
412 46
        } elseif ($this->ch === '*') {
413 21
            $this->blockComment();
414 12
        } else {
415 3
            $this->throwSyntaxError('Unrecognized comment');
416
        }
417 51
    }
418
419
    /**
420
     * Skip whitespace and comments.
421
     *
422
     * Note that we're detecting comments by only a single / character.
423
     * This works since regular expressions are not valid JSON(5), but this will
424
     * break if there are other valid values that begin with a / character!
425
     */
426 387
    private function white()
427
    {
428 387
        while ($this->ch !== null) {
429 384
            if ($this->ch === '/') {
430 57
                $this->comment();
431 379
            } elseif (preg_match(self::REGEX_WHITESPACE, $this->ch) === 1) {
432 282
                $this->next();
433 188
            } else {
434 375
                return;
435
            }
436 192
        }
437 288
    }
438
439
    /**
440
     * Matches true, false, null, etc
441
     */
442 84
    private function word()
443
    {
444 84
        switch ($this->ch) {
445 84
            case 't':
446 36
                $this->next('t');
447 36
                $this->next('r');
448 36
                $this->next('u');
449 36
                $this->next('e');
450 36
                return true;
451 63
            case 'f':
452 18
                $this->next('f');
453 18
                $this->next('a');
454 18
                $this->next('l');
455 18
                $this->next('s');
456 18
                $this->next('e');
457 18
                return false;
458 48
            case 'n':
459 18
                $this->next('n');
460 18
                $this->next('u');
461 18
                $this->next('l');
462 18
                $this->next('l');
463 18
                return null;
464 30
            case 'I':
465 15
                $this->next('I');
466 15
                $this->next('n');
467 12
                $this->next('f');
468 12
                $this->next('i');
469 12
                $this->next('n');
470 12
                $this->next('i');
471 12
                $this->next('t');
472 12
                $this->next('y');
473 12
                return INF;
474 15
            case 'N':
475 6
                $this->next('N');
476 6
                $this->next('a');
477 3
                $this->next('N');
478 3
                return NAN;
479 6
        }
480
481 9
        $this->throwSyntaxError('Unexpected ' . self::renderChar($this->ch));
482
    }
483
484 42
    private function arr()
485
    {
486 42
        $arr = [];
487
488 42
        if ($this->ch === '[') {
489 42
            if (++$this->depth > $this->maxDepth) {
490 3
                $this->throwSyntaxError('Maximum stack depth exceeded');
491
            }
492
493 42
            $this->next('[');
494 42
            $this->white();
495 42
            while ($this->ch !== null) {
496 42
                if ($this->ch === ']') {
497 12
                    $this->next(']');
498 12
                    $this->depth--;
499 12
                    return $arr; // Potentially empty array
500
                }
501
                // ES5 allows omitting elements in arrays, e.g. [,] and
502
                // [,null]. We don't allow this in JSON5.
503 39
                if ($this->ch === ',') {
504 6
                    $this->throwSyntaxError('Missing array element');
505
                } else {
506 33
                    $arr[] = $this->value();
507
                }
508 30
                $this->white();
509
                // If there's no comma after this value, this needs to
510
                // be the end of the array.
511 30
                if ($this->ch !== ',') {
512 21
                    $this->next(']');
513 18
                    $this->depth--;
514 18
                    return $arr;
515
                }
516 15
                $this->next(',');
517 15
                $this->white();
518 10
            }
519
        }
520
521
        $this->throwSyntaxError('Bad array');
522
    }
523
524
    /**
525
     * Parse an object value
526
     */
527 81
    private function obj()
528
    {
529 81
        $object = $this->associative ? [] : new \stdClass;
530
531 81
        if ($this->ch === '{') {
532 81
            if (++$this->depth > $this->maxDepth) {
533
                $this->throwSyntaxError('Maximum stack depth exceeded');
534
            }
535
536 81
            $this->next('{');
537 81
            $this->white();
538 81
            while ($this->ch) {
539 81
                if ($this->ch === '}') {
540 21
                    $this->next('}');
541 21
                    $this->depth--;
542 21
                    return $object; // Potentially empty object
543
                }
544
545
                // Keys can be unquoted. If they are, they need to be
546
                // valid JS identifiers.
547 69
                if ($this->ch === '"' || $this->ch === "'") {
548 30
                    $key = $this->string();
549 20
                } else {
550 42
                    $key = $this->identifier();
551
                }
552
553 60
                $this->white();
554 60
                $this->next(':');
555 57
                if ($this->associative) {
556 48
                    $object[$key] = $this->value();
557 32
                } else {
558 54
                    $object->{$key} = $this->value();
559
                }
560 57
                $this->white();
561
                // If there's no comma after this pair, this needs to be
562
                // the end of the object.
563 57
                if ($this->ch !== ',') {
564 48
                    $this->next('}');
565 45
                    $this->depth--;
566 45
                    return $object;
567
                }
568 21
                $this->next(',');
569 21
                $this->white();
570 14
            }
571
        }
572
573
        $this->throwSyntaxError('Bad object');
574
    }
575
576
    /**
577
     * Parse a JSON value.
578
     *
579
     * It could be an object, an array, a string, a number,
580
     * or a word.
581
     */
582 387
    private function value()
583
    {
584 387
        $this->white();
585 384
        switch ($this->ch) {
586 384
            case '{':
587 81
                return $this->obj();
588 360
            case '[':
589 42
                return $this->arr();
590 348
            case '"':
591 334
            case "'":
592 72
                return $this->string();
593 285
            case '-':
594 270
            case '+':
595 255
            case '.':
596 102
                return $this->number();
597 124
            default:
598 186
                return is_numeric($this->ch) ? $this->number() : $this->word();
599 124
        }
600
    }
601
602 108
    private function throwSyntaxError($message)
603
    {
604 108
        throw new SyntaxError($message, $this->lineNumber, $this->columnNumber);
605
    }
606
607 24
    private static function renderChar($chr)
608
    {
609 24
        return $chr === null ? 'EOF' : "'" . $chr . "'";
610
    }
611
612
    /**
613
     * @param string $ch
614
     *
615
     * @return string|null
616
     */
617 18
    private static function getEscapee($ch)
618
    {
619
        switch ($ch) {
620
            // @codingStandardsIgnoreStart
621 18
            case "'":  return "'";
622 15
            case '"':  return '"';
623 15
            case '\\': return '\\';
624 15
            case '/':  return '/';
625 15
            case "\n": return '';
626 6
            case 'b':  return chr(8);
627 6
            case 'f':  return "\f";
628 6
            case 'n':  return "\n";
629 6
            case 'r':  return "\r";
630 6
            case 't':  return "\t";
631 3
            default:   return null;
632
            // @codingStandardsIgnoreEnd
633 2
        }
634
    }
635
}
636