Completed
Push — master ( 9eb60a...7a0a03 )
by Colin
36:39 queued 35:14
created

Cursor   F

Complexity

Total Complexity 62

Size/Duplication

Total Lines 479
Duplicated Lines 0 %

Coupling/Cohesion

Components 1
Dependencies 0

Test Coverage

Coverage 94.12%

Importance

Changes 0
Metric Value
wmc 62
lcom 1
cbo 0
dl 0
loc 479
ccs 160
cts 170
cp 0.9412
rs 3.44
c 0
b 0
f 0

24 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 7 2
B getNextNonSpacePosition() 0 26 6
A getNextNonSpaceCharacter() 0 4 1
A getIndent() 0 8 2
A isIndented() 0 4 1
B getCharacter() 0 21 6
A peek() 0 4 1
A isBlank() 0 4 2
A advance() 0 4 1
A advanceBySpaceOrTab() 0 12 3
A advanceToNextNonSpaceOrTab() 0 8 1
A advanceToNextNonSpaceOrNewline() 0 22 4
A getLine() 0 4 1
A isAtEnd() 0 4 1
A match() 0 26 3
A saveState() 0 11 1
A restoreState() 0 11 1
A getPosition() 0 4 1
A getPreviousText() 0 4 1
A getSubstring() 0 10 3
A getColumn() 0 4 1
C advanceBy() 0 62 14
A advanceToEnd() 0 9 1
A getRemainder() 0 20 4

How to fix   Complexity   

Complex Class

Complex classes like Cursor often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Cursor, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
/*
6
 * This file is part of the league/commonmark package.
7
 *
8
 * (c) Colin O'Dell <[email protected]>
9
 *
10
 * For the full copyright and license information, please view the LICENSE
11
 * file that was distributed with this source code.
12
 */
13
14
namespace League\CommonMark\Parser;
15
16
class Cursor
17
{
18
    public const INDENT_LEVEL = 4;
19
20
    /**
21
     * @var string
22
     */
23
    private $line;
24
25
    /**
26
     * @var int
27
     */
28
    private $length;
29
30
    /**
31
     * @var int
32
     *
33
     * It's possible for this to be 1 char past the end, meaning we've parsed all chars and have
34
     * reached the end.  In this state, any character-returning method MUST return null.
35
     */
36
    private $currentPosition = 0;
37
38
    /**
39
     * @var int
40
     */
41
    private $column = 0;
42
43
    /**
44
     * @var int
45
     */
46
    private $indent = 0;
47
48
    /**
49
     * @var int
50
     */
51
    private $previousPosition = 0;
52
53
    /**
54
     * @var int|null
55
     */
56
    private $nextNonSpaceCache;
57
58
    /**
59
     * @var bool
60
     */
61
    private $partiallyConsumedTab = false;
62
63
    /**
64
     * @var bool
65
     */
66
    private $lineContainsTabs;
67
68
    /**
69
     * @var bool
70
     */
71
    private $isMultibyte;
72
73
    /**
74
     * @var array<int, string>
75
     */
76
    private $charCache = [];
77
78
    /**
79
     * @param string $line The line being parsed (ASCII or UTF-8)
80
     */
81 510
    public function __construct(string $line)
82
    {
83 510
        $this->line = $line;
84 510
        $this->length = \mb_strlen($line, 'UTF-8') ?: 0;
85 510
        $this->isMultibyte = $this->length !== \strlen($line);
86 510
        $this->lineContainsTabs = false !== \strpos($line, "\t");
87 510
    }
88
89
    /**
90
     * Returns the position of the next character which is not a space (or tab)
91
     *
92
     * @return int
93
     */
94 204
    public function getNextNonSpacePosition(): int
95
    {
96 204
        if ($this->nextNonSpaceCache !== null) {
97 24
            return $this->nextNonSpaceCache;
98
        }
99
100 204
        $i = $this->currentPosition;
101 204
        $cols = $this->column;
102
103 204
        while (($c = $this->getCharacter($i)) !== null) {
104 180
            if ($c === ' ') {
105 93
                $i++;
106 93
                $cols++;
107 138
            } elseif ($c === "\t") {
108 6
                $i++;
109 6
                $cols += (4 - ($cols % 4));
110
            } else {
111 138
                break;
112
            }
113
        }
114
115 204
        $nextNonSpace = ($c === null) ? $this->length : $i;
116 204
        $this->indent = $cols - $this->column;
117
118 204
        return $this->nextNonSpaceCache = $nextNonSpace;
119
    }
120
121
    /**
122
     * Returns the next character which isn't a space (or tab)
123
     *
124
     * @return string
125
     */
126 24
    public function getNextNonSpaceCharacter(): ?string
127
    {
128 24
        return $this->getCharacter($this->getNextNonSpacePosition());
129
    }
130
131
    /**
132
     * Calculates the current indent (number of spaces after current position)
133
     *
134
     * @return int
135
     */
136 84
    public function getIndent(): int
137
    {
138 84
        if ($this->nextNonSpaceCache === null) {
139 84
            $this->getNextNonSpacePosition();
140
        }
141
142 84
        return $this->indent;
143
    }
144
145
    /**
146
     * Whether the cursor is indented to INDENT_LEVEL
147
     *
148
     * @return bool
149
     */
150 24
    public function isIndented(): bool
151
    {
152 24
        return $this->getIndent() >= self::INDENT_LEVEL;
153
    }
154
155
    /**
156
     * @param int|null $index
157
     *
158
     * @return string|null
159
     */
160 264
    public function getCharacter(?int $index = null): ?string
161
    {
162 264
        if ($index === null) {
163 30
            $index = $this->currentPosition;
164
        }
165
166
        // Index out-of-bounds, or we're at the end
167 264
        if ($index < 0 || $index >= $this->length) {
168 90
            return null;
169
        }
170
171 222
        if ($this->isMultibyte) {
172 57
            if (isset($this->charCache[$index])) {
173 6
                return $this->charCache[$index];
174
            }
175
176 57
            return $this->charCache[$index] = \mb_substr($this->line, $index, 1, 'UTF-8');
177
        }
178
179 165
        return $this->line[$index];
180
    }
181
182
    /**
183
     * Returns the next character (or null, if none) without advancing forwards
184
     *
185
     * @param int $offset
186
     *
187
     * @return string|null
188
     */
189 39
    public function peek(int $offset = 1): ?string
190
    {
191 39
        return $this->getCharacter($this->currentPosition + $offset);
192
    }
193
194
    /**
195
     * Whether the remainder is blank
196
     *
197
     * @return bool
198
     */
199 18
    public function isBlank(): bool
200
    {
201 18
        return $this->nextNonSpaceCache === $this->length || $this->getNextNonSpacePosition() === $this->length;
202
    }
203
204
    /**
205
     * Move the cursor forwards
206
     */
207 39
    public function advance(): void
208
    {
209 39
        $this->advanceBy(1);
210 39
    }
211
212
    /**
213
     * Move the cursor forwards
214
     *
215
     * @param int  $characters       Number of characters to advance by
216
     * @param bool $advanceByColumns Whether to advance by columns instead of spaces
217
     *
218
     * @return void
219
     */
220 372
    public function advanceBy(int $characters, bool $advanceByColumns = false): void
221
    {
222 372
        if ($characters === 0) {
223 144
            $this->previousPosition = $this->currentPosition;
224
225 144
            return;
226
        }
227
228 291
        $this->previousPosition = $this->currentPosition;
229 291
        $this->nextNonSpaceCache = null;
230
231
        // Optimization to avoid tab handling logic if we have no tabs
232 291
        if (!$this->lineContainsTabs || false === \strpos(
233 9
            $nextFewChars = $this->isMultibyte ?
234
                \mb_substr($this->line, $this->currentPosition, $characters, 'UTF-8') :
235 9
                \substr($this->line, $this->currentPosition, $characters),
236 291
            "\t"
237
        )) {
238 285
            $length = \min($characters, $this->length - $this->currentPosition);
239 285
            $this->partiallyConsumedTab = false;
240 285
            $this->currentPosition += $length;
241 285
            $this->column += $length;
242
243 285
            return;
244
        }
245
246 9
        if ($characters === 1 && !empty($nextFewChars)) {
247 3
            $asArray = [$nextFewChars];
248 6
        } elseif ($this->isMultibyte) {
249
            /** @var string[] $asArray */
250
            $asArray = \preg_split('//u', $nextFewChars, -1, \PREG_SPLIT_NO_EMPTY);
251
        } else {
252 6
            $asArray = \str_split($nextFewChars);
253
        }
254
255 9
        foreach ($asArray as $relPos => $c) {
256 9
            if ($c === "\t") {
257 9
                $charsToTab = 4 - ($this->column % 4);
258 9
                if ($advanceByColumns) {
259 3
                    $this->partiallyConsumedTab = $charsToTab > $characters;
260 3
                    $charsToAdvance = $charsToTab > $characters ? $characters : $charsToTab;
261 3
                    $this->column += $charsToAdvance;
262 3
                    $this->currentPosition += $this->partiallyConsumedTab ? 0 : 1;
263 3
                    $characters -= $charsToAdvance;
264
                } else {
265 6
                    $this->partiallyConsumedTab = false;
266 6
                    $this->column += $charsToTab;
267 6
                    $this->currentPosition++;
268 9
                    $characters--;
269
                }
270
            } else {
271 3
                $this->partiallyConsumedTab = false;
272 3
                $this->currentPosition++;
273 3
                $this->column++;
274 3
                $characters--;
275
            }
276
277 9
            if ($characters <= 0) {
278 9
                break;
279
            }
280
        }
281 9
    }
282
283
    /**
284
     * Advances the cursor by a single space or tab, if present
285
     *
286
     * @return bool
287
     */
288 18
    public function advanceBySpaceOrTab(): bool
289
    {
290 18
        $character = $this->getCharacter();
291
292 18
        if ($character === ' ' || $character === "\t") {
293 18
            $this->advanceBy(1, true);
294
295 18
            return true;
296
        }
297
298 18
        return false;
299
    }
300
301
    /**
302
     * Parse zero or more space/tab characters
303
     *
304
     * @return int Number of positions moved
305
     */
306 78
    public function advanceToNextNonSpaceOrTab(): int
307
    {
308 78
        $newPosition = $this->getNextNonSpacePosition();
309 78
        $this->advanceBy($newPosition - $this->currentPosition);
310 78
        $this->partiallyConsumedTab = false;
311
312 78
        return $this->currentPosition - $this->previousPosition;
313
    }
314
315
    /**
316
     * Parse zero or more space characters, including at most one newline.
317
     *
318
     * Tab characters are not parsed with this function.
319
     *
320
     * @return int Number of positions moved
321
     */
322 54
    public function advanceToNextNonSpaceOrNewline(): int
323
    {
324 54
        $remainder = $this->getRemainder();
325
326
        // Optimization: Avoid the regex if we know there are no spaces or newlines
327 54
        if (empty($remainder) || ($remainder[0] !== ' ' && $remainder[0] !== "\n")) {
328 24
            $this->previousPosition = $this->currentPosition;
329
330 24
            return 0;
331
        }
332
333 30
        $matches = [];
334 30
        \preg_match('/^ *(?:\n *)?/', $remainder, $matches, \PREG_OFFSET_CAPTURE);
335
336
        // [0][0] contains the matched text
337
        // [0][1] contains the index of that match
338 30
        $increment = $matches[0][1] + \strlen($matches[0][0]);
339
340 30
        $this->advanceBy($increment);
341
342 30
        return $this->currentPosition - $this->previousPosition;
343
    }
344
345
    /**
346
     * Move the position to the very end of the line
347
     *
348
     * @return int The number of characters moved
349
     */
350
    public function advanceToEnd(): int
351
    {
352
        $this->previousPosition = $this->currentPosition;
353
        $this->nextNonSpaceCache = null;
354
355
        $this->currentPosition = $this->length;
356
357
        return $this->currentPosition - $this->previousPosition;
358
    }
359
360 144
    public function getRemainder(): string
361
    {
362 144
        if ($this->currentPosition >= $this->length) {
363 9
            return '';
364
        }
365
366 135
        $prefix = '';
367 135
        $position = $this->currentPosition;
368 135
        if ($this->partiallyConsumedTab) {
369
            $position++;
370
            $charsToTab = 4 - ($this->column % 4);
371
            $prefix = \str_repeat(' ', $charsToTab);
372
        }
373
374 135
        $subString = $this->isMultibyte ?
375 27
            \mb_substr($this->line, $position, null, 'UTF-8') :
376 135
            \substr($this->line, $position);
377
378 135
        return $prefix . $subString;
379
    }
380
381 3
    public function getLine(): string
382
    {
383 3
        return $this->line;
384
    }
385
386 21
    public function isAtEnd(): bool
387
    {
388 21
        return $this->currentPosition >= $this->length;
389
    }
390
391
    /**
392
     * Try to match a regular expression
393
     *
394
     * Returns the matching text and advances to the end of that match
395
     *
396
     * @param string $regex
397
     *
398
     * @return string|null
399
     */
400 48
    public function match(string $regex): ?string
401
    {
402 48
        $subject = $this->getRemainder();
403
404 48
        if (!\preg_match($regex, $subject, $matches, \PREG_OFFSET_CAPTURE)) {
405 9
            return null;
406
        }
407
408
        // $matches[0][0] contains the matched text
409
        // $matches[0][1] contains the index of that match
410
411 39
        if ($this->isMultibyte) {
412
            // PREG_OFFSET_CAPTURE always returns the byte offset, not the char offset, which is annoying
413 15
            $offset = \mb_strlen(\substr($subject, 0, $matches[0][1]), 'UTF-8');
414 15
            $matchLength = \mb_strlen($matches[0][0], 'UTF-8');
415
        } else {
416 24
            $offset = $matches[0][1];
417 24
            $matchLength = \strlen($matches[0][0]);
418
        }
419
420
        // [0][0] contains the matched text
421
        // [0][1] contains the index of that match
422 39
        $this->advanceBy($offset + $matchLength);
423
424 39
        return $matches[0][0];
425
    }
426
427
    /**
428
     * Encapsulates the current state of this cursor in case you need to rollback later.
429
     *
430
     * WARNING: Do not parse or use the return value for ANYTHING except for
431
     * passing it back into restoreState(), as the number of values and their
432
     * contents may change in any future release without warning.
433
     *
434
     * @return array<mixed>
435
     */
436 33
    public function saveState()
437
    {
438
        return [
439 33
            $this->currentPosition,
440 33
            $this->previousPosition,
441 33
            $this->nextNonSpaceCache,
442 33
            $this->indent,
443 33
            $this->column,
444 33
            $this->partiallyConsumedTab,
445
        ];
446
    }
447
448
    /**
449
     * Restore the cursor to a previous state.
450
     *
451
     * Pass in the value previously obtained by calling saveState().
452
     *
453
     * @param array<mixed> $state
454
     *
455
     * @return void
456
     */
457 6
    public function restoreState($state): void
458
    {
459
        list(
460 6
            $this->currentPosition,
461 6
            $this->previousPosition,
462 6
            $this->nextNonSpaceCache,
463 6
            $this->indent,
464 6
            $this->column,
465 6
            $this->partiallyConsumedTab,
466 4
          ) = $state;
467 6
    }
468
469 129
    public function getPosition(): int
470
    {
471 129
        return $this->currentPosition;
472
    }
473
474 6
    public function getPreviousText(): string
475
    {
476 6
        return \mb_substr($this->line, $this->previousPosition, $this->currentPosition - $this->previousPosition, 'UTF-8');
477
    }
478
479 21
    public function getSubstring(int $start, ?int $length = null): string
480
    {
481 21
        if ($this->isMultibyte) {
482 9
            return \mb_substr($this->line, $start, $length, 'UTF-8');
483 12
        } elseif ($length !== null) {
484 9
            return \substr($this->line, $start, $length);
485
        }
486
487 3
        return \substr($this->line, $start);
488
    }
489
490 21
    public function getColumn(): int
491
    {
492 21
        return $this->column;
493
    }
494
}
495