Completed
Pull Request — master (#1184)
by Alexey
12:56
created

Lexer::lexMe()   B

Complexity

Conditions 8
Paths 32

Size

Total Lines 71
Code Lines 43

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 8
eloc 43
c 1
b 0
f 0
nc 32
nop 2
dl 0
loc 71
rs 7.9875

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
/**
3
 * Hoa
4
 *
5
 *
6
 * @license
7
 *
8
 * BSD 3-Clause License
9
 *
10
 * Copyright © 2007-2017, Hoa community. All rights reserved.
11
 *
12
 * Redistribution and use in source and binary forms, with or without
13
 * modification, are permitted provided that the following conditions are met:
14
 *
15
 * 1. Redistributions of source code must retain the above copyright notice, this
16
 *    list of conditions and the following disclaimer.
17
 *
18
 * 2. Redistributions in binary form must reproduce the above copyright notice,
19
 *    this list of conditions and the following disclaimer in the documentation
20
 *    and/or other materials provided with the distribution.
21
 *
22
 * 3. Neither the name of the copyright holder nor the names of its
23
 *    contributors may be used to endorse or promote products derived from
24
 *    this software without specific prior written permission.
25
 *
26
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
27
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
28
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
29
 * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
30
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
31
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
32
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
33
 * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
34
 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
35
 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
36
 *
37
 */
38
39
namespace JMS\Serializer\Type\Compiler\Llk;
40
41
use JMS\Serializer\Type\Compiler;
42
43
/**
44
 * Class \JMS\Serializer\Type\Compiler\Llk\Lexer.
45
 *
46
 * Lexical analyser, i.e. split a string into a set of lexeme, i.e. tokens.
47
 *
48
 * @copyright  Copyright © 2007-2017 Hoa community
49
 * @license    New BSD License
50
 */
51
final class Lexer
52
{
53
    /**
54
     * Lexer state.
55
     *
56
     * @var array
57
     */
58
    protected $_lexerState  = null;
59
60
    /**
61
     * Text.
62
     *
63
     * @var string
64
     */
65
    protected $_text        = null;
66
67
    /**
68
     * Tokens.
69
     *
70
     * @var array
71
     */
72
    protected $_tokens      = [];
73
74
    /**
75
     * Namespace stacks.
76
     *
77
     * @var \SplStack
78
     */
79
    protected $_nsStack     = null;
80
81
    /**
82
     * PCRE options.
83
     *
84
     * @var string
85
     */
86
    protected $_pcreOptions = null;
87
88
89
90
    /**
91
     * Constructor.
92
     *
93
     * @param   array  $pragmas    Pragmas.
94
     */
95
    public function __construct(array $pragmas = [])
96
    {
97
        if (!isset($pragmas['lexer.unicode']) || true === $pragmas['lexer.unicode']) {
98
            $this->_pcreOptions .= 'u';
99
        }
100
101
        return;
102
    }
103
104
    /**
105
     * Text tokenizer: splits the text in parameter in an ordered array of
106
     * tokens.
107
     *
108
     * @param string  $text      Text to tokenize.
109
     * @param array[]   $tokens    Tokens to be returned.
110
     *
111
     * @return \Generator|array[]
112
     *
113
     * @throws \JMS\Serializer\Type\Compiler\Exception\UnrecognizedToken
114
     *
115
     * @psalm-return \Generator<int, array{token: string, value: string, length: int|false, namespace: array|string, keep: true, offset: int}>
116
     */
117
    public function lexMe($text, array $tokens): \Generator
118
    {
119
        $this->validateInputInUnicodeMode($text);
120
121
        $this->_text       = $text;
122
        $this->_tokens     = $tokens;
123
        $this->_nsStack    = null;
124
        $offset            = 0;
125
        $maxOffset         = strlen($this->_text);
126
        $this->_lexerState = 'default';
0 ignored issues
show
Documentation Bug introduced by
It seems like 'default' of type string is incompatible with the declared type array of property $_lexerState.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
127
        $stack             = false;
128
129
        foreach ($this->_tokens as &$tokens) {
130
            $_tokens = [];
131
132
            foreach ($tokens as $fullLexeme => $regex) {
133
                if (false === strpos($fullLexeme, ':')) {
134
                    $_tokens[$fullLexeme] = [$regex, null];
135
136
                    continue;
137
                }
138
139
                list($lexeme, $namespace) = explode(':', $fullLexeme, 2);
140
141
                $stack |= ('__shift__' === substr($namespace, 0, 9));
142
143
                unset($tokens[$fullLexeme]);
144
                $_tokens[$lexeme] = [$regex, $namespace];
145
            }
146
147
            $tokens = $_tokens;
148
        }
149
150
        if (true == $stack) {
151
            $this->_nsStack = new \SplStack();
152
        }
153
154
        while ($offset < $maxOffset) {
155
            $nextToken = $this->nextToken($offset);
156
157
            if (null === $nextToken) {
158
                throw new Compiler\Exception\UnrecognizedToken(
159
                    'Unrecognized token "%s" at line 1 and column %d:' .
160
                    "\n" . '%s' . "\n" .
161
                    str_repeat(' ', mb_strlen(substr($text, 0, $offset))) . '↑',
162
                    0,
163
                    [
164
                        mb_substr(substr($text, $offset), 0, 1),
165
                        $offset + 1,
166
                        $text
167
                    ],
168
                    1,
169
                    $offset
170
                );
171
            }
172
173
            if (true === $nextToken['keep']) {
174
                $nextToken['offset'] = $offset;
175
                yield $nextToken;
176
            }
177
178
            $offset += strlen($nextToken['value']);
179
        }
180
181
        yield [
182
            'token'     => 'EOF',
183
            'value'     => 'EOF',
184
            'length'    => 0,
185
            'namespace' => 'default',
186
            'keep'      => true,
187
            'offset'    => $offset
188
        ];
189
    }
190
191
    /**
192
     * Compute the next token recognized at the beginning of the string.
193
     *
194
     * @param int  $offset    Offset.
195
     *
196
     * @return (array|bool|int|string)[]|array[]|null
197
     *
198
     * @throws \JMS\Serializer\Type\Compiler\Exception\Lexer
199
     *
200
     * @psalm-return array{token: string, value: string, length: int|false, namespace: array, keep: bool}|null
201
     */
202
    protected function nextToken($offset)
203
    {
204
        $tokenArray = &$this->_tokens[$this->_lexerState];
205
206
        $previousNamespace = null;
207
        foreach ($tokenArray as $lexeme => $bucket) {
208
            list($regex, $nextState) = $bucket;
209
210
            if (null === $nextState) {
211
                $nextState = $this->_lexerState;
212
            }
213
214
            $out = $this->matchLexeme($lexeme, $regex, $offset);
215
216
            if (null !== $out) {
217
                $out['namespace'] = $this->_lexerState;
218
                $out['keep']      = 'skip' !== $lexeme;
219
220
                if ($nextState !== $this->_lexerState) {
221
                    $shift = false;
222
223
                    if (null !== $this->_nsStack &&
224
                        0 !== preg_match('#^__shift__(?:\s*\*\s*(\d+))?$#', $nextState, $matches)) {
0 ignored issues
show
Bug introduced by
$nextState of type array is incompatible with the type string expected by parameter $subject of preg_match(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

224
                        0 !== preg_match('#^__shift__(?:\s*\*\s*(\d+))?$#', /** @scrutinizer ignore-type */ $nextState, $matches)) {
Loading history...
225
                        $i = isset($matches[1]) ? intval($matches[1]) : 1;
226
227
                        if ($i > ($c = count($this->_nsStack))) {
228
                            throw new Compiler\Exception\Lexer(
229
                                'Cannot shift namespace %d-times, from token ' .
230
                                '%s in namespace %s, because the stack ' .
231
                                'contains only %d namespaces.',
232
                                1,
233
                                [
234
                                    $i,
235
                                    $lexeme,
236
                                    $this->_lexerState,
237
                                    $c
238
                                ]
239
                            );
240
                        }
241
242
                        while (1 <= $i--) {
243
                            $previousNamespace = $this->_nsStack->pop();
244
                        }
245
246
                        $nextState = $previousNamespace;
247
                        $shift     = true;
248
                    }
249
250
                    if (!isset($this->_tokens[$nextState])) {
251
                        throw new Compiler\Exception\Lexer(
252
                            'Namespace %s does not exist, called by token %s ' .
253
                            'in namespace %s.',
254
                            2,
255
                            [
256
                                $nextState,
257
                                $lexeme,
258
                                $this->_lexerState
259
                            ]
260
                        );
261
                    }
262
263
                    if (null !== $this->_nsStack && false === $shift) {
264
                        $this->_nsStack[] = $this->_lexerState;
265
                    }
266
267
                    $this->_lexerState = $nextState;
268
                }
269
270
                return $out;
271
            }
272
        }
273
274
        return null;
275
    }
276
277
    /**
278
     * Check if a given lexeme is matched at the beginning of the text.
279
     *
280
     * @param string  $lexeme    Name of the lexeme.
281
     * @param string  $regex     Regular expression describing the lexeme.
282
     * @param int     $offset    Offset.
283
     *
284
     * @return (int|false|string)[]|null
285
     *
286
     * @throws \JMS\Serializer\Type\Compiler\Exception\Lexer
287
     *
288
     * @psalm-return array{token: string, value: string, length: int|false}|null
289
     */
290
    protected function matchLexeme($lexeme, $regex, $offset)
291
    {
292
        $_regex = str_replace('#', '\#', $regex);
293
        $preg   = @preg_match(
294
            '#\G(?|' . $_regex . ')#' . $this->_pcreOptions,
295
            $this->_text,
296
            $matches,
297
            0,
298
            $offset
299
        );
300
301
        if (0 === $preg) {
302
            return null;
303
        }
304
305
        if (false === $preg) {
306
            throw new Compiler\Exception\InternalError(
307
                'Lexer encountered a PCRE error on a lexeme "%s", full regex: "%s". Please report this issue to the maintainers.',
308
                preg_last_error(),
309
                [$lexeme, $_regex]
310
            );
311
        }
312
313
        if ('' === $matches[0]) {
314
            throw new Compiler\Exception\Lexer(
315
                'A lexeme must not match an empty value, which is the ' .
316
                'case of "%s" (%s).',
317
                3,
318
                [$lexeme, $regex]
319
            );
320
        }
321
322
        return [
323
            'token'  => $lexeme,
324
            'value'  => $matches[0],
325
            'length' => mb_strlen($matches[0])
326
        ];
327
    }
328
329
    /**
330
     * @param string $text
331
     * @return bool
332
     */
333
    private function validateInputInUnicodeMode($text)
334
    {
335
        if (strpos($this->_pcreOptions, 'u') !== false && preg_match('##u', $text) === false) {
336
            throw new Compiler\Exception\Lexer(
337
                'Text is not valid utf-8 string, you probably need to switch "lexer.unicode" setting off.'
338
            );
339
        }
340
    }
341
}
342