Completed
Pull Request — master (#19)
by Roman
05:49
created

Tokenizer::tokenize()   D

Complexity

Conditions 23
Paths 1

Size

Total Lines 119
Code Lines 69

Duplication

Lines 16
Ratio 13.45 %

Importance

Changes 1
Bugs 1 Features 0
Metric Value
cc 23
eloc 69
nc 1
nop 1
dl 16
loc 119
rs 4.6303
c 1
b 1
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace PeacefulBit\Slate\Parser;
4
5
use function Nerd\Common\Arrays\append;
6
use function Nerd\Common\Arrays\toHeadTail;
7
use function Nerd\Common\Functional\tail;
8
use function Nerd\Common\Strings\toArray;
9
10
use PeacefulBit\Slate\Exceptions\TokenizerException;
11
use PeacefulBit\Slate\Parser\Tokens;
12
13
class Tokenizer
14
{
15
    const TOKEN_OPEN_BRACKET    = '(';
16
    const TOKEN_CLOSE_BRACKET   = ')';
17
    const TOKEN_DOUBLE_QUOTE    = '"';
18
    const TOKEN_BACK_SLASH      = '\\';
19
    const TOKEN_SEMICOLON       = ';';
20
21
    const TOKEN_TAB             = "\t";
22
    const TOKEN_SPACE           = " ";
23
    const TOKEN_NEW_LINE        = "\n";
24
    const TOKEN_CARRIAGE_RETURN = "\r";
25
26
    const CHAR_DIGITS           = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
27
    const CHAR_DOT              = '.';
28
29
    /**
30
     * @param $char
31
     * @return bool
32
     */
33 View Code Duplication
    private function isStructural($char)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
34
    {
35
        return in_array($char, [
36
            self::TOKEN_OPEN_BRACKET,
37
            self::TOKEN_CLOSE_BRACKET,
38
            self::TOKEN_DOUBLE_QUOTE,
39
            self::TOKEN_SEMICOLON
40
        ]);
41
    }
42
43
    /**
44
     * @param $char
45
     * @return bool
46
     */
47 View Code Duplication
    private function isDelimiter($char)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
48
    {
49
        return in_array($char, [
50
            self::TOKEN_TAB,
51
            self::TOKEN_CARRIAGE_RETURN,
52
            self::TOKEN_NEW_LINE,
53
            self::TOKEN_SPACE
54
        ]);
55
    }
56
57
    /**
58
     * @param $char
59
     * @return bool
60
     */
61
    private function isSymbol($char)
62
    {
63
        return !$this->isDelimiter($char) && !$this->isStructural($char);
64
    }
65
66
    /**
67
     * @param $char
68
     * @return bool
69
     */
70
    private function isNumber($char)
0 ignored issues
show
Unused Code introduced by
This method is not used, and could be removed.
Loading history...
71
    {
72
        return in_array($char, self::CHAR_DIGITS);
73
    }
74
75
    /**
76
     * Convert source code to array of tokens.
77
     *
78
     * @param string $code
79
     * @return Token[]
80
     */
81
    public function tokenize($code)
82
    {
83
        // Initial state of parser
84
        $baseIter = tail(function ($rest, $acc) use (&$baseIter, &$symbolIter, &$stringIter, &$commentIter) {
85
            if (sizeof($rest) == 0) {
86
                return $acc;
87
            }
88
            list ($head, $tail) = toHeadTail($rest);
89
            switch ($head) {
90
                // We got '(', so we just add it to accumulator.
91
                case self::TOKEN_OPEN_BRACKET:
92
                    return $baseIter($tail, append($acc, new Tokens\OpenBracketToken));
93
                // We got ')', and doing the same as in previous case.
94
                case self::TOKEN_CLOSE_BRACKET:
95
                    return $baseIter($tail, append($acc, new Tokens\CloseBracketToken));
96
                // We got '"'. That means that we're in the beginning of the string.
97
                // So we switch our state to stringIter.
98
                case self::TOKEN_DOUBLE_QUOTE:
99
                    return $stringIter($tail, '', $acc);
100
                // We got ';'. And that means that comment is starting here. So we
101
                // change our state to commentIter.
102
                case self::TOKEN_SEMICOLON:
103
                    return $commentIter($tail, '', $acc);
104
                default:
105
                    // If current char is a delimiter, we ignore it.
106
                    if ($this->isDelimiter($head)) {
107
                        return $baseIter($tail, $acc);
108
                    }
109
                    // In all other cases we interpret current char as first char
110
                    // of symbol and change our state to symbolIter.
111
                    return $symbolIter($tail, $head, $acc);
112
            }
113
        });
114
115
        // State when parser parses any symbol
116
        $symbolIter = tail(function ($rest, $buffer, $acc) use (&$symbolIter, &$baseIter, &$preDotSymbolIter) {
0 ignored issues
show
Bug introduced by
Consider using a different name than the imported variable $symbolIter, or did you forget to import by reference?

It seems like you are assigning to a variable which was imported through a use statement which was not imported by reference.

For clarity, we suggest to use a different name or import by reference depending on whether you would like to have the change visibile in outer-scope.

Change not visible in outer-scope

$x = 1;
$callable = function() use ($x) {
    $x = 2; // Not visible in outer scope. If you would like this, how
            // about using a different variable name than $x?
};

$callable();
var_dump($x); // integer(1)

Change visible in outer-scope

$x = 1;
$callable = function() use (&$x) {
    $x = 2;
};

$callable();
var_dump($x); // integer(2)
Loading history...
Unused Code introduced by
$symbolIter is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
117
            if (sizeof($rest) > 0) {
118
                list ($head, $tail) = toHeadTail($rest);
119
                if ($head == self::CHAR_DOT) {
120
                    return $preDotSymbolIter($tail, $buffer . $head, $acc);
121
                }
122
                if ($this->isSymbol($head)) {
123
                    return $symbolIter($tail, $buffer . $head, $acc);
124
                }
125
            }
126
            if (is_numeric($buffer)) {
127
                $symbolToken = new Tokens\NumericToken($buffer);
128
            } else {
129
                $symbolToken = new Tokens\IdentifierToken($buffer);
130
            }
131
            return $baseIter($rest, append($acc, $symbolToken));
132
        });
133
134 View Code Duplication
        $preDotSymbolIter = tail(function ($rest, $buffer, $acc) use (&$baseIter, &$dotSymbolIter) {
0 ignored issues
show
Bug introduced by
Consider using a different name than the imported variable $preDotSymbolIter, or did you forget to import by reference?

It seems like you are assigning to a variable which was imported through a use statement which was not imported by reference.

For clarity, we suggest to use a different name or import by reference depending on whether you would like to have the change visibile in outer-scope.

Change not visible in outer-scope

$x = 1;
$callable = function() use ($x) {
    $x = 2; // Not visible in outer scope. If you would like this, how
            // about using a different variable name than $x?
};

$callable();
var_dump($x); // integer(1)

Change visible in outer-scope

$x = 1;
$callable = function() use (&$x) {
    $x = 2;
};

$callable();
var_dump($x); // integer(2)
Loading history...
Unused Code introduced by
$preDotSymbolIter is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
135
            if (sizeof($rest) > 0) {
136
                list ($head, $tail) = toHeadTail($rest);
137
                if ($this->isSymbol($head)) {
138
                    return $dotSymbolIter($tail, $buffer . $head, $acc);
139
                }
140
            }
141
            throw new TokenizerException("Unexpected dot at the end of identifier");
142
        });
143
144
        // State when parser parses any symbol
145
        $dotSymbolIter = tail(function ($rest, $buffer, $acc) use (&$baseIter, &$dotSymbolIter) {
0 ignored issues
show
Bug introduced by
Consider using a different name than the imported variable $dotSymbolIter, or did you forget to import by reference?

It seems like you are assigning to a variable which was imported through a use statement which was not imported by reference.

For clarity, we suggest to use a different name or import by reference depending on whether you would like to have the change visibile in outer-scope.

Change not visible in outer-scope

$x = 1;
$callable = function() use ($x) {
    $x = 2; // Not visible in outer scope. If you would like this, how
            // about using a different variable name than $x?
};

$callable();
var_dump($x); // integer(1)

Change visible in outer-scope

$x = 1;
$callable = function() use (&$x) {
    $x = 2;
};

$callable();
var_dump($x); // integer(2)
Loading history...
Unused Code introduced by
$dotSymbolIter is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
146
            if (sizeof($rest) > 0) {
147
                list ($head, $tail) = toHeadTail($rest);
148
                if ($head == '.') {
149
                    throw new TokenizerException("Only one dot allowed");
150
                }
151
                if ($this->isSymbol($head)) {
152
                    return $dotSymbolIter($tail, $buffer . $head, $acc);
153
                }
154
            }
155
            if (is_numeric($buffer)) {
156
                $symbolToken = new Tokens\NumericToken($buffer);
157
            } else {
158
                $symbolToken = new Tokens\DotIdentifierToken($buffer);
159
            }
160
            return $baseIter($rest, append($acc, $symbolToken));
161
        });
162
163
        // State when parser parses string
164
        $stringIter = tail(function ($rest, $buffer, $acc) use (&$stringIter, &$baseIter, &$escapeIter) {
0 ignored issues
show
Bug introduced by
Consider using a different name than the imported variable $stringIter, or did you forget to import by reference?

It seems like you are assigning to a variable which was imported through a use statement which was not imported by reference.

For clarity, we suggest to use a different name or import by reference depending on whether you would like to have the change visibile in outer-scope.

Change not visible in outer-scope

$x = 1;
$callable = function() use ($x) {
    $x = 2; // Not visible in outer scope. If you would like this, how
            // about using a different variable name than $x?
};

$callable();
var_dump($x); // integer(1)

Change visible in outer-scope

$x = 1;
$callable = function() use (&$x) {
    $x = 2;
};

$callable();
var_dump($x); // integer(2)
Loading history...
165
            if (sizeof($rest) == 0) {
166
                throw new TokenizerException("Unexpected end of string");
167
            }
168
            list ($head, $tail) = toHeadTail($rest);
169
            if ($head == self::TOKEN_DOUBLE_QUOTE) {
170
                return $baseIter($tail, append($acc, new Tokens\StringToken($buffer)));
171
            }
172
            if ($head == Tokenizer::TOKEN_BACK_SLASH) {
173
                return $escapeIter($tail, $buffer, $acc);
174
            }
175
            return $stringIter($tail, $buffer . $head, $acc);
176
        });
177
178
        // State when parser parses escaped symbol
179 View Code Duplication
        $escapeIter = tail(function ($rest, $buffer, $acc) use (&$stringIter) {
0 ignored issues
show
Bug introduced by
Consider using a different name than the imported variable $escapeIter, or did you forget to import by reference?

It seems like you are assigning to a variable which was imported through a use statement which was not imported by reference.

For clarity, we suggest to use a different name or import by reference depending on whether you would like to have the change visibile in outer-scope.

Change not visible in outer-scope

$x = 1;
$callable = function() use ($x) {
    $x = 2; // Not visible in outer scope. If you would like this, how
            // about using a different variable name than $x?
};

$callable();
var_dump($x); // integer(1)

Change visible in outer-scope

$x = 1;
$callable = function() use (&$x) {
    $x = 2;
};

$callable();
var_dump($x); // integer(2)
Loading history...
Unused Code introduced by
$escapeIter is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
180
            if (sizeof($rest) == 0) {
181
                throw new TokenizerException("Unused escape character");
182
            }
183
            list ($head, $tail) = toHeadTail($rest);
184
            return $stringIter($tail, $buffer . $head, $acc);
185
        });
186
187
        // State when parser ignores comments
188
        $commentIter = function ($rest, $buffer, $acc) use (&$commentIter, &$baseIter) {
0 ignored issues
show
Bug introduced by
Consider using a different name than the imported variable $commentIter, or did you forget to import by reference?

It seems like you are assigning to a variable which was imported through a use statement which was not imported by reference.

For clarity, we suggest to use a different name or import by reference depending on whether you would like to have the change visibile in outer-scope.

Change not visible in outer-scope

$x = 1;
$callable = function() use ($x) {
    $x = 2; // Not visible in outer scope. If you would like this, how
            // about using a different variable name than $x?
};

$callable();
var_dump($x); // integer(1)

Change visible in outer-scope

$x = 1;
$callable = function() use (&$x) {
    $x = 2;
};

$callable();
var_dump($x); // integer(2)
Loading history...
Unused Code introduced by
$commentIter is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
189
            if (sizeof($rest) > 0) {
190
                list ($head, $tail) = toHeadTail($rest);
191
                if ($head != Tokenizer::TOKEN_NEW_LINE) {
192
                    return $commentIter($tail, $buffer . $head, $acc);
193
                }
194
            }
195
            return $baseIter($rest, $acc);
196
        };
197
198
        return $baseIter(toArray($code), []);
199
    }
200
}
201