| Conditions | 17 |
| Paths | 17 |
| Total Lines | 69 |
| Code Lines | 34 |
| Lines | 0 |
| Ratio | 0 % |
| Changes | 1 | ||
| Bugs | 0 | Features | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
| 1 | <?php |
||
| 215 | protected function findUtf8CharAt($offset) |
||
| 216 | { |
||
| 217 | $byte = \ord($this->raw{$offset}); |
||
| 218 | |||
| 219 | if ($byte <= 0b01111111) { |
||
| 220 | // ASCII passthru, 1 byte long |
||
| 221 | return [$offset, 1, true, $byte]; |
||
| 222 | } |
||
| 223 | |||
| 224 | if ($byte <= 0b10111111) { |
||
| 225 | // either part of a UTF8 char, or an invalid UTF8 codepoint. |
||
| 226 | // try to find start of UTF8 char |
||
| 227 | $original = $offset; |
||
| 228 | while ($offset > 0 && $original - $offset < 4) { |
||
| 229 | $prev = \ord($this->raw{--$offset}); |
||
| 230 | |||
| 231 | if ($prev <= 0b01111111) { |
||
| 232 | // prev is plain ASCII so current char can't be valid |
||
| 233 | return [$original, 1, false, $byte]; |
||
| 234 | } |
||
| 235 | |||
| 236 | if ($prev <= 0b10111111) { |
||
| 237 | // prev is also part of a UTF8 char, so keep looking |
||
| 238 | continue; |
||
| 239 | } |
||
| 240 | |||
| 241 | if ($prev == 0xC0 || $prev == 0xC1) { |
||
| 242 | // prev is an invalid UTF8 starter for overlong ASCII |
||
| 243 | return [$offset, 2, false, $byte]; |
||
| 244 | } |
||
| 245 | |||
| 246 | if ($prev <= 0b11110100) { |
||
| 247 | // prev is valid start byte, validate length to check this char |
||
| 248 | $length = self::charLength($prev); |
||
| 249 | |||
| 250 | if ($original < $offset + $length) { |
||
| 251 | return [$offset, $length, true, $byte]; |
||
| 252 | } |
||
| 253 | } |
||
| 254 | return [$original, 1, false, $byte]; |
||
| 255 | } |
||
| 256 | return [$original, 1, false, $byte]; |
||
| 257 | } |
||
| 258 | |||
| 259 | if ($byte <= 0b11110100) { |
||
| 260 | // valid UTF8 start byte, find the rest, determine if length is valid |
||
| 261 | $actual = $length = self::charLength($byte); |
||
| 262 | |||
| 263 | for ($i = 1; $i < $length; $i++) { |
||
| 264 | if ($offset + $i >= $this->length()) { |
||
| 265 | $actual = $i - 1; |
||
| 266 | break; |
||
| 267 | } |
||
| 268 | $last = \ord($this->raw{$offset + $i}); |
||
| 269 | if ($last < 0b10000000 || $last > 0b10111111) { |
||
| 270 | $actual = $i; |
||
| 271 | break; |
||
| 272 | } |
||
| 273 | } |
||
| 274 | |||
| 275 | if ($actual !== $length) { |
||
| 276 | return [$offset, $actual, false, $byte]; |
||
| 277 | } |
||
| 278 | return [$offset, $length, true, $byte]; |
||
| 279 | } |
||
| 280 | |||
| 281 | // if 245 to 255, Windows-1252 passthru |
||
| 282 | return [$offset, 1, false, $byte]; |
||
| 283 | } |
||
| 284 | |||
| 302 |
Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.
You can also find more detailed suggestions in the “Code” section of your repository.