WithoutAccents::setupUtf8Map()   B
last analyzed

Complexity

Conditions 2
Paths 2

Size

Total Lines 341
Code Lines 319

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 0
CRAP Score 6

Importance

Changes 0
Metric Value
eloc 319
dl 0
loc 341
c 0
b 0
f 0
ccs 0
cts 0
cp 0
rs 8
cc 2
nc 2
nop 0
crap 6

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
/**
3
 * Created by gerk on 30.11.17 05:56
4
 */
5
6
namespace PeekAndPoke\Component\Psi\Psi\Str;
7
8
use PeekAndPoke\Component\Psi\UnaryFunction;
9
10
/**
11
 * Replaces all special characters with their "normal" form
12
 *
13
 * Special thanks to WORDPRESS and their authors for this... things were taken from there.
14
 *
15
 * @author Karsten J. Gerber <[email protected]>
16
 */
17
class WithoutAccents implements UnaryFunction
18
{
19
    /** @var string[]|null */
20
    private static $utf8Map;
21
    /** @var string[]|null */
22
    private static $isoIn;
23
    /** @var string[]|null */
24
    private static $isoOut;
25
26 631
    public function __invoke($input)
27
    {
28 631
        if ($input === null || ! is_scalar($input)) {
29 1
            return null;
30
        }
31
32
        // nothing to replace?
33 630
        if (! preg_match('/[\x80-\xff]/', $input)) {
34 6
            return $input;
35
        }
36
37 624
        if ($this->seemsUtf8($input)) {
38 623
            return $this->removeAccentsUtf8($input);
39
        }
40
41
        // Assume ISO-8859-1 if not UTF-8
42 1
        return $this->removeAccentsIso($input);
43
    }
44
45
    /**
46
     * @return string[]
47
     */
48 1
    public static function getUtf8Map()
49
    {
50 1
        self::setupUtf8Map();
51
52 1
        return self::$utf8Map;
53
    }
54
55 624
    private function seemsUtf8($string)
56
    {
57 624
        return mb_detect_encoding($string, 'UTF-8', true) === 'UTF-8';
58
    }
59
60 623
    private function removeAccentsUtf8($input)
61
    {
62 623
        self::setupUtf8Map();
63
64 623
        return strtr($input, self::$utf8Map);
0 ignored issues
show
Bug introduced by
It seems like self::utf8Map can also be of type null; however, parameter $replace_pairs of strtr() does only seem to accept array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

64
        return strtr($input, /** @scrutinizer ignore-type */ self::$utf8Map);
Loading history...
65
    }
66
67 1
    private function removeAccentsIso($string)
68
    {
69 1
        self::setupIsoMap();
70
71 1
        $string    = strtr($string, self::$isoIn, self::$isoOut);
0 ignored issues
show
Bug introduced by
It seems like self::isoOut can also be of type string[]; however, parameter $to of strtr() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

71
        $string    = strtr($string, self::$isoIn, /** @scrutinizer ignore-type */ self::$isoOut);
Loading history...
Bug introduced by
It seems like self::isoIn can also be of type null; however, parameter $replace_pairs of strtr() does only seem to accept array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

71
        $string    = strtr($string, /** @scrutinizer ignore-type */ self::$isoIn, self::$isoOut);
Loading history...
72 1
        $doubleIn  = [chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254)];
73 1
        $doubleOut = ['OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th'];
74
75 1
        return str_replace($doubleIn, $doubleOut, $string);
76
    }
77
78
    /**
79
     * @codeCoverageIgnore
80
     */
81
    public static function setupIsoMap()
82
    {
83
        if (self::$isoIn !== null) {
84
            return;
85
        }
86
87
        self::$isoIn =
88
            chr(128) . chr(131) . chr(138) . chr(142) . chr(154) . chr(158)
0 ignored issues
show
Documentation Bug introduced by
It seems like chr(128) . chr(131) . ch...) . chr(253) . chr(255) of type string is incompatible with the declared type null|string[] of property $isoIn.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
89
            . chr(159) . chr(162) . chr(165) . chr(181) . chr(192) . chr(193) . chr(194)
90
            . chr(195) . chr(196) . chr(197) . chr(199) . chr(200) . chr(201) . chr(202)
91
            . chr(203) . chr(204) . chr(205) . chr(206) . chr(207) . chr(209) . chr(210)
92
            . chr(211) . chr(212) . chr(213) . chr(214) . chr(216) . chr(217) . chr(218)
93
            . chr(219) . chr(220) . chr(221) . chr(224) . chr(225) . chr(226) . chr(227)
94
            . chr(228) . chr(229) . chr(231) . chr(232) . chr(233) . chr(234) . chr(235)
95
            . chr(236) . chr(237) . chr(238) . chr(239) . chr(241) . chr(242) . chr(243)
96
            . chr(244) . chr(245) . chr(246) . chr(248) . chr(249) . chr(250) . chr(251)
97
            . chr(252) . chr(253) . chr(255);
98
99
        self::$isoOut = 'EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy';
0 ignored issues
show
Documentation Bug introduced by
It seems like 'EfSZszYcYuAAAAAACEEEEII...ceeeeiiiinoooooouuuuyy' of type string is incompatible with the declared type null|string[] of property $isoOut.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
100
    }
101
102
    /**
103
     * @codeCoverageIgnore
104
     */
105
    public static function setupUtf8Map()
106
    {
107
        if (self::$utf8Map !== null) {
108
            return;
109
        }
110
111
        self::$utf8Map = [
112
            // Decompositions for Latin-1 Supplement
113
            chr(194) . chr(170)            => 'a',
114
            chr(194) . chr(186)            => 'o',
115
            chr(195) . chr(128)            => 'A',
116
            chr(195) . chr(129)            => 'A',
117
            chr(195) . chr(130)            => 'A',
118
            chr(195) . chr(131)            => 'A',
119
            chr(195) . chr(132)            => 'A',
120
            chr(195) . chr(133)            => 'A',
121
            chr(195) . chr(134)            => 'AE',
122
            chr(195) . chr(135)            => 'C',
123
            chr(195) . chr(136)            => 'E',
124
            chr(195) . chr(137)            => 'E',
125
            chr(195) . chr(138)            => 'E',
126
            chr(195) . chr(139)            => 'E',
127
            chr(195) . chr(140)            => 'I',
128
            chr(195) . chr(141)            => 'I',
129
            chr(195) . chr(142)            => 'I',
130
            chr(195) . chr(143)            => 'I',
131
            chr(195) . chr(144)            => 'D',
132
            chr(195) . chr(145)            => 'N',
133
            chr(195) . chr(146)            => 'O',
134
            chr(195) . chr(147)            => 'O',
135
            chr(195) . chr(148)            => 'O',
136
            chr(195) . chr(149)            => 'O',
137
            chr(195) . chr(150)            => 'O',
138
            chr(195) . chr(153)            => 'U',
139
            chr(195) . chr(154)            => 'U',
140
            chr(195) . chr(155)            => 'U',
141
            chr(195) . chr(156)            => 'U',
142
            chr(195) . chr(157)            => 'Y',
143
            chr(195) . chr(158)            => 'TH',
144
            chr(195) . chr(159)            => 's',
145
            chr(195) . chr(160)            => 'a',
146
            chr(195) . chr(161)            => 'a',
147
            chr(195) . chr(162)            => 'a',
148
            chr(195) . chr(163)            => 'a',
149
            chr(195) . chr(164)            => 'a',
150
            chr(195) . chr(165)            => 'a',
151
            chr(195) . chr(166)            => 'ae',
152
            chr(195) . chr(167)            => 'c',
153
            chr(195) . chr(168)            => 'e',
154
            chr(195) . chr(169)            => 'e',
155
            chr(195) . chr(170)            => 'e',
156
            chr(195) . chr(171)            => 'e',
157
            chr(195) . chr(172)            => 'i',
158
            chr(195) . chr(173)            => 'i',
159
            chr(195) . chr(174)            => 'i',
160
            chr(195) . chr(175)            => 'i',
161
            chr(195) . chr(176)            => 'd',
162
            chr(195) . chr(177)            => 'n',
163
            chr(195) . chr(178)            => 'o',
164
            chr(195) . chr(179)            => 'o',
165
            chr(195) . chr(180)            => 'o',
166
            chr(195) . chr(181)            => 'o',
167
            chr(195) . chr(182)            => 'o',
168
            chr(195) . chr(184)            => 'o',
169
            chr(195) . chr(185)            => 'u',
170
            chr(195) . chr(186)            => 'u',
171
            chr(195) . chr(187)            => 'u',
172
            chr(195) . chr(188)            => 'u',
173
            chr(195) . chr(189)            => 'y',
174
            chr(195) . chr(190)            => 'th',
175
            chr(195) . chr(191)            => 'y',
176
            chr(195) . chr(152)            => 'O',
177
            // Decompositions for Latin Extended-A
178
            chr(196) . chr(128)            => 'A',
179
            chr(196) . chr(129)            => 'a',
180
            chr(196) . chr(130)            => 'A',
181
            chr(196) . chr(131)            => 'a',
182
            chr(196) . chr(132)            => 'A',
183
            chr(196) . chr(133)            => 'a',
184
            chr(196) . chr(134)            => 'C',
185
            chr(196) . chr(135)            => 'c',
186
            chr(196) . chr(136)            => 'C',
187
            chr(196) . chr(137)            => 'c',
188
            chr(196) . chr(138)            => 'C',
189
            chr(196) . chr(139)            => 'c',
190
            chr(196) . chr(140)            => 'C',
191
            chr(196) . chr(141)            => 'c',
192
            chr(196) . chr(142)            => 'D',
193
            chr(196) . chr(143)            => 'd',
194
            chr(196) . chr(144)            => 'D',
195
            chr(196) . chr(145)            => 'd',
196
            chr(196) . chr(146)            => 'E',
197
            chr(196) . chr(147)            => 'e',
198
            chr(196) . chr(148)            => 'E',
199
            chr(196) . chr(149)            => 'e',
200
            chr(196) . chr(150)            => 'E',
201
            chr(196) . chr(151)            => 'e',
202
            chr(196) . chr(152)            => 'E',
203
            chr(196) . chr(153)            => 'e',
204
            chr(196) . chr(154)            => 'E',
205
            chr(196) . chr(155)            => 'e',
206
            chr(196) . chr(156)            => 'G',
207
            chr(196) . chr(157)            => 'g',
208
            chr(196) . chr(158)            => 'G',
209
            chr(196) . chr(159)            => 'g',
210
            chr(196) . chr(160)            => 'G',
211
            chr(196) . chr(161)            => 'g',
212
            chr(196) . chr(162)            => 'G',
213
            chr(196) . chr(163)            => 'g',
214
            chr(196) . chr(164)            => 'H',
215
            chr(196) . chr(165)            => 'h',
216
            chr(196) . chr(166)            => 'H',
217
            chr(196) . chr(167)            => 'h',
218
            chr(196) . chr(168)            => 'I',
219
            chr(196) . chr(169)            => 'i',
220
            chr(196) . chr(170)            => 'I',
221
            chr(196) . chr(171)            => 'i',
222
            chr(196) . chr(172)            => 'I',
223
            chr(196) . chr(173)            => 'i',
224
            chr(196) . chr(174)            => 'I',
225
            chr(196) . chr(175)            => 'i',
226
            chr(196) . chr(176)            => 'I',
227
            chr(196) . chr(177)            => 'i',
228
            chr(196) . chr(178)            => 'IJ',
229
            chr(196) . chr(179)            => 'ij',
230
            chr(196) . chr(180)            => 'J',
231
            chr(196) . chr(181)            => 'j',
232
            chr(196) . chr(182)            => 'K',
233
            chr(196) . chr(183)            => 'k',
234
            chr(196) . chr(184)            => 'k',
235
            chr(196) . chr(185)            => 'L',
236
            chr(196) . chr(186)            => 'l',
237
            chr(196) . chr(187)            => 'L',
238
            chr(196) . chr(188)            => 'l',
239
            chr(196) . chr(189)            => 'L',
240
            chr(196) . chr(190)            => 'l',
241
            chr(196) . chr(191)            => 'L',
242
            chr(197) . chr(128)            => 'l',
243
            chr(197) . chr(129)            => 'L',
244
            chr(197) . chr(130)            => 'l',
245
            chr(197) . chr(131)            => 'N',
246
            chr(197) . chr(132)            => 'n',
247
            chr(197) . chr(133)            => 'N',
248
            chr(197) . chr(134)            => 'n',
249
            chr(197) . chr(135)            => 'N',
250
            chr(197) . chr(136)            => 'n',
251
            chr(197) . chr(137)            => 'N',
252
            chr(197) . chr(138)            => 'n',
253
            chr(197) . chr(139)            => 'N',
254
            chr(197) . chr(140)            => 'O',
255
            chr(197) . chr(141)            => 'o',
256
            chr(197) . chr(142)            => 'O',
257
            chr(197) . chr(143)            => 'o',
258
            chr(197) . chr(144)            => 'O',
259
            chr(197) . chr(145)            => 'o',
260
            chr(197) . chr(146)            => 'OE',
261
            chr(197) . chr(147)            => 'oe',
262
            chr(197) . chr(148)            => 'R',
263
            chr(197) . chr(149)            => 'r',
264
            chr(197) . chr(150)            => 'R',
265
            chr(197) . chr(151)            => 'r',
266
            chr(197) . chr(152)            => 'R',
267
            chr(197) . chr(153)            => 'r',
268
            chr(197) . chr(154)            => 'S',
269
            chr(197) . chr(155)            => 's',
270
            chr(197) . chr(156)            => 'S',
271
            chr(197) . chr(157)            => 's',
272
            chr(197) . chr(158)            => 'S',
273
            chr(197) . chr(159)            => 's',
274
            chr(197) . chr(160)            => 'S',
275
            chr(197) . chr(161)            => 's',
276
            chr(197) . chr(162)            => 'T',
277
            chr(197) . chr(163)            => 't',
278
            chr(197) . chr(164)            => 'T',
279
            chr(197) . chr(165)            => 't',
280
            chr(197) . chr(166)            => 'T',
281
            chr(197) . chr(167)            => 't',
282
            chr(197) . chr(168)            => 'U',
283
            chr(197) . chr(169)            => 'u',
284
            chr(197) . chr(170)            => 'U',
285
            chr(197) . chr(171)            => 'u',
286
            chr(197) . chr(172)            => 'U',
287
            chr(197) . chr(173)            => 'u',
288
            chr(197) . chr(174)            => 'U',
289
            chr(197) . chr(175)            => 'u',
290
            chr(197) . chr(176)            => 'U',
291
            chr(197) . chr(177)            => 'u',
292
            chr(197) . chr(178)            => 'U',
293
            chr(197) . chr(179)            => 'u',
294
            chr(197) . chr(180)            => 'W',
295
            chr(197) . chr(181)            => 'w',
296
            chr(197) . chr(182)            => 'Y',
297
            chr(197) . chr(183)            => 'y',
298
            chr(197) . chr(184)            => 'Y',
299
            chr(197) . chr(185)            => 'Z',
300
            chr(197) . chr(186)            => 'z',
301
            chr(197) . chr(187)            => 'Z',
302
            chr(197) . chr(188)            => 'z',
303
            chr(197) . chr(189)            => 'Z',
304
            chr(197) . chr(190)            => 'z',
305
            chr(197) . chr(191)            => 's',
306
            // Decompositions for Latin Extended-B
307
            chr(200) . chr(152)            => 'S',
308
            chr(200) . chr(153)            => 's',
309
            chr(200) . chr(154)            => 'T',
310
            chr(200) . chr(155)            => 't',
311
            // Euro Sign
312
            chr(226) . chr(130) . chr(172) => 'E',
313
            // GBP (Pound) Sign
314
            chr(194) . chr(163)            => '',
315
            // Vowels with diacritic (Vietnamese)
316
            // unmarked
317
            chr(198) . chr(160)            => 'O',
318
            chr(198) . chr(161)            => 'o',
319
            chr(198) . chr(175)            => 'U',
320
            chr(198) . chr(176)            => 'u',
321
            // grave accent
322
            chr(225) . chr(186) . chr(166) => 'A',
323
            chr(225) . chr(186) . chr(167) => 'a',
324
            chr(225) . chr(186) . chr(176) => 'A',
325
            chr(225) . chr(186) . chr(177) => 'a',
326
            chr(225) . chr(187) . chr(128) => 'E',
327
            chr(225) . chr(187) . chr(129) => 'e',
328
            chr(225) . chr(187) . chr(146) => 'O',
329
            chr(225) . chr(187) . chr(147) => 'o',
330
            chr(225) . chr(187) . chr(156) => 'O',
331
            chr(225) . chr(187) . chr(157) => 'o',
332
            chr(225) . chr(187) . chr(170) => 'U',
333
            chr(225) . chr(187) . chr(171) => 'u',
334
            chr(225) . chr(187) . chr(178) => 'Y',
335
            chr(225) . chr(187) . chr(179) => 'y',
336
            // hook
337
            chr(225) . chr(186) . chr(162) => 'A',
338
            chr(225) . chr(186) . chr(163) => 'a',
339
            chr(225) . chr(186) . chr(168) => 'A',
340
            chr(225) . chr(186) . chr(169) => 'a',
341
            chr(225) . chr(186) . chr(178) => 'A',
342
            chr(225) . chr(186) . chr(179) => 'a',
343
            chr(225) . chr(186) . chr(186) => 'E',
344
            chr(225) . chr(186) . chr(187) => 'e',
345
            chr(225) . chr(187) . chr(130) => 'E',
346
            chr(225) . chr(187) . chr(131) => 'e',
347
            chr(225) . chr(187) . chr(136) => 'I',
348
            chr(225) . chr(187) . chr(137) => 'i',
349
            chr(225) . chr(187) . chr(142) => 'O',
350
            chr(225) . chr(187) . chr(143) => 'o',
351
            chr(225) . chr(187) . chr(148) => 'O',
352
            chr(225) . chr(187) . chr(149) => 'o',
353
            chr(225) . chr(187) . chr(158) => 'O',
354
            chr(225) . chr(187) . chr(159) => 'o',
355
            chr(225) . chr(187) . chr(166) => 'U',
356
            chr(225) . chr(187) . chr(167) => 'u',
357
            chr(225) . chr(187) . chr(172) => 'U',
358
            chr(225) . chr(187) . chr(173) => 'u',
359
            chr(225) . chr(187) . chr(182) => 'Y',
360
            chr(225) . chr(187) . chr(183) => 'y',
361
            // tilde
362
            chr(225) . chr(186) . chr(170) => 'A',
363
            chr(225) . chr(186) . chr(171) => 'a',
364
            chr(225) . chr(186) . chr(180) => 'A',
365
            chr(225) . chr(186) . chr(181) => 'a',
366
            chr(225) . chr(186) . chr(188) => 'E',
367
            chr(225) . chr(186) . chr(189) => 'e',
368
            chr(225) . chr(187) . chr(132) => 'E',
369
            chr(225) . chr(187) . chr(133) => 'e',
370
            chr(225) . chr(187) . chr(150) => 'O',
371
            chr(225) . chr(187) . chr(151) => 'o',
372
            chr(225) . chr(187) . chr(160) => 'O',
373
            chr(225) . chr(187) . chr(161) => 'o',
374
            chr(225) . chr(187) . chr(174) => 'U',
375
            chr(225) . chr(187) . chr(175) => 'u',
376
            chr(225) . chr(187) . chr(184) => 'Y',
377
            chr(225) . chr(187) . chr(185) => 'y',
378
            // acute accent
379
            chr(225) . chr(186) . chr(164) => 'A',
380
            chr(225) . chr(186) . chr(165) => 'a',
381
            chr(225) . chr(186) . chr(174) => 'A',
382
            chr(225) . chr(186) . chr(175) => 'a',
383
            chr(225) . chr(186) . chr(190) => 'E',
384
            chr(225) . chr(186) . chr(191) => 'e',
385
            chr(225) . chr(187) . chr(144) => 'O',
386
            chr(225) . chr(187) . chr(145) => 'o',
387
            chr(225) . chr(187) . chr(154) => 'O',
388
            chr(225) . chr(187) . chr(155) => 'o',
389
            chr(225) . chr(187) . chr(168) => 'U',
390
            chr(225) . chr(187) . chr(169) => 'u',
391
            // dot below
392
            chr(225) . chr(186) . chr(160) => 'A',
393
            chr(225) . chr(186) . chr(161) => 'a',
394
            chr(225) . chr(186) . chr(172) => 'A',
395
            chr(225) . chr(186) . chr(173) => 'a',
396
            chr(225) . chr(186) . chr(182) => 'A',
397
            chr(225) . chr(186) . chr(183) => 'a',
398
            chr(225) . chr(186) . chr(184) => 'E',
399
            chr(225) . chr(186) . chr(185) => 'e',
400
            chr(225) . chr(187) . chr(134) => 'E',
401
            chr(225) . chr(187) . chr(135) => 'e',
402
            chr(225) . chr(187) . chr(138) => 'I',
403
            chr(225) . chr(187) . chr(139) => 'i',
404
            chr(225) . chr(187) . chr(140) => 'O',
405
            chr(225) . chr(187) . chr(141) => 'o',
406
            chr(225) . chr(187) . chr(152) => 'O',
407
            chr(225) . chr(187) . chr(153) => 'o',
408
            chr(225) . chr(187) . chr(162) => 'O',
409
            chr(225) . chr(187) . chr(163) => 'o',
410
            chr(225) . chr(187) . chr(164) => 'U',
411
            chr(225) . chr(187) . chr(165) => 'u',
412
            chr(225) . chr(187) . chr(176) => 'U',
413
            chr(225) . chr(187) . chr(177) => 'u',
414
            chr(225) . chr(187) . chr(180) => 'Y',
415
            chr(225) . chr(187) . chr(181) => 'y',
416
            // Vowels with diacritic (Chinese, Hanyu Pinyin)
417
            chr(201) . chr(145)            => 'a',
418
            // macron
419
            chr(199) . chr(149)            => 'U',
420
            chr(199) . chr(150)            => 'u',
421
            // acute accent
422
            chr(199) . chr(151)            => 'U',
423
            chr(199) . chr(152)            => 'u',
424
            // caron
425
            chr(199) . chr(141)            => 'A',
426
            chr(199) . chr(142)            => 'a',
427
            chr(199) . chr(143)            => 'I',
428
            chr(199) . chr(144)            => 'i',
429
            chr(199) . chr(145)            => 'O',
430
            chr(199) . chr(146)            => 'o',
431
            chr(199) . chr(147)            => 'U',
432
            chr(199) . chr(148)            => 'u',
433
            chr(199) . chr(153)            => 'U',
434
            chr(199) . chr(154)            => 'u',
435
            // grave accent
436
            chr(199) . chr(155)            => 'U',
437
            chr(199) . chr(156)            => 'u',
438
            // german umlauts
439
            chr(195) . chr(132)            => 'Ae',
440
            chr(195) . chr(164)            => 'ae',
441
            chr(195) . chr(150)            => 'Oe',
442
            chr(195) . chr(182)            => 'oe',
443
            chr(195) . chr(156)            => 'Ue',
444
            chr(195) . chr(188)            => 'ue',
445
            chr(195) . chr(159)            => 'ss',
446
        ];
447
    }
448
}
449