Completed
Push — master ( e96f63...7cd0ea )
by Rob
02:01
created

Utf8::chr()   D

Complexity

Conditions 18
Paths 92

Size

Total Lines 94

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 13
CRAP Score 143.6275

Importance

Changes 0
Metric Value
dl 0
loc 94
ccs 13
cts 48
cp 0.2708
rs 4.0242
c 0
b 0
f 0
cc 18
nc 92
nop 2
crap 143.6275

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace devtoolboxuk\soteria\voku\Resources;
4
5
class Utf8 extends Resources
0 ignored issues
show
Coding Style introduced by
The property $BROKEN_UTF8_FIX is not named in camelCase.

This check marks property names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
The property $WIN1252_TO_UTF8 is not named in camelCase.

This check marks property names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
The property $BIDI_UNI_CODE_CONTROLS_TABLE is not named in camelCase.

This check marks property names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
The property $WHITESPACE_TABLE is not named in camelCase.

This check marks property names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This class has 2058 lines of code which exceeds the configured maximum of 1000.

Really long classes often contain too much logic and violate the single responsibility principle.

We suggest to take a look at the “Code” section for options on how to refactor this code.

Loading history...
Complexity introduced by
This class has a complexity of 386 which exceeds the configured maximum of 50.

The class complexity is the sum of the complexity of all methods. A very high value is usually an indication that your class does not follow the single reponsibility principle and does more than one job.

Some resources for further reading:

You can also find more detailed suggestions for refactoring in the “Code” section of your repository.

Loading history...
6
{
7
8
    private $system;
9
    private $ENCODINGS;
10
    private $SUPPORT = [];
11
    private $BROKEN_UTF8_FIX;
12
    private $ORD;
13
    private $CHR;
14
    private $WIN1252_TO_UTF8;
15
    private $BOM = [
16
        "\xef\xbb\xbf" => 3, // UTF-8 BOM
17
        '' => 6, // UTF-8 BOM as "WINDOWS-1252" (one char has [maybe] more then one byte ...)
18
        "\x00\x00\xfe\xff" => 4, // UTF-32 (BE) BOM
19
        '  þÿ' => 6, // UTF-32 (BE) BOM as "WINDOWS-1252"
20
        "\xff\xfe\x00\x00" => 4, // UTF-32 (LE) BOM
21
        'ÿþ  ' => 6, // UTF-32 (LE) BOM as "WINDOWS-1252"
22
        "\xfe\xff" => 2, // UTF-16 (BE) BOM
23
        'þÿ' => 4, // UTF-16 (BE) BOM as "WINDOWS-1252"
24
        "\xff\xfe" => 2, // UTF-16 (LE) BOM
25
        'ÿþ' => 4, // UTF-16 (LE) BOM as "WINDOWS-1252"
26
    ];
27
28
    private $BIDI_UNI_CODE_CONTROLS_TABLE = [
29
        // LEFT-TO-RIGHT EMBEDDING (use -> dir = "ltr")
30
        8234 => "\xE2\x80\xAA",
31
        // RIGHT-TO-LEFT EMBEDDING (use -> dir = "rtl")
32
        8235 => "\xE2\x80\xAB",
33
        // POP DIRECTIONAL FORMATTING // (use -> </bdo>)
34
        8236 => "\xE2\x80\xAC",
35
        // LEFT-TO-RIGHT OVERRIDE // (use -> <bdo dir = "ltr">)
36
        8237 => "\xE2\x80\xAD",
37
        // RIGHT-TO-LEFT OVERRIDE // (use -> <bdo dir = "rtl">)
38
        8238 => "\xE2\x80\xAE",
39
        // LEFT-TO-RIGHT ISOLATE // (use -> dir = "ltr")
40
        8294 => "\xE2\x81\xA6",
41
        // RIGHT-TO-LEFT ISOLATE // (use -> dir = "rtl")
42
        8295 => "\xE2\x81\xA7",
43
        // FIRST STRONG ISOLATE // (use -> dir = "auto")
44
        8296 => "\xE2\x81\xA8",
45
        // POP DIRECTIONAL ISOLATE
46
        8297 => "\xE2\x81\xA9",
47
    ];
48
    private $WHITESPACE = [
0 ignored issues
show
Unused Code introduced by
The property $WHITESPACE is not used and could be removed.

This check marks private properties in classes that are never used. Those properties can be removed.

Loading history...
49
        // NUL Byte
50
        0 => "\x0",
51
        // Tab
52
        9 => "\x9",
53
        // New Line
54
        10 => "\xa",
55
        // Vertical Tab
56
        11 => "\xb",
57
        // Carriage Return
58
        13 => "\xd",
59
        // Ordinary Space
60
        32 => "\x20",
61
        // NO-BREAK SPACE
62
        160 => "\xc2\xa0",
63
        // OGHAM SPACE MARK
64
        5760 => "\xe1\x9a\x80",
65
        // MONGOLIAN VOWEL SEPARATOR
66
        6158 => "\xe1\xa0\x8e",
67
        // EN QUAD
68
        8192 => "\xe2\x80\x80",
69
        // EM QUAD
70
        8193 => "\xe2\x80\x81",
71
        // EN SPACE
72
        8194 => "\xe2\x80\x82",
73
        // EM SPACE
74
        8195 => "\xe2\x80\x83",
75
        // THREE-PER-EM SPACE
76
        8196 => "\xe2\x80\x84",
77
        // FOUR-PER-EM SPACE
78
        8197 => "\xe2\x80\x85",
79
        // SIX-PER-EM SPACE
80
        8198 => "\xe2\x80\x86",
81
        // FIGURE SPACE
82
        8199 => "\xe2\x80\x87",
83
        // PUNCTUATION SPACE
84
        8200 => "\xe2\x80\x88",
85
        // THIN SPACE
86
        8201 => "\xe2\x80\x89",
87
        //HAIR SPACE
88
        8202 => "\xe2\x80\x8a",
89
        // LINE SEPARATOR
90
        8232 => "\xe2\x80\xa8",
91
        // PARAGRAPH SEPARATOR
92
        8233 => "\xe2\x80\xa9",
93
        // NARROW NO-BREAK SPACE
94
        8239 => "\xe2\x80\xaf",
95
        // MEDIUM MATHEMATICAL SPACE
96
        8287 => "\xe2\x81\x9f",
97
        // IDEOGRAPHIC SPACE
98
        12288 => "\xe3\x80\x80",
99
    ];
100
    /**
101
     * @var array
102
     */
103
    private $WHITESPACE_TABLE = [
104
        'SPACE' => "\x20",
105
        'NO-BREAK SPACE' => "\xc2\xa0",
106
        'OGHAM SPACE MARK' => "\xe1\x9a\x80",
107
        'EN QUAD' => "\xe2\x80\x80",
108
        'EM QUAD' => "\xe2\x80\x81",
109
        'EN SPACE' => "\xe2\x80\x82",
110
        'EM SPACE' => "\xe2\x80\x83",
111
        'THREE-PER-EM SPACE' => "\xe2\x80\x84",
112
        'FOUR-PER-EM SPACE' => "\xe2\x80\x85",
113
        'SIX-PER-EM SPACE' => "\xe2\x80\x86",
114
        'FIGURE SPACE' => "\xe2\x80\x87",
115
        'PUNCTUATION SPACE' => "\xe2\x80\x88",
116
        'THIN SPACE' => "\xe2\x80\x89",
117
        'HAIR SPACE' => "\xe2\x80\x8a",
118
        'LINE SEPARATOR' => "\xe2\x80\xa8",
119
        'PARAGRAPH SEPARATOR' => "\xe2\x80\xa9",
120
        'ZERO WIDTH SPACE' => "\xe2\x80\x8b",
121
        'NARROW NO-BREAK SPACE' => "\xe2\x80\xaf",
122
        'MEDIUM MATHEMATICAL SPACE' => "\xe2\x81\x9f",
123
        'IDEOGRAPHIC SPACE' => "\xe3\x80\x80",
124
    ];
125
126 6
    function __construct()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for __construct.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
127
    {
128 6
        $this->system = new System();
129 6
        $this->checkForSupport();
130 6
    }
131
132 6
    private function checkForSupport()
0 ignored issues
show
Complexity introduced by
This operation has 13 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
133
    {
134 6
        if (!isset($this->SUPPORT['already_checked_via_portable_utf8'])) {
135 6
            $this->SUPPORT['already_checked_via_portable_utf8'] = true;
136
137
            // http://php.net/manual/en/book.mbstring.php
138 6
            $this->SUPPORT['mbstring'] = $this->system->mbstring_loaded();
139 6
            $this->SUPPORT['mbstring_func_overload'] = $this->system->mbstring_overloaded();
140 6
            if ($this->SUPPORT['mbstring'] === true) {
141 6
                \mb_internal_encoding('UTF-8');
142
                /** @noinspection UnusedFunctionResultInspection */
143
                /** @noinspection PhpComposerExtensionStubsInspection */
144 6
                \mb_regex_encoding('UTF-8');
145 6
                $this->SUPPORT['mbstring_internal_encoding'] = 'UTF-8';
146 6
            }
147
148
            // http://php.net/manual/en/book.iconv.php
149 6
            $this->SUPPORT['iconv'] = $this->system->iconv_loaded();
150
151
            // http://php.net/manual/en/book.intl.php
152 6
            $this->SUPPORT['intl'] = $this->system->intl_loaded();
153 6
            $this->SUPPORT['intl__transliterator_list_ids'] = [];
154
155
            if (
156 6
                $this->SUPPORT['intl'] === true
157 6
                &&
158 6
                \function_exists('transliterator_list_ids') === true
159 6
            ) {
160
                /** @noinspection PhpComposerExtensionStubsInspection */
161 6
                $this->SUPPORT['intl__transliterator_list_ids'] = \transliterator_list_ids();
162 6
            }
163
164
            // http://php.net/manual/en/class.intlchar.php
165 6
            $this->SUPPORT['intlChar'] = $this->system->intlChar_loaded();
166
167
            // http://php.net/manual/en/book.ctype.php
168 6
            $this->SUPPORT['ctype'] = $this->system->ctype_loaded();
169
170
            // http://php.net/manual/en/class.finfo.php
171 6
            $this->SUPPORT['finfo'] = $this->system->finfo_loaded();
172
173
            // http://php.net/manual/en/book.json.php
174 6
            $this->SUPPORT['json'] = $this->system->json_loaded();
175
176
            // http://php.net/manual/en/book.pcre.php
177 6
            $this->SUPPORT['pcre_utf8'] = $this->system->pcre_utf8_support();
178
179 6
            $this->SUPPORT['symfony_polyfill_used'] = $this->system->symfony_polyfill_used();
180 6
            if ($this->SUPPORT['symfony_polyfill_used'] === true) {
181
                \mb_internal_encoding('UTF-8');
182
                $this->SUPPORT['mbstring_internal_encoding'] = 'UTF-8';
183
            }
184 6
        }
185 6
    }
186
187 6
    public function rawurldecode($str, $multi_decode = true)
0 ignored issues
show
introduced by
The method rawurldecode has a boolean flag argument $multi_decode, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Coding Style Naming introduced by
The parameter $multi_decode is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $multi_decode is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $str_compare is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 60 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
188
    {
189 6
        if ($str === '') {
190
            return '';
191
        }
192
193 6
        if (strpos($str, '&') === false && strpos($str, '%') === false && strpos($str, '+') === false && strpos($str, '\u') === false) {
194 6
            return $this->fix_simple_utf8($str);
195
        }
196
197 6
        $pattern = '/%u([0-9a-fA-F]{3,4})/';
198 6
        if (preg_match($pattern, $str)) {
199
            $str = (string)preg_replace($pattern, '&#x\\1;', rawurldecode($str));
200
        }
201
202 6
        $flags = \ENT_QUOTES | \ENT_HTML5;
203
204 6
        if ($multi_decode === true) {
205
            do {
206 6
                $str_compare = $str;
207
208
                /**
209
                 * @psalm-suppress PossiblyInvalidArgument
210
                 */
211 6
                $str = $this->fix_simple_utf8(rawurldecode($this->html_entity_decode($this->to_utf8($str), $flags))
212 6
                );
213 6
            } while ($str_compare !== $str);
214 6
        }
215
216 6
        return $str;
217
    }
218
219 6
    private function fix_simple_utf8($str)
0 ignored issues
show
Coding Style Naming introduced by
The method fix_simple_utf8 is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $BROKEN_UTF8_TO_UTF8_KEYS_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $BROKEN_UTF8_TO_UTF8_VALUES_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::fix_simple_utf8" is not in camel caps format
Loading history...
220
    {
221 6
        if ($str === '') {
222
            return '';
223
        }
224
225 6
        static $BROKEN_UTF8_TO_UTF8_KEYS_CACHE = null;
0 ignored issues
show
Comprehensibility Naming introduced by
The variable name $BROKEN_UTF8_TO_UTF8_KEYS_CACHE exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
226 6
        static $BROKEN_UTF8_TO_UTF8_VALUES_CACHE = null;
0 ignored issues
show
Comprehensibility Naming introduced by
The variable name $BROKEN_UTF8_TO_UTF8_VALUES_CACHE exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
227
228 6
        if ($BROKEN_UTF8_TO_UTF8_KEYS_CACHE === null) {
229 1
            if ($this->BROKEN_UTF8_FIX === null) {
230 1
                $this->BROKEN_UTF8_FIX = $this->getData('utf8_fix');
231 1
            }
232
233 1
            $BROKEN_UTF8_TO_UTF8_KEYS_CACHE = array_keys($this->BROKEN_UTF8_FIX);
234 1
            $BROKEN_UTF8_TO_UTF8_VALUES_CACHE = array_values($this->BROKEN_UTF8_FIX);
235 1
        }
236
237 6
        return \str_replace($BROKEN_UTF8_TO_UTF8_KEYS_CACHE, $BROKEN_UTF8_TO_UTF8_VALUES_CACHE, $str);
238
    }
239
240 2
    private function getData($file)
241
    {
242
243 2
        return include __DIR__ . '/../Data/' . $file . '.php';
244
    }
245
246 6
    private function html_entity_decode($str, $flags = null, $encoding = 'UTF-8')
0 ignored issues
show
Complexity introduced by
This operation has 1440 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The method html_entity_decode is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $str_compare is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::html_entity_decode" is not in camel caps format
Loading history...
247
    {
248
        if (
249 6
            !isset($str[3]) // examples: &; || &x;
250 6
            ||
251 6
            strpos($str, '&') === false // no "&"
252 6
        ) {
253 6
            return $str;
254
        }
255
256 6
        if ($encoding !== 'UTF-8' && $encoding !== 'CP850') {
257
            $encoding = $this->normalize_encoding($encoding, 'UTF-8');
258
        }
259
260 6
        if ($flags === null) {
261
            $flags = \ENT_QUOTES | \ENT_HTML5;
262
        }
263
264 6
        if ($encoding !== 'UTF-8' && $encoding !== 'ISO-8859-1' && $encoding !== 'WINDOWS-1252' && $this->SUPPORT['mbstring'] === false) {
265
            trigger_error('UTF8::html_entity_decode() without mbstring cannot handle "' . $encoding . '" encoding', \E_USER_WARNING);
266
        }
267
268
        do {
269 6
            $str_compare = $str;
270
271
            // INFO: http://stackoverflow.com/questions/35854535/better-explanation-of-convmap-in-mb-encode-numericentity
272 6
            if ($this->SUPPORT['mbstring'] === true) {
273 6
                if ($encoding === 'UTF-8') {
274 6
                    $str = mb_decode_numericentity($str, [0x80, 0xfffff, 0, 0xfffff, 0]);
275 6
                } else {
0 ignored issues
show
Coding Style introduced by
The method html_entity_decode uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
276
                    $str = mb_decode_numericentity($str, [0x80, 0xfffff, 0, 0xfffff, 0], $encoding);
277
                }
278 6
            } else {
0 ignored issues
show
Coding Style introduced by
The method html_entity_decode uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
279
                $str = (string)preg_replace_callback(
280
                    "/&#\d{2,6};/",
281
                    /**
282
                     * @param string[] $matches
283
                     *
284
                     * @return string
285
                     */
286
                    static function ($matches) use ($encoding) {
287
                        $returnTmp = \mb_convert_encoding($matches[0], $encoding, 'HTML-ENTITIES');
288
                        if ($returnTmp !== '"' && $returnTmp !== "'") {
289
                            return $returnTmp;
290
                        }
291
292
                        return $matches[0];
293
                    },
294
                    $str
295
                );
296
            }
297
298 6
            if (strpos($str, '&') !== false) {
299 6
                if (strpos($str, '&#') !== false) {
300
                    // decode also numeric & UTF16 two byte entities
301 6
                    $str = (string)preg_replace('/(&#(?:x0*[0-9a-fA-F]{2,6}(?![0-9a-fA-F;])|(?:0*\d{2,6}(?![0-9;]))))/S', '$1;', $str);
302 6
                }
303
304 6
                $str = html_entity_decode($str, $flags, $encoding);
305 6
            }
306 6
        } while ($str_compare !== $str);
307
308 6
        return $str;
309
    }
310
311
    private function normalize_encoding($encoding, $fallback = '')
0 ignored issues
show
Complexity introduced by
This operation has 2592 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The method normalize_encoding is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $STATIC_NORMALIZE_ENCODING_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::normalize_encoding" is not in camel caps format
Loading history...
312
    {
313
        static $STATIC_NORMALIZE_ENCODING_CACHE = [];
0 ignored issues
show
Comprehensibility Naming introduced by
The variable name $STATIC_NORMALIZE_ENCODING_CACHE exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
314
315
        // init
316
        $encoding = (string)$encoding;
317
318
        if (!$encoding) {
319
            return $fallback;
320
        }
321
322
        if ($encoding === 'UTF-8' || $encoding === 'UTF8') {
323
            return 'UTF-8';
324
        }
325
326
        if ($encoding === '8BIT' || $encoding === 'BINARY') {
327
            return 'CP850';
328
        }
329
330
        if ($encoding === 'HTML' || $encoding === 'HTML-ENTITIES') {
331
            return 'HTML-ENTITIES';
332
        }
333
334
        if (
335
            $encoding === '1' // only a fallback, for non "strict_types" usage ...
336
            ||
337
            $encoding === '0' // only a fallback, for non "strict_types" usage ...
338
        ) {
339
            return $fallback;
340
        }
341
342
        if (isset($STATIC_NORMALIZE_ENCODING_CACHE[$encoding])) {
343
            return $STATIC_NORMALIZE_ENCODING_CACHE[$encoding];
344
        }
345
346
        if ($this->ENCODINGS === null) {
347
            $this->ENCODINGS = $this->getData('encodings');
348
        }
349
350
        if (in_array($encoding, $this->ENCODINGS, true)) {
351
            $STATIC_NORMALIZE_ENCODING_CACHE[$encoding] = $encoding;
352
353
            return $encoding;
354
        }
355
356
        $encodingOrig = $encoding;
357
        $encoding = strtoupper($encoding);
358
        $encodingUpperHelper = (string)preg_replace('/[^a-zA-Z0-9\s]/u', '', $encoding);
359
360
        $equivalences = [
361
            'ISO8859' => 'ISO-8859-1',
362
            'ISO88591' => 'ISO-8859-1',
363
            'ISO' => 'ISO-8859-1',
364
            'LATIN' => 'ISO-8859-1',
365
            'LATIN1' => 'ISO-8859-1', // Western European
366
            'ISO88592' => 'ISO-8859-2',
367
            'LATIN2' => 'ISO-8859-2', // Central European
368
            'ISO88593' => 'ISO-8859-3',
369
            'LATIN3' => 'ISO-8859-3', // Southern European
370
            'ISO88594' => 'ISO-8859-4',
371
            'LATIN4' => 'ISO-8859-4', // Northern European
372
            'ISO88595' => 'ISO-8859-5',
373
            'ISO88596' => 'ISO-8859-6', // Greek
374
            'ISO88597' => 'ISO-8859-7',
375
            'ISO88598' => 'ISO-8859-8', // Hebrew
376
            'ISO88599' => 'ISO-8859-9',
377
            'LATIN5' => 'ISO-8859-9', // Turkish
378
            'ISO885911' => 'ISO-8859-11',
379
            'TIS620' => 'ISO-8859-11', // Thai
380
            'ISO885910' => 'ISO-8859-10',
381
            'LATIN6' => 'ISO-8859-10', // Nordic
382
            'ISO885913' => 'ISO-8859-13',
383
            'LATIN7' => 'ISO-8859-13', // Baltic
384
            'ISO885914' => 'ISO-8859-14',
385
            'LATIN8' => 'ISO-8859-14', // Celtic
386
            'ISO885915' => 'ISO-8859-15',
387
            'LATIN9' => 'ISO-8859-15', // Western European (with some extra chars e.g. €)
388
            'ISO885916' => 'ISO-8859-16',
389
            'LATIN10' => 'ISO-8859-16', // Southeast European
390
            'CP1250' => 'WINDOWS-1250',
391
            'WIN1250' => 'WINDOWS-1250',
392
            'WINDOWS1250' => 'WINDOWS-1250',
393
            'CP1251' => 'WINDOWS-1251',
394
            'WIN1251' => 'WINDOWS-1251',
395
            'WINDOWS1251' => 'WINDOWS-1251',
396
            'CP1252' => 'WINDOWS-1252',
397
            'WIN1252' => 'WINDOWS-1252',
398
            'WINDOWS1252' => 'WINDOWS-1252',
399
            'CP1253' => 'WINDOWS-1253',
400
            'WIN1253' => 'WINDOWS-1253',
401
            'WINDOWS1253' => 'WINDOWS-1253',
402
            'CP1254' => 'WINDOWS-1254',
403
            'WIN1254' => 'WINDOWS-1254',
404
            'WINDOWS1254' => 'WINDOWS-1254',
405
            'CP1255' => 'WINDOWS-1255',
406
            'WIN1255' => 'WINDOWS-1255',
407
            'WINDOWS1255' => 'WINDOWS-1255',
408
            'CP1256' => 'WINDOWS-1256',
409
            'WIN1256' => 'WINDOWS-1256',
410
            'WINDOWS1256' => 'WINDOWS-1256',
411
            'CP1257' => 'WINDOWS-1257',
412
            'WIN1257' => 'WINDOWS-1257',
413
            'WINDOWS1257' => 'WINDOWS-1257',
414
            'CP1258' => 'WINDOWS-1258',
415
            'WIN1258' => 'WINDOWS-1258',
416
            'WINDOWS1258' => 'WINDOWS-1258',
417
            'UTF16' => 'UTF-16',
418
            'UTF32' => 'UTF-32',
419
            'UTF8' => 'UTF-8',
420
            'UTF' => 'UTF-8',
421
            'UTF7' => 'UTF-7',
422
            '8BIT' => 'CP850',
423
            'BINARY' => 'CP850',
424
        ];
425
426
        if (!empty($equivalences[$encodingUpperHelper])) {
427
            $encoding = $equivalences[$encodingUpperHelper];
428
        }
429
430
        $STATIC_NORMALIZE_ENCODING_CACHE[$encodingOrig] = $encoding;
431
432
        return $encoding;
433
    }
434
435 6
    private function to_utf8($str, $decodeHtmlEntityToUtf8 = false)
0 ignored issues
show
introduced by
The method to_utf8 has a boolean flag argument $decodeHtmlEntityToUtf8, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Complexity introduced by
This operation has 196032 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The method to_utf8 is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Comprehensibility Naming introduced by
The variable name $decodeHtmlEntityToUtf8 exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
Coding Style introduced by
Method name "Utf8::to_utf8" is not in camel caps format
Loading history...
436
    {
437
438 6
        if (is_array($str) === true) {
439
            foreach ($str as $k => $v) {
440
                $str[$k] = $this->to_utf8($v, $decodeHtmlEntityToUtf8);
441
            }
442
            return $str;
443
        }
444
445
446 6
        $str = (string)$str;
447 6
        if ($str === '') {
448
            return $str;
449
        }
450
451 6
        $max = \strlen($str);
452 6
        $buf = '';
453
454 6
        for ($i = 0; $i < $max; ++$i) {
455 6
            $c1 = $str[$i];
0 ignored issues
show
Comprehensibility introduced by
Avoid variables with short names like $c1. Configured minimum length is 3.

Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.

Loading history...
456
457 6
            if ($c1 >= "\xC0") { // should be converted to UTF8, if it's not UTF8 already
458
459
                if ($c1 <= "\xDF") { // looks like 2 bytes UTF8
460
461
                    $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
0 ignored issues
show
Comprehensibility introduced by
Avoid variables with short names like $c2. Configured minimum length is 3.

Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.

Loading history...
462
463
                    if ($c2 >= "\x80" && $c2 <= "\xBF") { // yeah, almost sure it's UTF8 already
464
                        $buf .= $c1 . $c2;
465
                        ++$i;
466
                    } else { // not valid UTF8 - convert it
0 ignored issues
show
Coding Style introduced by
The method to_utf8 uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
467
                        $buf .= $this->to_utf8_convert_helper($c1);
468
                    }
469
                } elseif ($c1 >= "\xE0" && $c1 <= "\xEF") { // looks like 3 bytes UTF8
470
471
                    $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
472
                    $c3 = $i + 2 >= $max ? "\x00" : $str[$i + 2];
0 ignored issues
show
Comprehensibility introduced by
Avoid variables with short names like $c3. Configured minimum length is 3.

Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.

Loading history...
473
474
                    if ($c2 >= "\x80" && $c2 <= "\xBF" && $c3 >= "\x80" && $c3 <= "\xBF") { // yeah, almost sure it's UTF8 already
475
                        $buf .= $c1 . $c2 . $c3;
476
                        $i += 2;
477
                    } else { // not valid UTF8 - convert it
0 ignored issues
show
Coding Style introduced by
The method to_utf8 uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
478
                        $buf .= $this->to_utf8_convert_helper($c1);
479
                    }
480
                } elseif ($c1 >= "\xF0" && $c1 <= "\xF7") { // looks like 4 bytes UTF8
481
482
                    $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
483
                    $c3 = $i + 2 >= $max ? "\x00" : $str[$i + 2];
484
                    $c4 = $i + 3 >= $max ? "\x00" : $str[$i + 3];
0 ignored issues
show
Comprehensibility introduced by
Avoid variables with short names like $c4. Configured minimum length is 3.

Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.

Loading history...
485
486
                    if ($c2 >= "\x80" && $c2 <= "\xBF" && $c3 >= "\x80" && $c3 <= "\xBF" && $c4 >= "\x80" && $c4 <= "\xBF") { // yeah, almost sure it's UTF8 already
487
                        $buf .= $c1 . $c2 . $c3 . $c4;
488
                        $i += 3;
489
                    } else { // not valid UTF8 - convert it
0 ignored issues
show
Coding Style introduced by
The method to_utf8 uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
490
                        $buf .= $this->to_utf8_convert_helper($c1);
491
                    }
492
                } else { // doesn't look like UTF8, but should be converted
0 ignored issues
show
Coding Style introduced by
The method to_utf8 uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
493
494
                    $buf .= $this->to_utf8_convert_helper($c1);
495
                }
496 6
            } elseif (($c1 & "\xC0") === "\x80") { // needs conversion
497
498
                $buf .= $this->to_utf8_convert_helper($c1);
499
            } else { // it doesn't need conversion
0 ignored issues
show
Coding Style introduced by
The method to_utf8 uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
500
501 6
                $buf .= $c1;
502
            }
503 6
        }
504
505
        // decode unicode escape sequences + unicode surrogate pairs
506 6
        $buf = preg_replace_callback(
507 6
            '/\\\\u([dD][89abAB][0-9a-fA-F]{2})\\\\u([dD][cdefCDEF][\da-fA-F]{2})|\\\\u([0-9a-fA-F]{4})/',
508
            /**
509
             * @param array $matches
510
             *
511
             * @return string
512
             */
513
            function (array $matches) {
514 1
                if (isset($matches[3])) {
515 1
                    $cp = (int)hexdec($matches[3]);
0 ignored issues
show
Comprehensibility introduced by
Avoid variables with short names like $cp. Configured minimum length is 3.

Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.

Loading history...
516 1
                } else {
0 ignored issues
show
Coding Style introduced by
The method to_utf8 uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
517
                    // http://unicode.org/faq/utf_bom.html#utf16-4
518
                    $cp = ((int)hexdec($matches[1]) << 10)
519
                        + (int)hexdec($matches[2])
520
                        + 0x10000
521
                        - (0xD800 << 10)
522
                        - 0xDC00;
523
                }
524
525
                // https://github.com/php/php-src/blob/php-7.3.2/ext/standard/html.c#L471
526
                //
527
                // php_utf32_utf8(unsigned char *buf, unsigned k)
528
529 1
                if ($cp < 0x80) {
530 1
                    return (string)$this->chr($cp);
531
                }
532
533
                if ($cp < 0xA0) {
534
                    /** @noinspection UnnecessaryCastingInspection */
535
                    return (string)$this->chr(0xC0 | $cp >> 6) . (string)$this->chr(0x80 | $cp & 0x3F);
536
                }
537
538
                return $this->decimal_to_chr($cp);
539 6
            },
540
            $buf
541 6
        );
542
543 6
        if ($buf === null) {
544
            return '';
545
        }
546
547
        // decode UTF-8 codepoints
548 6
        if ($decodeHtmlEntityToUtf8 === true) {
549
            $buf = $this->html_entity_decode($buf);
550
        }
551
552 6
        return $buf;
553
    }
554
555
    private function to_utf8_convert_helper($input)
0 ignored issues
show
Coding Style Naming introduced by
The method to_utf8_convert_helper is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 16 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style introduced by
Method name "Utf8::to_utf8_convert_helper" is not in camel caps format
Loading history...
556
    {
557
        // init
558
        $buf = '';
559
560
        if ($this->ORD === null) {
561
            $this->ORD = $this->getData('ord');
562
        }
563
564
        if ($this->CHR === null) {
565
            $this->CHR = $this->getData('chr');
566
        }
567
568
        if ($this->WIN1252_TO_UTF8 === null) {
569
            $this->WIN1252_TO_UTF8 = $this->getData('win1252_to_utf8');
570
        }
571
572
        $ordC1 = $this->ORD[$input];
573
        if (isset($this->WIN1252_TO_UTF8[$ordC1])) { // found in Windows-1252 special cases
574
            $buf .= $this->WIN1252_TO_UTF8[$ordC1];
575
        } else {
0 ignored issues
show
Coding Style introduced by
The method to_utf8_convert_helper uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
576
            $cc1 = $this->CHR[$ordC1 / 64] | "\xC0";
577
            $cc2 = ((string)$input & "\x3F") | "\x80";
578
            $buf .= $cc1 . $cc2;
579
        }
580
581
        return $buf;
582
    }
583
584 1
    private function chr($code_point, $encoding = 'UTF-8')
0 ignored issues
show
Complexity introduced by
This operation has 7200 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The parameter $code_point is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $code_point is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $CHAR_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
585
    {
586
        // init
587 1
        static $CHAR_CACHE = [];
588
589 1
        if ($encoding !== 'UTF-8' && $encoding !== 'CP850') {
590
            $encoding = $this->normalize_encoding($encoding, 'UTF-8');
591
        }
592
593 1
        if ($encoding !== 'UTF-8' && $encoding !== 'ISO-8859-1' && $encoding !== 'WINDOWS-1252' && $this->SUPPORT['mbstring'] === false) {
594
            trigger_error('UTF8::chr() without mbstring cannot handle "' . $encoding . '" encoding', \E_USER_WARNING);
595
        }
596
597 1
        $cacheKey = $code_point . $encoding;
598 1
        if (isset($CHAR_CACHE[$cacheKey]) === true) {
599
            return $CHAR_CACHE[$cacheKey];
600
        }
601
602 1
        if ($code_point <= 127) { // use "simple"-char only until "\x80"
603
604 1
            if ($this->CHR === null) {
605 1
                $this->CHR = (array)$this->getData('chr');
606 1
            }
607
608
            /**
609
             * @psalm-suppress PossiblyNullArrayAccess
610
             */
611 1
            $chr = $this->CHR[$code_point];
612
613 1
            if ($encoding !== 'UTF-8') {
614
                $chr = $this->encode($encoding, $chr);
615
            }
616
617 1
            return $CHAR_CACHE[$cacheKey] = $chr;
618
        }
619
620
        //
621
        // fallback via "IntlChar"
622
        //
623
624
        if ($this->SUPPORT['intlChar'] === true) {
625
            /** @noinspection PhpComposerExtensionStubsInspection */
626
            $chr = IntlChar::chr($code_point);
627
628
            if ($encoding !== 'UTF-8') {
629
                $chr = $this->encode($encoding, $chr);
630
            }
631
632
            return $CHAR_CACHE[$cacheKey] = $chr;
633
        }
634
635
        //
636
        // fallback via vanilla php
637
        //
638
639
        if ($this->CHR === null) {
640
            $this->CHR = (array)$this->getData('chr');
641
        }
642
643
        $code_point = (int)$code_point;
644
        if ($code_point <= 0x7F) {
645
            /**
646
             * @psalm-suppress PossiblyNullArrayAccess
647
             */
648
            $chr = $this->CHR[$code_point];
649
        } elseif ($code_point <= 0x7FF) {
650
            /**
651
             * @psalm-suppress PossiblyNullArrayAccess
652
             */
653
            $chr = $this->CHR[($code_point >> 6) + 0xC0] .
654
                $this->CHR[($code_point & 0x3F) + 0x80];
655
        } elseif ($code_point <= 0xFFFF) {
656
            /**
657
             * @psalm-suppress PossiblyNullArrayAccess
658
             */
659
            $chr = $this->CHR[($code_point >> 12) + 0xE0] .
660
                $this->CHR[(($code_point >> 6) & 0x3F) + 0x80] .
661
                $this->CHR[($code_point & 0x3F) + 0x80];
662
        } else {
0 ignored issues
show
Coding Style introduced by
The method chr uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
663
            /**
664
             * @psalm-suppress PossiblyNullArrayAccess
665
             */
666
            $chr = $this->CHR[($code_point >> 18) + 0xF0] .
667
                $this->CHR[(($code_point >> 12) & 0x3F) + 0x80] .
668
                $this->CHR[(($code_point >> 6) & 0x3F) + 0x80] .
669
                $this->CHR[($code_point & 0x3F) + 0x80];
670
        }
671
672
        if ($encoding !== 'UTF-8') {
673
            $chr = $this->encode($encoding, $chr);
674
        }
675
676
        return $CHAR_CACHE[$cacheKey] = $chr;
677
    }
678
679
    private function encode($toEncoding, $str, $autodetectFromEncoding = true, $fromEncoding = '')
0 ignored issues
show
introduced by
The method encode has a boolean flag argument $autodetectFromEncoding, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Complexity introduced by
This operation has 179159040 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Comprehensibility Naming introduced by
The variable name $autodetectFromEncoding exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
680
    {
681
        if ($str === '' || $toEncoding === '') {
682
            return $str;
683
        }
684
685
        if ($toEncoding !== 'UTF-8' && $toEncoding !== 'CP850') {
686
            $toEncoding = $this->normalize_encoding($toEncoding, 'UTF-8');
687
        }
688
689
        if ($fromEncoding && $fromEncoding !== 'UTF-8' && $fromEncoding !== 'CP850') {
690
            $fromEncoding = $this->normalize_encoding($fromEncoding, null);
691
        }
692
693
        if ($toEncoding && $fromEncoding && $fromEncoding === $toEncoding) {
694
            return $str;
695
        }
696
697
        if ($toEncoding === 'JSON') {
698
            $return = $this->json_encode($str);
699
            if ($return === false) {
700
                throw new InvalidArgumentException('The input string [' . $str . '] can not be used for json_encode().');
701
            }
702
703
            return $return;
704
        }
705
        if ($fromEncoding === 'JSON') {
706
            $str = $this->json_decode($str);
0 ignored issues
show
Bug introduced by
The method json_decode() does not exist on devtoolboxuk\soteria\voku\Resources\Utf8. Did you maybe mean json_encode()?

This check marks calls to methods that do not seem to exist on an object.

This is most likely the result of a method being renamed without all references to it being renamed likewise.

Loading history...
707
            $fromEncoding = '';
708
        }
709
710
        if ($toEncoding === 'BASE64') {
711
            return base64_encode($str);
712
        }
713
        if ($fromEncoding === 'BASE64') {
714
            $str = base64_decode($str, true);
715
            $fromEncoding = '';
716
        }
717
718
        if ($toEncoding === 'HTML-ENTITIES') {
719
            return $this->html_encode($str, true, 'UTF-8');
720
        }
721
        if ($fromEncoding === 'HTML-ENTITIES') {
722
            $str = $this->html_decode($str, \ENT_COMPAT, 'UTF-8');
0 ignored issues
show
Bug introduced by
The method html_decode() does not exist on devtoolboxuk\soteria\voku\Resources\Utf8. Did you maybe mean html_encode()?

This check marks calls to methods that do not seem to exist on an object.

This is most likely the result of a method being renamed without all references to it being renamed likewise.

Loading history...
723
            $fromEncoding = '';
724
        }
725
726
        $fromEncodingDetected = false;
727
        if ($autodetectFromEncoding === true || !$fromEncoding) {
728
            $fromEncodingDetected = $this->str_detect_encoding($str);
729
        }
730
731
        // DEBUG
732
        //var_dump($toEncoding, $fromEncoding, $fromEncodingDetected, $str, "\n\n");
733
734
        if ($fromEncodingDetected !== false) {
735
            $fromEncoding = $fromEncodingDetected;
736
        } elseif ($autodetectFromEncoding === true) {
737
            // fallback for the "autodetect"-mode
738
            return $this->to_utf8($str);
739
        }
740
741
        if (!$fromEncoding || $fromEncoding === $toEncoding) {
742
            return $str;
743
        }
744
745
        if ($toEncoding === 'UTF-8' && ($fromEncoding === 'WINDOWS-1252' || $fromEncoding === 'ISO-8859-1')) {
746
            return $this->to_utf8($str);
747
        }
748
749
        if ($toEncoding === 'ISO-8859-1' && ($fromEncoding === 'WINDOWS-1252' || $fromEncoding === 'UTF-8')) {
750
            return $this->to_iso8859($str);
751
        }
752
753
        if ($toEncoding !== 'UTF-8' && $toEncoding !== 'ISO-8859-1' && $toEncoding !== 'WINDOWS-1252' && $this->SUPPORT['mbstring'] === false) {
754
            trigger_error('UTF8::encode() without mbstring cannot handle "' . $toEncoding . '" encoding', E_USER_WARNING);
755
        }
756
757
        if ($this->SUPPORT['mbstring'] === true) {
758
            // warning: do not use the symfony polyfill here
759
            $strEncoded = mb_convert_encoding(
760
                $str,
761
                $toEncoding,
762
                $fromEncoding
763
            );
764
765
            if ($strEncoded) {
766
                return $strEncoded;
767
            }
768
        }
769
770
        $return = \iconv($fromEncoding, $toEncoding, $str);
771
        if ($return !== false) {
772
            return $return;
773
        }
774
775
        return $str;
776
    }
777
778
    private function json_encode($value, $options = 0, $depth = 512)
0 ignored issues
show
Coding Style Naming introduced by
The method json_encode is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::json_encode" is not in camel caps format
Loading history...
779
    {
780
        $value = $this->filter($value);
781
782
        if ($this->SUPPORT['json'] === false) {
783
            throw new \RuntimeException('ext-json: is not installed');
784
        }
785
786
        /** @noinspection PhpComposerExtensionStubsInspection */
787
        return \json_encode($value, $options, $depth);
788
    }
789
790
    private function filter($var, $normalization_form = \Normalizer::NFC, $leading_combining = '◌')
0 ignored issues
show
Coding Style Naming introduced by
The parameter $normalization_form is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The parameter $leading_combining is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $normalization_form is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $leading_combining is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 30 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
791
    {
792
        switch (\gettype($var)) {
793
            case 'array':
794
                foreach ($var as $k => $v) {
795
                    $var[$k] = $this->filter($v, $normalization_form, $leading_combining);
796
                }
797
                unset($v);
798
799
                break;
800
            case 'object':
801
                foreach ($var as $k => $v) {
802
                    $str[$k] = $this->filter($v, $normalization_form, $leading_combining);
0 ignored issues
show
Coding Style Comprehensibility introduced by
$str was never initialized. Although not strictly required by PHP, it is generally a good practice to add $str = array(); before regardless.

Adding an explicit array definition is generally preferable to implicit array definition as it guarantees a stable state of the code.

Let’s take a look at an example:

foreach ($collection as $item) {
    $myArray['foo'] = $item->getFoo();

    if ($item->hasBar()) {
        $myArray['bar'] = $item->getBar();
    }

    // do something with $myArray
}

As you can see in this example, the array $myArray is initialized the first time when the foreach loop is entered. You can also see that the value of the bar key is only written conditionally; thus, its value might result from a previous iteration.

This might or might not be intended. To make your intention clear, your code more readible and to avoid accidental bugs, we recommend to add an explicit initialization $myArray = array() either outside or inside the foreach loop.

Loading history...
803
                }
804
                unset($v);
805
806
                break;
807
            case 'string':
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
808
809
                if (strpos($var, "\r") !== false) {
810
                    // Workaround https://bugs.php.net/65732
811
                    $var = $this->normalize_line_ending($var);
812
                }
813
814
                if ($this->is_ascii($var) === false) {
815
                    if (\Normalizer::isNormalized($var, $normalization_form)) {
816
                        $n = '-';
0 ignored issues
show
Comprehensibility introduced by
Avoid variables with short names like $n. Configured minimum length is 3.

Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.

Loading history...
817
                    } else {
0 ignored issues
show
Coding Style introduced by
The method filter uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
818
                        $n = \Normalizer::normalize($var, $normalization_form);
819
820
                        if (isset($n[0])) {
821
                            $var = $n;
822
                        } else {
0 ignored issues
show
Coding Style introduced by
The method filter uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
823
                            $var = $this->encode('UTF-8', $var, true);
824
                        }
825
                    }
826
827
                    if (
828
                        $var[0] >= "\x80"
829
                        &&
830
                        isset($n[0], $leading_combining[0])
831
                        &&
832
                        preg_match('/^\p{Mn}/u', $var)
833
                    ) {
834
                        // Prevent leading combining chars
835
                        // for NFC-safe concatenations.
836
                        $var = $leading_combining . $var;
837
                    }
838
                }
839
840
                break;
841
        }
842
843
        return $var;
844
    }
845
846
    private function normalize_line_ending($str)
0 ignored issues
show
Coding Style Naming introduced by
The method normalize_line_ending is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::normalize_line_ending" is not in camel caps format
Loading history...
847
    {
848
        return \str_replace(["\r\n", "\r"], "\n", $str);
849
    }
850
851
    private function is_ascii($str)
0 ignored issues
show
Coding Style Naming introduced by
The method is_ascii is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::is_ascii" is not in camel caps format
Loading history...
852
    {
853
        if ($str === '') {
854
            return true;
855
        }
856
857
        return !preg_match('/[^\x09\x10\x13\x0A\x0D\x20-\x7E]/', $str);
858
    }
859
860
    private function html_encode($str, $keepAsciiChars = false, $encoding = 'UTF-8')
0 ignored issues
show
introduced by
The method html_encode has a boolean flag argument $keepAsciiChars, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Coding Style Naming introduced by
The method html_encode is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 30 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style introduced by
Method name "Utf8::html_encode" is not in camel caps format
Loading history...
861
    {
862
        if ($str === '') {
863
            return '';
864
        }
865
866
        if ($encoding !== 'UTF-8' && $encoding !== 'CP850') {
867
            $encoding = $this->normalize_encoding($encoding, 'UTF-8');
868
        }
869
870
        // INFO: http://stackoverflow.com/questions/35854535/better-explanation-of-convmap-in-mb-encode-numericentity
871
        if ($this->SUPPORT['mbstring'] === true) {
872
            $startCode = 0x00;
873
            if ($keepAsciiChars === true) {
874
                $startCode = 0x80;
875
            }
876
877
            if ($encoding === 'UTF-8') {
878
                return \mb_encode_numericentity(
879
                    $str,
880
                    [$startCode, 0xfffff, 0, 0xfffff, 0]
881
                );
882
            }
883
884
            return \mb_encode_numericentity(
885
                $str,
886
                [$startCode, 0xfffff, 0, 0xfffff, 0],
887
                $encoding
888
            );
889
        }
890
891
        return \implode(
892
            '',
893
            \array_map(
894
                function (string $chr) use ($keepAsciiChars, $encoding) {
895
                    return $this->single_chr_html_encode($chr, $keepAsciiChars, $encoding);
896
                },
897
                $this->str_split($str)
898
            )
899
        );
900
    }
901
902
    private function single_chr_html_encode($char, $keepAsciiChars = false, $encoding = 'UTF-8')
0 ignored issues
show
introduced by
The method single_chr_html_encode has a boolean flag argument $keepAsciiChars, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Coding Style Naming introduced by
The method single_chr_html_encode is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::single_chr_html_encode" is not in camel caps format
Loading history...
903
    {
904
        if ($char === '') {
905
            return '';
906
        }
907
908
        if (
909
            $keepAsciiChars === true
910
            &&
911
            $this->is_ascii($char) === true
912
        ) {
913
            return $char;
914
        }
915
916
        return '&#' . $this->ord($char, $encoding) . ';';
917
    }
918
919
    private function ord($chr, $encoding = 'UTF-8')
0 ignored issues
show
Complexity introduced by
This operation has 19440 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The variable $CHAR_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
920
    {
921
        static $CHAR_CACHE = [];
922
923
        // init
924
        $chr = (string)$chr;
925
926
        if ($encoding !== 'UTF-8' && $encoding !== 'CP850') {
927
            $encoding = $this->normalize_encoding($encoding, 'UTF-8');
928
        }
929
930
        $cacheKey = $chr . $encoding;
931
        if (isset($CHAR_CACHE[$cacheKey]) === true) {
932
            return $CHAR_CACHE[$cacheKey];
933
        }
934
935
        // check again, if it's still not UTF-8
936
        if ($encoding !== 'UTF-8') {
937
            $chr = $this->encode($encoding, $chr);
938
        }
939
940
        if ($this->ORD === null) {
941
            $this->ORD = $this->getData('ord');
942
        }
943
944
        if (isset($this->ORD[$chr])) {
945
            return $CHAR_CACHE[$cacheKey] = $this->ORD[$chr];
946
        }
947
948
        //
949
        // fallback via "IntlChar"
950
        //
951
952
        if ($this->SUPPORT['intlChar'] === true) {
953
            /** @noinspection PhpComposerExtensionStubsInspection */
954
            $code = \IntlChar::ord($chr);
955
            if ($code) {
956
                return $CHAR_CACHE[$cacheKey] = $code;
957
            }
958
        }
959
960
        //
961
        // fallback via vanilla php
962
        //
963
964
        /** @noinspection CallableParameterUseCaseInTypeContextInspection */
965
        $chr = \unpack('C*', (string)\substr($chr, 0, 4));
966
        $code = $chr ? $chr[1] : 0;
967
968
        if ($code >= 0xF0 && isset($chr[4])) {
969
            /** @noinspection UnnecessaryCastingInspection */
970
            return $CHAR_CACHE[$cacheKey] = (int)((($code - 0xF0) << 18) + (($chr[2] - 0x80) << 12) + (($chr[3] - 0x80) << 6) + $chr[4] - 0x80);
971
        }
972
973
        if ($code >= 0xE0 && isset($chr[3])) {
974
            /** @noinspection UnnecessaryCastingInspection */
975
            return $CHAR_CACHE[$cacheKey] = (int)((($code - 0xE0) << 12) + (($chr[2] - 0x80) << 6) + $chr[3] - 0x80);
976
        }
977
978
        if ($code >= 0xC0 && isset($chr[2])) {
979
            /** @noinspection UnnecessaryCastingInspection */
980
            return $CHAR_CACHE[$cacheKey] = (int)((($code - 0xC0) << 6) + $chr[2] - 0x80);
981
        }
982
983
        return $CHAR_CACHE[$cacheKey] = $code;
984
    }
985
986
    private function str_split($str, $length = 1, $cleanUtf8 = false, $tryToUseMbFunction = true)
0 ignored issues
show
introduced by
The method str_split has a boolean flag argument $cleanUtf8, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
introduced by
The method str_split has a boolean flag argument $tryToUseMbFunction, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Complexity introduced by
This operation has 4032 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The method str_split is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::str_split" is not in camel caps format
Loading history...
987
    {
988
        if ($length <= 0) {
989
            return [];
990
        }
991
992
        if (is_array($str) === true) {
993
            foreach ($str as $k => $v) {
994
                $str[$k] = $this->str_split($v, $length, $cleanUtf8, $tryToUseMbFunction);
995
            }
996
997
            return $str;
998
        }
999
1000
        // init
1001
        $str = (string)$str;
1002
1003
        if ($str === '') {
1004
            return [];
1005
        }
1006
1007
        if ($cleanUtf8 === true) {
1008
            $str = $this->clean($str);
1009
        }
1010
1011
        if ($tryToUseMbFunction === true && $this->SUPPORT['mbstring'] === true) {
1012
            $iMax = \mb_strlen($str);
1013
            if ($iMax <= 127) {
1014
                $ret = [];
1015
                for ($i = 0; $i < $iMax; ++$i) {
1016
                    $ret[] = \mb_substr($str, $i, 1);
1017
                }
1018
            } else {
0 ignored issues
show
Coding Style introduced by
The method str_split uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
1019
                $retArray = [];
1020
                preg_match_all('/./us', $str, $retArray);
1021
                $ret = isset($retArray[0]) ? $retArray[0] : [];
1022
            }
1023
        } elseif ($this->SUPPORT['pcre_utf8'] === true) {
1024
            $retArray = [];
1025
            preg_match_all('/./us', $str, $retArray);
1026
            $ret = isset($retArray[0]) ? $retArray[0] : [];
1027
        } else {
0 ignored issues
show
Coding Style introduced by
The method str_split uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
Coding Style introduced by
Blank line found at start of control structure
Loading history...
1028
1029
            // fallback
1030
1031
            $ret = [];
1032
            $len = \strlen($str);
1033
1034
            /** @noinspection ForeachInvariantsInspection */
1035
            for ($i = 0; $i < $len; ++$i) {
1036
                if (($str[$i] & "\x80") === "\x00") {
1037
                    $ret[] = $str[$i];
1038
                } elseif (
1039
                    isset($str[$i + 1])
1040
                    &&
1041
                    ($str[$i] & "\xE0") === "\xC0"
1042
                ) {
1043
                    if (($str[$i + 1] & "\xC0") === "\x80") {
1044
                        $ret[] = $str[$i] . $str[$i + 1];
1045
1046
                        ++$i;
1047
                    }
1048
                } elseif (
1049
                    isset($str[$i + 2])
1050
                    &&
1051
                    ($str[$i] & "\xF0") === "\xE0"
1052
                ) {
1053
                    if (
1054
                        ($str[$i + 1] & "\xC0") === "\x80"
1055
                        &&
1056
                        ($str[$i + 2] & "\xC0") === "\x80"
1057
                    ) {
1058
                        $ret[] = $str[$i] . $str[$i + 1] . $str[$i + 2];
1059
1060
                        $i += 2;
1061
                    }
1062
                } elseif (
1063
                    isset($str[$i + 3])
1064
                    &&
1065
                    ($str[$i] & "\xF8") === "\xF0"
1066
                ) {
1067
                    if (
1068
                        ($str[$i + 1] & "\xC0") === "\x80"
1069
                        &&
1070
                        ($str[$i + 2] & "\xC0") === "\x80"
1071
                        &&
1072
                        ($str[$i + 3] & "\xC0") === "\x80"
1073
                    ) {
1074
                        $ret[] = $str[$i] . $str[$i + 1] . $str[$i + 2] . $str[$i + 3];
1075
1076
                        $i += 3;
1077
                    }
1078
                }
1079
            }
1080
        }
1081
1082
        if ($length > 1) {
1083
            $ret = \array_chunk($ret, $length);
1084
1085
            return \array_map(
1086
                static function (&$item) {
1087
                    return \implode('', $item);
1088
                },
1089
                $ret
1090
            );
1091
        }
1092
1093
        if (isset($ret[0]) && $ret[0] === '') {
1094
            return [];
1095
        }
1096
1097
        return $ret;
1098
    }
1099
1100
    private function clean($str, $remove_bom = false, $normalize_whitespace = false, $normalize_msword = false, $keep_non_breaking_space = false, $replace_diamond_question_mark = false, $remove_invisible_characters = true)
0 ignored issues
show
introduced by
The method clean has a boolean flag argument $remove_bom, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
introduced by
The method clean has a boolean flag argument $normalize_whitespace, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
introduced by
The method clean has a boolean flag argument $normalize_msword, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
introduced by
The method clean has a boolean flag argument $keep_non_breaking_space, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
introduced by
The method clean has a boolean flag argument $replace_diamond_question_mark, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
introduced by
The method clean has a boolean flag argument $remove_invisible_characters, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Coding Style Naming introduced by
The parameter $remove_bom is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The parameter $normalize_whitespace is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The parameter $normalize_msword is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The parameter $keep_non_breaking_space is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The parameter $replace_diamond_question_mark is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The parameter $remove_invisible_characters is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $replace_diamond_question_mark is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $remove_invisible_characters is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $normalize_whitespace is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $keep_non_breaking_space is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $normalize_msword is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $remove_bom is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Comprehensibility Naming introduced by
The variable name $keep_non_breaking_space exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
Comprehensibility Naming introduced by
The variable name $replace_diamond_question_mark exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
Comprehensibility Naming introduced by
The variable name $remove_invisible_characters exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
Complexity introduced by
This operation has 32 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
1101
    {
1102
        // http://stackoverflow.com/questions/1401317/remove-non-utf8-characters-from-string
1103
        // caused connection reset problem on larger strings
1104
1105
        $regx = '/
1106
          (
1107
            (?: [\x00-\x7F]               # single-byte sequences   0xxxxxxx
1108
            |   [\xC0-\xDF][\x80-\xBF]    # double-byte sequences   110xxxxx 10xxxxxx
1109
            |   [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences   1110xxxx 10xxxxxx * 2
1110
            |   [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3
1111
            ){1,100}                      # ...one or more times
1112
          )
1113
        | ( [\x80-\xBF] )                 # invalid byte in range 10000000 - 10111111
1114
        | ( [\xC0-\xFF] )                 # invalid byte in range 11000000 - 11111111
1115
        /x';
1116
        $str = (string)preg_replace($regx, '$1', $str);
1117
1118
        if ($replace_diamond_question_mark === true) {
1119
            $str = $this->replace_diamond_question_mark($str, '');
1120
        }
1121
1122
        if ($remove_invisible_characters === true) {
1123
            $str = $this->remove_invisible_characters($str);
1124
        }
1125
1126
        if ($normalize_whitespace === true) {
1127
            $str = $this->normalize_whitespace($str, $keep_non_breaking_space);
1128
        }
1129
1130
        if ($normalize_msword === true) {
1131
            $str = $this->normalize_msword($str);
1132
        }
1133
1134
        if ($remove_bom === true) {
1135
            $str = $this->remove_bom($str);
1136
        }
1137
1138
        return $str;
1139
    }
1140
1141 6
    public function replace_diamond_question_mark($str, $replacementChar = '', $processInvalidUtf8 = true)
0 ignored issues
show
introduced by
The method replace_diamond_question_mark has a boolean flag argument $processInvalidUtf8, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Coding Style Naming introduced by
The method replace_diamond_question_mark is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 10 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style introduced by
Method name "Utf8::replace_diamond_question_mark" is not in camel caps format
Loading history...
1142
    {
1143 6
        if ($str === '') {
1144
            return '';
1145
        }
1146
1147 6
        if ($processInvalidUtf8 === true) {
1148 6
            $replacementCharHelper = $replacementChar;
0 ignored issues
show
Comprehensibility Naming introduced by
The variable name $replacementCharHelper exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
1149 6
            if ($replacementChar === '') {
1150 6
                $replacementCharHelper = 'none';
1151 6
            }
1152
1153 6
            if ($this->SUPPORT['mbstring'] === false) {
1154
                // if there is no native support for "mbstring",
1155
                // then we need to clean the string before ...
1156
                $str = $this->clean($str);
1157
            }
1158
1159 6
            $save = \mb_substitute_character();
1160 6
            \mb_substitute_character($replacementCharHelper);
1161
            // the polyfill maybe return false, so cast to string
1162 6
            $str = (string)\mb_convert_encoding($str, 'UTF-8', 'UTF-8');
1163 6
            \mb_substitute_character($save);
1164 6
        }
1165
1166 6
        return \str_replace(
1167
            [
1168 6
                "\xEF\xBF\xBD",
1169 6
                '�',
1170 6
            ],
1171
            [
1172 6
                $replacementChar,
1173 6
                $replacementChar,
1174 6
            ],
1175
            $str
1176 6
        );
1177
    }
1178
1179 6
    public function remove_invisible_characters($str, $url_encoded = true, $replacement = '')
0 ignored issues
show
introduced by
The method remove_invisible_characters has a boolean flag argument $url_encoded, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Coding Style Naming introduced by
The method remove_invisible_characters is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The parameter $url_encoded is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $non_displayables is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $url_encoded is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::remove_invisible_characters" is not in camel caps format
Loading history...
1180
    {
1181
        // init
1182 6
        $non_displayables = [];
1183
1184
        // every control character except newline (dec 10),
1185
        // carriage return (dec 13) and horizontal tab (dec 09)
1186 6
        if ($url_encoded) {
1187 6
            $non_displayables[] = '/%0[0-8bcefBCEF]/'; // url encoded 00-08, 11, 12, 14, 15
1188 6
            $non_displayables[] = '/%1[0-9a-fA-F]/'; // url encoded 16-31
1189 6
        }
1190
1191 6
        $non_displayables[] = '/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]+/S'; // 00-08, 11, 12, 14-31, 127
1192
1193
        do {
1194 6
            $str = (string)preg_replace($non_displayables, $replacement, $str, -1, $count);
1195 6
        } while ($count !== 0);
1196
1197 6
        return $str;
1198
    }
1199
1200 6
    public function normalize_whitespace($str, $keepNonBreakingSpace = false, $keepBidiUnicodeControls = false)
0 ignored issues
show
introduced by
The method normalize_whitespace has a boolean flag argument $keepNonBreakingSpace, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
introduced by
The method normalize_whitespace has a boolean flag argument $keepBidiUnicodeControls, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Coding Style Naming introduced by
The method normalize_whitespace is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $WHITESPACE_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $BIDI_UNICODE_CONTROLS_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Comprehensibility Naming introduced by
The variable name $keepBidiUnicodeControls exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
Complexity introduced by
This operation has 18 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style introduced by
Method name "Utf8::normalize_whitespace" is not in camel caps format
Loading history...
1201
    {
1202 6
        if ($str === '') {
1203
            return '';
1204
        }
1205
1206 6
        static $WHITESPACE_CACHE = [];
1207 6
        $cacheKey = (int)$keepNonBreakingSpace;
1208
1209 6
        if (!isset($WHITESPACE_CACHE[$cacheKey])) {
1210 1
            $WHITESPACE_CACHE[$cacheKey] = $this->WHITESPACE_TABLE;
1211
1212 1
            if ($keepNonBreakingSpace === true) {
1213
                unset($WHITESPACE_CACHE[$cacheKey]['NO-BREAK SPACE']);
1214
            }
1215
1216 1
            $WHITESPACE_CACHE[$cacheKey] = array_values($WHITESPACE_CACHE[$cacheKey]);
1217 1
        }
1218
1219 6
        if ($keepBidiUnicodeControls === false) {
1220 6
            static $BIDI_UNICODE_CONTROLS_CACHE = null;
0 ignored issues
show
Comprehensibility Naming introduced by
The variable name $BIDI_UNICODE_CONTROLS_CACHE exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
1221
1222 6
            if ($BIDI_UNICODE_CONTROLS_CACHE === null) {
1223 1
                $BIDI_UNICODE_CONTROLS_CACHE = array_values($this->BIDI_UNI_CODE_CONTROLS_TABLE);
1224 1
            }
1225
1226 6
            $str = \str_replace($BIDI_UNICODE_CONTROLS_CACHE, '', $str);
1227 6
        }
1228
1229 6
        return \str_replace($WHITESPACE_CACHE[$cacheKey], ' ', $str);
1230
    }
1231
1232
    private function normalize_msword($str)
0 ignored issues
show
Coding Style Naming introduced by
The method normalize_msword is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::normalize_msword" is not in camel caps format
Loading history...
1233
    {
1234
        if ($str === '') {
1235
            return '';
1236
        }
1237
1238
        $keys = [
1239
            "\xc2\xab", // « (U+00AB) in UTF-8
1240
            "\xc2\xbb", // » (U+00BB) in UTF-8
1241
            "\xe2\x80\x98", // ‘ (U+2018) in UTF-8
1242
            "\xe2\x80\x99", // ’ (U+2019) in UTF-8
1243
            "\xe2\x80\x9a", // ‚ (U+201A) in UTF-8
1244
            "\xe2\x80\x9b", // ‛ (U+201B) in UTF-8
1245
            "\xe2\x80\x9c", // “ (U+201C) in UTF-8
1246
            "\xe2\x80\x9d", // ” (U+201D) in UTF-8
1247
            "\xe2\x80\x9e", // „ (U+201E) in UTF-8
1248
            "\xe2\x80\x9f", // ‟ (U+201F) in UTF-8
1249
            "\xe2\x80\xb9", // ‹ (U+2039) in UTF-8
1250
            "\xe2\x80\xba", // › (U+203A) in UTF-8
1251
            "\xe2\x80\x93", // – (U+2013) in UTF-8
1252
            "\xe2\x80\x94", // — (U+2014) in UTF-8
1253
            "\xe2\x80\xa6", // … (U+2026) in UTF-8
1254
        ];
1255
1256
        $values = [
1257
            '"', // « (U+00AB) in UTF-8
1258
            '"', // » (U+00BB) in UTF-8
1259
            "'", // ‘ (U+2018) in UTF-8
1260
            "'", // ’ (U+2019) in UTF-8
1261
            "'", // ‚ (U+201A) in UTF-8
1262
            "'", // ‛ (U+201B) in UTF-8
1263
            '"', // “ (U+201C) in UTF-8
1264
            '"', // ” (U+201D) in UTF-8
1265
            '"', // „ (U+201E) in UTF-8
1266
            '"', // ‟ (U+201F) in UTF-8
1267
            "'", // ‹ (U+2039) in UTF-8
1268
            "'", // › (U+203A) in UTF-8
1269
            '-', // – (U+2013) in UTF-8
1270
            '-', // — (U+2014) in UTF-8
1271
            '...', // … (U+2026) in UTF-8
1272
        ];
1273
1274
        return \str_replace($keys, $values, $str);
1275
    }
1276
1277 6
    public function remove_bom($str)
0 ignored issues
show
Coding Style Naming introduced by
The method remove_bom is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::remove_bom" is not in camel caps format
Loading history...
1278
    {
1279 6
        if ($str === '') {
1280
            return '';
1281
        }
1282
1283 6
        $strLength = \strlen($str);
1284 6
        foreach ($this->BOM as $bomString => $bomByteLength) {
1285 6
            if (strpos($str, $bomString, 0) === 0) {
1286
                $strTmp = \substr($str, $bomByteLength, $strLength);
1287
                if ($strTmp === false) {
1288
                    return '';
1289
                }
1290
1291
                $strLength -= (int)$bomByteLength;
1292
                $str = (string)$strTmp;
1293
            }
1294 6
        }
1295
1296 6
        return $str;
1297
    }
1298
1299
    private function str_detect_encoding($str)
0 ignored issues
show
Complexity introduced by
This operation has 1224 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The method str_detect_encoding is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::str_detect_encoding" is not in camel caps format
Loading history...
1300
    {
1301
        // init
1302
        $str = (string)$str;
1303
1304
        //
1305
        // 1.) check binary strings (010001001...) like UTF-16 / UTF-32 / PDF / Images / ...
1306
        //
1307
1308
        if ($this->is_binary($str, true) === true) {
1309
            $isUtf16 = $this->is_utf16($str, false);
1310
            if ($isUtf16 === 1) {
1311
                return 'UTF-16LE';
1312
            }
1313
            if ($isUtf16 === 2) {
1314
                return 'UTF-16BE';
1315
            }
1316
1317
            $isUtf32 = $this->is_utf32($str, false);
1318
            if ($isUtf32 === 1) {
1319
                return 'UTF-32LE';
1320
            }
1321
            if ($isUtf32 === 2) {
1322
                return 'UTF-32BE';
1323
            }
1324
1325
            // is binary but not "UTF-16" or "UTF-32"
1326
            return false;
1327
        }
1328
1329
        //
1330
        // 2.) simple check for ASCII chars
1331
        //
1332
1333
        if ($this->is_ascii($str) === true) {
1334
            return 'ASCII';
1335
        }
1336
1337
        //
1338
        // 3.) simple check for UTF-8 chars
1339
        //
1340
1341
        if ($this->is_utf8($str) === true) {
1342
            return 'UTF-8';
1343
        }
1344
1345
        //
1346
        // 4.) check via "mb_detect_encoding()"
1347
        //
1348
        // INFO: UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always with "mb_detect_encoding()"
1349
1350
        $detectOrder = [
1351
            'ISO-8859-1',
1352
            'ISO-8859-2',
1353
            'ISO-8859-3',
1354
            'ISO-8859-4',
1355
            'ISO-8859-5',
1356
            'ISO-8859-6',
1357
            'ISO-8859-7',
1358
            'ISO-8859-8',
1359
            'ISO-8859-9',
1360
            'ISO-8859-10',
1361
            'ISO-8859-13',
1362
            'ISO-8859-14',
1363
            'ISO-8859-15',
1364
            'ISO-8859-16',
1365
            'WINDOWS-1251',
1366
            'WINDOWS-1252',
1367
            'WINDOWS-1254',
1368
            'CP932',
1369
            'CP936',
1370
            'CP950',
1371
            'CP866',
1372
            'CP850',
1373
            'CP51932',
1374
            'CP50220',
1375
            'CP50221',
1376
            'CP50222',
1377
            'ISO-2022-JP',
1378
            'ISO-2022-KR',
1379
            'JIS',
1380
            'JIS-ms',
1381
            'EUC-CN',
1382
            'EUC-JP',
1383
        ];
1384
1385
        if ($this->SUPPORT['mbstring'] === true) {
1386
            // info: do not use the symfony polyfill here
1387
            $encoding = \mb_detect_encoding($str, $detectOrder, true);
1388
            if ($encoding) {
1389
                return $encoding;
1390
            }
1391
        }
1392
1393
        //
1394
        // 5.) check via "iconv()"
1395
        //
1396
1397
        if ($this->ENCODINGS === null) {
1398
            $this->ENCODINGS = $this->getData('encodings');
1399
        }
1400
1401
        foreach ($this->ENCODINGS as $encodingTmp) {
1402
            // INFO: //IGNORE but still throw notice
1403
            /** @noinspection PhpUsageOfSilenceOperatorInspection */
1404
            if ((string)@\iconv($encodingTmp, $encodingTmp . '//IGNORE', $str) === $str) {
1405
                return $encodingTmp;
1406
            }
1407
        }
1408
1409
        return false;
1410
    }
1411
1412
    private function is_binary($input, $strict = false)
0 ignored issues
show
introduced by
The method is_binary has a boolean flag argument $strict, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Coding Style Naming introduced by
The method is_binary is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $finfo_encoding is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 112 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style introduced by
Method name "Utf8::is_binary" is not in camel caps format
Loading history...
1413
    {
1414
        $input = (string)$input;
1415
        if ($input === '') {
1416
            return false;
1417
        }
1418
1419
        if (preg_match('~^[01]+$~', $input)) {
1420
            return true;
1421
        }
1422
1423
        $ext = $this->get_file_type($input);
1424
        if ($ext['type'] === 'binary') {
1425
            return true;
1426
        }
1427
1428
        $testLength = \strlen($input);
1429
        $testNull = \substr_count($input, "\x0", 0, $testLength);
1430
        if (($testNull / $testLength) > 0.25) {
1431
            return true;
1432
        }
1433
1434
        if ($strict === true) {
1435
            if ($this->SUPPORT['finfo'] === false) {
1436
                throw new \RuntimeException('ext-fileinfo: is not installed');
1437
            }
1438
1439
            /** @noinspection PhpComposerExtensionStubsInspection */
1440
            $finfo_encoding = (new \finfo(\FILEINFO_MIME_ENCODING))->buffer($input);
1441
            if ($finfo_encoding && $finfo_encoding === 'binary') {
1442
                return true;
1443
            }
1444
        }
1445
1446
        return false;
1447
    }
1448
1449
    private function get_file_type(
0 ignored issues
show
Coding Style Naming introduced by
The method get_file_type is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $str_info is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $type_code is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 120 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style introduced by
Method name "Utf8::get_file_type" is not in camel caps format
Loading history...
1450
        $str,
1451
        $fallback = [
1452
            'ext' => null,
1453
            'mime' => 'application/octet-stream',
1454
            'type' => null,
1455
        ]
1456
    ) {
1457
        if ($str === '') {
1458
            return $fallback;
1459
        }
1460
1461
        $str_info = \substr($str, 0, 2);
1462
        if ($str_info === false || \strlen($str_info) !== 2) {
1463
            return $fallback;
1464
        }
1465
1466
        $str_info = \unpack('C2chars', $str_info);
1467
        if ($str_info === false) {
1468
            return $fallback;
1469
        }
1470
        $type_code = (int)($str_info['chars1'] . $str_info['chars2']);
1471
1472
        // DEBUG
1473
        //var_dump($type_code);
1474
1475
        switch ($type_code) {
1476
            case 3780:
1477
                $ext = 'pdf';
1478
                $mime = 'application/pdf';
1479
                $type = 'binary';
1480
1481
                break;
1482
            case 7790:
1483
                $ext = 'exe';
1484
                $mime = 'application/octet-stream';
1485
                $type = 'binary';
1486
1487
                break;
1488
            case 7784:
1489
                $ext = 'midi';
1490
                $mime = 'audio/x-midi';
1491
                $type = 'binary';
1492
1493
                break;
1494
            case 8075:
1495
                $ext = 'zip';
1496
                $mime = 'application/zip';
1497
                $type = 'binary';
1498
1499
                break;
1500
            case 8297:
1501
                $ext = 'rar';
1502
                $mime = 'application/rar';
1503
                $type = 'binary';
1504
1505
                break;
1506
            case 255216:
1507
                $ext = 'jpg';
1508
                $mime = 'image/jpeg';
1509
                $type = 'binary';
1510
1511
                break;
1512
            case 7173:
1513
                $ext = 'gif';
1514
                $mime = 'image/gif';
1515
                $type = 'binary';
1516
1517
                break;
1518
            case 6677:
1519
                $ext = 'bmp';
1520
                $mime = 'image/bmp';
1521
                $type = 'binary';
1522
1523
                break;
1524
            case 13780:
1525
                $ext = 'png';
1526
                $mime = 'image/png';
1527
                $type = 'binary';
1528
1529
                break;
1530
            default:
1531
                return $fallback;
1532
        }
1533
1534
        return [
1535
            'ext' => $ext,
1536
            'mime' => $mime,
1537
            'type' => $type,
1538
        ];
1539
    }
1540
1541
    private function is_utf16($str, $checkIfStringIsBinary = true)
0 ignored issues
show
introduced by
The method is_utf16 has a boolean flag argument $checkIfStringIsBinary, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Complexity introduced by
This operation has 1152 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The method is_utf16 is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Comprehensibility Naming introduced by
The variable name $checkIfStringIsBinary exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
Coding Style introduced by
Method name "Utf8::is_utf16" is not in camel caps format
Loading history...
1542
    {
1543
1544
        // init
1545
        $str = (string)$str;
1546
        $strChars = [];
1547
1548
        if (
1549
            $checkIfStringIsBinary === true
1550
            &&
1551
            $this->is_binary($str, true) === false
1552
        ) {
1553
            return false;
1554
        }
1555
1556
        if ($this->SUPPORT['mbstring'] === false) {
1557
            \trigger_error('UTF8::is_utf16() without mbstring may did not work correctly', \E_USER_WARNING);
1558
        }
1559
1560
        $str = $this->remove_bom($str);
1561
1562
1563
        $maybeUTF16LE = 0;
1564
        $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-16LE');
1565
        if ($test) {
1566
            $test2 = \mb_convert_encoding($test, 'UTF-16LE', 'UTF-8');
1567
            $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-16LE');
1568
            if ($test3 === $test) {
1569
                if (\count($strChars) === 0) {
1570
                    $strChars = $this->count_chars($str, true, false);
1571
                }
1572
                $countChars = $this->count_chars($test3);
1573
                foreach ($countChars as $test3char => $test3charEmpty) {
1574
                    if (\in_array($test3char, $strChars, true) === true) {
1575
                        ++$maybeUTF16LE;
1576
                    }
1577
                    unset($countChars[$test3char]);
1578
                }
1579
1580
0 ignored issues
show
Coding Style introduced by
Blank line found at end of control structure
Loading history...
1581
            }
1582
        }
1583
1584
        $maybeUTF16BE = 0;
1585
        $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-16BE');
1586
        if ($test) {
1587
            $test2 = \mb_convert_encoding($test, 'UTF-16BE', 'UTF-8');
1588
            $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-16BE');
1589
            if ($test3 === $test) {
1590
                if (\count($strChars) === 0) {
1591
                    $strChars = $this->count_chars($str, true, false);
1592
                }
1593
                $countChars = $this->count_chars($test3);
1594
                foreach ($countChars as $test3char => $test3charEmpty) {
1595
                    if (\in_array($test3char, $strChars, true) === true) {
1596
                        ++$maybeUTF16BE;
1597
                    }
1598
                    unset($countChars[$test3char]);
1599
                }
1600
0 ignored issues
show
Coding Style introduced by
Blank line found at end of control structure
Loading history...
1601
            }
1602
        }
1603
1604
        if ($maybeUTF16BE !== $maybeUTF16LE) {
1605
            if ($maybeUTF16LE > $maybeUTF16BE) {
1606
                return 1;
1607
            }
1608
1609
            return 2;
1610
        }
1611
1612
        return false;
1613
    }
1614
1615
    private function count_chars($str, $cleanUtf8 = false, $tryToUseMbFunction = true)
0 ignored issues
show
introduced by
The method count_chars has a boolean flag argument $cleanUtf8, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
introduced by
The method count_chars has a boolean flag argument $tryToUseMbFunction, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Coding Style Naming introduced by
The method count_chars is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::count_chars" is not in camel caps format
Loading history...
1616
    {
1617
        return \array_count_values($this->str_split($str, 1, $cleanUtf8, $tryToUseMbFunction));
1618
    }
1619
1620
    /**
1621
     * Check if the string is UTF-32.
1622
     *
1623
     * @param mixed $str <p>The input string.</p>
1624
     * @param bool $checkIfStringIsBinary
1625
     *
1626
     * @return false|int
1627
     *                   <strong>false</strong> if is't not UTF-32,<br>
1628
     *                   <strong>1</strong> for UTF-32LE,<br>
1629
     *                   <strong>2</strong> for UTF-32BE
1630
     */
1631
    private function is_utf32($str, $checkIfStringIsBinary = true)
0 ignored issues
show
introduced by
The method is_utf32 has a boolean flag argument $checkIfStringIsBinary, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Complexity introduced by
This operation has 1152 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The method is_utf32 is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Comprehensibility Naming introduced by
The variable name $checkIfStringIsBinary exceeds the maximum configured length of 20.

Very long variable names usually make code harder to read. It is therefore recommended not to make variable names too verbose.

Loading history...
Coding Style introduced by
Method name "Utf8::is_utf32" is not in camel caps format
Loading history...
1632
    {
1633
        // init
1634
        $str = (string)$str;
1635
        $strChars = [];
1636
1637
        if ($checkIfStringIsBinary === true && $this->is_binary($str, true) === false) {
1638
            return false;
1639
        }
1640
1641
        if ($this->SUPPORT['mbstring'] === false) {
1642
            \trigger_error('UTF8::is_utf32() without mbstring may did not work correctly', \E_USER_WARNING);
1643
        }
1644
1645
        $str = $this->remove_bom($str);
1646
1647
        $maybeUTF32LE = 0;
1648
        $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-32LE');
1649
        if ($test) {
1650
            $test2 = \mb_convert_encoding($test, 'UTF-32LE', 'UTF-8');
1651
            $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-32LE');
1652
            if ($test3 === $test) {
1653
                if (\count($strChars) === 0) {
1654
                    $strChars = $this->count_chars($str, true, false);
1655
                }
1656
                $countChars = $this->count_chars($test3);
1657
                foreach ($countChars as $test3char => $test3charEmpty) {
1658
                    if (\in_array($test3char, $strChars, true) === true) {
1659
                        ++$maybeUTF32LE;
1660
                    }
1661
                    unset($countChars[$test3char]);
1662
                }
1663
            }
1664
        }
1665
1666
        $maybeUTF32BE = 0;
1667
        $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-32BE');
1668
        if ($test) {
1669
            $test2 = \mb_convert_encoding($test, 'UTF-32BE', 'UTF-8');
1670
            $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-32BE');
1671
            if ($test3 === $test) {
1672
                if (\count($strChars) === 0) {
1673
                    $strChars = $this->count_chars($str, true, false);
1674
                }
1675
                $countChars = $this->count_chars($test3);
1676
                foreach ($countChars as $test3char => $test3charEmpty) {
1677
                    if (\in_array($test3char, $strChars, true) === true) {
1678
                        ++$maybeUTF32BE;
1679
                    }
1680
                    unset($countChars[$test3char]);
1681
                }
1682
            }
1683
        }
1684
1685
        if ($maybeUTF32BE !== $maybeUTF32LE) {
1686
            if ($maybeUTF32LE > $maybeUTF32BE) {
1687
                return 1;
1688
            }
1689
1690
            return 2;
1691
        }
1692
1693
        return false;
1694
    }
1695
1696
    /**
1697
     * Checks whether the passed string contains only byte sequences that appear valid UTF-8 characters.
1698
     *
1699
     * @see    http://hsivonen.iki.fi/php-utf8/
1700
     *
1701
     * @param string|string[] $str <p>The string to be checked.</p>
1702
     * @param bool $strict <p>Check also if the string is not UTF-16 or UTF-32.</p>
1703
     *
1704
     * @return bool
1705
     */
1706
    private function is_utf8($str, $strict = false)
0 ignored issues
show
introduced by
The method is_utf8 has a boolean flag argument $strict, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Complexity introduced by
This operation has 6400 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The method is_utf8 is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::is_utf8" is not in camel caps format
Loading history...
1707
    {
1708
        if (\is_array($str) === true) {
1709
            foreach ($str as &$v) {
1710
                if ($this->is_utf8($v, $strict) === false) {
1711
                    return false;
1712
                }
1713
            }
1714
1715
            return true;
1716
        }
1717
1718
        if ($str === '') {
1719
            return true;
1720
        }
1721
1722
        if ($strict === true) {
1723
            $isBinary = $this->is_binary($str, true);
1724
1725
            if ($isBinary && $this->is_utf16($str, false) !== false) {
1726
                return false;
1727
            }
1728
1729
            if ($isBinary && $this->is_utf32($str, false) !== false) {
1730
                return false;
1731
            }
1732
        }
1733
1734
        if ($this->system->pcre_utf8_support() !== true) {
0 ignored issues
show
Coding Style introduced by
Blank line found at start of control structure
Loading history...
1735
1736
            // If even just the first character can be matched, when the /u
1737
            // modifier is used, then it's valid UTF-8. If the UTF-8 is somehow
1738
            // invalid, nothing at all will match, even if the string contains
1739
            // some valid sequences
1740
            return \preg_match('/^.{1}/us', $str, $ar) === 1;
0 ignored issues
show
Comprehensibility introduced by
Avoid variables with short names like $ar. Configured minimum length is 3.

Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.

Loading history...
1741
        }
1742
1743
        $mState = 0; // cached expected number of octets after the current octet
1744
        // until the beginning of the next UTF8 character sequence
1745
        $mUcs4 = 0; // cached Unicode character
1746
        $mBytes = 1; // cached expected number of octets in the current sequence
1747
1748
        if ($this->ORD === null) {
1749
            $this->ORD = $this->getData('ord');
1750
        }
1751
1752
        $len = \strlen((string)$str);
1753
        /** @noinspection ForeachInvariantsInspection */
1754
        for ($i = 0; $i < $len; ++$i) {
1755
            $in = $this->ORD[$str[$i]];
0 ignored issues
show
Comprehensibility introduced by
Avoid variables with short names like $in. Configured minimum length is 3.

Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.

Loading history...
1756
            if ($mState === 0) {
1757
                // When mState is zero we expect either a US-ASCII character or a
1758
                // multi-octet sequence.
1759
                if ((0x80 & $in) === 0) {
1760
                    // US-ASCII, pass straight through.
1761
                    $mBytes = 1;
1762
                } elseif ((0xE0 & $in) === 0xC0) {
1763
                    // First octet of 2 octet sequence.
1764
                    $mUcs4 = $in;
1765
                    $mUcs4 = ($mUcs4 & 0x1F) << 6;
1766
                    $mState = 1;
1767
                    $mBytes = 2;
1768
                } elseif ((0xF0 & $in) === 0xE0) {
1769
                    // First octet of 3 octet sequence.
1770
                    $mUcs4 = $in;
1771
                    $mUcs4 = ($mUcs4 & 0x0F) << 12;
1772
                    $mState = 2;
1773
                    $mBytes = 3;
1774
                } elseif ((0xF8 & $in) === 0xF0) {
1775
                    // First octet of 4 octet sequence.
1776
                    $mUcs4 = $in;
1777
                    $mUcs4 = ($mUcs4 & 0x07) << 18;
1778
                    $mState = 3;
1779
                    $mBytes = 4;
1780
                } elseif ((0xFC & $in) === 0xF8) {
1781
                    /* First octet of 5 octet sequence.
1782
                     *
1783
                     * This is illegal because the encoded codepoint must be either
1784
                     * (a) not the shortest form or
1785
                     * (b) outside the Unicode range of 0-0x10FFFF.
1786
                     * Rather than trying to resynchronize, we will carry on until the end
1787
                     * of the sequence and let the later error handling code catch it.
1788
                     */
1789
                    $mUcs4 = $in;
1790
                    $mUcs4 = ($mUcs4 & 0x03) << 24;
1791
                    $mState = 4;
1792
                    $mBytes = 5;
1793
                } elseif ((0xFE & $in) === 0xFC) {
1794
                    // First octet of 6 octet sequence, see comments for 5 octet sequence.
1795
                    $mUcs4 = $in;
1796
                    $mUcs4 = ($mUcs4 & 1) << 30;
1797
                    $mState = 5;
1798
                    $mBytes = 6;
1799
                } else {
0 ignored issues
show
Coding Style introduced by
The method is_utf8 uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
1800
                    // Current octet is neither in the US-ASCII range nor a legal first
1801
                    // octet of a multi-octet sequence.
1802
                    return false;
1803
                }
1804
            } elseif ((0xC0 & $in) === 0x80) {
0 ignored issues
show
Coding Style introduced by
Blank line found at start of control structure
Loading history...
1805
1806
                // When mState is non-zero, we expect a continuation of the multi-octet
1807
                // sequence
1808
1809
                // Legal continuation.
1810
                $shift = ($mState - 1) * 6;
1811
                $tmp = $in;
1812
                $tmp = ($tmp & 0x0000003F) << $shift;
1813
                $mUcs4 |= $tmp;
1814
                // Prefix: End of the multi-octet sequence. mUcs4 now contains the final
1815
                // Unicode code point to be output.
1816
                if (--$mState === 0) {
1817
                    // Check for illegal sequences and code points.
1818
                    //
1819
                    // From Unicode 3.1, non-shortest form is illegal
1820
                    if (
1821
                        ($mBytes === 2 && $mUcs4 < 0x0080)
1822
                        ||
1823
                        ($mBytes === 3 && $mUcs4 < 0x0800)
1824
                        ||
1825
                        ($mBytes === 4 && $mUcs4 < 0x10000)
1826
                        ||
1827
                        ($mBytes > 4)
1828
                        ||
1829
                        // From Unicode 3.2, surrogate characters are illegal.
1830
                        (($mUcs4 & 0xFFFFF800) === 0xD800)
1831
                        ||
1832
                        // Code points outside the Unicode range are illegal.
1833
                        ($mUcs4 > 0x10FFFF)
1834
                    ) {
1835
                        return false;
1836
                    }
1837
                    // initialize UTF8 cache
1838
                    $mState = 0;
1839
                    $mUcs4 = 0;
1840
                    $mBytes = 1;
1841
                }
1842
            } else {
0 ignored issues
show
Coding Style introduced by
The method is_utf8 uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
1843
                // ((0xC0 & (*in) != 0x80) && (mState != 0))
1844
                // Incomplete multi-octet sequence.
1845
                return false;
1846
            }
1847
        }
1848
1849
        return true;
1850
    }
1851
1852
    private function to_iso8859($str)
0 ignored issues
show
Coding Style Naming introduced by
The method to_iso8859 is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::to_iso8859" is not in camel caps format
Loading history...
1853
    {
1854
        if (is_array($str) === true) {
0 ignored issues
show
Coding Style introduced by
Blank line found at start of control structure
Loading history...
1855
1856
            foreach ($str as $k => $v) {
1857
                $str[$k] = $this->to_iso8859($v);
1858
            }
1859
1860
            return $str;
1861
        }
1862
1863
        $str = (string)$str;
1864
        if ($str === '') {
1865
            return '';
1866
        }
1867
1868
        return $this->utf8_decode($str);
1869
    }
1870
1871
    /**
1872
     * Decodes an UTF-8 string to ISO-8859-1.
1873
     *
1874
     * @param string $str <p>The input string.</p>
1875
     * @param bool $keepUtf8Chars
1876
     *
1877
     * @return string
1878
     */
1879
    private function utf8_decode($str, $keepUtf8Chars = false)
0 ignored issues
show
introduced by
The method utf8_decode has a boolean flag argument $keepUtf8Chars, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Complexity introduced by
This operation has 480 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style Naming introduced by
The method utf8_decode is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $str_backup is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::utf8_decode" is not in camel caps format
Loading history...
1880
    {
1881
        if ($str === '') {
1882
            return '';
1883
        }
1884
1885
        // save for later comparision
1886
        $str_backup = $str;
1887
        $len = \strlen($str);
1888
1889
        if ($this->ORD === null) {
1890
            $this->ORD = $this->getData('ord');
1891
        }
1892
1893
        if ($this->CHR === null) {
1894
            $this->CHR = $this->getData('chr');
1895
        }
1896
1897
        $noCharFound = '?';
1898
        /** @noinspection ForeachInvariantsInspection */
1899
        for ($i = 0, $j = 0; $i < $len; ++$i, ++$j) {
1900
            switch ($str[$i] & "\xF0") {
1901
                case "\xC0":
1902
                case "\xD0":
1903
                    $c = ($this->ORD[$str[$i] & "\x1F"] << 6) | $this->ORD[$str[++$i] & "\x3F"];
0 ignored issues
show
Comprehensibility introduced by
Avoid variables with short names like $c. Configured minimum length is 3.

Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.

Loading history...
1904
                    $str[$j] = $c < 256 ? $this->CHR[$c] : $noCharFound;
1905
1906
                    break;
1907
1908
                /** @noinspection PhpMissingBreakStatementInspection */
1909
                case "\xF0":
1910
                    ++$i;
1911
1912
                // no break
1913
1914
                case "\xE0":
1915
                    $str[$j] = $noCharFound;
1916
                    $i += 2;
1917
1918
                    break;
1919
1920
                default:
1921
                    $str[$j] = $str[$i];
1922
            }
1923
        }
1924
1925
        $return = substr($str, 0, $j);
1926
        if ($return === false) {
1927
            $return = '';
1928
        }
1929
1930
        if (
1931
            $keepUtf8Chars === true
1932
            &&
1933
            $this->strlen($return) >= (int)$this->strlen($str_backup)
1934
        ) {
1935
            return $str_backup;
1936
        }
1937
1938
        return $return;
1939
    }
1940
1941
    /**
1942
     * Get the string length, not the byte-length!
1943
     *
1944
     * @see     http://php.net/manual/en/function.mb-strlen.php
1945
     *
1946
     * @param string $str <p>The string being checked for length.</p>
1947
     * @param string $encoding [optional] <p>Set the charset for e.g. "mb_" function</p>
1948
     * @param bool $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
1949
     *
1950
     * @return false|int
1951
     *                   The number <strong>(int)</strong> of characters in the string $str having character encoding
1952
     *                   $encoding.
1953
     *                   (One multi-byte character counted as +1).
1954
     *                   <br>
1955
     *                   Can return <strong>false</strong>, if e.g. mbstring is not installed and we process invalid
1956
     *                   chars.
1957
     */
1958
    private function strlen($str, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
introduced by
The method strlen has a boolean flag argument $cleanUtf8, which is a certain sign of a Single Responsibility Principle violation.
Loading history...
Complexity introduced by
This operation has 20736 execution paths which exceeds the configured maximum of 200.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
1959
    {
1960
        if ($str === '') {
1961
            return 0;
1962
        }
1963
1964
        if ($encoding !== 'UTF-8' && $encoding !== 'CP850') {
1965
            $encoding = $this->normalize_encoding($encoding, 'UTF-8');
1966
        }
1967
1968
        if ($cleanUtf8 === true) {
1969
            // "mb_strlen" and "\iconv_strlen" returns wrong length,
1970
            // if invalid characters are found in $str
1971
            $str = $this->clean($str);
1972
        }
1973
1974
        //
1975
        // fallback via mbstring
1976
        //
1977
1978
        if ($this->SUPPORT['mbstring'] === true) {
1979
            if ($encoding === 'UTF-8') {
1980
                return \mb_strlen($str);
1981
            }
1982
1983
            return \mb_strlen($str, $encoding);
1984
        }
1985
1986
        //
1987
        // fallback for binary || ascii only
1988
        //
1989
1990
        if (
1991
            $encoding === 'CP850'
1992
            ||
1993
            $encoding === 'ASCII'
1994
        ) {
1995
            return \strlen($str);
1996
        }
1997
1998
        if (
1999
            $encoding !== 'UTF-8'
2000
            &&
2001
            $this->SUPPORT['mbstring'] === false
2002
            &&
2003
            $this->SUPPORT['iconv'] === false
2004
        ) {
2005
            \trigger_error('UTF8::strlen() without mbstring / iconv cannot handle "' . $encoding . '" encoding', \E_USER_WARNING);
2006
        }
2007
2008
        //
2009
        // fallback via iconv
2010
        //
2011
2012
        if ($this->SUPPORT['iconv'] === true) {
2013
            $returnTmp = \iconv_strlen($str, $encoding);
2014
            if ($returnTmp !== false) {
2015
                return $returnTmp;
2016
            }
2017
        }
2018
2019
        //
2020
        // fallback via intl
2021
        //
2022
2023
        if (
2024
            $encoding === 'UTF-8' // INFO: "grapheme_strlen()" can't handle other encodings
2025
            &&
2026
            $this->SUPPORT['intl'] === true
2027
        ) {
2028
            $returnTmp = \grapheme_strlen($str);
2029
            if ($returnTmp !== null) {
2030
                return $returnTmp;
2031
            }
2032
        }
2033
2034
        //
2035
        // fallback for ascii only
2036
        //
2037
2038
        if ($this->is_ascii($str)) {
2039
            return \strlen($str);
2040
        }
2041
2042
        //
2043
        // fallback via vanilla php
2044
        //
2045
2046
        \preg_match_all('/./us', $str, $parts);
2047
2048
        $returnTmp = \count($parts[0]);
2049
        if ($returnTmp === 0) {
2050
            return false;
2051
        }
2052
2053
        return $returnTmp;
2054
    }
2055
2056
    private function decimal_to_chr($int)
0 ignored issues
show
Coding Style Naming introduced by
The method decimal_to_chr is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Utf8::decimal_to_chr" is not in camel caps format
Loading history...
2057
    {
2058
        return $this->html_entity_decode('&#' . $int . ';', \ENT_QUOTES | \ENT_HTML5);
2059
    }
2060
2061
2062
}
2063