Completed
Push — master ( 302520...eee065 )
by Lars
03:05
created

UTF8::substr_count()   C

Complexity

Conditions 12
Paths 66

Size

Total Lines 56
Code Lines 30

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 22
CRAP Score 12

Importance

Changes 0
Metric Value
dl 0
loc 56
ccs 22
cts 22
cp 1
rs 6.7092
c 0
b 0
f 0
cc 12
eloc 30
nc 66
nop 6
crap 12

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
declare(strict_types=1);
4
5
namespace voku\helper;
6
7
use Symfony\Polyfill\Intl\Grapheme\Grapheme;
8
9
/**
10
 * UTF8-Helper-Class
11
 *
12
 * @package voku\helper
13
 */
14
final class UTF8
15
{
16
  /**
17
   * @var array
18
   */
19
  private static $WIN1252_TO_UTF8 = array(
20
      128 => "\xe2\x82\xac", // EURO SIGN
21
      130 => "\xe2\x80\x9a", // SINGLE LOW-9 QUOTATION MARK
22
      131 => "\xc6\x92", // LATIN SMALL LETTER F WITH HOOK
23
      132 => "\xe2\x80\x9e", // DOUBLE LOW-9 QUOTATION MARK
24
      133 => "\xe2\x80\xa6", // HORIZONTAL ELLIPSIS
25
      134 => "\xe2\x80\xa0", // DAGGER
26
      135 => "\xe2\x80\xa1", // DOUBLE DAGGER
27
      136 => "\xcb\x86", // MODIFIER LETTER CIRCUMFLEX ACCENT
28
      137 => "\xe2\x80\xb0", // PER MILLE SIGN
29
      138 => "\xc5\xa0", // LATIN CAPITAL LETTER S WITH CARON
30
      139 => "\xe2\x80\xb9", // SINGLE LEFT-POINTING ANGLE QUOTE
31
      140 => "\xc5\x92", // LATIN CAPITAL LIGATURE OE
32
      142 => "\xc5\xbd", // LATIN CAPITAL LETTER Z WITH CARON
33
      145 => "\xe2\x80\x98", // LEFT SINGLE QUOTATION MARK
34
      146 => "\xe2\x80\x99", // RIGHT SINGLE QUOTATION MARK
35
      147 => "\xe2\x80\x9c", // LEFT DOUBLE QUOTATION MARK
36
      148 => "\xe2\x80\x9d", // RIGHT DOUBLE QUOTATION MARK
37
      149 => "\xe2\x80\xa2", // BULLET
38
      150 => "\xe2\x80\x93", // EN DASH
39
      151 => "\xe2\x80\x94", // EM DASH
40
      152 => "\xcb\x9c", // SMALL TILDE
41
      153 => "\xe2\x84\xa2", // TRADE MARK SIGN
42
      154 => "\xc5\xa1", // LATIN SMALL LETTER S WITH CARON
43
      155 => "\xe2\x80\xba", // SINGLE RIGHT-POINTING ANGLE QUOTE
44
      156 => "\xc5\x93", // LATIN SMALL LIGATURE OE
45
      158 => "\xc5\xbe", // LATIN SMALL LETTER Z WITH CARON
46
      159 => "\xc5\xb8", // LATIN CAPITAL LETTER Y WITH DIAERESIS
47
  );
48
49
  /**
50
   * @var array
51
   */
52
  private static $CP1252_TO_UTF8 = array(
53
      '€' => '€',
54
      '‚' => '‚',
55
      'ƒ' => 'ƒ',
56
      '„' => '„',
57
      '…' => '…',
58
      '†' => '†',
59
      '‡' => '‡',
60
      'ˆ' => 'ˆ',
61
      '‰' => '‰',
62
      'Š' => 'Š',
63
      '‹' => '‹',
64
      'Œ' => 'Œ',
65
      'Ž' => 'Ž',
66
      '‘' => '‘',
67
      '’' => '’',
68
      '“' => '“',
69
      '”' => '”',
70
      '•' => '•',
71
      '–' => '–',
72
      '—' => '—',
73
      '˜' => '˜',
74
      '™' => '™',
75
      'š' => 'š',
76
      '›' => '›',
77
      'œ' => 'œ',
78
      'ž' => 'ž',
79
      'Ÿ' => 'Ÿ',
80
  );
81
82
  /**
83
   * Bom => Byte-Length
84
   *
85
   * INFO: https://en.wikipedia.org/wiki/Byte_order_mark
86
   *
87
   * @var array
88
   */
89
  private static $BOM = array(
90
      "\xef\xbb\xbf"     => 3, // UTF-8 BOM
91
      ''              => 6, // UTF-8 BOM as "WINDOWS-1252" (one char has [maybe] more then one byte ...)
92
      "\x00\x00\xfe\xff" => 4, // UTF-32 (BE) BOM
93
      '  þÿ'             => 6, // UTF-32 (BE) BOM as "WINDOWS-1252"
0 ignored issues
show
Unused Code Comprehensibility introduced by
36% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
94
      "\xff\xfe\x00\x00" => 4, // UTF-32 (LE) BOM
95
      'ÿþ  '             => 6, // UTF-32 (LE) BOM as "WINDOWS-1252"
0 ignored issues
show
Unused Code Comprehensibility introduced by
36% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
96
      "\xfe\xff"         => 2, // UTF-16 (BE) BOM
97
      'þÿ'               => 4, // UTF-16 (BE) BOM as "WINDOWS-1252"
0 ignored issues
show
Unused Code Comprehensibility introduced by
36% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
98
      "\xff\xfe"         => 2, // UTF-16 (LE) BOM
99
      'ÿþ'               => 4, // UTF-16 (LE) BOM as "WINDOWS-1252"
0 ignored issues
show
Unused Code Comprehensibility introduced by
36% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
100
  );
101
102
  /**
103
   * Numeric code point => UTF-8 Character
104
   *
105
   * url: http://www.w3schools.com/charsets/ref_utf_punctuation.asp
106
   *
107
   * @var array
108
   */
109
  private static $WHITESPACE = array(
110
    // NUL Byte
111
    0     => "\x0",
112
    // Tab
113
    9     => "\x9",
114
    // New Line
115
    10    => "\xa",
116
    // Vertical Tab
117
    11    => "\xb",
118
    // Carriage Return
119
    13    => "\xd",
120
    // Ordinary Space
121
    32    => "\x20",
122
    // NO-BREAK SPACE
123
    160   => "\xc2\xa0",
124
    // OGHAM SPACE MARK
125
    5760  => "\xe1\x9a\x80",
126
    // MONGOLIAN VOWEL SEPARATOR
127
    6158  => "\xe1\xa0\x8e",
128
    // EN QUAD
129
    8192  => "\xe2\x80\x80",
130
    // EM QUAD
131
    8193  => "\xe2\x80\x81",
132
    // EN SPACE
133
    8194  => "\xe2\x80\x82",
134
    // EM SPACE
135
    8195  => "\xe2\x80\x83",
136
    // THREE-PER-EM SPACE
137
    8196  => "\xe2\x80\x84",
138
    // FOUR-PER-EM SPACE
139
    8197  => "\xe2\x80\x85",
140
    // SIX-PER-EM SPACE
141
    8198  => "\xe2\x80\x86",
142
    // FIGURE SPACE
143
    8199  => "\xe2\x80\x87",
144
    // PUNCTUATION SPACE
145
    8200  => "\xe2\x80\x88",
146
    // THIN SPACE
147
    8201  => "\xe2\x80\x89",
148
    //HAIR SPACE
149
    8202  => "\xe2\x80\x8a",
150
    // LINE SEPARATOR
151
    8232  => "\xe2\x80\xa8",
152
    // PARAGRAPH SEPARATOR
153
    8233  => "\xe2\x80\xa9",
154
    // NARROW NO-BREAK SPACE
155
    8239  => "\xe2\x80\xaf",
156
    // MEDIUM MATHEMATICAL SPACE
157
    8287  => "\xe2\x81\x9f",
158
    // IDEOGRAPHIC SPACE
159
    12288 => "\xe3\x80\x80",
160
  );
161
162
  /**
163
   * @var array
164
   */
165
  private static $WHITESPACE_TABLE = array(
166
      'SPACE'                     => "\x20",
167
      'NO-BREAK SPACE'            => "\xc2\xa0",
168
      'OGHAM SPACE MARK'          => "\xe1\x9a\x80",
169
      'EN QUAD'                   => "\xe2\x80\x80",
170
      'EM QUAD'                   => "\xe2\x80\x81",
171
      'EN SPACE'                  => "\xe2\x80\x82",
172
      'EM SPACE'                  => "\xe2\x80\x83",
173
      'THREE-PER-EM SPACE'        => "\xe2\x80\x84",
174
      'FOUR-PER-EM SPACE'         => "\xe2\x80\x85",
175
      'SIX-PER-EM SPACE'          => "\xe2\x80\x86",
176
      'FIGURE SPACE'              => "\xe2\x80\x87",
177
      'PUNCTUATION SPACE'         => "\xe2\x80\x88",
178
      'THIN SPACE'                => "\xe2\x80\x89",
179
      'HAIR SPACE'                => "\xe2\x80\x8a",
180
      'LINE SEPARATOR'            => "\xe2\x80\xa8",
181
      'PARAGRAPH SEPARATOR'       => "\xe2\x80\xa9",
182
      'ZERO WIDTH SPACE'          => "\xe2\x80\x8b",
183
      'NARROW NO-BREAK SPACE'     => "\xe2\x80\xaf",
184
      'MEDIUM MATHEMATICAL SPACE' => "\xe2\x81\x9f",
185
      'IDEOGRAPHIC SPACE'         => "\xe3\x80\x80",
186
  );
187
188
  /**
189
   * bidirectional text chars
190
   *
191
   * url: https://www.w3.org/International/questions/qa-bidi-unicode-controls
192
   *
193
   * @var array
194
   */
195
  private static $BIDI_UNI_CODE_CONTROLS_TABLE = array(
196
    // LEFT-TO-RIGHT EMBEDDING (use -> dir = "ltr")
197
    8234 => "\xE2\x80\xAA",
198
    // RIGHT-TO-LEFT EMBEDDING (use -> dir = "rtl")
199
    8235 => "\xE2\x80\xAB",
200
    // POP DIRECTIONAL FORMATTING // (use -> </bdo>)
201
    8236 => "\xE2\x80\xAC",
202
    // LEFT-TO-RIGHT OVERRIDE // (use -> <bdo dir = "ltr">)
203
    8237 => "\xE2\x80\xAD",
204
    // RIGHT-TO-LEFT OVERRIDE // (use -> <bdo dir = "rtl">)
205
    8238 => "\xE2\x80\xAE",
206
    // LEFT-TO-RIGHT ISOLATE // (use -> dir = "ltr")
207
    8294 => "\xE2\x81\xA6",
208
    // RIGHT-TO-LEFT ISOLATE // (use -> dir = "rtl")
209
    8295 => "\xE2\x81\xA7",
210
    // FIRST STRONG ISOLATE // (use -> dir = "auto")
211
    8296 => "\xE2\x81\xA8",
212
    // POP DIRECTIONAL ISOLATE
213
    8297 => "\xE2\x81\xA9",
214
  );
215
216
  /**
217
   * @var array
218
   */
219
  private static $COMMON_CASE_FOLD = array(
220
      'ſ'            => 's',
221
      "\xCD\x85"     => 'ι',
222
      'ς'            => 'σ',
223
      "\xCF\x90"     => 'β',
224
      "\xCF\x91"     => 'θ',
225
      "\xCF\x95"     => 'φ',
226
      "\xCF\x96"     => 'π',
227
      "\xCF\xB0"     => 'κ',
228
      "\xCF\xB1"     => 'ρ',
229
      "\xCF\xB5"     => 'ε',
230
      "\xE1\xBA\x9B" => "\xE1\xB9\xA1",
231
      "\xE1\xBE\xBE" => 'ι',
232
  );
233
234
  /**
235
   * @var array
236
   */
237
  private static $BROKEN_UTF8_FIX = array(
238
      "\xc2\x80" => "\xe2\x82\xac", // EURO SIGN
239
      "\xc2\x82" => "\xe2\x80\x9a", // SINGLE LOW-9 QUOTATION MARK
240
      "\xc2\x83" => "\xc6\x92", // LATIN SMALL LETTER F WITH HOOK
241
      "\xc2\x84" => "\xe2\x80\x9e", // DOUBLE LOW-9 QUOTATION MARK
242
      "\xc2\x85" => "\xe2\x80\xa6", // HORIZONTAL ELLIPSIS
243
      "\xc2\x86" => "\xe2\x80\xa0", // DAGGER
244
      "\xc2\x87" => "\xe2\x80\xa1", // DOUBLE DAGGER
245
      "\xc2\x88" => "\xcb\x86", // MODIFIER LETTER CIRCUMFLEX ACCENT
246
      "\xc2\x89" => "\xe2\x80\xb0", // PER MILLE SIGN
247
      "\xc2\x8a" => "\xc5\xa0", // LATIN CAPITAL LETTER S WITH CARON
248
      "\xc2\x8b" => "\xe2\x80\xb9", // SINGLE LEFT-POINTING ANGLE QUOTE
249
      "\xc2\x8c" => "\xc5\x92", // LATIN CAPITAL LIGATURE OE
250
      "\xc2\x8e" => "\xc5\xbd", // LATIN CAPITAL LETTER Z WITH CARON
251
      "\xc2\x91" => "\xe2\x80\x98", // LEFT SINGLE QUOTATION MARK
252
      "\xc2\x92" => "\xe2\x80\x99", // RIGHT SINGLE QUOTATION MARK
253
      "\xc2\x93" => "\xe2\x80\x9c", // LEFT DOUBLE QUOTATION MARK
254
      "\xc2\x94" => "\xe2\x80\x9d", // RIGHT DOUBLE QUOTATION MARK
255
      "\xc2\x95" => "\xe2\x80\xa2", // BULLET
256
      "\xc2\x96" => "\xe2\x80\x93", // EN DASH
257
      "\xc2\x97" => "\xe2\x80\x94", // EM DASH
258
      "\xc2\x98" => "\xcb\x9c", // SMALL TILDE
259
      "\xc2\x99" => "\xe2\x84\xa2", // TRADE MARK SIGN
260
      "\xc2\x9a" => "\xc5\xa1", // LATIN SMALL LETTER S WITH CARON
261
      "\xc2\x9b" => "\xe2\x80\xba", // SINGLE RIGHT-POINTING ANGLE QUOTE
262
      "\xc2\x9c" => "\xc5\x93", // LATIN SMALL LIGATURE OE
263
      "\xc2\x9e" => "\xc5\xbe", // LATIN SMALL LETTER Z WITH CARON
264
      "\xc2\x9f" => "\xc5\xb8", // LATIN CAPITAL LETTER Y WITH DIAERESIS
265
      'ü'       => 'ü',
266
      'ä'       => 'ä',
267
      'ö'       => 'ö',
268
      'Ö'       => 'Ö',
269
      'ß'       => 'ß',
270
      'Ã '       => 'à',
271
      'á'       => 'á',
272
      'â'       => 'â',
273
      'ã'       => 'ã',
274
      'ù'       => 'ù',
275
      'ú'       => 'ú',
276
      'û'       => 'û',
277
      'Ù'       => 'Ù',
278
      'Ú'       => 'Ú',
279
      'Û'       => 'Û',
280
      'Ü'       => 'Ü',
281
      'ò'       => 'ò',
282
      'ó'       => 'ó',
283
      'ô'       => 'ô',
284
      'è'       => 'è',
285
      'é'       => 'é',
286
      'ê'       => 'ê',
287
      'ë'       => 'ë',
288
      'À'       => 'À',
289
      'Á'       => 'Á',
290
      'Â'       => 'Â',
291
      'Ã'       => 'Ã',
292
      'Ä'       => 'Ä',
293
      'Ã…'       => 'Å',
294
      'Ç'       => 'Ç',
295
      'È'       => 'È',
296
      'É'       => 'É',
297
      'Ê'       => 'Ê',
298
      'Ë'       => 'Ë',
299
      'ÃŒ'       => 'Ì',
300
      'Í'       => 'Í',
301
      'ÃŽ'       => 'Î',
302
      'Ï'       => 'Ï',
303
      'Ñ'       => 'Ñ',
304
      'Ã’'       => 'Ò',
305
      'Ó'       => 'Ó',
306
      'Ô'       => 'Ô',
307
      'Õ'       => 'Õ',
308
      'Ø'       => 'Ø',
309
      'Ã¥'       => 'å',
310
      'æ'       => 'æ',
311
      'ç'       => 'ç',
312
      'ì'       => 'ì',
313
      'í'       => 'í',
314
      'î'       => 'î',
315
      'ï'       => 'ï',
316
      'ð'       => 'ð',
317
      'ñ'       => 'ñ',
318
      'õ'       => 'õ',
319
      'ø'       => 'ø',
320
      'ý'       => 'ý',
321
      'ÿ'       => 'ÿ',
322
      '€'      => '€',
323
      '’'      => '’',
324
  );
325
326
  /**
327
   * @var array
328
   */
329
  private static $UTF8_TO_WIN1252 = array(
330
      "\xe2\x82\xac" => "\x80", // EURO SIGN
331
      "\xe2\x80\x9a" => "\x82", // SINGLE LOW-9 QUOTATION MARK
332
      "\xc6\x92"     => "\x83", // LATIN SMALL LETTER F WITH HOOK
333
      "\xe2\x80\x9e" => "\x84", // DOUBLE LOW-9 QUOTATION MARK
334
      "\xe2\x80\xa6" => "\x85", // HORIZONTAL ELLIPSIS
335
      "\xe2\x80\xa0" => "\x86", // DAGGER
336
      "\xe2\x80\xa1" => "\x87", // DOUBLE DAGGER
337
      "\xcb\x86"     => "\x88", // MODIFIER LETTER CIRCUMFLEX ACCENT
338
      "\xe2\x80\xb0" => "\x89", // PER MILLE SIGN
339
      "\xc5\xa0"     => "\x8a", // LATIN CAPITAL LETTER S WITH CARON
340
      "\xe2\x80\xb9" => "\x8b", // SINGLE LEFT-POINTING ANGLE QUOTE
341
      "\xc5\x92"     => "\x8c", // LATIN CAPITAL LIGATURE OE
342
      "\xc5\xbd"     => "\x8e", // LATIN CAPITAL LETTER Z WITH CARON
343
      "\xe2\x80\x98" => "\x91", // LEFT SINGLE QUOTATION MARK
344
      "\xe2\x80\x99" => "\x92", // RIGHT SINGLE QUOTATION MARK
345
      "\xe2\x80\x9c" => "\x93", // LEFT DOUBLE QUOTATION MARK
346
      "\xe2\x80\x9d" => "\x94", // RIGHT DOUBLE QUOTATION MARK
347
      "\xe2\x80\xa2" => "\x95", // BULLET
348
      "\xe2\x80\x93" => "\x96", // EN DASH
349
      "\xe2\x80\x94" => "\x97", // EM DASH
350
      "\xcb\x9c"     => "\x98", // SMALL TILDE
351
      "\xe2\x84\xa2" => "\x99", // TRADE MARK SIGN
352
      "\xc5\xa1"     => "\x9a", // LATIN SMALL LETTER S WITH CARON
353
      "\xe2\x80\xba" => "\x9b", // SINGLE RIGHT-POINTING ANGLE QUOTE
354
      "\xc5\x93"     => "\x9c", // LATIN SMALL LIGATURE OE
355
      "\xc5\xbe"     => "\x9e", // LATIN SMALL LETTER Z WITH CARON
356
      "\xc5\xb8"     => "\x9f", // LATIN CAPITAL LETTER Y WITH DIAERESIS
357
  );
358
359
  /**
360
   * @var array
361
   */
362
  private static $UTF8_MSWORD = array(
363
      "\xc2\xab"     => '"', // « (U+00AB) in UTF-8
364
      "\xc2\xbb"     => '"', // » (U+00BB) in UTF-8
365
      "\xe2\x80\x98" => "'", // ‘ (U+2018) in UTF-8
366
      "\xe2\x80\x99" => "'", // ’ (U+2019) in UTF-8
367
      "\xe2\x80\x9a" => "'", // ‚ (U+201A) in UTF-8
368
      "\xe2\x80\x9b" => "'", // ‛ (U+201B) in UTF-8
369
      "\xe2\x80\x9c" => '"', // “ (U+201C) in UTF-8
370
      "\xe2\x80\x9d" => '"', // ” (U+201D) in UTF-8
371
      "\xe2\x80\x9e" => '"', // „ (U+201E) in UTF-8
372
      "\xe2\x80\x9f" => '"', // ‟ (U+201F) in UTF-8
373
      "\xe2\x80\xb9" => "'", // ‹ (U+2039) in UTF-8
374
      "\xe2\x80\xba" => "'", // › (U+203A) in UTF-8
375
      "\xe2\x80\x93" => '-', // – (U+2013) in UTF-8
376
      "\xe2\x80\x94" => '-', // — (U+2014) in UTF-8
377
      "\xe2\x80\xa6" => '...' // … (U+2026) in UTF-8
378
  );
379
380
  /**
381
   * @var array
382
   */
383
  private static $ICONV_ENCODING = array(
384
      'ANSI_X3.4-1968',
385
      'ANSI_X3.4-1986',
386
      'ASCII',
387
      'CP367',
388
      'IBM367',
389
      'ISO-IR-6',
390
      'ISO646-US',
391
      'ISO_646.IRV:1991',
392
      'US',
393
      'US-ASCII',
394
      'CSASCII',
395
      'UTF-8',
396
      'ISO-10646-UCS-2',
397
      'UCS-2',
398
      'CSUNICODE',
399
      'UCS-2BE',
400
      'UNICODE-1-1',
401
      'UNICODEBIG',
402
      'CSUNICODE11',
403
      'UCS-2LE',
404
      'UNICODELITTLE',
405
      'ISO-10646-UCS-4',
406
      'UCS-4',
407
      'CSUCS4',
408
      'UCS-4BE',
409
      'UCS-4LE',
410
      'UTF-16',
411
      'UTF-16BE',
412
      'UTF-16LE',
413
      'UTF-32',
414
      'UTF-32BE',
415
      'UTF-32LE',
416
      'UNICODE-1-1-UTF-7',
417
      'UTF-7',
418
      'CSUNICODE11UTF7',
419
      'UCS-2-INTERNAL',
420
      'UCS-2-SWAPPED',
421
      'UCS-4-INTERNAL',
422
      'UCS-4-SWAPPED',
423
      'C99',
424
      'JAVA',
425
      'CP819',
426
      'IBM819',
427
      'ISO-8859-1',
428
      'ISO-IR-100',
429
      'ISO8859-1',
430
      'ISO_8859-1',
431
      'ISO_8859-1:1987',
432
      'L1',
433
      'LATIN1',
434
      'CSISOLATIN1',
435
      'ISO-8859-2',
436
      'ISO-IR-101',
437
      'ISO8859-2',
438
      'ISO_8859-2',
439
      'ISO_8859-2:1987',
440
      'L2',
441
      'LATIN2',
442
      'CSISOLATIN2',
443
      'ISO-8859-3',
444
      'ISO-IR-109',
445
      'ISO8859-3',
446
      'ISO_8859-3',
447
      'ISO_8859-3:1988',
448
      'L3',
449
      'LATIN3',
450
      'CSISOLATIN3',
451
      'ISO-8859-4',
452
      'ISO-IR-110',
453
      'ISO8859-4',
454
      'ISO_8859-4',
455
      'ISO_8859-4:1988',
456
      'L4',
457
      'LATIN4',
458
      'CSISOLATIN4',
459
      'CYRILLIC',
460
      'ISO-8859-5',
461
      'ISO-IR-144',
462
      'ISO8859-5',
463
      'ISO_8859-5',
464
      'ISO_8859-5:1988',
465
      'CSISOLATINCYRILLIC',
466
      'ARABIC',
467
      'ASMO-708',
468
      'ECMA-114',
469
      'ISO-8859-6',
470
      'ISO-IR-127',
471
      'ISO8859-6',
472
      'ISO_8859-6',
473
      'ISO_8859-6:1987',
474
      'CSISOLATINARABIC',
475
      'ECMA-118',
476
      'ELOT_928',
477
      'GREEK',
478
      'GREEK8',
479
      'ISO-8859-7',
480
      'ISO-IR-126',
481
      'ISO8859-7',
482
      'ISO_8859-7',
483
      'ISO_8859-7:1987',
484
      'ISO_8859-7:2003',
485
      'CSISOLATINGREEK',
486
      'HEBREW',
487
      'ISO-8859-8',
488
      'ISO-IR-138',
489
      'ISO8859-8',
490
      'ISO_8859-8',
491
      'ISO_8859-8:1988',
492
      'CSISOLATINHEBREW',
493
      'ISO-8859-9',
494
      'ISO-IR-148',
495
      'ISO8859-9',
496
      'ISO_8859-9',
497
      'ISO_8859-9:1989',
498
      'L5',
499
      'LATIN5',
500
      'CSISOLATIN5',
501
      'ISO-8859-10',
502
      'ISO-IR-157',
503
      'ISO8859-10',
504
      'ISO_8859-10',
505
      'ISO_8859-10:1992',
506
      'L6',
507
      'LATIN6',
508
      'CSISOLATIN6',
509
      'ISO-8859-11',
510
      'ISO8859-11',
511
      'ISO_8859-11',
512
      'ISO-8859-13',
513
      'ISO-IR-179',
514
      'ISO8859-13',
515
      'ISO_8859-13',
516
      'L7',
517
      'LATIN7',
518
      'ISO-8859-14',
519
      'ISO-CELTIC',
520
      'ISO-IR-199',
521
      'ISO8859-14',
522
      'ISO_8859-14',
523
      'ISO_8859-14:1998',
524
      'L8',
525
      'LATIN8',
526
      'ISO-8859-15',
527
      'ISO-IR-203',
528
      'ISO8859-15',
529
      'ISO_8859-15',
530
      'ISO_8859-15:1998',
531
      'LATIN-9',
532
      'ISO-8859-16',
533
      'ISO-IR-226',
534
      'ISO8859-16',
535
      'ISO_8859-16',
536
      'ISO_8859-16:2001',
537
      'L10',
538
      'LATIN10',
539
      'KOI8-R',
540
      'CSKOI8R',
541
      'KOI8-U',
542
      'KOI8-RU',
543
      'CP1250',
544
      'MS-EE',
545
      'WINDOWS-1250',
546
      'CP1251',
547
      'MS-CYRL',
548
      'WINDOWS-1251',
549
      'CP1252',
550
      'MS-ANSI',
551
      'WINDOWS-1252',
552
      'CP1253',
553
      'MS-GREEK',
554
      'WINDOWS-1253',
555
      'CP1254',
556
      'MS-TURK',
557
      'WINDOWS-1254',
558
      'CP1255',
559
      'MS-HEBR',
560
      'WINDOWS-1255',
561
      'CP1256',
562
      'MS-ARAB',
563
      'WINDOWS-1256',
564
      'CP1257',
565
      'WINBALTRIM',
566
      'WINDOWS-1257',
567
      'CP1258',
568
      'WINDOWS-1258',
569
      '850',
570
      'CP850',
571
      'IBM850',
572
      'CSPC850MULTILINGUAL',
573
      '862',
574
      'CP862',
575
      'IBM862',
576
      'CSPC862LATINHEBREW',
577
      '866',
578
      'CP866',
579
      'IBM866',
580
      'CSIBM866',
581
      'MAC',
582
      'MACINTOSH',
583
      'MACROMAN',
584
      'CSMACINTOSH',
585
      'MACCENTRALEUROPE',
586
      'MACICELAND',
587
      'MACCROATIAN',
588
      'MACROMANIA',
589
      'MACCYRILLIC',
590
      'MACUKRAINE',
591
      'MACGREEK',
592
      'MACTURKISH',
593
      'MACHEBREW',
594
      'MACARABIC',
595
      'MACTHAI',
596
      'HP-ROMAN8',
597
      'R8',
598
      'ROMAN8',
599
      'CSHPROMAN8',
600
      'NEXTSTEP',
601
      'ARMSCII-8',
602
      'GEORGIAN-ACADEMY',
603
      'GEORGIAN-PS',
604
      'KOI8-T',
605
      'CP154',
606
      'CYRILLIC-ASIAN',
607
      'PT154',
608
      'PTCP154',
609
      'CSPTCP154',
610
      'KZ-1048',
611
      'RK1048',
612
      'STRK1048-2002',
613
      'CSKZ1048',
614
      'MULELAO-1',
615
      'CP1133',
616
      'IBM-CP1133',
617
      'ISO-IR-166',
618
      'TIS-620',
619
      'TIS620',
620
      'TIS620-0',
621
      'TIS620.2529-1',
622
      'TIS620.2533-0',
623
      'TIS620.2533-1',
624
      'CP874',
625
      'WINDOWS-874',
626
      'VISCII',
627
      'VISCII1.1-1',
628
      'CSVISCII',
629
      'TCVN',
630
      'TCVN-5712',
631
      'TCVN5712-1',
632
      'TCVN5712-1:1993',
633
      'ISO-IR-14',
634
      'ISO646-JP',
635
      'JIS_C6220-1969-RO',
636
      'JP',
637
      'CSISO14JISC6220RO',
638
      'JISX0201-1976',
639
      'JIS_X0201',
640
      'X0201',
641
      'CSHALFWIDTHKATAKANA',
642
      'ISO-IR-87',
643
      'JIS0208',
644
      'JIS_C6226-1983',
645
      'JIS_X0208',
646
      'JIS_X0208-1983',
647
      'JIS_X0208-1990',
648
      'X0208',
649
      'CSISO87JISX0208',
650
      'ISO-IR-159',
651
      'JIS_X0212',
652
      'JIS_X0212-1990',
653
      'JIS_X0212.1990-0',
654
      'X0212',
655
      'CSISO159JISX02121990',
656
      'CN',
657
      'GB_1988-80',
658
      'ISO-IR-57',
659
      'ISO646-CN',
660
      'CSISO57GB1988',
661
      'CHINESE',
662
      'GB_2312-80',
663
      'ISO-IR-58',
664
      'CSISO58GB231280',
665
      'CN-GB-ISOIR165',
666
      'ISO-IR-165',
667
      'ISO-IR-149',
668
      'KOREAN',
669
      'KSC_5601',
670
      'KS_C_5601-1987',
671
      'KS_C_5601-1989',
672
      'CSKSC56011987',
673
      'EUC-JP',
674
      'EUCJP',
675
      'EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE',
676
      'CSEUCPKDFMTJAPANESE',
677
      'MS_KANJI',
678
      'SHIFT-JIS',
679
      'SHIFT_JIS',
680
      'SJIS',
681
      'CSSHIFTJIS',
682
      'CP932',
683
      'ISO-2022-JP',
684
      'CSISO2022JP',
685
      'ISO-2022-JP-1',
686
      'ISO-2022-JP-2',
687
      'CSISO2022JP2',
688
      'CN-GB',
689
      'EUC-CN',
690
      'EUCCN',
691
      'GB2312',
692
      'CSGB2312',
693
      'GBK',
694
      'CP936',
695
      'MS936',
696
      'WINDOWS-936',
697
      'GB18030',
698
      'ISO-2022-CN',
699
      'CSISO2022CN',
700
      'ISO-2022-CN-EXT',
701
      'HZ',
702
      'HZ-GB-2312',
703
      'EUC-TW',
704
      'EUCTW',
705
      'CSEUCTW',
706
      'BIG-5',
707
      'BIG-FIVE',
708
      'BIG5',
709
      'BIGFIVE',
710
      'CN-BIG5',
711
      'CSBIG5',
712
      'CP950',
713
      'BIG5-HKSCS:1999',
714
      'BIG5-HKSCS:2001',
715
      'BIG5-HKSCS',
716
      'BIG5-HKSCS:2004',
717
      'BIG5HKSCS',
718
      'EUC-KR',
719
      'EUCKR',
720
      'CSEUCKR',
721
      'CP949',
722
      'UHC',
723
      'CP1361',
724
      'JOHAB',
725
      'ISO-2022-KR',
726
      'CSISO2022KR',
727
      'CP856',
728
      'CP922',
729
      'CP943',
730
      'CP1046',
731
      'CP1124',
732
      'CP1129',
733
      'CP1161',
734
      'IBM-1161',
735
      'IBM1161',
736
      'CSIBM1161',
737
      'CP1162',
738
      'IBM-1162',
739
      'IBM1162',
740
      'CSIBM1162',
741
      'CP1163',
742
      'IBM-1163',
743
      'IBM1163',
744
      'CSIBM1163',
745
      'DEC-KANJI',
746
      'DEC-HANYU',
747
      '437',
748
      'CP437',
749
      'IBM437',
750
      'CSPC8CODEPAGE437',
751
      'CP737',
752
      'CP775',
753
      'IBM775',
754
      'CSPC775BALTIC',
755
      '852',
756
      'CP852',
757
      'IBM852',
758
      'CSPCP852',
759
      'CP853',
760
      '855',
761
      'CP855',
762
      'IBM855',
763
      'CSIBM855',
764
      '857',
765
      'CP857',
766
      'IBM857',
767
      'CSIBM857',
768
      'CP858',
769
      '860',
770
      'CP860',
771
      'IBM860',
772
      'CSIBM860',
773
      '861',
774
      'CP-IS',
775
      'CP861',
776
      'IBM861',
777
      'CSIBM861',
778
      '863',
779
      'CP863',
780
      'IBM863',
781
      'CSIBM863',
782
      'CP864',
783
      'IBM864',
784
      'CSIBM864',
785
      '865',
786
      'CP865',
787
      'IBM865',
788
      'CSIBM865',
789
      '869',
790
      'CP-GR',
791
      'CP869',
792
      'IBM869',
793
      'CSIBM869',
794
      'CP1125',
795
      'EUC-JISX0213',
796
      'SHIFT_JISX0213',
797
      'ISO-2022-JP-3',
798
      'BIG5-2003',
799
      'ISO-IR-230',
800
      'TDS565',
801
      'ATARI',
802
      'ATARIST',
803
      'RISCOS-LATIN1',
804
  );
805
806
  /**
807 1
   * @var array
808
   */
809 1
  private static $SUPPORT = array();
810 1
811
  /**
812
   * __construct()
813
   */
814
  public function __construct()
815
  {
816
    self::checkForSupport();
817
  }
818
819
  /**
820 2
   * Return the character at the specified position: $str[1] like functionality.
821
   *
822 2
   * @param string $str <p>A UTF-8 string.</p>
823
   * @param int    $pos <p>The position of character to return.</p>
824
   *
825
   * @return string <p>Single Multi-Byte character.</p>
826
   */
827
  public static function access($str, $pos)
828
  {
829
    $str = (string)$str;
830
    $pos = (int)$pos;
831
832
    if (!isset($str[0])) {
833
      return '';
834 1
    }
835
836 1
    if ($pos < 0) {
837 1
      return '';
838 1
    }
839
840 1
    return self::substr($str, $pos, 1);
841
  }
842
843
  /**
844
   * Prepends UTF-8 BOM character to the string and returns the whole string.
845
   *
846
   * INFO: If BOM already existed there, the Input string is returned.
847
   *
848
   * @param string $str <p>The input string.</p>
849
   *
850 1
   * @return string <p>The output string that contains BOM.</p>
851
   */
852 1
  public static function add_bom_to_string($str)
853
  {
854
    if (self::string_has_bom($str) === false) {
855
      $str = self::bom() . $str;
856
    }
857
858
    return $str;
859
  }
860 2
861
  /**
862 2
   * Convert binary into an string.
863
   *
864
   * @param mixed $bin 1|0
865
   *
866
   * @return string
867
   */
868
  public static function binary_to_str($bin)
869
  {
870
    if (!isset($bin[0])) {
871
      return '';
872
    }
873
874 1
    return pack('H*', base_convert($bin, 2, 16));
875
  }
876 1
877
  /**
878
   * Returns the UTF-8 Byte Order Mark Character.
879
   *
880
   * INFO: take a look at UTF8::$bom for e.g. UTF-16 and UTF-32 BOM values
881
   *
882
   * @return string UTF-8 Byte Order Mark
883
   */
884 2
  public static function bom()
885
  {
886 2
    return "\xef\xbb\xbf";
887
  }
888 1
889
  /**
890 1
   * @alias of UTF8::chr_map()
891 1
   *
892 1
   * @see   UTF8::chr_map()
893 1
   *
894 1
   * @param string|array $callback
895 1
   * @param string       $str
896 2
   *
897
   * @return array
898
   */
899
  public static function callback($callback, $str)
900
  {
901
    return self::chr_map($callback, $str);
902
  }
903
904
  /**
905
   * This method will auto-detect your server environment for UTF-8 support.
906
   *
907 9
   * INFO: You don't need to run it manually, it will be triggered if it's needed.
908
   */
909 9
  public static function checkForSupport()
910 9
  {
911 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
912
913
      self::$SUPPORT['already_checked_via_portable_utf8'] = true;
914 9
915
      // http://php.net/manual/en/book.mbstring.php
916
      self::$SUPPORT['mbstring'] = self::mbstring_loaded();
917
918 9
      if (
919
          defined('MB_OVERLOAD_STRING')
920
          &&
921
          ini_get('mbstring.func_overload') & MB_OVERLOAD_STRING
922
      ) {
923 9
        self::$SUPPORT['mbstring_func_overload'] = true;
924 9
      } else {
925 8
        self::$SUPPORT['mbstring_func_overload'] = false;
926
      }
927
928
      // http://php.net/manual/en/book.iconv.php
929 8
      self::$SUPPORT['iconv'] = self::iconv_loaded();
930 6
931
      // http://php.net/manual/en/book.intl.php
932
      self::$SUPPORT['intl'] = self::intl_loaded();
933 7
934 6
      // http://php.net/manual/en/class.intlchar.php
935 6
      self::$SUPPORT['intlChar'] = self::intlChar_loaded();
936
937
      // http://php.net/manual/en/book.pcre.php
938 7
      self::$SUPPORT['pcre_utf8'] = self::pcre_utf8_support();
939 7
    }
940 7
  }
941 7
942
  /**
943
   * Check for php-support.
944 1
   *
945 1
   * @param string|null $key
946 1
   *
947 1
   * @return bool[]|bool|null return the full support-array, if $key === null<br />
948 1
   *                          return bool-value, if $key is used and available<br />
949
   *                          otherwise return null
950
   */
951
  public static function getSupportInfo($key = null)
952
  {
953
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
954
      self::checkForSupport();
955
    }
956
957
    if ($key === null) {
958
      return self::$SUPPORT;
0 ignored issues
show
Bug Best Practice introduced by
The return type of return self::$SUPPORT; (array) is incompatible with the return type documented by voku\helper\UTF8::getSupportInfo of type boolean[]|boolean|null.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
959
    }
960
961
    if (!isset(self::$SUPPORT[$key])) {
962
      return null;
963 1
    }
964
965 1
    return self::$SUPPORT[$key];
966
  }
967 1
968
  /**
969
   * Generates a UTF-8 encoded character from the given code point.
970
   *
971
   * INFO: opposite to UTF8::ord()
972
   *
973
   * @param int    $code_point <p>The code point for which to generate a character.</p>
974
   * @param string $encoding   [optional] <p>Default is UTF-8</p>
975
   *
976
   * @return string|null <p>Multi-Byte character, returns null on failure or empty input.</p>
977
   */
978
  public static function chr($code_point, $encoding = 'UTF-8')
979
  {
980
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
981
      self::checkForSupport();
982 4
    }
983
984 4
    if ($encoding !== 'UTF-8') {
985 3
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
986
    } elseif (self::$SUPPORT['intlChar'] === true) {
987
      return \IntlChar::chr($code_point);
988 4
    }
989
990
    // check type of code_point, only if there is no support for "\IntlChar"
991
    $i = (int)$code_point;
992
    if ($i !== $code_point) {
993
      return null;
994
    }
995
996
    // use static cache, only if there is no support for "\IntlChar"
997
    static $CHAR_CACHE = array();
998 2
    $cacheKey = $code_point . $encoding;
999
    if (isset($CHAR_CACHE[$cacheKey]) === true) {
1000 2
      return $CHAR_CACHE[$cacheKey];
1001 2
    }
1002 2
1003
    if (0x80 > $code_point %= 0x200000) {
1004 2
      $str = self::chr_and_parse_int($code_point);
1005
    } elseif (0x800 > $code_point) {
1006 2
      $str = self::chr_and_parse_int(0xC0 | $code_point >> 6) .
1007
             self::chr_and_parse_int(0x80 | $code_point & 0x3F);
1008
    } elseif (0x10000 > $code_point) {
1009 2
      $str = self::chr_and_parse_int(0xE0 | $code_point >> 12) .
1010
             self::chr_and_parse_int(0x80 | $code_point >> 6 & 0x3F) .
1011 2
             self::chr_and_parse_int(0x80 | $code_point & 0x3F);
1012 2
    } else {
1013 2
      $str = self::chr_and_parse_int(0xF0 | $code_point >> 18) .
1014
             self::chr_and_parse_int(0x80 | $code_point >> 12 & 0x3F) .
1015 1
             self::chr_and_parse_int(0x80 | $code_point >> 6 & 0x3F) .
1016 1
             self::chr_and_parse_int(0x80 | $code_point & 0x3F);
1017 1
    }
1018
1019
    if ($encoding !== 'UTF-8') {
1020
      $str = \mb_convert_encoding($str, $encoding, 'UTF-8');
1021
    }
1022
1023 2
    // add into static cache
1024
    $CHAR_CACHE[$cacheKey] = $str;
1025 2
1026 2
    return $str;
1027
  }
1028 2
1029
  /**
1030
   * @param int $int
1031
   *
1032
   * @return string
1033
   */
1034
  private static function chr_and_parse_int($int)
1035
  {
1036
    return chr((int)$int);
1037
  }
1038
1039 1
  /**
1040
   * Applies callback to all characters of a string.
1041 1
   *
1042
   * @param string|array $callback <p>The callback function.</p>
1043
   * @param string       $str      <p>UTF-8 string to run callback on.</p>
1044
   *
1045
   * @return array <p>The outcome of callback.</p>
1046
   */
1047
  public static function chr_map($callback, $str)
1048
  {
1049
    $chars = self::split($str);
1050
1051
    return array_map($callback, $chars);
1052
  }
1053 1
1054
  /**
1055 1
   * Generates an array of byte length of each character of a Unicode string.
1056
   *
1057
   * 1 byte => U+0000  - U+007F
1058
   * 2 byte => U+0080  - U+07FF
1059
   * 3 byte => U+0800  - U+FFFF
1060
   * 4 byte => U+10000 - U+10FFFF
1061
   *
1062
   * @param string $str <p>The original Unicode string.</p>
1063
   *
1064
   * @return array <p>An array of byte lengths of each character.</p>
1065
   */
1066
  public static function chr_size_list($str)
1067
  {
1068
    $str = (string)$str;
1069
1070
    if (!isset($str[0])) {
1071 44
      return array();
1072
    }
1073
1074
    return array_map(
1075
        function ($data) {
1076
          return UTF8::strlen($data, '8BIT');
1077
        },
1078
        self::split($str)
1079
    );
1080
  }
1081
1082
  /**
1083
   * Get a decimal code representation of a specific character.
1084
   *
1085
   * @param string $char <p>The input character.</p>
1086 44
   *
1087 44
   * @return int
1088
   */
1089 44
  public static function chr_to_decimal($char)
1090 44
  {
1091
    $char = (string)$char;
1092 44
    $code = self::ord($char[0]);
1093 17
    $bytes = 1;
1094 17
1095
    if (!($code & 0x80)) {
1096 44
      // 0xxxxxxx
1097 12
      return $code;
1098 12
    }
1099
1100 44
    if (($code & 0xe0) === 0xc0) {
1101 5
      // 110xxxxx
1102 5
      $bytes = 2;
1103
      $code &= ~0xc0;
1104 44
    } elseif (($code & 0xf0) === 0xe0) {
1105
      // 1110xxxx
1106
      $bytes = 3;
1107
      $code &= ~0xe0;
1108
    } elseif (($code & 0xf8) === 0xf0) {
1109
      // 11110xxx
1110
      $bytes = 4;
1111
      $code &= ~0xf0;
1112
    }
1113
1114 4
    for ($i = 2; $i <= $bytes; $i++) {
1115
      // 10xxxxxx
1116 4
      $code = ($code << 6) + (self::ord($char[$i - 1]) & ~0x80);
1117
    }
1118 4
1119 1
    return $code;
1120
  }
1121
1122
  /**
1123 4
   * Get hexadecimal code point (U+xxxx) of a UTF-8 encoded character.
1124
   *
1125
   * @param string $char <p>The input character</p>
1126
   * @param string $pfix [optional]
1127
   *
1128
   * @return string <p>The code point encoded as U+xxxx<p>
1129
   */
1130 4
  public static function chr_to_hex($char, $pfix = 'U+')
1131
  {
1132 4
    $char = (string)$char;
1133
1134
    if (!isset($char[0])) {
1135
      return '';
1136
    }
1137
1138
    if ($char === '&#0;') {
1139
      $char = '';
1140
    }
1141
1142
    return self::int_to_hex(self::ord($char), $pfix);
1143
  }
1144
1145
  /**
1146 5
   * alias for "UTF8::chr_to_decimal()"
1147
   *
1148 5
   * @see UTF8::chr_to_decimal()
1149 5
   *
1150 5
   * @param string $chr
1151
   *
1152 5
   * @return int
1153
   */
1154 5
  public static function chr_to_int($chr)
1155 5
  {
1156 5
    return self::chr_to_decimal($chr);
1157
  }
1158 5
1159
  /**
1160 5
   * Splits a string into smaller chunks and multiple lines, using the specified line ending character.
1161 1
   *
1162
   * @param string $body     <p>The original string to be split.</p>
1163 1
   * @param int    $chunklen [optional] <p>The maximum character length of a chunk.</p>
1164 1
   * @param string $end      [optional] <p>The character(s) to be inserted at the end of each chunk.</p>
1165 1
   *
1166
   * @return string <p>The chunked string</p>
1167 1
   */
1168 1
  public static function chunk_split($body, $chunklen = 76, $end = "\r\n")
1169
  {
1170 5
    return implode($end, self::split($body, $chunklen));
1171
  }
1172
1173
  /**
1174
   * Accepts a string and removes all non-UTF-8 characters from it + extras if needed.
1175
   *
1176
   * @param string $str                     <p>The string to be sanitized.</p>
1177
   * @param bool   $remove_bom              [optional] <p>Set to true, if you need to remove UTF-BOM.</p>
1178
   * @param bool   $normalize_whitespace    [optional] <p>Set to true, if you need to normalize the whitespace.</p>
1179
   * @param bool   $normalize_msword        [optional] <p>Set to true, if you need to normalize MS Word chars e.g.: "…"
1180
   *                                        => "..."</p>
1181
   * @param bool   $keep_non_breaking_space [optional] <p>Set to true, to keep non-breaking-spaces, in combination with
1182 6
   *                                        $normalize_whitespace</p>
1183
   *
1184 6
   * @return string <p>Clean UTF-8 encoded string.</p>
1185
   */
1186
  public static function clean($str, $remove_bom = false, $normalize_whitespace = false, $normalize_msword = false, $keep_non_breaking_space = false)
1187
  {
1188
    // http://stackoverflow.com/questions/1401317/remove-non-utf8-characters-from-string
1189
    // caused connection reset problem on larger strings
1190
1191
    $regx = '/
1192
      (
1193
        (?: [\x00-\x7F]               # single-byte sequences   0xxxxxxx
1194 1
        |   [\xC0-\xDF][\x80-\xBF]    # double-byte sequences   110xxxxx 10xxxxxx
1195
        |   [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences   1110xxxx 10xxxxxx * 2
1196 1
        |   [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3
1197 1
        ){1,100}                      # ...one or more times
1198 1
      )
1199
    | ( [\x80-\xBF] )                 # invalid byte in range 10000000 - 10111111
1200 1
    | ( [\xC0-\xFF] )                 # invalid byte in range 11000000 - 11111111
1201
    /x';
1202
    $str = preg_replace($regx, '$1', $str);
1203
1204
    $str = self::replace_diamond_question_mark($str, '');
1205
    $str = self::remove_invisible_characters($str);
1206
1207
    if ($normalize_whitespace === true) {
1208
      $str = self::normalize_whitespace($str, $keep_non_breaking_space);
1209
    }
1210
1211
    if ($normalize_msword === true) {
1212
      $str = self::normalize_msword($str);
1213
    }
1214
1215
    if ($remove_bom === true) {
1216 11
      $str = self::remove_bom($str);
1217
    }
1218 11
1219 11
    return $str;
1220
  }
1221 11
1222 5
  /**
1223
   * Clean-up a and show only printable UTF-8 chars at the end  + fix UTF-8 encoding.
1224
   *
1225 11
   * @param string $str <p>The input string.</p>
1226 1
   *
1227 1
   * @return string
1228
   */
1229 11
  public static function cleanup($str)
1230
  {
1231
    $str = (string)$str;
1232
1233 11
    if (!isset($str[0])) {
1234
      return '';
1235
    }
1236 11
1237
    // fixed ISO <-> UTF-8 Errors
1238 1
    $str = self::fix_simple_utf8($str);
1239 11
1240
    // remove all none UTF-8 symbols
1241
    // && remove diamond question mark (�)
1242
    // && remove remove invisible characters (e.g. "\0")
1243 11
    // && remove BOM
1244
    // && normalize whitespace chars (but keep non-breaking-spaces)
1245
    $str = self::clean($str, true, true, false, true);
1246 11
1247 1
    return (string)$str;
1248 1
  }
1249 1
1250 11
  /**
1251 11
   * Accepts a string or a array of strings and returns an array of Unicode code points.
1252
   *
1253
   * INFO: opposite to UTF8::string()
1254
   *
1255
   * @param string|string[] $arg        <p>A UTF-8 encoded string or an array of such strings.</p>
1256 2
   * @param bool            $u_style    <p>If True, will return code points in U+xxxx format,
1257
   *                                    default, code points will be returned as integers.</p>
1258
   *
1259 1
   * @return array <p>The array of code points.</p>
1260
   */
1261
  public static function codepoints($arg, $u_style = false)
1262 2
  {
1263 1
    if (is_string($arg) === true) {
1264
      $arg = self::split($arg);
1265
    }
1266 2
1267 2
    $arg = array_map(
1268 2
        array(
1269
            '\\voku\\helper\\UTF8',
1270 2
            'ord',
1271
        ),
1272 2
        $arg
1273 2
    );
1274
1275
    if ($u_style) {
1276
      $arg = array_map(
1277 1
          array(
1278
              '\\voku\\helper\\UTF8',
1279
              'int_to_hex',
1280
          ),
1281
          $arg
1282
      );
1283
    }
1284
1285
    return $arg;
1286
  }
1287
1288
  /**
1289
   * Returns count of characters used in a string.
1290
   *
1291
   * @param string $str       <p>The input string.</p>
1292
   * @param bool   $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
1293
   *
1294
   * @return array <p>An associative array of Character as keys and
1295
   *               their count as values.</p>
1296
   */
1297
  public static function count_chars($str, $cleanUtf8 = false)
1298
  {
1299
    return array_count_values(self::split($str, 1, $cleanUtf8));
1300
  }
1301
1302
  /**
1303
   * Converts a int-value into an UTF-8 character.
1304
   *
1305
   * @param mixed $int
1306
   *
1307
   * @return string
1308
   */
1309
  public static function decimal_to_chr($int)
1310
  {
1311
    if (Bootup::is_php('5.4') === true) {
1312
      $flags = ENT_QUOTES | ENT_HTML5;
1313
    } else {
1314
      $flags = ENT_QUOTES;
1315
    }
1316
1317
    return self::html_entity_decode('&#' . $int . ';', $flags);
1318
  }
1319
1320
  /**
1321
   * Encode a string with a new charset-encoding.
1322
   *
1323
   * INFO:  The different to "UTF8::utf8_encode()" is that this function, try to fix also broken / double encoding,
1324
   *        so you can call this function also on a UTF-8 String and you don't mess the string.
1325
   *
1326
   * @param string $encoding <p>e.g. 'UTF-8', 'ISO-8859-1', etc.</p>
1327
   * @param string $str      <p>The input string</p>
1328
   * @param bool   $force    [optional] <p>Force the new encoding (we try to fix broken / double encoding for UTF-8)<br
1329
   *                         /> otherwise we auto-detect the current string-encoding</p>
1330
   *
1331
   * @return string
1332
   */
1333
  public static function encode($encoding, $str, $force = true)
1334
  {
1335
    $str = (string)$str;
1336
    $encoding = (string)$encoding;
1337
1338
    if (!isset($str[0], $encoding[0])) {
1339
      return $str;
1340
    }
1341
1342
    if ($encoding !== 'UTF-8') {
1343
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
1344
    }
1345
1346
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
1347
      self::checkForSupport();
1348
    }
1349
1350
    $encodingDetected = self::str_detect_encoding($str);
1351
1352
    if (
1353
        $encodingDetected
0 ignored issues
show
Bug Best Practice introduced by
The expression $encodingDetected of type false|string is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
1354
        &&
1355
        (
1356
            $force === true
1357
            ||
1358
            $encodingDetected !== $encoding
1359
        )
1360
    ) {
1361
1362 2
      if (
1363
          $encoding === 'UTF-8'
1364
          &&
1365 2
          (
1366 2
              $force === true
1367
              || $encodingDetected === 'UTF-8'
1368 2
              || $encodingDetected === 'WINDOWS-1252'
1369 2
              || $encodingDetected === 'ISO-8859-1'
1370
          )
1371
      ) {
1372
        return self::to_utf8($str);
1373 2
      }
1374 2
1375
      if (
1376 2
          $encoding === 'ISO-8859-1'
1377 2
          &&
1378
          (
1379 2
              $force === true
1380 1
              || $encodingDetected === 'ISO-8859-1'
1381 1
              || $encodingDetected === 'UTF-8'
1382 2
          )
1383
      ) {
1384
        return self::to_iso8859($str);
1385
      }
1386 2
1387 1
      if (
1388
          $encoding !== 'UTF-8'
1389
          &&
1390 1
          $encoding !== 'WINDOWS-1252'
1391 1
          &&
1392 1
          self::$SUPPORT['mbstring'] === false
1393 1
      ) {
1394
        trigger_error('UTF8::encode() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
1395 1
      }
1396
1397
      $strEncoded = \mb_convert_encoding(
1398
          $str,
1399
          $encoding,
1400
          $encodingDetected
1401
      );
1402
1403
      if ($strEncoded) {
1404
        return $strEncoded;
1405 1
      }
1406
    }
1407 1
1408
    return $str;
1409
  }
1410
1411
  /**
1412
   * Reads entire file into a string.
1413
   *
1414
   * WARNING: do not use UTF-8 Option ($convertToUtf8) for binary-files (e.g.: images) !!!
1415
   *
1416
   * @link http://php.net/manual/en/function.file-get-contents.php
1417
   *
1418
   * @param string        $filename      <p>
1419 9
   *                                     Name of the file to read.
1420
   *                                     </p>
1421 9
   * @param int|false     $flags         [optional] <p>
1422 9
   *                                     Prior to PHP 6, this parameter is called
1423 3
   *                                     use_include_path and is a bool.
1424
   *                                     As of PHP 5 the FILE_USE_INCLUDE_PATH can be used
1425 3
   *                                     to trigger include path
1426 3
   *                                     search.
1427 3
   *                                     </p>
1428 9
   *                                     <p>
1429 2
   *                                     The value of flags can be any combination of
1430 2
   *                                     the following flags (with some restrictions), joined with the
1431 2
   *                                     binary OR (|)
1432 2
   *                                     operator.
1433 9
   *                                     </p>
1434
   *                                     <p>
1435 8
   *                                     <table>
1436
   *                                     Available flags
1437 2
   *                                     <tr valign="top">
1438 2
   *                                     <td>Flag</td>
1439
   *                                     <td>Description</td>
1440 8
   *                                     </tr>
1441
   *                                     <tr valign="top">
1442 8
   *                                     <td>
1443 6
   *                                     FILE_USE_INCLUDE_PATH
1444 6
   *                                     </td>
1445 6
   *                                     <td>
1446
   *                                     Search for filename in the include directory.
1447 6
   *                                     See include_path for more
1448 3
   *                                     information.
1449 3
   *                                     </td>
1450 5
   *                                     </tr>
1451
   *                                     <tr valign="top">
1452
   *                                     <td>
1453
   *                                     FILE_TEXT
1454
   *                                     </td>
1455 8
   *                                     <td>
1456 8
   *                                     As of PHP 6, the default encoding of the read
1457 5
   *                                     data is UTF-8. You can specify a different encoding by creating a
1458 8
   *                                     custom context or by changing the default using
1459
   *                                     stream_default_encoding. This flag cannot be
1460
   *                                     used with FILE_BINARY.
1461 2
   *                                     </td>
1462 2
   *                                     </tr>
1463 8
   *                                     <tr valign="top">
1464 8
   *                                     <td>
1465 9
   *                                     FILE_BINARY
1466
   *                                     </td>
1467 9
   *                                     <td>
1468
   *                                     With this flag, the file is read in binary mode. This is the default
1469
   *                                     setting and cannot be used with FILE_TEXT.
1470
   *                                     </td>
1471
   *                                     </tr>
1472
   *                                     </table>
1473
   *                                     </p>
1474
   * @param resource|null $context       [optional] <p>
1475
   *                                     A valid context resource created with
1476
   *                                     stream_context_create. If you don't need to use a
1477
   *                                     custom context, you can skip this parameter by &null;.
1478
   *                                     </p>
1479
   * @param int|null      $offset        [optional] <p>
1480
   *                                     The offset where the reading starts.
1481
   *                                     </p>
1482
   * @param int|null      $maxlen        [optional] <p>
1483
   *                                     Maximum length of data read. The default is to read until end
1484
   *                                     of file is reached.
1485
   *                                     </p>
1486
   * @param int           $timeout       <p>The time in seconds for the timeout.</p>
1487
   *
1488
   * @param boolean       $convertToUtf8 <strong>WARNING!!!</strong> <p>Maybe you can't use this option for e.g. images
1489
   *                                     or pdf, because they used non default utf-8 chars</p>
1490
   *
1491
   * @return string <p>The function returns the read data or false on failure.</p>
1492
   */
1493
  public static function file_get_contents($filename, $flags = null, $context = null, $offset = null, $maxlen = null, $timeout = 10, $convertToUtf8 = true)
1494
  {
1495
    // init
1496
    $timeout = (int)$timeout;
1497
    $filename = filter_var($filename, FILTER_SANITIZE_STRING);
1498
1499
    if ($timeout && $context === null) {
1500
      $context = stream_context_create(
1501
          array(
1502
              'http' =>
1503
                  array(
1504
                      'timeout' => $timeout,
1505
                  ),
1506
          )
1507
      );
1508
    }
1509
1510
    if (!$flags) {
1511
      $flags = false;
1512
    }
1513
1514
    if ($offset === null) {
1515
      $offset = 0;
1516
    }
1517
1518
    if (is_int($maxlen) === true) {
1519
      $data = file_get_contents($filename, $flags, $context, $offset, $maxlen);
1520 1
    } else {
1521
      $data = file_get_contents($filename, $flags, $context, $offset);
1522 1
    }
1523 1
1524 1
    // return false on error
1525 1
    if ($data === false) {
1526
      return false;
1527
    }
1528 1
1529
    if ($convertToUtf8 === true) {
1530
      $data = self::encode('UTF-8', $data, false);
1531
      $data = self::cleanup($data);
1532
    }
1533
1534
    return $data;
1535
  }
1536
1537
  /**
1538
   * Checks if a file starts with BOM (Byte Order Mark) character.
1539
   *
1540 1
   * @param string $file_path <p>Path to a valid file.</p>
1541
   *
1542 1
   * @return bool <p><strong>true</strong> if the file has BOM at the start, <strong>false</strong> otherwise.</>
1543 1
   */
1544 1
  public static function file_has_bom($file_path)
1545 1
  {
1546
    return self::string_has_bom(file_get_contents($file_path));
1547
  }
1548 1
1549
  /**
1550
   * Normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1551
   *
1552
   * @param mixed  $var
1553
   * @param int    $normalization_form
1554
   * @param string $leading_combining
1555
   *
1556
   * @return mixed
1557
   */
1558
  public static function filter($var, $normalization_form = 4 /* n::NFC */, $leading_combining = '◌')
1559 1
  {
1560
    switch (gettype($var)) {
1561 1 View Code Duplication
      case 'array':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1562
        foreach ($var as $k => $v) {
1563
          /** @noinspection AlterInForeachInspection */
1564
          $var[$k] = self::filter($v, $normalization_form, $leading_combining);
1565
        }
1566
        break;
1567 View Code Duplication
      case 'object':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1568
        foreach ($var as $k => $v) {
1569
          $var->{$k} = self::filter($v, $normalization_form, $leading_combining);
1570
        }
1571
        break;
1572
      case 'string':
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
1573
1574
        if (false !== strpos($var, "\r")) {
1575
          // Workaround https://bugs.php.net/65732
1576
          $var = str_replace(array("\r\n", "\r"), "\n", $var);
1577 7
        }
1578
1579 7
        if (self::is_ascii($var) === false) {
1580 7
          /** @noinspection PhpUndefinedClassInspection */
1581
          if (\Normalizer::isNormalized($var, $normalization_form)) {
1582 7
            $n = '-';
1583
          } else {
1584 7
            /** @noinspection PhpUndefinedClassInspection */
1585 2
            $n = \Normalizer::normalize($var, $normalization_form);
1586
1587
            if (isset($n[0])) {
1588 7
              $var = $n;
1589 1
            } else {
1590 1
              $var = self::encode('UTF-8', $var);
1591 1
            }
1592
          }
1593 7
1594
          if (
1595
              $var[0] >= "\x80"
1596
              &&
1597
              isset($n[0], $leading_combining[0])
1598
              &&
1599
              preg_match('/^\p{Mn}/u', $var)
1600
          ) {
1601
            // Prevent leading combining chars
1602
            // for NFC-safe concatenations.
1603 1
            $var = $leading_combining . $var;
1604
          }
1605 1
        }
1606
1607 1
        break;
1608
    }
1609
1610 1
    return $var;
1611 1
  }
1612
1613 1
  /**
1614
   * "filter_input()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1615
   *
1616 1
   * Gets a specific external variable by name and optionally filters it
1617 1
   *
1618 1
   * @link  http://php.net/manual/en/function.filter-input.php
1619 1
   *
1620 1
   * @param int    $type          <p>
1621
   *                              One of <b>INPUT_GET</b>, <b>INPUT_POST</b>,
1622 1
   *                              <b>INPUT_COOKIE</b>, <b>INPUT_SERVER</b>, or
1623
   *                              <b>INPUT_ENV</b>.
1624
   *                              </p>
1625
   * @param string $variable_name <p>
1626
   *                              Name of a variable to get.
1627
   *                              </p>
1628
   * @param int    $filter        [optional] <p>
1629
   *                              The ID of the filter to apply. The
1630
   *                              manual page lists the available filters.
1631
   *                              </p>
1632 1
   * @param mixed  $options       [optional] <p>
1633
   *                              Associative array of options or bitwise disjunction of flags. If filter
1634 1
   *                              accepts options, flags can be provided in "flags" field of array.
1635
   *                              </p>
1636
   *
1637
   * @return mixed Value of the requested variable on success, <b>FALSE</b> if the filter fails,
1638 1
   * or <b>NULL</b> if the <i>variable_name</i> variable is not set.
1639
   * If the flag <b>FILTER_NULL_ON_FAILURE</b> is used, it
1640
   * returns <b>FALSE</b> if the variable is not set and <b>NULL</b> if the filter fails.
1641
   * @since 5.2.0
1642
   */
1643 View Code Duplication
  public static function filter_input($type, $variable_name, $filter = FILTER_DEFAULT, $options = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1644
  {
1645
    if (4 > func_num_args()) {
1646
      $var = filter_input($type, $variable_name, $filter);
1647
    } else {
1648
      $var = filter_input($type, $variable_name, $filter, $options);
1649
    }
1650
1651
    return self::filter($var);
1652
  }
1653
1654 1
  /**
1655
   * "filter_input_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1656 1
   *
1657 1
   * Gets external variables and optionally filters them
1658
   *
1659
   * @link  http://php.net/manual/en/function.filter-input-array.php
1660 1
   *
1661
   * @param int   $type       <p>
1662 1
   *                          One of <b>INPUT_GET</b>, <b>INPUT_POST</b>,
1663 1
   *                          <b>INPUT_COOKIE</b>, <b>INPUT_SERVER</b>, or
1664 1
   *                          <b>INPUT_ENV</b>.
1665 1
   *                          </p>
1666 1
   * @param mixed $definition [optional] <p>
1667 1
   *                          An array defining the arguments. A valid key is a string
1668 1
   *                          containing a variable name and a valid value is either a filter type, or an array
1669 1
   *                          optionally specifying the filter, flags and options. If the value is an
1670 1
   *                          array, valid keys are filter which specifies the
1671 1
   *                          filter type,
1672 1
   *                          flags which specifies any flags that apply to the
1673
   *                          filter, and options which specifies any options that
1674
   *                          apply to the filter. See the example below for a better understanding.
1675
   *                          </p>
1676
   *                          <p>
1677
   *                          This parameter can be also an integer holding a filter constant. Then all values in the
1678
   *                          input array are filtered by this filter.
1679
   *                          </p>
1680
   * @param bool  $add_empty  [optional] <p>
1681
   *                          Add missing keys as <b>NULL</b> to the return value.
1682
   *                          </p>
1683
   *
1684
   * @return mixed An array containing the values of the requested variables on success, or <b>FALSE</b>
1685
   * on failure. An array value will be <b>FALSE</b> if the filter fails, or <b>NULL</b> if
1686
   * the variable is not set. Or if the flag <b>FILTER_NULL_ON_FAILURE</b>
1687
   * is used, it returns <b>FALSE</b> if the variable is not set and <b>NULL</b> if the filter
1688
   * fails.
1689
   * @since 5.2.0
1690
   */
1691 View Code Duplication
  public static function filter_input_array($type, $definition = null, $add_empty = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1692 1
  {
1693 1
    if (2 > func_num_args()) {
1694
      $a = filter_input_array($type);
1695
    } else {
1696
      $a = filter_input_array($type, $definition, $add_empty);
1697
    }
1698
1699
    return self::filter($a);
1700
  }
1701
1702
  /**
1703
   * "filter_var()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1704
   *
1705
   * Filters a variable with a specified filter
1706
   *
1707
   * @link  http://php.net/manual/en/function.filter-var.php
1708
   *
1709
   * @param mixed $variable <p>
1710
   *                        Value to filter.
1711
   *                        </p>
1712
   * @param int   $filter   [optional] <p>
1713
   *                        The ID of the filter to apply. The
1714
   *                        manual page lists the available filters.
1715
   *                        </p>
1716
   * @param mixed $options  [optional] <p>
1717
   *                        Associative array of options or bitwise disjunction of flags. If filter
1718
   *                        accepts options, flags can be provided in "flags" field of array. For
1719
   *                        the "callback" filter, callable type should be passed. The
1720
   *                        callback must accept one argument, the value to be filtered, and return
1721
   *                        the value after filtering/sanitizing it.
1722
   *                        </p>
1723
   *                        <p>
1724
   *                        <code>
1725
   *                        // for filters that accept options, use this format
1726
   *                        $options = array(
1727
   *                        'options' => array(
1728
   *                        'default' => 3, // value to return if the filter fails
1729
   *                        // other options here
1730
   *                        'min_range' => 0
1731
   *                        ),
1732
   *                        'flags' => FILTER_FLAG_ALLOW_OCTAL,
1733
   *                        );
1734
   *                        $var = filter_var('0755', FILTER_VALIDATE_INT, $options);
1735
   *                        // for filter that only accept flags, you can pass them directly
1736
   *                        $var = filter_var('oops', FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE);
1737
   *                        // for filter that only accept flags, you can also pass as an array
1738
   *                        $var = filter_var('oops', FILTER_VALIDATE_BOOLEAN,
1739
   *                        array('flags' => FILTER_NULL_ON_FAILURE));
1740
   *                        // callback validate filter
1741
   *                        function foo($value)
1742
   *                        {
1743
   *                        // Expected format: Surname, GivenNames
1744
   *                        if (strpos($value, ", ") === false) return false;
1745
   *                        list($surname, $givennames) = explode(", ", $value, 2);
1746
   *                        $empty = (empty($surname) || empty($givennames));
1747
   *                        $notstrings = (!is_string($surname) || !is_string($givennames));
1748
   *                        if ($empty || $notstrings) {
1749
   *                        return false;
1750
   *                        } else {
1751
   *                        return $value;
1752 1
   *                        }
1753
   *                        }
1754 1
   *                        $var = filter_var('Doe, Jane Sue', FILTER_CALLBACK, array('options' => 'foo'));
1755 1
   *                        </code>
1756
   *                        </p>
1757 1
   *
1758
   * @return mixed the filtered data, or <b>FALSE</b> if the filter fails.
1759
   * @since 5.2.0
1760
   */
1761 View Code Duplication
  public static function filter_var($variable, $filter = FILTER_DEFAULT, $options = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1762
  {
1763
    if (3 > func_num_args()) {
1764
      $variable = filter_var($variable, $filter);
1765
    } else {
1766
      $variable = filter_var($variable, $filter, $options);
1767
    }
1768
1769
    return self::filter($variable);
1770
  }
1771
1772 1
  /**
1773
   * "filter_var_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1774 1
   *
1775
   * Gets multiple variables and optionally filters them
1776
   *
1777
   * @link  http://php.net/manual/en/function.filter-var-array.php
1778
   *
1779
   * @param array $data       <p>
1780
   *                          An array with string keys containing the data to filter.
1781
   *                          </p>
1782
   * @param mixed $definition [optional] <p>
1783
   *                          An array defining the arguments. A valid key is a string
1784
   *                          containing a variable name and a valid value is either a
1785
   *                          filter type, or an
1786 1
   *                          array optionally specifying the filter, flags and options.
1787
   *                          If the value is an array, valid keys are filter
1788 1
   *                          which specifies the filter type,
1789 1
   *                          flags which specifies any flags that apply to the
1790
   *                          filter, and options which specifies any options that
1791
   *                          apply to the filter. See the example below for a better understanding.
1792 1
   *                          </p>
1793 1
   *                          <p>
1794
   *                          This parameter can be also an integer holding a filter constant. Then all values in the
1795
   *                          input array are filtered by this filter.
1796 1
   *                          </p>
1797
   * @param bool  $add_empty  [optional] <p>
1798
   *                          Add missing keys as <b>NULL</b> to the return value.
1799
   *                          </p>
1800
   *
1801
   * @return mixed An array containing the values of the requested variables on success, or <b>FALSE</b>
1802
   * on failure. An array value will be <b>FALSE</b> if the filter fails, or <b>NULL</b> if
1803
   * the variable is not set.
1804
   * @since 5.2.0
1805
   */
1806 View Code Duplication
  public static function filter_var_array($data, $definition = null, $add_empty = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1807
  {
1808
    if (2 > func_num_args()) {
1809
      $a = filter_var_array($data);
1810 1
    } else {
1811
      $a = filter_var_array($data, $definition, $add_empty);
1812 1
    }
1813
1814
    return self::filter($a);
1815
  }
1816
1817
  /**
1818
   * Check if the number of unicode characters are not more than the specified integer.
1819
   *
1820
   * @param string $str      The original string to be checked.
1821
   * @param int    $box_size The size in number of chars to be checked against string.
1822
   *
1823
   * @return bool true if string is less than or equal to $box_size, false otherwise.
1824
   */
1825
  public static function fits_inside($str, $box_size)
1826 2
  {
1827
    return (self::strlen($str) <= $box_size);
1828
  }
1829 2
1830
  /**
1831 2
   * Try to fix simple broken UTF-8 strings.
1832 2
   *
1833 1
   * INFO: Take a look at "UTF8::fix_utf8()" if you need a more advanced fix for broken UTF-8 strings.
1834 1
   *
1835
   * If you received an UTF-8 string that was converted from Windows-1252 as it was ISO-8859-1
1836 2
   * (ignoring Windows-1252 chars from 80 to 9F) use this function to fix it.
1837 1
   * See: http://en.wikipedia.org/wiki/Windows-1252
1838 1
   *
1839
   * @param string $str <p>The input string</p>
1840 2
   *
1841 2
   * @return string
1842 2
   */
1843 View Code Duplication
  public static function fix_simple_utf8($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1844 2
  {
1845
    // init
1846
    $str = (string)$str;
1847
1848
    if (!isset($str[0])) {
1849
      return '';
1850
    }
1851
1852
    static $BROKEN_UTF8_TO_UTF8_KEYS_CACHE = null;
1853
    static $BROKEN_UTF8_TO_UTF8_VALUES_CACHE = null;
1854
1855
    if ($BROKEN_UTF8_TO_UTF8_KEYS_CACHE === null) {
1856
      $BROKEN_UTF8_TO_UTF8_KEYS_CACHE = array_keys(self::$BROKEN_UTF8_FIX);
1857
      $BROKEN_UTF8_TO_UTF8_VALUES_CACHE = array_values(self::$BROKEN_UTF8_FIX);
1858
    }
1859
1860
    return str_replace($BROKEN_UTF8_TO_UTF8_KEYS_CACHE, $BROKEN_UTF8_TO_UTF8_VALUES_CACHE, $str);
1861
  }
1862
1863
  /**
1864
   * Fix a double (or multiple) encoded UTF8 string.
1865
   *
1866
   * @param string|string[] $str <p>You can use a string or an array of strings.</p>
1867
   *
1868
   * @return mixed
1869
   */
1870
  public static function fix_utf8($str)
1871
  {
1872
    if (is_array($str) === true) {
1873
1874
      /** @noinspection ForeachSourceInspection */
1875
      foreach ($str as $k => $v) {
1876
        /** @noinspection AlterInForeachInspection */
1877
        /** @noinspection OffsetOperationsInspection */
1878
        $str[$k] = self::fix_utf8($v);
1879
      }
1880
1881
      return $str;
1882
    }
1883
1884
    $last = '';
1885
    while ($last !== $str) {
1886
      $last = $str;
1887
      $str = self::to_utf8(
1888
          self::utf8_decode($str)
0 ignored issues
show
Bug introduced by
It seems like $str defined by self::to_utf8(self::utf8_decode($str)) on line 1887 can also be of type array; however, voku\helper\UTF8::utf8_decode() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
Security Bug introduced by
It seems like self::utf8_decode($str) targeting voku\helper\UTF8::utf8_decode() can also be of type false; however, voku\helper\UTF8::to_utf8() does only seem to accept string|array<integer,string>, did you maybe forget to handle an error condition?
Loading history...
1889
      );
1890
    }
1891
1892
    return $str;
1893
  }
1894
1895
  /**
1896
   * Get character of a specific character.
1897
   *
1898
   * @param string $char
1899
   *
1900
   * @return string <p>'RTL' or 'LTR'</p>
1901
   */
1902
  public static function getCharDirection($char)
1903
  {
1904
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
1905
      self::checkForSupport();
1906
    }
1907
1908
    if (self::$SUPPORT['intlChar'] === true) {
1909
      $tmpReturn = \IntlChar::charDirection($char);
1910
1911
      // from "IntlChar"-Class
1912
      $charDirection = array(
1913
          'RTL' => array(1, 13, 14, 15, 21),
1914
          'LTR' => array(0, 11, 12, 20),
1915
      );
1916
1917
      if (in_array($tmpReturn, $charDirection['LTR'], true)) {
1918
        return 'LTR';
1919
      } elseif (in_array($tmpReturn, $charDirection['RTL'], true)) {
1920
        return 'RTL';
1921
      }
1922
    }
1923
1924
    $c = static::chr_to_decimal($char);
1925
1926 9
    if (!(0x5be <= $c && 0x10b7f >= $c)) {
1927
      return 'LTR';
1928 9
    }
1929
1930 9
    if (0x85e >= $c) {
1931 6
1932
      if (0x5be === $c ||
1933
          0x5c0 === $c ||
1934 9
          0x5c3 === $c ||
1935 7
          0x5c6 === $c ||
1936
          (0x5d0 <= $c && 0x5ea >= $c) ||
1937
          (0x5f0 <= $c && 0x5f4 >= $c) ||
1938
          0x608 === $c ||
1939 9
          0x60b === $c ||
1940 9
          0x60d === $c ||
1941
          0x61b === $c ||
1942 9
          (0x61e <= $c && 0x64a >= $c) ||
1943 9
          (0x66d <= $c && 0x66f >= $c) ||
1944 9
          (0x671 <= $c && 0x6d5 >= $c) ||
1945 9
          (0x6e5 <= $c && 0x6e6 >= $c) ||
1946 9
          (0x6ee <= $c && 0x6ef >= $c) ||
1947 6
          (0x6fa <= $c && 0x70d >= $c) ||
1948
          0x710 === $c ||
1949
          (0x712 <= $c && 0x72f >= $c) ||
1950 9
          (0x74d <= $c && 0x7a5 >= $c) ||
1951 2
          0x7b1 === $c ||
1952 2
          (0x7c0 <= $c && 0x7ea >= $c) ||
1953
          (0x7f4 <= $c && 0x7f5 >= $c) ||
1954 9
          0x7fa === $c ||
1955 4
          (0x800 <= $c && 0x815 >= $c) ||
1956 4
          0x81a === $c ||
1957 4
          0x824 === $c ||
1958
          0x828 === $c ||
1959
          (0x830 <= $c && 0x83e >= $c) ||
1960 4
          (0x840 <= $c && 0x858 >= $c) ||
1961
          0x85e === $c
1962
      ) {
1963 9
        return 'RTL';
1964
      }
1965 9
1966 9
    } elseif (0x200f === $c) {
1967
1968 7
      return 'RTL';
1969
1970 7
    } elseif (0xfb1d <= $c) {
1971 6
1972
      if (0xfb1d === $c ||
1973 4
          (0xfb1f <= $c && 0xfb28 >= $c) ||
1974
          (0xfb2a <= $c && 0xfb36 >= $c) ||
1975 9
          (0xfb38 <= $c && 0xfb3c >= $c) ||
1976
          0xfb3e === $c ||
1977 9
          (0xfb40 <= $c && 0xfb41 >= $c) ||
1978
          (0xfb43 <= $c && 0xfb44 >= $c) ||
1979
          (0xfb46 <= $c && 0xfbc1 >= $c) ||
1980 9
          (0xfbd3 <= $c && 0xfd3d >= $c) ||
1981 9
          (0xfd50 <= $c && 0xfd8f >= $c) ||
1982 9
          (0xfd92 <= $c && 0xfdc7 >= $c) ||
1983
          (0xfdf0 <= $c && 0xfdfc >= $c) ||
1984 9
          (0xfe70 <= $c && 0xfe74 >= $c) ||
1985
          (0xfe76 <= $c && 0xfefc >= $c) ||
1986 9
          (0x10800 <= $c && 0x10805 >= $c) ||
1987
          0x10808 === $c ||
1988 9
          (0x1080a <= $c && 0x10835 >= $c) ||
1989
          (0x10837 <= $c && 0x10838 >= $c) ||
1990
          0x1083c === $c ||
1991
          (0x1083f <= $c && 0x10855 >= $c) ||
1992
          (0x10857 <= $c && 0x1085f >= $c) ||
1993
          (0x10900 <= $c && 0x1091b >= $c) ||
1994
          (0x10920 <= $c && 0x10939 >= $c) ||
1995
          0x1093f === $c ||
1996
          0x10a00 === $c ||
1997
          (0x10a10 <= $c && 0x10a13 >= $c) ||
1998
          (0x10a15 <= $c && 0x10a17 >= $c) ||
1999
          (0x10a19 <= $c && 0x10a33 >= $c) ||
2000
          (0x10a40 <= $c && 0x10a47 >= $c) ||
2001
          (0x10a50 <= $c && 0x10a58 >= $c) ||
2002
          (0x10a60 <= $c && 0x10a7f >= $c) ||
2003
          (0x10b00 <= $c && 0x10b35 >= $c) ||
2004
          (0x10b40 <= $c && 0x10b55 >= $c) ||
2005
          (0x10b58 <= $c && 0x10b72 >= $c) ||
2006
          (0x10b78 <= $c && 0x10b7f >= $c)
2007
      ) {
2008
        return 'RTL';
2009
      }
2010
    }
2011
2012
    return 'LTR';
2013
  }
2014
2015
  /**
2016
   * get data from "/data/*.ser"
2017
   *
2018
   * @param string $file
2019
   *
2020
   * @return bool|string|array|int <p>Will return false on error.</p>
2021
   */
2022
  private static function getData($file)
2023
  {
2024
    $file = __DIR__ . '/data/' . $file . '.php';
2025
    if (file_exists($file)) {
2026
      /** @noinspection PhpIncludeInspection */
2027
      return require $file;
2028
    } else {
2029
      return false;
2030
    }
2031
  }
2032
2033
  /**
2034
   * alias for "UTF8::string_has_bom()"
2035
   *
2036
   * @see UTF8::string_has_bom()
2037
   *
2038
   * @param string $str
2039
   *
2040
   * @return bool
2041
   *
2042
   * @deprecated
2043
   */
2044
  public static function hasBom($str)
2045
  {
2046
    return self::string_has_bom($str);
2047
  }
2048
2049
  /**
2050
   * Converts a hexadecimal-value into an UTF-8 character.
2051
   *
2052
   * @param string $hexdec <p>The hexadecimal value.</p>
2053
   *
2054
   * @return string|false <p>One single UTF-8 character.</p>
2055
   */
2056
  public static function hex_to_chr($hexdec)
2057
  {
2058
    return self::decimal_to_chr(hexdec($hexdec));
2059
  }
2060
2061
  /**
2062
   * Converts hexadecimal U+xxxx code point representation to integer.
2063
   *
2064
   * INFO: opposite to UTF8::int_to_hex()
2065
   *
2066
   * @param string $hexdec <p>The hexadecimal code point representation.</p>
2067
   *
2068
   * @return int|false <p>The code point, or false on failure.</p>
2069
   */
2070
  public static function hex_to_int($hexdec)
2071
  {
2072
    $hexdec = (string)$hexdec;
2073
2074
    if (!isset($hexdec[0])) {
2075
      return false;
2076
    }
2077
2078
    if (preg_match('/^(?:\\\u|U\+|)([a-z0-9]{4,6})$/i', $hexdec, $match)) {
2079
      return intval($match[1], 16);
2080
    }
2081
2082
    return false;
2083
  }
2084
2085
  /**
2086
   * alias for "UTF8::html_entity_decode()"
2087
   *
2088
   * @see UTF8::html_entity_decode()
2089
   *
2090
   * @param string $str
2091
   * @param int    $flags
2092
   * @param string $encoding
2093
   *
2094 2
   * @return string
2095
   */
2096 2
  public static function html_decode($str, $flags = null, $encoding = 'UTF-8')
2097 1
  {
2098 1
    return self::html_entity_decode($str, $flags, $encoding);
2099
  }
2100 2
2101
  /**
2102 2
   * Converts a UTF-8 string to a series of HTML numbered entities.
2103 1
   *
2104
   * INFO: opposite to UTF8::html_decode()
2105
   *
2106 2
   * @param string $str            <p>The Unicode string to be encoded as numbered entities.</p>
2107 2
   * @param bool   $keepAsciiChars [optional] <p>Keep ASCII chars.</p>
2108 2
   * @param string $encoding       [optional] <p>Default is UTF-8</p>
2109 2
   *
2110 2
   * @return string <p>HTML numbered entities.</p>
2111 1
   */
2112
  public static function html_encode($str, $keepAsciiChars = false, $encoding = 'UTF-8')
2113 1
  {
2114 1
    // init
2115 1
    $str = (string)$str;
2116 1
2117 1
    if (!isset($str[0])) {
2118 2
      return '';
2119
    }
2120 2
2121
    if ($encoding !== 'UTF-8') {
2122
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2123
    }
2124
2125
    # INFO: http://stackoverflow.com/questions/35854535/better-explanation-of-convmap-in-mb-encode-numericentity
2126
    if (function_exists('mb_encode_numericentity')) {
2127
2128
      $startCode = 0x00;
2129
      if ($keepAsciiChars === true) {
2130
        $startCode = 0x80;
2131
      }
2132
2133
      return mb_encode_numericentity(
2134
          $str,
2135
          array($startCode, 0xfffff, 0, 0xfffff, 0),
2136
          $encoding
2137
      );
2138
    }
2139
2140
    return implode(
2141
        '',
2142
        array_map(
2143
            function ($data) use ($keepAsciiChars, $encoding) {
2144
              return UTF8::single_chr_html_encode($data, $keepAsciiChars, $encoding);
2145
            },
2146
            self::split($str)
2147
        )
2148
    );
2149
  }
2150
2151
  /**
2152
   * UTF-8 version of html_entity_decode()
2153
   *
2154
   * The reason we are not using html_entity_decode() by itself is because
2155
   * while it is not technically correct to leave out the semicolon
2156
   * at the end of an entity most browsers will still interpret the entity
2157
   * correctly. html_entity_decode() does not convert entities without
2158
   * semicolons, so we are left with our own little solution here. Bummer.
2159
   *
2160
   * Convert all HTML entities to their applicable characters
2161
   *
2162
   * INFO: opposite to UTF8::html_encode()
2163
   *
2164
   * @link http://php.net/manual/en/function.html-entity-decode.php
2165
   *
2166
   * @param string $str      <p>
2167
   *                         The input string.
2168
   *                         </p>
2169
   * @param int    $flags    [optional] <p>
2170
   *                         A bitmask of one or more of the following flags, which specify how to handle quotes and
2171
   *                         which document type to use. The default is ENT_COMPAT | ENT_HTML401.
2172
   *                         <table>
2173
   *                         Available <i>flags</i> constants
2174
   *                         <tr valign="top">
2175
   *                         <td>Constant Name</td>
2176
   *                         <td>Description</td>
2177
   *                         </tr>
2178
   *                         <tr valign="top">
2179
   *                         <td><b>ENT_COMPAT</b></td>
2180
   *                         <td>Will convert double-quotes and leave single-quotes alone.</td>
2181
   *                         </tr>
2182
   *                         <tr valign="top">
2183
   *                         <td><b>ENT_QUOTES</b></td>
2184
   *                         <td>Will convert both double and single quotes.</td>
2185
   *                         </tr>
2186
   *                         <tr valign="top">
2187
   *                         <td><b>ENT_NOQUOTES</b></td>
2188
   *                         <td>Will leave both double and single quotes unconverted.</td>
2189
   *                         </tr>
2190
   *                         <tr valign="top">
2191
   *                         <td><b>ENT_HTML401</b></td>
2192
   *                         <td>
2193
   *                         Handle code as HTML 4.01.
2194
   *                         </td>
2195
   *                         </tr>
2196
   *                         <tr valign="top">
2197
   *                         <td><b>ENT_XML1</b></td>
2198
   *                         <td>
2199
   *                         Handle code as XML 1.
2200
   *                         </td>
2201
   *                         </tr>
2202
   *                         <tr valign="top">
2203
   *                         <td><b>ENT_XHTML</b></td>
2204
   *                         <td>
2205
   *                         Handle code as XHTML.
2206
   *                         </td>
2207
   *                         </tr>
2208
   *                         <tr valign="top">
2209
   *                         <td><b>ENT_HTML5</b></td>
2210
   *                         <td>
2211
   *                         Handle code as HTML 5.
2212
   *                         </td>
2213
   *                         </tr>
2214
   *                         </table>
2215
   *                         </p>
2216
   * @param string $encoding [optional] <p>Encoding to use.</p>
2217
   *
2218
   * @return string <p>The decoded string.</p>
2219
   */
2220
  public static function html_entity_decode($str, $flags = null, $encoding = 'UTF-8')
2221
  {
2222
    // init
2223
    $str = (string)$str;
2224
2225
    if (!isset($str[0])) {
2226
      return '';
2227
    }
2228
2229
    if (!isset($str[3])) { // examples: &; || &x;
0 ignored issues
show
Unused Code Comprehensibility introduced by
46% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
2230
      return $str;
2231
    }
2232 1
2233
    if (
2234 1
        strpos($str, '&') === false
2235
        ||
2236
        (
2237
            strpos($str, '&#') === false
2238 1
            &&
2239
            strpos($str, ';') === false
2240
        )
2241
    ) {
2242
      return $str;
2243
    }
2244
2245
    if ($encoding !== 'UTF-8') {
2246 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2247
    }
2248 1
2249
    if ($flags === null) {
2250
      if (Bootup::is_php('5.4') === true) {
2251
        $flags = ENT_QUOTES | ENT_HTML5;
2252
      } else {
2253
        $flags = ENT_QUOTES;
2254
      }
2255
    }
2256
2257
    do {
2258
      $str_compare = $str;
2259
2260
      $str = preg_replace_callback(
2261 3
          "/&#\d{2,6};/",
2262
          function ($matches) use ($encoding) {
2263 3
            $returnTmp = \mb_convert_encoding($matches[0], $encoding, 'HTML-ENTITIES');
2264 3
2265
            if ($returnTmp !== '"' && $returnTmp !== "'") {
2266 3
              return $returnTmp;
2267
            } else {
2268 3
              return $matches[0];
2269
            }
2270
          },
2271
          $str
2272
      );
2273
2274
      // decode numeric & UTF16 two byte entities
2275
      $str = html_entity_decode(
2276
          preg_replace('/(&#(?:x0*[0-9a-f]{2,6}(?![0-9a-f;])|(?:0*\d{2,6}(?![0-9;]))))/iS', '$1;', $str),
2277
          $flags,
2278
          $encoding
2279 1
      );
2280
2281 1
    } while ($str_compare !== $str);
2282
2283
    return $str;
2284
  }
2285
2286
  /**
2287
   * Convert all applicable characters to HTML entities: UTF-8 version of htmlentities()
2288
   *
2289 2
   * @link http://php.net/manual/en/function.htmlentities.php
2290
   *
2291 2
   * @param string $str           <p>
2292
   *                              The input string.
2293
   *                              </p>
2294
   * @param int    $flags         [optional] <p>
2295
   *                              A bitmask of one or more of the following flags, which specify how to handle quotes,
2296
   *                              invalid code unit sequences and the used document type. The default is
2297
   *                              ENT_COMPAT | ENT_HTML401.
2298
   *                              <table>
2299
   *                              Available <i>flags</i> constants
2300
   *                              <tr valign="top">
2301
   *                              <td>Constant Name</td>
2302
   *                              <td>Description</td>
2303 2
   *                              </tr>
2304
   *                              <tr valign="top">
2305 2
   *                              <td><b>ENT_COMPAT</b></td>
2306
   *                              <td>Will convert double-quotes and leave single-quotes alone.</td>
2307
   *                              </tr>
2308
   *                              <tr valign="top">
2309
   *                              <td><b>ENT_QUOTES</b></td>
2310
   *                              <td>Will convert both double and single quotes.</td>
2311
   *                              </tr>
2312
   *                              <tr valign="top">
2313
   *                              <td><b>ENT_NOQUOTES</b></td>
2314
   *                              <td>Will leave both double and single quotes unconverted.</td>
2315
   *                              </tr>
2316
   *                              <tr valign="top">
2317 1
   *                              <td><b>ENT_IGNORE</b></td>
2318
   *                              <td>
2319 1
   *                              Silently discard invalid code unit sequences instead of returning
2320
   *                              an empty string. Using this flag is discouraged as it
2321
   *                              may have security implications.
2322
   *                              </td>
2323
   *                              </tr>
2324
   *                              <tr valign="top">
2325
   *                              <td><b>ENT_SUBSTITUTE</b></td>
2326
   *                              <td>
2327
   *                              Replace invalid code unit sequences with a Unicode Replacement Character
2328
   *                              U+FFFD (UTF-8) or &#38;#38;#FFFD; (otherwise) instead of returning an empty string.
2329
   *                              </td>
2330
   *                              </tr>
2331
   *                              <tr valign="top">
2332
   *                              <td><b>ENT_DISALLOWED</b></td>
2333
   *                              <td>
2334
   *                              Replace invalid code points for the given document type with a
2335
   *                              Unicode Replacement Character U+FFFD (UTF-8) or &#38;#38;#FFFD;
2336
   *                              (otherwise) instead of leaving them as is. This may be useful, for
2337
   *                              instance, to ensure the well-formedness of XML documents with
2338
   *                              embedded external content.
2339
   *                              </td>
2340
   *                              </tr>
2341
   *                              <tr valign="top">
2342
   *                              <td><b>ENT_HTML401</b></td>
2343
   *                              <td>
2344
   *                              Handle code as HTML 4.01.
2345
   *                              </td>
2346
   *                              </tr>
2347
   *                              <tr valign="top">
2348
   *                              <td><b>ENT_XML1</b></td>
2349
   *                              <td>
2350
   *                              Handle code as XML 1.
2351
   *                              </td>
2352
   *                              </tr>
2353
   *                              <tr valign="top">
2354
   *                              <td><b>ENT_XHTML</b></td>
2355
   *                              <td>
2356
   *                              Handle code as XHTML.
2357
   *                              </td>
2358
   *                              </tr>
2359 1
   *                              <tr valign="top">
2360
   *                              <td><b>ENT_HTML5</b></td>
2361 1
   *                              <td>
2362
   *                              Handle code as HTML 5.
2363
   *                              </td>
2364
   *                              </tr>
2365
   *                              </table>
2366
   *                              </p>
2367
   * @param string $encoding      [optional] <p>
2368
   *                              Like <b>htmlspecialchars</b>,
2369
   *                              <b>htmlentities</b> takes an optional third argument
2370
   *                              <i>encoding</i> which defines encoding used in
2371
   *                              conversion.
2372
   *                              Although this argument is technically optional, you are highly
2373
   *                              encouraged to specify the correct value for your code.
2374
   *                              </p>
2375
   * @param bool   $double_encode [optional] <p>
2376
   *                              When <i>double_encode</i> is turned off PHP will not
2377
   *                              encode existing html entities. The default is to convert everything.
2378
   *                              </p>
2379
   *
2380
   *
2381
   * @return string the encoded string.
2382
   * </p>
2383
   * <p>
2384
   * If the input <i>string</i> contains an invalid code unit
2385
   * sequence within the given <i>encoding</i> an empty string
2386
   * will be returned, unless either the <b>ENT_IGNORE</b> or
2387 1
   * <b>ENT_SUBSTITUTE</b> flags are set.
2388
   */
2389 1
  public static function htmlentities($str, $flags = ENT_COMPAT, $encoding = 'UTF-8', $double_encode = true)
2390
  {
2391
    if ($encoding !== 'UTF-8') {
2392
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2393
    }
2394
2395
    $str = htmlentities($str, $flags, $encoding, $double_encode);
2396
2397
    if ($encoding !== 'UTF-8') {
2398
      return $str;
2399
    }
2400
2401 1
    $byteLengths = self::chr_size_list($str);
2402
    $search = array();
2403 1
    $replacements = array();
2404
    foreach ($byteLengths as $counter => $byteLength) {
2405
      if ($byteLength >= 3) {
2406
        $char = self::access($str, $counter);
2407
2408
        if (!isset($replacements[$char])) {
2409
          $search[$char] = $char;
2410
          $replacements[$char] = self::html_encode($char);
0 ignored issues
show
Security Bug introduced by
It seems like $char defined by self::access($str, $counter) on line 2406 can also be of type false; however, voku\helper\UTF8::html_encode() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
2411
        }
2412
      }
2413
    }
2414
2415
    return str_replace($search, $replacements, $str);
2416 16
  }
2417
2418 16
  /**
2419
   * Convert only special characters to HTML entities: UTF-8 version of htmlspecialchars()
2420
   *
2421
   * INFO: Take a look at "UTF8::htmlentities()"
2422
   *
2423
   * @link http://php.net/manual/en/function.htmlspecialchars.php
2424
   *
2425
   * @param string $str           <p>
2426
   *                              The string being converted.
2427
   *                              </p>
2428
   * @param int    $flags         [optional] <p>
2429
   *                              A bitmask of one or more of the following flags, which specify how to handle quotes,
2430
   *                              invalid code unit sequences and the used document type. The default is
2431 28
   *                              ENT_COMPAT | ENT_HTML401.
2432
   *                              <table>
2433 28
   *                              Available <i>flags</i> constants
2434
   *                              <tr valign="top">
2435 28
   *                              <td>Constant Name</td>
2436 5
   *                              <td>Description</td>
2437
   *                              </tr>
2438
   *                              <tr valign="top">
2439 28
   *                              <td><b>ENT_COMPAT</b></td>
2440
   *                              <td>Will convert double-quotes and leave single-quotes alone.</td>
2441
   *                              </tr>
2442
   *                              <tr valign="top">
2443
   *                              <td><b>ENT_QUOTES</b></td>
2444
   *                              <td>Will convert both double and single quotes.</td>
2445
   *                              </tr>
2446
   *                              <tr valign="top">
2447
   *                              <td><b>ENT_NOQUOTES</b></td>
2448
   *                              <td>Will leave both double and single quotes unconverted.</td>
2449 1
   *                              </tr>
2450
   *                              <tr valign="top">
2451 1
   *                              <td><b>ENT_IGNORE</b></td>
2452
   *                              <td>
2453 1
   *                              Silently discard invalid code unit sequences instead of returning
2454 1
   *                              an empty string. Using this flag is discouraged as it
2455
   *                              may have security implications.
2456
   *                              </td>
2457 1
   *                              </tr>
2458 1
   *                              <tr valign="top">
2459
   *                              <td><b>ENT_SUBSTITUTE</b></td>
2460 1
   *                              <td>
2461
   *                              Replace invalid code unit sequences with a Unicode Replacement Character
2462
   *                              U+FFFD (UTF-8) or &#38;#38;#FFFD; (otherwise) instead of returning an empty string.
2463
   *                              </td>
2464
   *                              </tr>
2465
   *                              <tr valign="top">
2466
   *                              <td><b>ENT_DISALLOWED</b></td>
2467
   *                              <td>
2468
   *                              Replace invalid code points for the given document type with a
2469
   *                              Unicode Replacement Character U+FFFD (UTF-8) or &#38;#38;#FFFD;
2470
   *                              (otherwise) instead of leaving them as is. This may be useful, for
2471 16
   *                              instance, to ensure the well-formedness of XML documents with
2472
   *                              embedded external content.
2473
   *                              </td>
2474 16
   *                              </tr>
2475
   *                              <tr valign="top">
2476
   *                              <td><b>ENT_HTML401</b></td>
2477 16
   *                              <td>
2478
   *                              Handle code as HTML 4.01.
2479 16
   *                              </td>
2480 16
   *                              </tr>
2481 15
   *                              <tr valign="top">
2482 16
   *                              <td><b>ENT_XML1</b></td>
2483 6
   *                              <td>
2484
   *                              Handle code as XML 1.
2485 15
   *                              </td>
2486
   *                              </tr>
2487
   *                              <tr valign="top">
2488
   *                              <td><b>ENT_XHTML</b></td>
2489
   *                              <td>
2490
   *                              Handle code as XHTML.
2491
   *                              </td>
2492
   *                              </tr>
2493
   *                              <tr valign="top">
2494
   *                              <td><b>ENT_HTML5</b></td>
2495
   *                              <td>
2496
   *                              Handle code as HTML 5.
2497
   *                              </td>
2498
   *                              </tr>
2499
   *                              </table>
2500
   *                              </p>
2501
   * @param string $encoding      [optional] <p>
2502
   *                              Defines encoding used in conversion.
2503
   *                              </p>
2504
   *                              <p>
2505
   *                              For the purposes of this function, the encodings
2506
   *                              ISO-8859-1, ISO-8859-15,
2507
   *                              UTF-8, cp866,
2508
   *                              cp1251, cp1252, and
2509
   *                              KOI8-R are effectively equivalent, provided the
2510
   *                              <i>string</i> itself is valid for the encoding, as
2511
   *                              the characters affected by <b>htmlspecialchars</b> occupy
2512
   *                              the same positions in all of these encodings.
2513
   *                              </p>
2514
   * @param bool   $double_encode [optional] <p>
2515
   *                              When <i>double_encode</i> is turned off PHP will not
2516
   *                              encode existing html entities, the default is to convert everything.
2517
   *                              </p>
2518
   *
2519
   * @return string The converted string.
2520
   * </p>
2521
   * <p>
2522
   * If the input <i>string</i> contains an invalid code unit
2523
   * sequence within the given <i>encoding</i> an empty string
2524
   * will be returned, unless either the <b>ENT_IGNORE</b> or
2525
   * <b>ENT_SUBSTITUTE</b> flags are set.
2526
   */
2527
  public static function htmlspecialchars($str, $flags = ENT_COMPAT, $encoding = 'UTF-8', $double_encode = true)
2528
  {
2529
    if ($encoding !== 'UTF-8') {
2530
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2531
    }
2532
2533
    return htmlspecialchars($str, $flags, $encoding, $double_encode);
2534
  }
2535
2536 1
  /**
2537
   * Checks whether iconv is available on the server.
2538 1
   *
2539
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
2540 1
   */
2541
  public static function iconv_loaded()
2542
  {
2543
    $return = extension_loaded('iconv') ? true : false;
2544
2545 1
    // INFO: "default_charset" is already set by the "Bootup"-class
2546
2547 1
    if (Bootup::is_php('5.6') === false) {
2548
      // INFO: "iconv_set_encoding" is deprecated since PHP >= 5.6
2549 1
      iconv_set_encoding('input_encoding', 'UTF-8');
2550 1
      iconv_set_encoding('output_encoding', 'UTF-8');
2551
      iconv_set_encoding('internal_encoding', 'UTF-8');
2552 1
    }
2553
2554
    return $return;
2555
  }
2556
2557
  /**
2558
   * alias for "UTF8::decimal_to_chr()"
2559
   *
2560
   * @see UTF8::decimal_to_chr()
2561
   *
2562
   * @param mixed $int
2563 1
   *
2564
   * @return string
2565 1
   */
2566
  public static function int_to_chr($int)
2567 1
  {
2568
    return self::decimal_to_chr($int);
2569
  }
2570
2571
  /**
2572 1
   * Converts Integer to hexadecimal U+xxxx code point representation.
2573 1
   *
2574 1
   * INFO: opposite to UTF8::hex_to_int()
2575 1
   *
2576 1
   * @param int    $int  <p>The integer to be converted to hexadecimal code point.</p>
2577
   * @param string $pfix [optional]
2578 1
   *
2579
   * @return string <p>The code point, or empty string on failure.</p>
2580
   */
2581
  public static function int_to_hex($int, $pfix = 'U+')
2582
  {
2583
    if ((int)$int === $int) {
2584
      $hex = dechex($int);
2585
2586
      $hex = (strlen($hex) < 4 ? substr('0000' . $hex, -4) : $hex);
2587
2588
      return $pfix . $hex;
2589
    }
2590
2591
    return '';
2592
  }
2593 4
2594
  /**
2595 4
   * Checks whether intl-char is available on the server.
2596
   *
2597 4
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
2598
   */
2599 4
  public static function intlChar_loaded()
2600 4
  {
2601 4
    return (
2602 4
        Bootup::is_php('7.0') === true
2603 4
        &&
2604 4
        class_exists('IntlChar') === true
2605 4
    );
2606 4
  }
2607 4
2608 2
  /**
2609 2
   * Checks whether intl is available on the server.
2610 4
   *
2611 4
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
2612 4
   */
2613
  public static function intl_loaded()
2614 4
  {
2615 4
    return extension_loaded('intl') ? true : false;
2616 4
  }
2617 4
2618 4
  /**
2619 4
   * alias for "UTF8::is_ascii()"
2620 4
   *
2621 4
   * @see UTF8::is_ascii()
2622 4
   *
2623 3
   * @param string $str
2624 3
   *
2625 4
   * @return boolean
2626 4
   *
2627 4
   * @deprecated
2628
   */
2629 4
  public static function isAscii($str)
2630 3
  {
2631 2
    return self::is_ascii($str);
2632
  }
2633 3
2634
  /**
2635
   * alias for "UTF8::is_base64()"
2636
   *
2637 3
   * @see UTF8::is_base64()
2638
   *
2639 3
   * @param string $str
2640
   *
2641
   * @return bool
2642
   *
2643
   * @deprecated
2644
   */
2645
  public static function isBase64($str)
2646
  {
2647
    return self::is_base64($str);
2648
  }
2649
2650
  /**
2651
   * alias for "UTF8::is_binary()"
2652
   *
2653 3
   * @see UTF8::is_binary()
2654
   *
2655 3
   * @param string $str
2656
   *
2657 3
   * @return bool
2658
   *
2659 3
   * @deprecated
2660 3
   */
2661 3
  public static function isBinary($str)
2662 3
  {
2663 3
    return self::is_binary($str);
2664 3
  }
2665 3
2666 3
  /**
2667 3
   * alias for "UTF8::is_bom()"
2668 1
   *
2669 1
   * @see UTF8::is_bom()
2670 3
   *
2671 3
   * @param string $utf8_chr
2672 3
   *
2673
   * @return boolean
2674 3
   *
2675 3
   * @deprecated
2676 3
   */
2677 3
  public static function isBom($utf8_chr)
2678 3
  {
2679 3
    return self::is_bom($utf8_chr);
2680 3
  }
2681 3
2682 3
  /**
2683 1
   * alias for "UTF8::is_html()"
2684 1
   *
2685 3
   * @see UTF8::is_html()
2686 3
   *
2687 3
   * @param string $str
2688
   *
2689 3
   * @return boolean
2690 1
   *
2691 1
   * @deprecated
2692
   */
2693 1
  public static function isHtml($str)
2694
  {
2695
    return self::is_html($str);
2696
  }
2697 3
2698
  /**
2699 3
   * alias for "UTF8::is_json()"
2700
   *
2701
   * @see UTF8::is_json()
2702
   *
2703
   * @param string $str
2704
   *
2705
   * @return bool
2706
   *
2707
   * @deprecated
2708
   */
2709
  public static function isJson($str)
2710
  {
2711
    return self::is_json($str);
2712 43
  }
2713
2714 43
  /**
2715
   * alias for "UTF8::is_utf16()"
2716 43
   *
2717 3
   * @see UTF8::is_utf16()
2718
   *
2719
   * @param string $str
2720 41
   *
2721 1
   * @return int|false false if is't not UTF16, 1 for UTF-16LE, 2 for UTF-16BE.
2722 1
   *
2723
   * @deprecated
2724
   */
2725
  public static function isUtf16($str)
2726
  {
2727
    return self::is_utf16($str);
2728
  }
2729
2730 41
  /**
2731
   * alias for "UTF8::is_utf32()"
2732
   *
2733
   * @see UTF8::is_utf32()
2734
   *
2735
   * @param string $str
2736
   *
2737
   * @return int|false false if is't not UTF16, 1 for UTF-32LE, 2 for UTF-32BE.
2738
   *
2739
   * @deprecated
2740 41
   */
2741
  public static function isUtf32($str)
2742 41
  {
2743 41
    return self::is_utf32($str);
2744 41
  }
2745
2746
  /**
2747 41
   * alias for "UTF8::is_utf8()"
2748 41
   *
2749 41
   * @see UTF8::is_utf8()
2750
   *
2751
   * @param string $str
2752 41
   * @param bool   $strict
2753
   *
2754 36
   * @return bool
2755 41
   *
2756
   * @deprecated
2757 34
   */
2758 34
  public static function isUtf8($str, $strict = false)
2759 34
  {
2760 34
    return self::is_utf8($str, $strict);
2761 39
  }
2762
2763 21
  /**
2764 21
   * Checks if a string is 7 bit ASCII.
2765 21
   *
2766 21
   * @param string $str <p>The string to check.</p>
2767 33
   *
2768
   * @return bool <p>
2769 9
   *              <strong>true</strong> if it is ASCII<br />
2770 9
   *              <strong>false</strong> otherwise
2771 9
   *              </p>
2772 9
   */
2773 16
  public static function is_ascii($str)
2774
  {
2775
    $str = (string)$str;
2776
2777
    if (!isset($str[0])) {
2778
      return true;
2779
    }
2780
2781
    return (bool)!preg_match('/[\x80-\xFF]/', $str);
2782 3
  }
2783 3
2784 3
  /**
2785 3
   * Returns true if the string is base64 encoded, false otherwise.
2786 9
   *
2787
   * @param string $str <p>The input string.</p>
2788 3
   *
2789 3
   * @return bool <p>Whether or not $str is base64 encoded.</p>
2790 3
   */
2791 3
  public static function is_base64($str)
2792 3
  {
2793
    $str = (string)$str;
2794
2795
    if (!isset($str[0])) {
2796 5
      return false;
2797
    }
2798 41
2799
    $base64String = (string)base64_decode($str, true);
2800
    if ($base64String && base64_encode($base64String) === $str) {
2801 36
      return true;
2802
    } else {
2803 33
      return false;
2804 33
    }
2805 33
  }
2806 33
2807
  /**
2808
   * Check if the input is binary... (is look like a hack).
2809
   *
2810
   * @param mixed $input
2811 33
   *
2812
   * @return bool
2813
   */
2814
  public static function is_binary($input)
2815
  {
2816
    $input = (string)$input;
2817 33
2818 33
    if (!isset($input[0])) {
2819 33
      return false;
2820 33
    }
2821
2822 33
    if (preg_match('~^[01]+$~', $input)) {
2823
      return true;
2824 33
    }
2825 33
2826 5
    $testLength = strlen($input);
2827
    if ($testLength && substr_count($input, "\x0") / $testLength > 0.3) {
2828
      return true;
2829 33
    }
2830 33
2831 33
    if (substr_count($input, "\x00") > 0) {
2832 33
      return true;
2833 33
    }
2834
2835
    return false;
2836
  }
2837
2838 18
  /**
2839
   * Check if the file is binary.
2840
   *
2841 41
   * @param string $file
2842
   *
2843 20
   * @return boolean
2844
   */
2845
  public static function is_binary_file($file)
2846
  {
2847
    try {
2848
      $fp = fopen($file, 'rb');
2849
      $block = fread($fp, 512);
2850
      fclose($fp);
2851
    } catch (\Exception $e) {
2852
      $block = '';
2853
    }
2854
2855
    return self::is_binary($block);
2856
  }
2857
2858
  /**
2859
   * Checks if the given string is equal to any "Byte Order Mark".
2860
   *
2861
   * WARNING: Use "UTF8::string_has_bom()" if you will check BOM in a string.
2862
   *
2863
   * @param string $str <p>The input string.</p>
2864
   *
2865
   * @return bool <p><strong>true</strong> if the $utf8_chr is Byte Order Mark, <strong>false</strong> otherwise.</p>
2866
   */
2867
  public static function is_bom($str)
2868
  {
2869
    foreach (self::$BOM as $bomString => $bomByteLength) {
2870
      if ($str === $bomString) {
2871
        return true;
2872
      }
2873
    }
2874
2875
    return false;
2876
  }
2877
2878
  /**
2879
   * Check if the string contains any html-tags <lall>.
2880
   *
2881
   * @param string $str <p>The input string.</p>
2882
   *
2883 2
   * @return boolean
2884
   */
2885 2
  public static function is_html($str)
2886
  {
2887 2
    $str = (string)$str;
2888 2
2889 2
    if (!isset($str[0])) {
2890
      return false;
2891
    }
2892
2893 2
    // init
2894
    $matches = array();
2895
2896
    preg_match("/<\/?\w+(?:(?:\s+\w+(?:\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)*+\s*|\s*)\/?>/", $str, $matches);
2897
2898
    if (count($matches) === 0) {
2899
      return false;
2900
    } else {
2901
      return true;
2902
    }
2903
  }
2904
2905
  /**
2906
   * Try to check if "$str" is an json-string.
2907
   *
2908
   * @param string $str <p>The input string.</p>
2909
   *
2910
   * @return bool
2911
   */
2912
  public static function is_json($str)
2913
  {
2914
    $str = (string)$str;
2915
2916
    if (!isset($str[0])) {
2917
      return false;
2918
    }
2919
2920
    $json = self::json_decode($str);
2921
2922
    if (
2923
        (
2924
            is_object($json) === true
2925
            ||
2926
            is_array($json) === true
2927
        )
2928
        &&
2929
        json_last_error() === JSON_ERROR_NONE
2930
    ) {
2931
      return true;
2932 2
    } else {
2933
      return false;
2934 2
    }
2935
  }
2936 2
2937
  /**
2938
   * Check if the string is UTF-16.
2939 2
   *
2940
   * @param string $str <p>The input string.</p>
2941
   *
2942 2
   * @return int|false <p>
2943
   *                   <strong>false</strong> if is't not UTF-16,<br />
2944
   *                   <strong>1</strong> for UTF-16LE,<br />
2945
   *                   <strong>2</strong> for UTF-16BE.
2946
   *                   </p>
2947
   */
2948 View Code Duplication
  public static function is_utf16($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
2949
  {
2950
    $str = self::remove_bom($str);
2951
2952 6
    if (self::is_binary($str) === true) {
2953
2954 6
      $maybeUTF16LE = 0;
2955
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-16LE');
2956
      if ($test) {
2957
        $test2 = \mb_convert_encoding($test, 'UTF-16LE', 'UTF-8');
2958
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-16LE');
2959
        if ($test3 === $test) {
2960
          $strChars = self::count_chars($str, true);
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::remove_bom($str) on line 2950 can also be of type false; however, voku\helper\UTF8::count_chars() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
2961
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
2962
            if (in_array($test3char, $strChars, true) === true) {
2963
              $maybeUTF16LE++;
2964
            }
2965 24
          }
2966
        }
2967 24
      }
2968
2969 24
      $maybeUTF16BE = 0;
2970 2
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-16BE');
2971
      if ($test) {
2972
        $test2 = \mb_convert_encoding($test, 'UTF-16BE', 'UTF-8');
2973
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-16BE');
2974 23
        if ($test3 === $test) {
2975 2
          $strChars = self::count_chars($str, true);
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::remove_bom($str) on line 2950 can also be of type false; however, voku\helper\UTF8::count_chars() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
2976
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
2977
            if (in_array($test3char, $strChars, true) === true) {
2978 23
              $maybeUTF16BE++;
2979
            }
2980 23
          }
2981
        }
2982
      }
2983
2984
      if ($maybeUTF16BE !== $maybeUTF16LE) {
2985
        if ($maybeUTF16LE > $maybeUTF16BE) {
2986
          return 1;
2987
        } else {
2988
          return 2;
2989
        }
2990 1
      }
2991
2992 1
    }
2993
2994
    return false;
2995
  }
2996 1
2997
  /**
2998
   * Check if the string is UTF-32.
2999
   *
3000
   * @param string $str
3001
   *
3002
   * @return int|false <p>
3003
   *                   <strong>false</strong> if is't not UTF-16,<br />
3004
   *                   <strong>1</strong> for UTF-32LE,<br />
3005
   *                   <strong>2</strong> for UTF-32BE.
3006
   *                   </p>
3007 1
   */
3008 View Code Duplication
  public static function is_utf32($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3009 1
  {
3010 1
    $str = self::remove_bom($str);
3011 1
3012
    if (self::is_binary($str) === true) {
3013 1
3014
      $maybeUTF32LE = 0;
3015
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-32LE');
3016
      if ($test) {
3017
        $test2 = \mb_convert_encoding($test, 'UTF-32LE', 'UTF-8');
3018
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-32LE');
3019
        if ($test3 === $test) {
3020
          $strChars = self::count_chars($str, true);
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::remove_bom($str) on line 3010 can also be of type false; however, voku\helper\UTF8::count_chars() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
3021
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
3022 2
            if (in_array($test3char, $strChars, true) === true) {
3023
              $maybeUTF32LE++;
3024 2
            }
3025
          }
3026 2
        }
3027 2
      }
3028 2
3029
      $maybeUTF32BE = 0;
3030 2
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-32BE');
3031
      if ($test) {
3032
        $test2 = \mb_convert_encoding($test, 'UTF-32BE', 'UTF-8');
3033
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-32BE');
3034
        if ($test3 === $test) {
3035
          $strChars = self::count_chars($str, true);
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::remove_bom($str) on line 3010 can also be of type false; however, voku\helper\UTF8::count_chars() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
3036
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
3037
            if (in_array($test3char, $strChars, true) === true) {
3038
              $maybeUTF32BE++;
3039
            }
3040 1
          }
3041
        }
3042 1
      }
3043
3044
      if ($maybeUTF32BE !== $maybeUTF32LE) {
3045
        if ($maybeUTF32LE > $maybeUTF32BE) {
3046 1
          return 1;
3047
        } else {
3048
          return 2;
3049
        }
3050
      }
3051
3052
    }
3053
3054
    return false;
3055
  }
3056
3057
  /**
3058 1
   * Checks whether the passed string contains only byte sequences that appear valid UTF-8 characters.
3059
   *
3060 1
   * @see    http://hsivonen.iki.fi/php-utf8/
3061
   *
3062
   * @param string $str    <p>The string to be checked.</p>
3063
   * @param bool   $strict <p>Check also if the string is not UTF-16 or UTF-32.</p>
3064
   *
3065
   * @return bool
3066
   */
3067
  public static function is_utf8($str, $strict = false)
3068
  {
3069
    $str = (string)$str;
3070 16
3071
    if (!isset($str[0])) {
3072 16
      return true;
3073
    }
3074 16
3075 2
    if ($strict === true) {
3076
      if (self::is_utf16($str) !== false) {
3077
        return false;
3078 16
      }
3079 1
3080
      if (self::is_utf32($str) !== false) {
3081
        return false;
3082 16
      }
3083 4
    }
3084
3085
    if (self::pcre_utf8_support() !== true) {
3086 15
3087 14
      // If even just the first character can be matched, when the /u
3088
      // modifier is used, then it's valid UTF-8. If the UTF-8 is somehow
3089
      // invalid, nothing at all will match, even if the string contains
3090 4
      // some valid sequences
3091 4
      return (preg_match('/^.{1}/us', $str, $ar) === 1);
3092 4
3093
    } else {
3094
3095 4
      $mState = 0; // cached expected number of octets after the current octet
3096 4
      // until the beginning of the next UTF8 character sequence
3097 4
      $mUcs4 = 0; // cached Unicode character
3098 4
      $mBytes = 1; // cached expected number of octets in the current sequence
3099 4
3100 4
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
3101 4
        self::checkForSupport();
3102 4
      }
3103 4
3104 4 View Code Duplication
      if (self::$SUPPORT['mbstring_func_overload'] === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3105 4
        $len = \mb_strlen($str, '8BIT');
3106 4
      } else {
3107 4
        $len = strlen($str);
3108 4
      }
3109 4
3110
      /** @noinspection ForeachInvariantsInspection */
3111 4
      for ($i = 0; $i < $len; $i++) {
3112 4
        $in = ord($str[$i]);
3113 4
        if ($mState === 0) {
3114
          // When mState is zero we expect either a US-ASCII character or a
3115 4
          // multi-octet sequence.
3116
          if (0 === (0x80 & $in)) {
3117 4
            // US-ASCII, pass straight through.
3118
            $mBytes = 1;
3119 View Code Duplication
          } elseif (0xC0 === (0xE0 & $in)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3120
            // First octet of 2 octet sequence.
3121
            $mUcs4 = $in;
3122
            $mUcs4 = ($mUcs4 & 0x1F) << 6;
3123
            $mState = 1;
3124
            $mBytes = 2;
3125
          } elseif (0xE0 === (0xF0 & $in)) {
3126
            // First octet of 3 octet sequence.
3127 13
            $mUcs4 = $in;
3128
            $mUcs4 = ($mUcs4 & 0x0F) << 12;
3129 13
            $mState = 2;
3130 13
            $mBytes = 3;
3131 View Code Duplication
          } elseif (0xF0 === (0xF8 & $in)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3132 13
            // First octet of 4 octet sequence.
3133 1
            $mUcs4 = $in;
3134 1
            $mUcs4 = ($mUcs4 & 0x07) << 18;
3135 1
            $mState = 3;
3136
            $mBytes = 4;
3137 13
          } elseif (0xF8 === (0xFC & $in)) {
3138
            /* First octet of 5 octet sequence.
3139
            *
3140
            * This is illegal because the encoded codepoint must be either
3141
            * (a) not the shortest form or
3142
            * (b) outside the Unicode range of 0-0x10FFFF.
3143
            * Rather than trying to resynchronize, we will carry on until the end
3144
            * of the sequence and let the later error handling code catch it.
3145
            */
3146
            $mUcs4 = $in;
3147
            $mUcs4 = ($mUcs4 & 0x03) << 24;
3148
            $mState = 4;
3149
            $mBytes = 5;
3150 18 View Code Duplication
          } elseif (0xFC === (0xFE & $in)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3151
            // First octet of 6 octet sequence, see comments for 5 octet sequence.
3152 18
            $mUcs4 = $in;
3153 18
            $mUcs4 = ($mUcs4 & 1) << 30;
3154
            $mState = 5;
3155 18
            $mBytes = 6;
3156
          } else {
3157 18
            /* Current octet is neither in the US-ASCII range nor a legal first
3158
             * octet of a multi-octet sequence.
3159 2
             */
3160
            return false;
3161 2
          }
3162
        } else {
3163 1
          // When mState is non-zero, we expect a continuation of the multi-octet
3164 1
          // sequence
3165
          if (0x80 === (0xC0 & $in)) {
3166 2
            // Legal continuation.
3167 2
            $shift = ($mState - 1) * 6;
3168
            $tmp = $in;
3169 18
            $tmp = ($tmp & 0x0000003F) << $shift;
3170 18
            $mUcs4 |= $tmp;
3171 1
            /**
3172 1
             * End of the multi-octet sequence. mUcs4 now contains the final
3173
             * Unicode code point to be output
3174 18
             */
3175 18
            if (0 === --$mState) {
3176
              /*
3177 18
              * Check for illegal sequences and code points.
3178
              */
3179
              // From Unicode 3.1, non-shortest form is illegal
3180
              if (
3181
                  (2 === $mBytes && $mUcs4 < 0x0080) ||
3182
                  (3 === $mBytes && $mUcs4 < 0x0800) ||
3183
                  (4 === $mBytes && $mUcs4 < 0x10000) ||
3184
                  (4 < $mBytes) ||
3185
                  // From Unicode 3.2, surrogate characters are illegal.
3186
                  (($mUcs4 & 0xFFFFF800) === 0xD800) ||
3187
                  // Code points outside the Unicode range are illegal.
3188
                  ($mUcs4 > 0x10FFFF)
3189
              ) {
3190
                return false;
3191
              }
3192
              // initialize UTF8 cache
3193
              $mState = 0;
3194
              $mUcs4 = 0;
3195
              $mBytes = 1;
3196
            }
3197
          } else {
3198
            /**
3199
             *((0xC0 & (*in) != 0x80) && (mState != 0))
3200
             * Incomplete multi-octet sequence.
3201
             */
3202
            return false;
3203
          }
3204
        }
3205
      }
3206
3207
      return true;
3208
    }
3209
  }
3210
3211
  /**
3212
   * (PHP 5 &gt;= 5.2.0, PECL json &gt;= 1.2.0)<br/>
3213
   * Decodes a JSON string
3214
   *
3215
   * @link http://php.net/manual/en/function.json-decode.php
3216
   *
3217
   * @param string $json    <p>
3218
   *                        The <i>json</i> string being decoded.
3219
   *                        </p>
3220
   *                        <p>
3221
   *                        This function only works with UTF-8 encoded strings.
3222
   *                        </p>
3223
   *                        <p>PHP implements a superset of
3224
   *                        JSON - it will also encode and decode scalar types and <b>NULL</b>. The JSON standard
3225
   *                        only supports these values when they are nested inside an array or an object.
3226
   *                        </p>
3227
   * @param bool   $assoc   [optional] <p>
3228
   *                        When <b>TRUE</b>, returned objects will be converted into
3229
   *                        associative arrays.
3230 17
   *                        </p>
3231
   * @param int    $depth   [optional] <p>
3232 17
   *                        User specified recursion depth.
3233 3
   *                        </p>
3234
   * @param int    $options [optional] <p>
3235
   *                        Bitmask of JSON decode options. Currently only
3236 16
   *                        <b>JSON_BIGINT_AS_STRING</b>
3237
   *                        is supported (default is to cast large integers as floats)
3238
   *                        </p>
3239
   *
3240 16
   * @return mixed the value encoded in <i>json</i> in appropriate
3241
   * PHP type. Values true, false and
3242
   * null (case-insensitive) are returned as <b>TRUE</b>, <b>FALSE</b>
3243
   * and <b>NULL</b> respectively. <b>NULL</b> is returned if the
3244
   * <i>json</i> cannot be decoded or if the encoded
3245
   * data is deeper than the recursion limit.
3246
   */
3247 View Code Duplication
  public static function json_decode($json, $assoc = false, $depth = 512, $options = 0)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3248 16
  {
3249 16
    $json = (string)self::filter($json);
3250 15
3251
    if (Bootup::is_php('5.4') === true) {
3252
      $json = json_decode($json, $assoc, $depth, $options);
3253 9
    } else {
3254 9
      $json = json_decode($json, $assoc, $depth);
3255 9
    }
3256
3257 9
    return $json;
3258 1
  }
3259
3260
  /**
3261 9
   * (PHP 5 &gt;= 5.2.0, PECL json &gt;= 1.2.0)<br/>
3262 4
   * Returns the JSON representation of a value.
3263
   *
3264
   * @link http://php.net/manual/en/function.json-encode.php
3265 9
   *
3266 5
   * @param mixed $value   <p>
3267
   *                       The <i>value</i> being encoded. Can be any type except
3268
   *                       a resource.
3269 9
   *                       </p>
3270
   *                       <p>
3271
   *                       All string data must be UTF-8 encoded.
3272
   *                       </p>
3273
   *                       <p>PHP implements a superset of
3274
   *                       JSON - it will also encode and decode scalar types and <b>NULL</b>. The JSON standard
3275
   *                       only supports these values when they are nested inside an array or an object.
3276
   *                       </p>
3277
   * @param int   $options [optional] <p>
3278
   *                       Bitmask consisting of <b>JSON_HEX_QUOT</b>,
3279
   *                       <b>JSON_HEX_TAG</b>,
3280
   *                       <b>JSON_HEX_AMP</b>,
3281
   *                       <b>JSON_HEX_APOS</b>,
3282
   *                       <b>JSON_NUMERIC_CHECK</b>,
3283
   *                       <b>JSON_PRETTY_PRINT</b>,
3284
   *                       <b>JSON_UNESCAPED_SLASHES</b>,
3285 1
   *                       <b>JSON_FORCE_OBJECT</b>,
3286
   *                       <b>JSON_UNESCAPED_UNICODE</b>. The behaviour of these
3287
   *                       constants is described on
3288 1
   *                       the JSON constants page.
3289
   *                       </p>
3290 1
   * @param int   $depth   [optional] <p>
3291 1
   *                       Set the maximum depth. Must be greater than zero.
3292 1
   *                       </p>
3293
   *
3294
   * @return string a JSON encoded string on success or <b>FALSE</b> on failure.
3295 1
   */
3296 View Code Duplication
  public static function json_encode($value, $options = 0, $depth = 512)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3297
  {
3298
    $value = self::filter($value);
3299
3300
    if (Bootup::is_php('5.5') === true) {
3301
      $json = json_encode($value, $options, $depth);
3302
    } else {
3303 41
      $json = json_encode($value, $options);
3304
    }
3305
3306 41
    return $json;
3307
  }
3308
3309
  /**
3310
   * Makes string's first char lowercase.
3311
   *
3312
   * @param string $str <p>The input string</p>
3313
   *
3314
   * @return string <p>The resulting string</p>
3315
   */
3316
  public static function lcfirst($str)
3317 1
  {
3318
    return self::strtolower(self::substr($str, 0, 1)) . self::substr($str, 1);
0 ignored issues
show
Security Bug introduced by
It seems like self::substr($str, 0, 1) targeting voku\helper\UTF8::substr() can also be of type false; however, voku\helper\UTF8::strtolower() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
3319 1
  }
3320 1
3321
  /**
3322
   * Strip whitespace or other characters from beginning of a UTF-8 string.
3323 1
   *
3324 1
   * @param string $str   <p>The string to be trimmed</p>
3325 1
   * @param string $chars <p>Optional characters to be stripped</p>
3326
   *
3327
   * @return string <p>The string with unwanted characters stripped from the left.</p>
3328 1
   */
3329 View Code Duplication
  public static function ltrim($str = '', $chars = INF)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3330
  {
3331 1
    $str = (string)$str;
3332
3333
    if (!isset($str[0])) {
3334
      return '';
3335 1
    }
3336 1
3337 1
    // Info: http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page#Unicodecharactercategories
3338
    if ($chars === INF || !$chars) {
3339
      return preg_replace('/^[\pZ\pC]+/u', '', $str);
3340 1
    }
3341
3342
    return preg_replace('/^' . self::rxClass($chars) . '+/u', '', $str);
3343 1
  }
3344
3345
  /**
3346
   * Returns the UTF-8 character with the maximum code point in the given data.
3347 1
   *
3348
   * @param mixed $arg <p>A UTF-8 encoded string or an array of such strings.</p>
3349 1
   *
3350 1
   * @return string <p>The character with the highest code point than others.</p>
3351 1
   */
3352 1 View Code Duplication
  public static function max($arg)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3353 1
  {
3354
    if (is_array($arg) === true) {
3355
      $arg = implode('', $arg);
3356
    }
3357
3358
    return self::chr(max(self::codepoints($arg)));
3359
  }
3360
3361
  /**
3362
   * Calculates and returns the maximum number of bytes taken by any
3363
   * UTF-8 encoded character in the given string.
3364
   *
3365 5
   * @param string $str <p>The original Unicode string.</p>
3366
   *
3367 5
   * @return int <p>Max byte lengths of the given chars.</p>
3368
   */
3369
  public static function max_chr_width($str)
3370
  {
3371
    $bytes = self::chr_size_list($str);
3372
    if (count($bytes) > 0) {
3373
      return (int)max($bytes);
3374
    } else {
3375
      return 0;
3376
    }
3377 10
  }
3378
3379 10
  /**
3380 10
   * Checks whether mbstring is available on the server.
3381 5
   *
3382 5
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
3383 10
   */
3384
  public static function mbstring_loaded()
3385 10
  {
3386
    $return = extension_loaded('mbstring') ? true : false;
3387
3388
    if ($return === true) {
3389
      \mb_internal_encoding('UTF-8');
3390
    }
3391
3392
    return $return;
3393
  }
3394
3395
  /**
3396 1
   * Returns the UTF-8 character with the minimum code point in the given data.
3397
   *
3398 1
   * @param mixed $arg <strong>A UTF-8 encoded string or an array of such strings.</strong>
3399 1
   *
3400 1
   * @return string <p>The character with the lowest code point than others.</p>
3401
   */
3402 1 View Code Duplication
  public static function min($arg)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3403 1
  {
3404 1
    if (is_array($arg) === true) {
3405 1
      $arg = implode('', $arg);
3406 1
    }
3407
3408 1
    return self::chr(min(self::codepoints($arg)));
3409
  }
3410
3411
  /**
3412
   * alias for "UTF8::normalize_encoding()"
3413
   *
3414
   * @see UTF8::normalize_encoding()
3415
   *
3416
   * @param string $encoding
3417
   * @param mixed  $fallback
3418
   *
3419
   * @return string
3420
   *
3421
   * @deprecated
3422
   */
3423
  public static function normalizeEncoding($encoding, $fallback = false)
3424 45
  {
3425
    return self::normalize_encoding($encoding, $fallback);
3426
  }
3427 45
3428
  /**
3429
   * Normalize the encoding-"name" input.
3430
   *
3431 45
   * @param string $encoding <p>e.g.: ISO, UTF8, WINDOWS-1251 etc.</p>
3432 45
   * @param mixed  $fallback <p>e.g.: UTF-8</p>
3433 45
   *
3434 45
   * @return string <p>e.g.: ISO-8859-1, UTF-8, WINDOWS-1251 etc.</p>
3435
   */
3436 45
  public static function normalize_encoding($encoding, $fallback = false)
3437
  {
3438
    static $STATIC_NORMALIZE_ENCODING_CACHE = array();
3439 45
3440 45
    if (!$encoding) {
3441
      return $fallback;
3442 45
    }
3443
3444
    if ('UTF-8' === $encoding) {
3445
      return $encoding;
3446
    }
3447
3448
    if (in_array($encoding, self::$ICONV_ENCODING, true)) {
3449
      return $encoding;
3450
    }
3451
3452
    if (isset($STATIC_NORMALIZE_ENCODING_CACHE[$encoding])) {
3453 45
      return $STATIC_NORMALIZE_ENCODING_CACHE[$encoding];
3454
    }
3455 45
3456
    $encodingOrig = $encoding;
3457 45
    $encoding = strtoupper($encoding);
3458 45
    $encodingUpperHelper = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);
3459 45
3460
    $equivalences = array(
3461 45
        'ISO88591'    => 'ISO-8859-1',
3462 45
        'ISO8859'     => 'ISO-8859-1',
3463 45
        'ISO'         => 'ISO-8859-1',
3464
        'LATIN1'      => 'ISO-8859-1',
3465 45
        'LATIN'       => 'ISO-8859-1',
3466
        'WIN1252'     => 'ISO-8859-1',
3467
        'WINDOWS1252' => 'ISO-8859-1',
3468
        'UTF16'       => 'UTF-16',
3469
        'UTF32'       => 'UTF-32',
3470
        'UTF8'        => 'UTF-8',
3471
        'UTF'         => 'UTF-8',
3472
        'UTF7'        => 'UTF-7',
3473
        '8BIT'        => 'CP850',
3474
        'BINARY'      => 'CP850',
3475
    );
3476 23
3477
    if (!empty($equivalences[$encodingUpperHelper])) {
3478 23
      $encoding = $equivalences[$encodingUpperHelper];
3479
    }
3480 23
3481 5
    $STATIC_NORMALIZE_ENCODING_CACHE[$encodingOrig] = $encoding;
3482
3483
    return $encoding;
3484
  }
3485 19
3486 3
  /**
3487
   * Normalize some MS Word special characters.
3488
   *
3489 18
   * @param string $str <p>The string to be normalized.</p>
3490
   *
3491 18
   * @return string
3492
   */
3493 View Code Duplication
  public static function normalize_msword($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3494
  {
3495
    // init
3496
    $str = (string)$str;
3497
3498
    if (!isset($str[0])) {
3499
      return '';
3500
    }
3501
3502 52
    static $UTF8_MSWORD_KEYS_CACHE = null;
3503
    static $UTF8_MSWORD_VALUES_CACHE = null;
3504 52
3505
    if ($UTF8_MSWORD_KEYS_CACHE === null) {
3506 52
      $UTF8_MSWORD_KEYS_CACHE = array_keys(self::$UTF8_MSWORD);
3507
      $UTF8_MSWORD_VALUES_CACHE = array_values(self::$UTF8_MSWORD);
3508 52
    }
3509 40
3510
    return str_replace($UTF8_MSWORD_KEYS_CACHE, $UTF8_MSWORD_VALUES_CACHE, $str);
3511
  }
3512 18
3513
  /**
3514
   * Normalize the whitespace.
3515 18
   *
3516 17
   * @param string $str                     <p>The string to be normalized.</p>
3517
   * @param bool   $keepNonBreakingSpace    [optional] <p>Set to true, to keep non-breaking-spaces.</p>
3518 17
   * @param bool   $keepBidiUnicodeControls [optional] <p>Set to true, to keep non-printable (for the web)
3519 17
   *                                        bidirectional text chars.</p>
3520 17
   *
3521 2
   * @return string
3522 2
   */
3523
  public static function normalize_whitespace($str, $keepNonBreakingSpace = false, $keepBidiUnicodeControls = false)
3524
  {
3525 18
    // init
3526
    $str = (string)$str;
3527 18
3528 18
    if (!isset($str[0])) {
3529 18
      return '';
3530
    }
3531 18
3532 18
    static $WHITESPACE_CACHE = array();
3533 18
    $cacheKey = (int)$keepNonBreakingSpace;
3534
3535
    if (!isset($WHITESPACE_CACHE[$cacheKey])) {
3536
3537 18
      $WHITESPACE_CACHE[$cacheKey] = self::$WHITESPACE_TABLE;
3538
3539 18
      if ($keepNonBreakingSpace === true) {
3540
        /** @noinspection OffsetOperationsInspection */
3541
        unset($WHITESPACE_CACHE[$cacheKey]['NO-BREAK SPACE']);
3542
      }
3543
3544
      $WHITESPACE_CACHE[$cacheKey] = array_values($WHITESPACE_CACHE[$cacheKey]);
3545
    }
3546
3547
    if ($keepBidiUnicodeControls === false) {
3548
      static $BIDI_UNICODE_CONTROLS_CACHE = null;
3549
3550
      if ($BIDI_UNICODE_CONTROLS_CACHE === null) {
3551
        $BIDI_UNICODE_CONTROLS_CACHE = array_values(self::$BIDI_UNI_CODE_CONTROLS_TABLE);
3552
      }
3553
3554
      $str = str_replace($BIDI_UNICODE_CONTROLS_CACHE, '', $str);
3555
    }
3556
3557
    return str_replace($WHITESPACE_CACHE[$cacheKey], ' ', $str);
3558
  }
3559
3560 1
  /**
3561
   * Format a number with grouped thousands.
3562 1
   *
3563 1
   * @param float  $number
3564
   * @param int    $decimals
3565
   * @param string $dec_point
3566
   * @param string $thousands_sep
3567
   *
3568 1
   * @return string
3569 1
   *    *
3570 1
   * @deprecated Because this has nothing to do with UTF8. :/
3571 1
   */
3572
  public static function number_format($number, $decimals = 0, $dec_point = '.', $thousands_sep = ',')
3573
  {
3574 1
    $thousands_sep = (string)$thousands_sep;
3575
    $dec_point = (string)$dec_point;
3576
    $number = (float)$number;
3577
3578
    if (
3579
        isset($thousands_sep[1], $dec_point[1])
3580
        &&
3581
        Bootup::is_php('5.4') === true
3582
    ) {
3583
      return str_replace(
3584
          array(
3585
              '.',
3586 36
              ',',
3587
          ),
3588 36
          array(
3589
              $dec_point,
3590 36
              $thousands_sep,
3591 2
          ),
3592
          number_format($number, $decimals, '.', ',')
3593
      );
3594
    }
3595 36
3596 36
    return number_format($number, $decimals, $dec_point, $thousands_sep);
3597
  }
3598 36
3599
  /**
3600
   * Calculates Unicode code point of the given UTF-8 encoded character.
3601
   *
3602 36
   * INFO: opposite to UTF8::chr()
3603
   *
3604 36
   * @param string      $chr      <p>The character of which to calculate code point.<p/>
3605 6
   * @param string|null $encoding [optional] <p>Default is UTF-8</p>
3606 6
   *
3607
   * @return int <p>
3608 36
   *             Unicode code point of the given character,<br />
3609 36
   *             0 on invalid UTF-8 byte sequence.
3610 36
   *             </p>
3611 36
   */
3612 36
  public static function ord($chr, $encoding = 'UTF-8')
3613
  {
3614 36
3615
    if ($encoding !== 'UTF-8') {
3616
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
3617
3618
      // check again, if it's still not UTF-8
3619
      /** @noinspection NotOptimalIfConditionsInspection */
3620
      if ($encoding !== 'UTF-8') {
3621
        $chr = (string)\mb_convert_encoding($chr, 'UTF-8', $encoding);
3622
      }
3623
    }
3624
3625
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
3626
      self::checkForSupport();
3627
    }
3628
3629
    if (self::$SUPPORT['intlChar'] === true) {
3630
      $tmpReturn = \IntlChar::ord($chr);
3631
      if ($tmpReturn) {
3632
        return $tmpReturn;
3633
      }
3634
    }
3635
3636
    // use static cache, if there is no support for "\IntlChar"
3637
    static $CHAR_CACHE = array();
3638
    if (isset($CHAR_CACHE[$chr]) === true) {
3639
      return $CHAR_CACHE[$chr];
3640
    }
3641
3642
    $chr_orig = $chr;
3643
    /** @noinspection CallableParameterUseCaseInTypeContextInspection */
3644
    $chr = unpack('C*', self::substr($chr, 0, 4, '8BIT'));
3645
    $code = $chr ? $chr[1] : 0;
3646 36
3647 5
    if (0xF0 <= $code && isset($chr[4])) {
3648
      return $CHAR_CACHE[$chr_orig] = (($code - 0xF0) << 18) + (($chr[2] - 0x80) << 12) + (($chr[3] - 0x80) << 6) + $chr[4] - 0x80;
3649 5
    }
3650 5
3651
    if (0xE0 <= $code && isset($chr[3])) {
3652
      return $CHAR_CACHE[$chr_orig] = (($code - 0xE0) << 12) + (($chr[2] - 0x80) << 6) + $chr[3] - 0x80;
3653 36
    }
3654
3655
    if (0xC0 <= $code && isset($chr[2])) {
3656
      return $CHAR_CACHE[$chr_orig] = (($code - 0xC0) << 6) + $chr[2] - 0x80;
3657 36
    }
3658
3659
    return $CHAR_CACHE[$chr_orig] = $code;
3660
  }
3661
3662
  /**
3663
   * Parses the string into an array (into the the second parameter).
3664
   *
3665
   * WARNING: Instead of "parse_str()" this method do not (re-)placing variables in the current scope,
3666
   *          if the second parameter is not set!
3667
   *
3668
   * @link http://php.net/manual/en/function.parse-str.php
3669
   *
3670 12
   * @param string  $str       <p>The input string.</p>
3671
   * @param array   $result    <p>The result will be returned into this reference parameter.</p>
3672
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
3673
   *
3674
   * @return bool <p>Will return <strong>false</strong> if php can't parse the string and we haven't any $result.</p>
3675
   */
3676 12
  public static function parse_str($str, &$result, $cleanUtf8 = false)
3677 2
  {
3678 1
    if ($cleanUtf8 === true) {
3679 2
      $str = self::clean($str);
3680 1
    }
3681 2
3682
    $return = \mb_parse_str($str, $result);
3683 2
    if ($return === false || empty($result)) {
3684
      return false;
3685
    }
3686 2
3687
    return true;
3688
  }
3689
3690
  /**
3691
   * Checks if \u modifier is available that enables Unicode support in PCRE.
3692 12
   *
3693 3
   * @return bool <p><strong>true</strong> if support is available, <strong>false</strong> otherwise.</p>
3694
   */
3695
  public static function pcre_utf8_support()
3696
  {
3697
    /** @noinspection PhpUsageOfSilenceOperatorInspection */
3698
    return (bool)@preg_match('//u', '');
3699
  }
3700 12
3701 9
  /**
3702
   * Create an array containing a range of UTF-8 characters.
3703
   *
3704
   * @param mixed $var1 <p>Numeric or hexadecimal code points, or a UTF-8 character to start from.</p>
3705
   * @param mixed $var2 <p>Numeric or hexadecimal code points, or a UTF-8 character to end at.</p>
3706
   *
3707
   * @return array
3708
   */
3709
  public static function range($var1, $var2)
3710 6
  {
3711 6
    if (!$var1 || !$var2) {
3712 6
      return array();
3713 6
    }
3714 6
3715 6 View Code Duplication
    if (ctype_digit((string)$var1)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3716 6
      $start = (int)$var1;
3717 6
    } elseif (ctype_xdigit($var1)) {
3718 6
      $start = (int)self::hex_to_int($var1);
3719 6
    } else {
3720 6
      $start = self::ord($var1);
3721 6
    }
3722 6
3723 6
    if (!$start) {
3724 6
      return array();
3725 6
    }
3726 6
3727 6 View Code Duplication
    if (ctype_digit((string)$var2)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3728 6
      $end = (int)$var2;
3729 6
    } elseif (ctype_xdigit($var2)) {
3730 6
      $end = (int)self::hex_to_int($var2);
3731
    } else {
3732 6
      $end = self::ord($var2);
3733 6
    }
3734 6
3735
    if (!$end) {
3736
      return array();
3737
    }
3738
3739
    return array_map(
3740
        array(
3741
            '\\voku\\helper\\UTF8',
3742
            'chr',
3743
        ),
3744
        range($start, $end)
3745
    );
3746
  }
3747
3748
  /**
3749
   * Multi decode html entity & fix urlencoded-win1252-chars.
3750
   *
3751
   * e.g:
3752
   * 'test+test'                     => 'test+test'
3753
   * 'D&#252;sseldorf'               => 'Düsseldorf'
3754
   * 'D%FCsseldorf'                  => 'Düsseldorf'
3755
   * 'D&#xFC;sseldorf'               => 'Düsseldorf'
3756
   * 'D%26%23xFC%3Bsseldorf'         => 'Düsseldorf'
3757
   * 'Düsseldorf'                   => 'Düsseldorf'
3758
   * 'D%C3%BCsseldorf'               => 'Düsseldorf'
3759
   * 'D%C3%83%C2%BCsseldorf'         => 'Düsseldorf'
3760
   * 'D%25C3%2583%25C2%25BCsseldorf' => 'Düsseldorf'
3761
   *
3762
   * @param string $str          <p>The input string.</p>
3763
   * @param bool   $multi_decode <p>Decode as often as possible.</p>
3764
   *
3765
   * @return string
3766
   */
3767 View Code Duplication
  public static function rawurldecode($str, $multi_decode = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3768
  {
3769
    $str = (string)$str;
3770
3771
    if (!isset($str[0])) {
3772
      return '';
3773
    }
3774
3775
    $pattern = '/%u([0-9a-f]{3,4})/i';
3776
    if (preg_match($pattern, $str)) {
3777
      $str = preg_replace($pattern, '&#x\\1;', rawurldecode($str));
3778 14
    }
3779
3780 14
    $flags = Bootup::is_php('5.4') === true ? ENT_QUOTES | ENT_HTML5 : ENT_QUOTES;
3781
3782
    do {
3783 14
      $str_compare = $str;
3784 14
3785 1
      $str = self::fix_simple_utf8(
3786 1
          rawurldecode(
3787 13
              self::html_entity_decode(
3788
                  self::to_utf8($str),
0 ignored issues
show
Bug introduced by
It seems like self::to_utf8($str) targeting voku\helper\UTF8::to_utf8() can also be of type array; however, voku\helper\UTF8::html_entity_decode() does only seem to accept string, maybe add an additional type check?

This check looks at variables that are passed out again to other methods.

If the outgoing method call has stricter type requirements than the method itself, an issue is raised.

An additional type check may prevent trouble.

Loading history...
3789 14
                  $flags
3790
              )
3791 14
          )
3792 14
      );
3793
3794 14
    } while ($multi_decode === true && $str_compare !== $str);
3795
3796
    return (string)$str;
3797
  }
3798
3799
  /**
3800
   * alias for "UTF8::remove_bom()"
3801
   *
3802
   * @see UTF8::remove_bom()
3803
   *
3804
   * @param string $str
3805
   *
3806 1
   * @return string
3807
   *
3808 1
   * @deprecated
3809
   */
3810 1
  public static function removeBOM($str)
3811
  {
3812
    return self::remove_bom($str);
3813
  }
3814 1
3815
  /**
3816 1
   * Remove the BOM from UTF-8 / UTF-16 / UTF-32 strings.
3817
   *
3818
   * @param string $str <p>The input string.</p>
3819
   *
3820 1
   * @return string <p>String without UTF-BOM</p>
3821 1
   */
3822
  public static function remove_bom($str)
3823
  {
3824 1
    $str = (string)$str;
3825 1
3826 1
    if (!isset($str[0])) {
3827 1
      return '';
3828
    }
3829 1
3830
    foreach (self::$BOM as $bomString => $bomByteLength) {
3831
      if (0 === self::strpos($str, $bomString, 0, '8BIT')) {
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::substr($str, $bomByteLength, null, '8BIT') on line 3832 can also be of type false; however, voku\helper\UTF8::strpos() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
3832 1
        $str = self::substr($str, $bomByteLength, null, '8BIT');
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::substr($str, $bomByteLength, null, '8BIT') on line 3832 can also be of type false; however, voku\helper\UTF8::substr() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
3833
      }
3834
    }
3835 1
3836
    return $str;
3837
  }
3838
3839
  /**
3840
   * Removes duplicate occurrences of a string in another string.
3841
   *
3842
   * @param string          $str  <p>The base string.</p>
3843
   * @param string|string[] $what <p>String to search for in the base string.</p>
3844
   *
3845
   * @return string <p>The result string with removed duplicates.</p>
3846
   */
3847
  public static function remove_duplicates($str, $what = ' ')
3848
  {
3849
    if (is_string($what) === true) {
3850
      $what = array($what);
3851 2
    }
3852
3853 2
    if (is_array($what) === true) {
3854
      /** @noinspection ForeachSourceInspection */
3855
      foreach ($what as $item) {
3856 2
        $str = preg_replace('/(' . preg_quote($item, '/') . ')+/', $item, $str);
3857 2
      }
3858
    }
3859 2
3860
    return $str;
3861 2
  }
3862 2
3863
  /**
3864 2
   * Remove invisible characters from a string.
3865
   *
3866
   * e.g.: This prevents sandwiching null characters between ascii characters, like Java\0script.
3867 2
   *
3868 2
   * copy&past from https://github.com/bcit-ci/CodeIgniter/blob/develop/system/core/Common.php
3869 2
   *
3870 2
   * @param string $str
3871 2
   * @param bool   $url_encoded
3872
   * @param string $replacement
3873 2
   *
3874 2
   * @return string
3875 2
   */
3876 2
  public static function remove_invisible_characters($str, $url_encoded = true, $replacement = '')
3877 2
  {
3878 2
    // init
3879
    $non_displayables = array();
3880 2
3881 2
    // every control character except newline (dec 10),
3882 2
    // carriage return (dec 13) and horizontal tab (dec 09)
0 ignored issues
show
Unused Code Comprehensibility introduced by
37% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
3883 2
    if ($url_encoded) {
3884 2
      $non_displayables[] = '/%0[0-8bcef]/'; // url encoded 00-08, 11, 12, 14, 15
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
3885 2
      $non_displayables[] = '/%1[0-9a-f]/'; // url encoded 16-31
3886
    }
3887 2
3888
    $non_displayables[] = '/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]+/S'; // 00-08, 11, 12, 14-31, 127
0 ignored issues
show
Unused Code Comprehensibility introduced by
62% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
3889
3890 2
    do {
3891
      $str = preg_replace($non_displayables, $replacement, $str, -1, $count);
3892
    } while ($count !== 0);
3893
3894
    return $str;
3895
  }
3896
3897
  /**
3898
   * Replace the diamond question mark (�) and invalid-UTF8 chars with the replacement.
3899
   *
3900
   * @param string $str                <p>The input string</p>
3901
   * @param string $replacementChar    <p>The replacement character.</p>
3902
   * @param bool   $processInvalidUtf8 <p>Convert invalid UTF-8 chars </p>
3903
   *
3904
   * @return string
3905
   */
3906
  public static function replace_diamond_question_mark($str, $replacementChar = '', $processInvalidUtf8 = true)
3907
  {
3908
    $str = (string)$str;
3909
3910
    if (!isset($str[0])) {
3911 1
      return '';
3912
    }
3913 1
3914
    if ($processInvalidUtf8 === true) {
3915 1
      $replacementCharHelper = $replacementChar;
3916
      if ($replacementChar === '') {
3917
        $replacementCharHelper = 'none';
3918
      }
3919
3920
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
3921
        self::checkForSupport();
3922
      }
3923
3924
      if (self::$SUPPORT['mbstring'] === false) {
3925
        trigger_error('UTF8::replace_diamond_question_mark() without mbstring cannot handle all chars correctly', E_USER_WARNING);
3926
      }
3927
3928
      $save = \mb_substitute_character();
3929
      \mb_substitute_character($replacementCharHelper);
3930
      /** @noinspection CallableParameterUseCaseInTypeContextInspection */
3931
      $str = \mb_convert_encoding($str, 'UTF-8', 'UTF-8');
3932
      \mb_substitute_character($save);
3933
    }
3934
3935
    return str_replace(
3936
        array(
3937
            "\xEF\xBF\xBD",
3938
            '�',
3939
        ),
3940
        array(
3941
            $replacementChar,
3942
            $replacementChar,
3943
        ),
3944
        $str
3945
    );
3946
  }
3947 12
3948
  /**
3949 12
   * Strip whitespace or other characters from end of a UTF-8 string.
3950
   *
3951
   * @param string $str   <p>The string to be trimmed.</p>
3952
   * @param string $chars <p>Optional characters to be stripped.</p>
3953
   *
3954
   * @return string <p>The string with unwanted characters stripped from the right.</p>
3955
   */
3956 View Code Duplication
  public static function rtrim($str = '', $chars = INF)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3957
  {
3958
    $str = (string)$str;
3959 1
3960
    if (!isset($str[0])) {
3961 1
      return '';
3962
    }
3963 1
3964
    // Info: http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page#Unicodecharactercategories
3965 1
    if ($chars === INF || !$chars) {
3966
      return preg_replace('/[\pZ\pC]+$/u', '', $str);
3967
    }
3968
3969
    return preg_replace('/' . self::rxClass($chars) . '+$/u', '', $str);
3970
  }
3971
3972
  /**
3973
   * rxClass
3974
   *
3975
   * @param string $s
3976
   * @param string $class
3977 1
   *
3978
   * @return string
3979 1
   */
3980
  private static function rxClass($s, $class = '')
3981 1
  {
3982 1
    static $RX_CLASSS_CACHE = array();
3983 1
3984
    $cacheKey = $s . $class;
3985 1
3986 1
    if (isset($RX_CLASSS_CACHE[$cacheKey])) {
3987 1
      return $RX_CLASSS_CACHE[$cacheKey];
3988 1
    }
3989
3990
    /** @noinspection CallableParameterUseCaseInTypeContextInspection */
3991 1
    $class = array($class);
3992
3993
    /** @noinspection SuspiciousLoopInspection */
3994
    foreach (self::str_split($s) as $s) {
3995
      if ('-' === $s) {
3996
        $class[0] = '-' . $class[0];
3997
      } elseif (!isset($s[2])) {
3998
        $class[0] .= preg_quote($s, '/');
3999
      } elseif (1 === self::strlen($s)) {
4000
        $class[0] .= $s;
4001
      } else {
4002 21
        $class[] = $s;
4003
      }
4004
    }
4005 21
4006 21
    if ($class[0]) {
4007
      $class[0] = '[' . $class[0] . ']';
4008 21
    }
4009 1
4010
    if (1 === count($class)) {
4011
      $return = $class[0];
4012 20
    } else {
4013
      $return = '(?:' . implode('|', $class) . ')';
4014
    }
4015
4016 20
    $RX_CLASSS_CACHE[$cacheKey] = $return;
4017 20
4018
    return $return;
4019 20
  }
4020 20
4021
  /**
4022
   * WARNING: Echo native UTF8-Support libs, e.g. for debugging.
4023 1
   */
4024 1
  public static function showSupport()
4025
  {
4026
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
4027 1
      self::checkForSupport();
4028 1
    }
4029 1
4030 1
    foreach (self::$SUPPORT as $utf8Support) {
4031 1
      echo $utf8Support . "\n<br>";
4032
    }
4033 1
  }
4034
4035 1
  /**
4036
   * Converts a UTF-8 character to HTML Numbered Entity like "&#123;".
4037
   *
4038
   * @param string $char           <p>The Unicode character to be encoded as numbered entity.</p>
4039
   * @param bool   $keepAsciiChars <p>Set to <strong>true</strong> to keep ASCII chars.</>
4040
   * @param string $encoding       [optional] <p>Default is UTF-8</p>
4041
   *
4042
   * @return string <p>The HTML numbered entity.</p>
4043
   */
4044
  public static function single_chr_html_encode($char, $keepAsciiChars = false, $encoding = 'UTF-8')
4045 1
  {
4046
    // init
4047 1
    $char = (string)$char;
4048
4049 1
    if (!isset($char[0])) {
4050
      return '';
4051 1
    }
4052
4053
    if (
4054
        $keepAsciiChars === true
4055
        &&
4056
        self::is_ascii($char) === true
4057
    ) {
4058
      return $char;
4059
    }
4060
4061
    if ($encoding !== 'UTF-8') {
4062
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
4063
    }
4064
4065 7
    return '&#' . self::ord($char, $encoding) . ';';
4066
  }
4067 7
4068
  /**
4069
   * Convert a string to an array of Unicode characters.
4070
   *
4071
   * @param string  $str       <p>The string to split into array.</p>
4072
   * @param int     $length    [optional] <p>Max character length of each array element.</p>
4073
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
4074
   *
4075
   * @return string[] <p>An array containing chunks of the string.</p>
4076
   */
4077
  public static function split($str, $length = 1, $cleanUtf8 = false)
4078
  {
4079
    $str = (string)$str;
4080
4081
    if (!isset($str[0])) {
4082
      return array();
4083 1
    }
4084
4085 1
    // init
4086 1
    $ret = array();
4087
4088 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
4089
      self::checkForSupport();
4090 1
    }
4091
4092 1
    if (self::$SUPPORT['pcre_utf8'] === true) {
4093 1
4094 1
      if ($cleanUtf8 === true) {
4095 1
        $str = self::clean($str);
4096
      }
4097 1
4098
      preg_match_all('/./us', $str, $retArray);
4099 1
      if (isset($retArray[0])) {
4100 1
        $ret = $retArray[0];
4101 1
      }
4102 1
      unset($retArray);
4103 1
4104 1
    } else {
4105
4106 1
      // fallback
4107
4108 1
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
4109
        self::checkForSupport();
4110
      }
4111
4112 1 View Code Duplication
      if (self::$SUPPORT['mbstring_func_overload'] === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4113
        $len = \mb_strlen($str, '8BIT');
4114
      } else {
4115
        $len = strlen($str);
4116
      }
4117
4118
      /** @noinspection ForeachInvariantsInspection */
4119
      for ($i = 0; $i < $len; $i++) {
4120
4121
        if (($str[$i] & "\x80") === "\x00") {
4122
4123
          $ret[] = $str[$i];
4124
4125
        } elseif (
4126
            isset($str[$i + 1])
4127
            &&
4128
            ($str[$i] & "\xE0") === "\xC0"
4129 9
        ) {
4130
4131 9
          if (($str[$i + 1] & "\xC0") === "\x80") {
4132
            $ret[] = $str[$i] . $str[$i + 1];
4133
4134
            $i++;
4135
          }
4136
4137 View Code Duplication
        } elseif (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4138
            isset($str[$i + 2])
4139
            &&
4140
            ($str[$i] & "\xF0") === "\xE0"
4141
        ) {
4142
4143
          if (
4144
              ($str[$i + 1] & "\xC0") === "\x80"
4145
              &&
4146
              ($str[$i + 2] & "\xC0") === "\x80"
4147 1
          ) {
4148
            $ret[] = $str[$i] . $str[$i + 1] . $str[$i + 2];
4149 1
4150
            $i += 2;
4151
          }
4152
4153
        } elseif (
4154
            isset($str[$i + 3])
4155
            &&
4156
            ($str[$i] & "\xF8") === "\xF0"
4157
        ) {
4158
4159 View Code Duplication
          if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4160
              ($str[$i + 1] & "\xC0") === "\x80"
4161
              &&
4162
              ($str[$i + 2] & "\xC0") === "\x80"
4163
              &&
4164 12
              ($str[$i + 3] & "\xC0") === "\x80"
4165
          ) {
4166 12
            $ret[] = $str[$i] . $str[$i + 1] . $str[$i + 2] . $str[$i + 3];
4167 11
4168 11
            $i += 3;
4169 12
          }
4170
4171
        }
4172
      }
4173
    }
4174
4175
    if ($length > 1) {
4176
      $ret = array_chunk($ret, $length);
4177
4178
      return array_map(
4179
          function ($item) {
4180
            return implode('', $item);
4181
          }, $ret
4182 9
      );
4183
    }
4184 9
4185 1
    /** @noinspection OffsetOperationsInspection */
4186
    if (isset($ret[0]) && $ret[0] === '') {
4187
      return array();
4188 8
    }
4189 2
4190 2
    return $ret;
4191
  }
4192 8
4193 8
  /**
4194 1
   * Optimized "\mb_detect_encoding()"-function -> with support for UTF-16 and UTF-32.
4195
   *
4196
   * @param string $str <p>The input string.</p>
4197 7
   *
4198
   * @return false|string <p>
4199 7
   *                      The detected string-encoding e.g. UTF-8 or UTF-16BE,<br />
4200
   *                      otherwise it will return false.
4201
   *                      </p>
4202 1
   */
4203
  public static function str_detect_encoding($str)
4204
  {
4205
    //
4206
    // 1.) check binary strings (010001001...) like UTF-16 / UTF-32
4207
    //
4208
4209
    if (self::is_binary($str) === true) {
4210
      if (self::is_utf16($str) === 1) {
4211
        return 'UTF-16LE';
4212
      } elseif (self::is_utf16($str) === 2) {
4213
        return 'UTF-16BE';
4214
      } elseif (self::is_utf32($str) === 1) {
4215
        return 'UTF-32LE';
4216
      } elseif (self::is_utf32($str) === 2) {
4217
        return 'UTF-32BE';
4218 1
      }
4219
    }
4220 1
4221
    //
4222
    // 2.) simple check for ASCII chars
4223
    //
4224
4225
    if (self::is_ascii($str) === true) {
4226
      return 'ASCII';
4227
    }
4228
4229
    //
4230
    // 3.) simple check for UTF-8 chars
4231
    //
4232 2
4233
    if (self::is_utf8($str) === true) {
4234 2
      return 'UTF-8';
4235 2
    }
4236
4237 2
    //
4238 2
    // 4.) check via "\mb_detect_encoding()"
4239 2
    //
4240
    // INFO: UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always with "\mb_detect_encoding()"
4241 2
4242 2
    $detectOrder = array(
4243
        'ISO-8859-1',
4244
        'ISO-8859-2',
4245
        'ISO-8859-3',
4246
        'ISO-8859-4',
4247
        'ISO-8859-5',
4248
        'ISO-8859-6',
4249
        'ISO-8859-7',
4250
        'ISO-8859-8',
4251
        'ISO-8859-9',
4252 3
        'ISO-8859-10',
4253
        'ISO-8859-13',
4254 3
        'ISO-8859-14',
4255 3
        'ISO-8859-15',
4256 3
        'ISO-8859-16',
4257
        'WINDOWS-1251',
4258 3
        'WINDOWS-1252',
4259
        'WINDOWS-1254',
4260 3
        'ISO-2022-JP',
4261
        'JIS',
4262
        'EUC-JP',
4263
    );
4264
4265
    $encoding = \mb_detect_encoding($str, $detectOrder, true);
4266
    if ($encoding) {
4267
      return $encoding;
4268
    }
4269
4270
    //
4271
    // 5.) check via "iconv()"
4272
    //
4273
4274
    $md5 = md5($str);
4275
    foreach (self::$ICONV_ENCODING as $encodingTmp) {
4276
      # INFO: //IGNORE and //TRANSLIT still throw notice
4277
      /** @noinspection PhpUsageOfSilenceOperatorInspection */
4278
      if (md5(@\iconv($encodingTmp, $encodingTmp . '//IGNORE', $str)) === $md5) {
4279
        return $encodingTmp;
4280
      }
4281
    }
4282 2
4283
    return false;
4284
  }
4285 2
4286
  /**
4287 2
   * Check if the string ends with the given substring.
4288
   *
4289
   * @param string $haystack <p>The string to search in.</p>
4290
   * @param string $needle   <p>The substring to search for.</p>
4291
   *
4292
   * @return bool
4293
   */
4294 View Code Duplication
  public static function str_ends_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4295
  {
4296
    $haystack = (string)$haystack;
4297
    $needle = (string)$needle;
4298
4299
    if (!isset($haystack[0], $needle[0])) {
4300
      return false;
4301
    }
4302
4303
    if ($needle === self::substr($haystack, -self::strlen($needle))) {
4304
      return true;
4305
    }
4306
4307
    return false;
4308
  }
4309
4310
  /**
4311
   * Check if the string ends with the given substring, case insensitive.
4312
   *
4313
   * @param string $haystack <p>The string to search in.</p>
4314 8
   * @param string $needle   <p>The substring to search for.</p>
4315
   *
4316 8
   * @return bool
4317 8
   */
4318 View Code Duplication
  public static function str_iends_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4319 8
  {
4320 3
    $haystack = (string)$haystack;
4321
    $needle = (string)$needle;
4322
4323 7
    if (!isset($haystack[0], $needle[0])) {
4324 1
      return false;
4325 1
    }
4326 1
4327
    if (self::strcasecmp(self::substr($haystack, -self::strlen($needle)), $needle) === 0) {
0 ignored issues
show
Security Bug introduced by
It seems like self::substr($haystack, -self::strlen($needle)) targeting voku\helper\UTF8::substr() can also be of type false; however, voku\helper\UTF8::strcasecmp() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
4328
      return true;
4329
    }
4330 7
4331 1
    return false;
4332 7
  }
4333 7
4334 7
  /**
4335
   * Case-insensitive and UTF-8 safe version of <function>str_replace</function>.
4336
   *
4337
   * @link  http://php.net/manual/en/function.str-ireplace.php
4338 7
   *
4339
   * @param mixed $search  <p>
4340
   *                       Every replacement with search array is
4341
   *                       performed on the result of previous replacement.
4342
   *                       </p>
4343
   * @param mixed $replace <p>
4344
   *                       </p>
4345
   * @param mixed $subject <p>
4346
   *                       If subject is an array, then the search and
4347
   *                       replace is performed with every entry of
4348
   *                       subject, and the return value is an array as
4349
   *                       well.
4350
   *                       </p>
4351
   * @param int   $count   [optional] <p>
4352
   *                       The number of matched and replaced needles will
4353
   *                       be returned in count which is passed by
4354
   *                       reference.
4355 8
   *                       </p>
4356
   *
4357 8
   * @return mixed <p>A string or an array of replacements.</p>
4358 2
   */
4359
  public static function str_ireplace($search, $replace, $subject, &$count = null)
4360
  {
4361 6
    $search = (array)$search;
4362
4363
    /** @noinspection AlterInForeachInspection */
4364
    foreach ($search as &$s) {
4365 6
      if ('' === $s .= '') {
4366
        $s = '/^(?<=.)$/';
4367
      } else {
4368
        $s = '/' . preg_quote($s, '/') . '/ui';
4369
      }
4370
    }
4371
4372 6
    $subject = preg_replace($search, $replace, $subject, -1, $replace);
4373
    $count = $replace; // used as reference parameter
4374
4375
    return $subject;
4376
  }
4377
4378
  /**
4379
   * Check if the string starts with the given substring, case insensitive.
4380
   *
4381
   * @param string $haystack <p>The string to search in.</p>
4382
   * @param string $needle   <p>The substring to search for.</p>
4383
   *
4384
   * @return bool
4385
   */
4386 View Code Duplication
  public static function str_istarts_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4387 62
  {
4388
    $haystack = (string)$haystack;
4389 62
    $needle = (string)$needle;
4390
4391 62
    if (!isset($haystack[0], $needle[0])) {
4392 4
      return false;
4393
    }
4394
4395
    if (self::stripos($haystack, $needle) === 0) {
4396
      return true;
4397 61
    }
4398 2
4399 61
    return false;
4400 60
  }
4401 60
4402 2
  /**
4403
   * Limit the number of characters in a string, but also after the next word.
4404
   *
4405
   * @param string $str
4406 61
   * @param int    $length
4407 61
   * @param string $strAddOn
4408 1
   *
4409
   * @return string
4410
   */
4411 61
  public static function str_limit_after_word($str, $length = 100, $strAddOn = '...')
4412 2
  {
4413 2
    $str = (string)$str;
4414
4415 61
    if (!isset($str[0])) {
4416
      return '';
4417
    }
4418
4419
    $length = (int)$length;
4420
4421
    if (self::strlen($str) <= $length) {
4422
      return $str;
4423
    }
4424
4425
    if (self::substr($str, $length - 1, 1) === ' ') {
4426
      return self::substr($str, 0, $length - 1) . $strAddOn;
4427
    }
4428
4429
    $str = self::substr($str, 0, $length);
4430 1
    $array = explode(' ', $str);
4431
    array_pop($array);
4432 1
    $new_str = implode(' ', $array);
4433
4434
    if ($new_str === '') {
4435
      $str = self::substr($str, 0, $length - 1) . $strAddOn;
0 ignored issues
show
Security Bug introduced by
It seems like $str can also be of type false; however, voku\helper\UTF8::substr() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
4436
    } else {
4437
      $str = $new_str . $strAddOn;
4438
    }
4439
4440
    return $str;
4441
  }
4442
4443
  /**
4444
   * Pad a UTF-8 string to given length with another string.
4445
   *
4446
   * @param string $str        <p>The input string.</p>
4447
   * @param int    $pad_length <p>The length of return string.</p>
4448
   * @param string $pad_string [optional] <p>String to use for padding the input string.</p>
4449 2
   * @param int    $pad_type   [optional] <p>
4450
   *                           Can be <strong>STR_PAD_RIGHT</strong> (default),
4451 2
   *                           <strong>STR_PAD_LEFT</strong> or <strong>STR_PAD_BOTH</strong>
4452
   *                           </p>
4453
   *
4454
   * @return string <strong>Returns the padded string</strong>
4455
   */
4456
  public static function str_pad($str, $pad_length, $pad_string = ' ', $pad_type = STR_PAD_RIGHT)
4457
  {
4458
    $str_length = self::strlen($str);
4459
4460
    if (
4461
        is_int($pad_length) === true
4462
        &&
4463
        $pad_length > 0
4464
        &&
4465
        $pad_length >= $str_length
4466
    ) {
4467 1
      $ps_length = self::strlen($pad_string);
4468
4469 1
      $diff = $pad_length - $str_length;
4470
4471
      switch ($pad_type) {
4472 View Code Duplication
        case STR_PAD_LEFT:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4473
          $pre = str_repeat($pad_string, (int)ceil($diff / $ps_length));
4474
          $pre = self::substr($pre, 0, $diff);
4475
          $post = '';
4476
          break;
4477
4478
        case STR_PAD_BOTH:
4479
          $pre = str_repeat($pad_string, (int)ceil($diff / $ps_length / 2));
4480
          $pre = self::substr($pre, 0, (int)$diff / 2);
4481
          $post = str_repeat($pad_string, (int)ceil($diff / $ps_length / 2));
4482
          $post = self::substr($post, 0, (int)ceil($diff / 2));
4483
          break;
4484
4485 2
        case STR_PAD_RIGHT:
4486 View Code Duplication
        default:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4487 2
          $post = str_repeat($pad_string, (int)ceil($diff / $ps_length));
4488 2
          $post = self::substr($post, 0, $diff);
4489
          $pre = '';
4490 2
      }
4491
4492
      return $pre . $str . $post;
4493
    }
4494
4495
    return $str;
4496
  }
4497
4498
  /**
4499
   * Repeat a string.
4500
   *
4501
   * @param string $str        <p>
4502
   *                           The string to be repeated.
4503 1
   *                           </p>
4504
   * @param int    $multiplier <p>
4505 1
   *                           Number of time the input string should be
4506 1
   *                           repeated.
4507
   *                           </p>
4508 1
   *                           <p>
4509 1
   *                           multiplier has to be greater than or equal to 0.
4510
   *                           If the multiplier is set to 0, the function
4511
   *                           will return an empty string.
4512 1
   *                           </p>
4513 1
   *
4514
   * @return string <p>The repeated string.</p>
4515 1
   */
4516
  public static function str_repeat($str, $multiplier)
4517
  {
4518
    $str = self::filter($str);
4519
4520
    return str_repeat($str, $multiplier);
4521
  }
4522
4523
  /**
4524
   * INFO: This is only a wrapper for "str_replace()"  -> the original functions is already UTF-8 safe.
4525
   *
4526
   * Replace all occurrences of the search string with the replacement string
4527
   *
4528
   * @link http://php.net/manual/en/function.str-replace.php
4529
   *
4530
   * @param mixed $search  <p>
4531
   *                       The value being searched for, otherwise known as the needle.
4532
   *                       An array may be used to designate multiple needles.
4533
   *                       </p>
4534
   * @param mixed $replace <p>
4535 15
   *                       The replacement value that replaces found search
4536
   *                       values. An array may be used to designate multiple replacements.
4537 15
   *                       </p>
4538 15
   * @param mixed $subject <p>
4539
   *                       The string or array being searched and replaced on,
4540 15
   *                       otherwise known as the haystack.
4541 2
   *                       </p>
4542
   *                       <p>
4543
   *                       If subject is an array, then the search and
4544
   *                       replace is performed with every entry of
4545 14
   *                       subject, and the return value is an array as
4546
   *                       well.
4547
   *                       </p>
4548
   * @param int   $count   [optional] If passed, this will hold the number of matched and replaced needles.
4549 14
   *
4550
   * @return mixed <p>This function returns a string or an array with the replaced values.</p>
4551
   */
4552
  public static function str_replace($search, $replace, $subject, &$count = null)
4553 14
  {
4554
    return str_replace($search, $replace, $subject, $count);
4555
  }
4556 2
4557 2
  /**
4558 2
   * Replace the first "$search"-term with the "$replace"-term.
4559
   *
4560 14
   * @param string $search
4561
   * @param string $replace
4562
   * @param string $subject
4563
   *
4564
   * @return string
4565
   */
4566 14
  public static function str_replace_first($search, $replace, $subject)
4567 2
  {
4568 14
    $pos = self::strpos($subject, $search);
4569 14
4570 14
    if ($pos !== false) {
4571 1
      return self::substr_replace($subject, $replace, $pos, self::strlen($search));
4572
    }
4573
4574 14
    return $subject;
4575 14
  }
4576
4577
  /**
4578
   * Shuffles all the characters in the string.
4579
   *
4580
   * @param string $str <p>The input string</p>
4581
   *
4582
   * @return string <p>The shuffled string.</p>
4583
   */
4584
  public static function str_shuffle($str)
4585
  {
4586
    $array = self::split($str);
4587
4588
    shuffle($array);
4589
4590
    return implode('', $array);
4591
  }
4592
4593
  /**
4594
   * Sort all characters according to code points.
4595
   *
4596
   * @param string $str    <p>A UTF-8 string.</p>
4597
   * @param bool   $unique <p>Sort unique. If <strong>true</strong>, repeated characters are ignored.</p>
4598
   * @param bool   $desc   <p>If <strong>true</strong>, will sort characters in reverse code point order.</p>
4599
   *
4600
   * @return string <p>String of sorted characters.</p>
4601
   */
4602
  public static function str_sort($str, $unique = false, $desc = false)
4603
  {
4604
    $array = self::codepoints($str);
4605
4606
    if ($unique) {
4607
      $array = array_flip(array_flip($array));
4608
    }
4609
4610
    if ($desc) {
4611
      arsort($array);
4612
    } else {
4613
      asort($array);
4614
    }
4615
4616
    return self::string($array);
4617
  }
4618
4619
  /**
4620 1
   * Split a string into an array.
4621
   *
4622 1
   * @param string $str
4623 1
   * @param int    $len
4624 1
   *
4625
   * @return array
4626 1
   */
4627
  public static function str_split($str, $len = 1)
4628
  {
4629
    // init
4630
    $len = (int)$len;
4631
    $str = (string)$str;
4632
4633 1
    if (!isset($str[0])) {
4634
      return array();
4635
    }
4636
4637
    if ($len < 1) {
4638
      return str_split($str, $len);
4639
    }
4640
4641
    /** @noinspection PhpInternalEntityUsedInspection */
4642
    preg_match_all('/' . Grapheme::GRAPHEME_CLUSTER_RX . '/u', $str, $a);
4643 4
    $a = $a[0];
4644
4645 4
    if ($len === 1) {
4646
      return $a;
4647 4
    }
4648 2
4649
    $arrayOutput = array();
4650
    $p = -1;
4651 3
4652
    /** @noinspection PhpForeachArrayIsUsedAsValueInspection */
4653
    foreach ($a as $l => $a) {
4654
      if ($l % $len) {
4655
        $arrayOutput[$p] .= $a;
4656
      } else {
4657
        $arrayOutput[++$p] = $a;
4658
      }
4659
    }
4660
4661
    return $arrayOutput;
4662
  }
4663
4664
  /**
4665
   * Check if the string starts with the given substring.
4666
   *
4667
   * @param string $haystack <p>The string to search in.</p>
4668
   * @param string $needle   <p>The substring to search for.</p>
4669
   *
4670
   * @return bool
4671
   */
4672 View Code Duplication
  public static function str_starts_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4673
  {
4674
    $haystack = (string)$haystack;
4675
    $needle = (string)$needle;
4676
4677 1
    if (!isset($haystack[0], $needle[0])) {
4678
      return false;
4679 1
    }
4680 1
4681 1
    if (self::strpos($haystack, $needle) === 0) {
4682
      return true;
4683 1
    }
4684
4685
    return false;
4686
  }
4687
4688
  /**
4689
   * Get a binary representation of a specific string.
4690 1
   *
4691
   * @param string $str <p>The input string.</p>
4692
   *
4693
   * @return string
4694
   */
4695
  public static function str_to_binary($str)
4696
  {
4697
    $str = (string)$str;
4698
4699
    $value = unpack('H*', $str);
4700
4701
    return base_convert($value[1], 16, 2);
4702
  }
4703
4704
  /**
4705
   * Convert a string into an array of words.
4706
   *
4707 1
   * @param string $str
4708
   * @param string $charlist
4709 1
   *
4710
   * @return array
4711
   */
4712
  public static function str_to_words($str, $charlist = '')
4713
  {
4714
    $str = (string)$str;
4715
4716
    if (!isset($str[0])) {
4717
      return array('');
4718
    }
4719
4720
    $charlist = self::rxClass($charlist, '\pL');
4721
4722
    return \preg_split("/({$charlist}+(?:[\p{Pd}’']{$charlist}+)*)/u", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
4723
  }
4724
4725
  /**
4726
   * alias for "UTF8::to_ascii()"
4727
   *
4728
   * @see UTF8::to_ascii()
4729 11
   *
4730
   * @param string $str
4731 11
   * @param string $unknown
4732
   * @param bool   $strict
4733 11
   *
4734 2
   * @return string
4735 2
   */
4736
  public static function str_transliterate($str, $unknown = '?', $strict = false)
4737 11
  {
4738
    return self::to_ascii($str, $unknown, $strict);
4739 11
  }
4740 2
4741
  /**
4742
   * Counts number of words in the UTF-8 string.
4743
   *
4744 10
   * @param string $str      <p>The input string.</p>
4745 10
   * @param int    $format   [optional] <p>
4746
   *                         <strong>0</strong> => return a number of words (default)<br />
4747
   *                         <strong>1</strong> => return an array of words<br />
4748
   *                         <strong>2</strong> => return an array of words with word-offset as key
4749 10
   *                         </p>
4750
   * @param string $charlist [optional] <p>Additional chars that contains to words and do not start a new word.</p>
4751 10
   *
4752
   * @return array|int <p>The number of words in the string</p>
4753
   */
4754 3
  public static function str_word_count($str, $format = 0, $charlist = '')
4755 3
  {
4756 3
    $strParts = self::str_to_words($str, $charlist);
4757
4758 10
    $len = count($strParts);
4759
4760
    if ($format === 1) {
4761
4762
      $numberOfWords = array();
4763
      for ($i = 1; $i < $len; $i += 2) {
4764 10
        $numberOfWords[] = $strParts[$i];
4765 1
      }
4766 10
4767 10
    } elseif ($format === 2) {
4768 10
4769 1
      $numberOfWords = array();
4770
      $offset = self::strlen($strParts[0]);
4771
      for ($i = 1; $i < $len; $i += 2) {
4772
        $numberOfWords[$offset] = $strParts[$i];
4773
        $offset += self::strlen($strParts[$i]) + self::strlen($strParts[$i + 1]);
4774 10
      }
4775 10
4776 10
    } else {
4777 10
4778
      $numberOfWords = ($len - 1) / 2;
4779
4780
    }
4781
4782
    return $numberOfWords;
4783
  }
4784
4785
  /**
4786
   * Case-insensitive string comparison.
4787
   *
4788
   * INFO: Case-insensitive version of UTF8::strcmp()
4789
   *
4790
   * @param string $str1
4791
   * @param string $str2
4792
   *
4793
   * @return int <p>
4794
   *             <strong>&lt; 0</strong> if str1 is less than str2;<br />
4795
   *             <strong>&gt; 0</strong> if str1 is greater than str2,<br />
4796
   *             <strong>0</strong> if they are equal.
4797
   *             </p>
4798
   */
4799
  public static function strcasecmp($str1, $str2)
4800
  {
4801
    return self::strcmp(self::strtocasefold($str1), self::strtocasefold($str2));
4802
  }
4803
4804
  /**
4805
   * alias for "UTF8::strstr()"
4806
   *
4807
   * @see UTF8::strstr()
4808
   *
4809
   * @param string  $haystack
4810
   * @param string  $needle
4811
   * @param bool    $before_needle
4812
   * @param string  $encoding
4813 10
   * @param boolean $cleanUtf8
4814
   *
4815
   * @return string|false
4816 10
   */
4817 10
  public static function strchr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
4818
  {
4819 10
    return self::strstr($haystack, $needle, $before_needle, $encoding, $cleanUtf8);
4820 2
  }
4821 2
4822
  /**
4823 10
   * Case-sensitive string comparison.
4824 10
   *
4825 2
   * @param string $str1
4826
   * @param string $str2
4827
   *
4828 8
   * @return int  <p>
4829
   *              <strong>&lt; 0</strong> if str1 is less than str2<br />
4830
   *              <strong>&gt; 0</strong> if str1 is greater than str2<br />
4831
   *              <strong>0</strong> if they are equal.
4832
   *              </p>
4833
   */
4834
  public static function strcmp($str1, $str2)
4835
  {
4836
    /** @noinspection PhpUndefinedClassInspection */
4837
    return $str1 . '' === $str2 . '' ? 0 : strcmp(
4838
        \Normalizer::normalize($str1, \Normalizer::NFD),
4839
        \Normalizer::normalize($str2, \Normalizer::NFD)
4840
    );
4841
  }
4842
4843
  /**
4844
   * Find length of initial segment not matching mask.
4845 2
   *
4846
   * @param string $str
4847 2
   * @param string $charList
4848
   * @param int    $offset
4849
   * @param int    $length
4850
   *
4851
   * @return int|null
4852
   */
4853
  public static function strcspn($str, $charList, $offset = 0, $length = 2147483647)
4854 2
  {
4855 1
    if ('' === $charList .= '') {
4856 1
      return null;
4857
    }
4858
4859
    if ($offset || 2147483647 !== $length) {
4860 2
      $str = (string)self::substr($str, $offset, $length);
4861 2
    }
4862 2
4863 2
    $str = (string)$str;
4864
    if (!isset($str[0])) {
4865
      return null;
4866
    }
4867
4868
    if (preg_match('/^(.*?)' . self::rxClass($charList) . '/us', $str, $length)) {
4869
      /** @noinspection OffsetOperationsInspection */
4870
      return self::strlen($length[1]);
4871
    }
4872
4873
    return self::strlen($str);
4874
  }
4875
4876
  /**
4877
   * alias for "UTF8::stristr()"
4878
   *
4879
   * @see UTF8::stristr()
4880
   *
4881
   * @param string  $haystack
4882 11
   * @param string  $needle
4883
   * @param bool    $before_needle
4884 11
   * @param string  $encoding
4885 11
   * @param boolean $cleanUtf8
4886 11
   *
4887
   * @return string|false
4888 11
   */
4889 1
  public static function strichr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
4890 1
  {
4891 1
    return self::stristr($haystack, $needle, $before_needle, $encoding, $cleanUtf8);
4892
  }
4893 11
4894
  /**
4895 11
   * Create a UTF-8 string from code points.
4896
   *
4897 11
   * INFO: opposite to UTF8::codepoints()
4898 1
   *
4899 1
   * @param array $array <p>Integer or Hexadecimal codepoints.</p>
4900
   *
4901
   * @return string <p>UTF-8 encoded string.</p>
4902 11
   */
4903 11
  public static function string(array $array)
4904
  {
4905 11
    return implode(
4906
        '',
4907 11
        array_map(
4908
            array(
4909
                '\\voku\\helper\\UTF8',
4910
                'chr',
4911
            ),
4912
            $array
4913
        )
4914
    );
4915
  }
4916
4917
  /**
4918
   * Checks if string starts with "BOM" (Byte Order Mark Character) character.
4919
   *
4920
   * @param string $str <p>The input string.</p>
4921 21
   *
4922
   * @return bool <p><strong>true</strong> if the string has BOM at the start, <strong>false</strong> otherwise.</p>
4923
   */
4924 21
  public static function string_has_bom($str)
4925
  {
4926 21
    foreach (self::$BOM as $bomString => $bomByteLength) {
4927 6
      if (0 === strpos($str, $bomString)) {
4928
        return true;
4929
      }
4930 19
    }
4931
4932
    return false;
4933
  }
4934
4935
  /**
4936 19
   * Strip HTML and PHP tags from a string + clean invalid UTF-8.
4937 2
   *
4938 2
   * @link http://php.net/manual/en/function.strip-tags.php
4939
   *
4940 19
   * @param string  $str            <p>
4941
   *                                The input string.
4942
   *                                </p>
4943
   * @param string  $allowable_tags [optional] <p>
4944
   *                                You can use the optional second parameter to specify tags which should
4945
   *                                not be stripped.
4946
   *                                </p>
4947
   *                                <p>
4948
   *                                HTML comments and PHP tags are also stripped. This is hardcoded and
4949
   *                                can not be changed with allowable_tags.
4950 3
   *                                </p>
4951
   * @param boolean $cleanUtf8      [optional] <p>Clean non UTF-8 chars from the string.</p>
4952 3
   *
4953
   * @return string <p>The stripped string.</p>
4954
   */
4955
  public static function strip_tags($str, $allowable_tags = null, $cleanUtf8 = false)
4956
  {
4957
    $str = (string)$str;
4958
4959
    if (!isset($str[0])) {
4960
      return '';
4961
    }
4962
4963
    if ($cleanUtf8) {
4964
      $str = self::clean($str);
4965
    }
4966 16
4967
    return strip_tags($str, $allowable_tags);
4968 16
  }
4969
4970 16
  /**
4971 2
   * Finds position of first occurrence of a string within another, case insensitive.
4972
   *
4973
   * @link http://php.net/manual/en/function.mb-stripos.php
4974 15
   *
4975
   * @param string  $haystack  <p>
4976
   *                           The string from which to get the position of the first occurrence
4977
   *                           of needle
4978
   *                           </p>
4979
   * @param string  $needle    <p>
4980 15
   *                           The string to find in haystack
4981 2
   *                           </p>
4982 2
   * @param int     $offset    [optional] <p>
4983
   *                           The position in haystack
4984 15
   *                           to start searching
4985
   *                           </p>
4986
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
4987
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
4988
   *
4989
   * @return int|false <p>
4990
   *                   Return the numeric position of the first occurrence of needle in the haystack string,<br />
4991
   *                   or false if needle is not found.
4992
   *                   </p>
4993
   */
4994
  public static function stripos($haystack, $needle, $offset = null, $encoding = 'UTF-8', $cleanUtf8 = false)
4995
  {
4996
    $haystack = (string)$haystack;
4997
    $needle = (string)$needle;
4998
    $offset = (int)$offset;
4999
5000
    if (!isset($haystack[0], $needle[0])) {
5001 1
      return false;
5002
    }
5003 1
5004 1
    if ($cleanUtf8 === true) {
5005 1
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5006 1
      // if invalid characters are found in $haystack before $needle
5007 1
      $haystack = self::clean($haystack);
5008
      $needle = self::clean($needle);
5009 1
    }
5010 1
5011 1 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5012 1
        $encoding === 'UTF-8'
5013 1
        ||
5014
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5015 1
    ) {
5016 1
      $encoding = 'UTF-8';
5017
    } else {
5018 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5019
    }
5020
5021
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5022
      self::checkForSupport();
5023
    }
5024
5025
    if (
5026
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5027
        &&
5028
        self::$SUPPORT['intl'] === true
5029
        &&
5030 1
        Bootup::is_php('5.4') === true
5031
    ) {
5032 1
      return \grapheme_stripos($haystack, $needle, $offset);
5033 1
    }
5034 1
5035
    // fallback to "mb_"-function via polyfill
5036 1
    return \mb_stripos($haystack, $needle, $offset, $encoding);
5037
  }
5038
5039
  /**
5040 1
   * Returns all of haystack starting from and including the first occurrence of needle to the end.
5041 1
   *
5042
   * @param string  $haystack      <p>The input string. Must be valid UTF-8.</p>
5043 1
   * @param string  $needle        <p>The string to look for. Must be valid UTF-8.</p>
5044
   * @param bool    $before_needle [optional] <p>
5045
   *                               If <b>TRUE</b>, grapheme_strstr() returns the part of the
5046
   *                               haystack before the first occurrence of the needle (excluding the needle).
5047
   *                               </p>
5048
   * @param string  $encoding      [optional] <p>Set the charset for e.g. "\mb_" function</p>
5049
   * @param boolean $cleanUtf8     [optional] <p>Clean non UTF-8 chars from the string.</p>
5050
   *
5051
   * @return false|string A sub-string,<br />or <strong>false</strong> if needle is not found.
5052
   */
5053 View Code Duplication
  public static function stristr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5054
  {
5055
    $haystack = (string)$haystack;
5056
    $needle = (string)$needle;
5057
    $before_needle = (bool)$before_needle;
5058
5059 47
    if (!isset($haystack[0], $needle[0])) {
5060
      return false;
5061
    }
5062 47
5063
    if ($encoding !== 'UTF-8') {
5064 47
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5065 9
    }
5066
5067
    if ($cleanUtf8 === true) {
5068 45
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5069
      // if invalid characters are found in $haystack before $needle
5070
      $needle = self::clean($needle);
5071
      $haystack = self::clean($haystack);
5072 1
    }
5073 1
5074
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5075 45
      self::checkForSupport();
5076 45
    }
5077 37
5078 37
    if (
5079
        $encoding !== 'UTF-8'
5080 45
        &&
5081 2
        self::$SUPPORT['mbstring'] === false
5082
    ) {
5083
      trigger_error('UTF8::stristr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5084 43
    }
5085 20
5086 20
    if (self::$SUPPORT['mbstring'] === true) {
5087 41
      return \mb_stristr($haystack, $needle, $before_needle, $encoding);
5088
    }
5089
5090 43
    if (
5091
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5092
        &&
5093
        self::$SUPPORT['intl'] === true
5094
        &&
5095
        Bootup::is_php('5.4') === true
5096 43
    ) {
5097 2
      return \grapheme_stristr($haystack, $needle, $before_needle);
5098 43
    }
5099 43
5100 43
    preg_match('/^(.*?)' . preg_quote($needle, '/') . '/usi', $haystack, $match);
5101 1
5102
    if (!isset($match[1])) {
5103
      return false;
5104 43
    }
5105 43
5106
    if ($before_needle) {
5107
      return $match[1];
5108
    }
5109
5110
    return self::substr($haystack, self::strlen($match[1]));
0 ignored issues
show
Security Bug introduced by
It seems like $haystack defined by self::clean($haystack) on line 5071 can also be of type false; however, voku\helper\UTF8::substr() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
5111
  }
5112
5113
  /**
5114
   * Get the string length, not the byte-length!
5115
   *
5116
   * @link     http://php.net/manual/en/function.mb-strlen.php
5117
   *
5118
   * @param string  $str       <p>The string being checked for length.</p>
5119
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5120
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5121
   *
5122
   * @return int <p>The number of characters in the string $str having character encoding $encoding. (One multi-byte
5123
   *             character counted as +1)</p>
5124
   */
5125
  public static function strlen($str, $encoding = 'UTF-8', $cleanUtf8 = false)
5126
  {
5127
    $str = (string)$str;
5128
5129
    if (!isset($str[0])) {
5130
      return 0;
5131
    }
5132
5133 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5134
        $encoding === 'UTF-8'
5135 1
        ||
5136
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5137 1
    ) {
5138 1
      $encoding = 'UTF-8';
5139
    } else {
5140 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5141
    }
5142
5143
    switch ($encoding) {
5144
      case 'ASCII':
5145
      case 'CP850':
5146
        if (
5147
            $encoding === 'CP850'
5148
            &&
5149
            self::$SUPPORT['mbstring_func_overload'] === false
5150
        ) {
5151
          return strlen($str);
5152
        } else {
5153
          return \mb_strlen($str, '8BIT');
5154
        }
5155
    }
5156
5157
    if ($cleanUtf8 === true) {
5158
      // "\mb_strlen" and "\iconv_strlen" returns wrong length,
5159
      // if invalid characters are found in $str
5160
      $str = self::clean($str);
5161 1
    }
5162
5163 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5164 1
      self::checkForSupport();
5165
    }
5166 1
5167 1 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5168
        $encoding !== 'UTF-8'
5169
        &&
5170 1
        self::$SUPPORT['mbstring'] === false
5171 1
        &&
5172 1
        self::$SUPPORT['iconv'] === false
5173
    ) {
5174 1
      trigger_error('UTF8::strlen() without mbstring / iconv cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5175 1
    }
5176
5177
    if (
5178 1
        $encoding !== 'UTF-8'
5179 1
        &&
5180
        self::$SUPPORT['iconv'] === true
5181 1
        &&
5182 1
        self::$SUPPORT['mbstring'] === false
5183 1
    ) {
5184
      return \iconv_strlen($str, $encoding);
5185 1
    }
5186
5187
    if (self::$SUPPORT['mbstring'] === true) {
5188
      return \mb_strlen($str, $encoding);
5189
    }
5190
5191
    if (self::$SUPPORT['iconv'] === true) {
5192 1
      return \iconv_strlen($str, $encoding);
5193
    }
5194
5195
    if (
5196
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5197
        &&
5198
        self::$SUPPORT['intl'] === true
5199
        &&
5200
        Bootup::is_php('5.4') === true
5201
    ) {
5202
      return \grapheme_strlen($str);
5203
    }
5204
5205
    // fallback via vanilla php
5206
    preg_match_all('/./us', $str, $parts);
5207 6
    $returnTmp = count($parts[0]);
5208
    if ($returnTmp !== 0) {
5209 6
      return $returnTmp;
5210 1
    }
5211
5212
    // fallback to "mb_"-function via polyfill
5213 1
    return \mb_strlen($str, $encoding);
5214 1
  }
5215 1
5216 1
  /**
5217
   * Case insensitive string comparisons using a "natural order" algorithm.
5218
   *
5219
   * INFO: natural order version of UTF8::strcasecmp()
5220 1
   *
5221 1
   * @param string $str1 <p>The first string.</p>
5222 1
   * @param string $str2 <p>The second string.</p>
5223 1
   *
5224 1
   * @return int <strong>&lt; 0</strong> if str1 is less than str2<br />
5225 1
   *             <strong>&gt; 0</strong> if str1 is greater than str2<br />
5226 1
   *             <strong>0</strong> if they are equal
5227 1
   */
5228
  public static function strnatcasecmp($str1, $str2)
5229
  {
5230
    return self::strnatcmp(self::strtocasefold($str1), self::strtocasefold($str2));
5231 1
  }
5232 1
5233 1
  /**
5234 1
   * String comparisons using a "natural order" algorithm
5235 1
   *
5236 1
   * INFO: natural order version of UTF8::strcmp()
5237 1
   *
5238 1
   * @link  http://php.net/manual/en/function.strnatcmp.php
5239
   *
5240
   * @param string $str1 <p>The first string.</p>
5241 1
   * @param string $str2 <p>The second string.</p>
5242 1
   *
5243 1
   * @return int <strong>&lt; 0</strong> if str1 is less than str2;<br />
5244 1
   *             <strong>&gt; 0</strong> if str1 is greater than str2;<br />
5245
   *             <strong>0</strong> if they are equal
5246
   */
5247
  public static function strnatcmp($str1, $str2)
5248 1
  {
5249
    return $str1 . '' === $str2 . '' ? 0 : strnatcmp(self::strtonatfold($str1), self::strtonatfold($str2));
5250 6
  }
5251 1
5252 1
  /**
5253 1
   * Case-insensitive string comparison of the first n characters.
5254 1
   *
5255
   * @link  http://php.net/manual/en/function.strncasecmp.php
5256 1
   *
5257
   * @param string $str1 <p>The first string.</p>
5258
   * @param string $str2 <p>The second string.</p>
5259 6
   * @param int    $len  <p>The length of strings to be used in the comparison.</p>
5260 6
   *
5261
   * @return int <strong>&lt; 0</strong> if <i>str1</i> is less than <i>str2</i>;<br />
5262 6
   *             <strong>&gt; 0</strong> if <i>str1</i> is greater than <i>str2</i>;<br />
5263 4
   *             <strong>0</strong> if they are equal
5264 4
   */
5265
  public static function strncasecmp($str1, $str2, $len)
5266 6
  {
5267
    return self::strncmp(self::strtocasefold($str1), self::strtocasefold($str2), $len);
5268 6
  }
5269
5270
  /**
5271
   * String comparison of the first n characters.
5272
   *
5273
   * @link  http://php.net/manual/en/function.strncmp.php
5274
   *
5275
   * @param string $str1 <p>The first string.</p>
5276
   * @param string $str2 <p>The second string.</p>
5277
   * @param int    $len  <p>Number of characters to use in the comparison.</p>
5278
   *
5279
   * @return int <strong>&lt; 0</strong> if <i>str1</i> is less than <i>str2</i>;<br />
5280 1
   *             <strong>&gt; 0</strong> if <i>str1</i> is greater than <i>str2</i>;<br />
5281
   *             <strong>0</strong> if they are equal
5282 1
   */
5283
  public static function strncmp($str1, $str2, $len)
5284 1
  {
5285 1
    $str1 = self::substr($str1, 0, $len);
5286
    $str2 = self::substr($str2, 0, $len);
5287
5288 1
    return self::strcmp($str1, $str2);
0 ignored issues
show
Security Bug introduced by
It seems like $str1 defined by self::substr($str1, 0, $len) on line 5285 can also be of type false; however, voku\helper\UTF8::strcmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
Security Bug introduced by
It seems like $str2 defined by self::substr($str2, 0, $len) on line 5286 can also be of type false; however, voku\helper\UTF8::strcmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
5289 1
  }
5290 1
5291
  /**
5292 1
   * Search a string for any of a set of characters.
5293
   *
5294
   * @link  http://php.net/manual/en/function.strpbrk.php
5295 1
   *
5296 1
   * @param string $haystack  <p>The string where char_list is looked for.</p>
5297
   * @param string $char_list <p>This parameter is case sensitive.</p>
5298 1
   *
5299 1
   * @return string String starting from the character found, or false if it is not found.
5300
   */
5301 1
  public static function strpbrk($haystack, $char_list)
5302
  {
5303 1
    $haystack = (string)$haystack;
5304 1
    $char_list = (string)$char_list;
5305
5306 1
    if (!isset($haystack[0], $char_list[0])) {
5307
      return false;
5308 1
    }
5309
5310 1
    if (preg_match('/' . self::rxClass($char_list) . '/us', $haystack, $m)) {
5311
      return substr($haystack, strpos($haystack, $m[0]));
5312 1
    } else {
5313
      return false;
5314
    }
5315
  }
5316
5317
  /**
5318
   * Find position of first occurrence of string in a string.
5319
   *
5320
   * @link http://php.net/manual/en/function.mb-strpos.php
5321
   *
5322
   * @param string  $haystack  <p>The string being checked.</p>
5323
   * @param string  $needle    <p>The position counted from the beginning of haystack.</p>
5324
   * @param int     $offset    [optional] <p>The search offset. If it is not specified, 0 is used.</p>
5325
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5326 7
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5327
   *
5328 7
   * @return int|false <p>
5329
   *                   The numeric position of the first occurrence of needle in the haystack string.<br />
5330
   *                   If needle is not found it returns false.
5331
   *                   </p>
5332
   */
5333
  public static function strpos($haystack, $needle, $offset = 0, $encoding = 'UTF-8', $cleanUtf8 = false)
5334
  {
5335
    $haystack = (string)$haystack;
5336
    $needle = (string)$needle;
5337
5338
    if (!isset($haystack[0], $needle[0])) {
5339
      return false;
5340 1
    }
5341
5342 1
    // init
5343
    $offset = (int)$offset;
5344
5345
    // iconv and mbstring do not support integer $needle
5346
5347
    if ((int)$needle === $needle && $needle >= 0) {
0 ignored issues
show
Unused Code Bug introduced by
The strict comparison === seems to always evaluate to false as the types of (int) $needle (integer) and $needle (string) can never be identical. Maybe you want to use a loose comparison == instead?
Loading history...
5348
      $needle = (string)self::chr($needle);
5349
    }
5350
5351
    if ($cleanUtf8 === true) {
5352
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5353
      // if invalid characters are found in $haystack before $needle
5354 1
      $needle = self::clean($needle);
5355
      $haystack = self::clean($haystack);
5356 1
    }
5357
5358 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5359
        $encoding === 'UTF-8'
5360
        ||
5361
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5362
    ) {
5363
      $encoding = 'UTF-8';
5364
    } else {
5365
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5366
    }
5367
5368 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5369
      self::checkForSupport();
5370 1
    }
5371
5372
    if (
5373
        $encoding === 'CP850'
5374
        &&
5375
        self::$SUPPORT['mbstring_func_overload'] === false
5376
    ) {
5377
      return strpos($haystack, $needle, $offset);
5378
    }
5379
5380 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5381
        $encoding !== 'UTF-8'
0 ignored issues
show
Comprehensibility introduced by
Consider adding parentheses for clarity. Current Interpretation: ($encoding !== 'UTF-8') ...PPORT['iconv'] === true, Probably Intended Meaning: $encoding !== ('UTF-8' &...PORT['iconv'] === true)

When comparing the result of a bit operation, we suggest to add explicit parenthesis and not to rely on PHP’s built-in operator precedence to ensure the code behaves as intended and to make it more readable.

Let’s take a look at these examples:

// Returns always int(0).
return 0 === $foo & 4;
return (0 === $foo) & 4;

// More likely intended return: true/false
return 0 === ($foo & 4);
Loading history...
5382
        &
5383
        self::$SUPPORT['iconv'] === true
5384
        &&
5385 13
        self::$SUPPORT['mbstring'] === false
5386
    ) {
5387 13
      trigger_error('UTF8::strpos() without mbstring / iconv cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5388
    }
5389
5390 13
    if (
5391
        $offset >= 0 // iconv_strpos() can't handle negative offset
5392 13
        &&
5393 3
        $encoding !== 'UTF-8'
5394
        &&
5395
        self::$SUPPORT['mbstring'] === false
5396 11
        &&
5397
        self::$SUPPORT['iconv'] === true
5398
    ) {
5399 11
      // ignore invalid negative offset to keep compatibility
5400 7
      // with php < 5.5.35, < 5.6.21, < 7.0.6
0 ignored issues
show
Unused Code Comprehensibility introduced by
39% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
5401
      return \iconv_strpos($haystack, $needle, $offset > 0 ? $offset : 0, $encoding);
5402
    }
5403 5
5404 1
    if (self::$SUPPORT['mbstring'] === true) {
5405
      return \mb_strpos($haystack, $needle, $offset, $encoding);
5406
    }
5407
5408 1
    if (
5409 1
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5410
        &&
5411
        self::$SUPPORT['intl'] === true
5412 1
        &&
5413 1
        Bootup::is_php('5.4') === true
5414
    ) {
5415
      return \grapheme_strpos($haystack, $needle, $offset);
5416 1
    }
5417
5418
    if (
5419 1
        $offset >= 0 // iconv_strpos() can't handle negative offset
5420
        &&
5421 5
        self::$SUPPORT['iconv'] === true
5422 5
    ) {
5423 5
      // ignore invalid negative offset to keep compatibility
5424
      // with php < 5.5.35, < 5.6.21, < 7.0.6
0 ignored issues
show
Unused Code Comprehensibility introduced by
39% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
5425 5
      return \iconv_strpos($haystack, $needle, $offset > 0 ? $offset : 0, $encoding);
5426
    }
5427 5
5428 5
    // fallback via vanilla php
5429
5430
    $haystack = self::substr($haystack, $offset);
0 ignored issues
show
Security Bug introduced by
It seems like $haystack defined by self::substr($haystack, $offset) on line 5430 can also be of type false; however, voku\helper\UTF8::substr() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
5431 5
5432
    if ($offset < 0) {
5433
      $offset = 0;
5434 5
    }
5435 5
5436 5
    $pos = strpos($haystack, $needle);
5437
    if ($pos === false) {
5438 5
      return false;
5439 2
    }
5440
5441 2
    $returnTmp = $offset + self::strlen(substr($haystack, 0, $pos));
5442 2
    if ($returnTmp !== false) {
5443 2
      return $returnTmp;
5444
    }
5445 2
5446 1
    // fallback to "mb_"-function via polyfill
5447
    return \mb_strpos($haystack, $needle, $offset, $encoding);
5448 1
  }
5449 1
5450 1
  /**
5451
   * Finds the last occurrence of a character in a string within another.
5452 1
   *
5453
   * @link http://php.net/manual/en/function.mb-strrchr.php
5454
   *
5455
   * @param string $haystack      <p>The string from which to get the last occurrence of needle.</p>
5456
   * @param string $needle        <p>The string to find in haystack</p>
5457
   * @param bool   $before_needle [optional] <p>
5458
   *                              Determines which portion of haystack
5459
   *                              this function returns.
5460
   *                              If set to true, it returns all of haystack
5461
   *                              from the beginning to the last occurrence of needle.
5462
   *                              If set to false, it returns all of haystack
5463
   *                              from the last occurrence of needle to the end,
5464
   *                              </p>
5465
   * @param string $encoding      [optional] <p>
5466
   *                              Character encoding name to use.
5467 1
   *                              If it is omitted, internal character encoding is used.
5468 2
   *                              </p>
5469
   * @param bool   $cleanUtf8     [optional] <p>Clean non UTF-8 chars from the string.</p>
5470 5
   *
5471
   * @return string|false The portion of haystack or false if needle is not found.
5472
   */
5473 View Code Duplication
  public static function strrchr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5474
  {
5475 5
    if ($encoding !== 'UTF-8') {
5476
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5477
    }
5478
5479
    if ($cleanUtf8 === true) {
5480 5
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5481 5
      // if invalid characters are found in $haystack before $needle
5482 1
      $needle = self::clean($needle);
5483 1
      $haystack = self::clean($haystack);
5484
    }
5485 1
5486 1
    // fallback to "mb_"-function via polyfill
5487 1
    return \mb_strrchr($haystack, $needle, $before_needle, $encoding);
5488
  }
5489 1
5490
  /**
5491 5
   * Reverses characters order in the string.
5492 5
   *
5493 5
   * @param string $str The input string
5494 5
   *
5495 1
   * @return string The string with characters in the reverse sequence
5496
   */
5497 5
  public static function strrev($str)
5498
  {
5499 5
    $str = (string)$str;
5500
5501
    if (!isset($str[0])) {
5502
      return '';
5503
    }
5504
5505
    return implode('', array_reverse(self::split($str)));
5506
  }
5507
5508
  /**
5509 2
   * Finds the last occurrence of a character in a string within another, case insensitive.
5510
   *
5511 2
   * @link http://php.net/manual/en/function.mb-strrichr.php
5512
   *
5513 1
   * @param string  $haystack      <p>The string from which to get the last occurrence of needle.</p>
5514
   * @param string  $needle        <p>The string to find in haystack.</p>
5515
   * @param bool    $before_needle [optional] <p>
5516 1
   *                               Determines which portion of haystack
5517 1
   *                               this function returns.
5518
   *                               If set to true, it returns all of haystack
5519 1
   *                               from the beginning to the last occurrence of needle.
5520
   *                               If set to false, it returns all of haystack
5521
   *                               from the last occurrence of needle to the end,
5522 2
   *                               </p>
5523
   * @param string  $encoding      [optional] <p>
5524 2
   *                               Character encoding name to use.
5525 1
   *                               If it is omitted, internal character encoding is used.
5526
   *                               </p>
5527
   * @param boolean $cleanUtf8     [optional] <p>Clean non UTF-8 chars from the string.</p>
5528 2
   *
5529
   * @return string|false <p>The portion of haystack or<br />false if needle is not found.</p>
5530
   */
5531 View Code Duplication
  public static function strrichr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5532
  {
5533
    if ($encoding !== 'UTF-8') {
5534
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5535
    }
5536
5537
    if ($cleanUtf8 === true) {
5538
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5539
      // if invalid characters are found in $haystack before $needle
5540 1
      $needle = self::clean($needle);
5541
      $haystack = self::clean($haystack);
5542 1
    }
5543
5544
    return \mb_strrichr($haystack, $needle, $before_needle, $encoding);
5545
  }
5546
5547
  /**
5548
   * Find position of last occurrence of a case-insensitive string.
5549
   *
5550
   * @param string  $haystack  <p>The string to look in.</p>
5551
   * @param string  $needle    <p>The string to look for.</p>
5552
   * @param int     $offset    [optional] <p>Number of characters to ignore in the beginning or end.</p>
5553
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5554
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5555
   *
5556
   * @return int|false <p>
5557
   *                   The numeric position of the last occurrence of needle in the haystack string.<br />If needle is
5558
   *                   not found, it returns false.
5559
   *                   </p>
5560
   */
5561
  public static function strripos($haystack, $needle, $offset = 0, $encoding = 'UTF-8', $cleanUtf8 = false)
5562
  {
5563
    if ((int)$needle === $needle && $needle >= 0) {
0 ignored issues
show
Unused Code Bug introduced by
The strict comparison === seems to always evaluate to false as the types of (int) $needle (integer) and $needle (string) can never be identical. Maybe you want to use a loose comparison == instead?
Loading history...
5564
      $needle = (string)self::chr($needle);
5565
    }
5566
5567
    // init
5568 20
    $haystack = (string)$haystack;
5569
    $needle = (string)$needle;
5570 20
    $offset = (int)$offset;
5571 2
5572
    if (!isset($haystack[0], $needle[0])) {
5573
      return false;
5574 2
    }
5575 2
5576 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5577 2
        $cleanUtf8 === true
5578
        ||
5579
        $encoding === true // INFO: the "bool"-check is only a fallback for old versions
5580 20
    ) {
5581
      // \mb_strripos && iconv_strripos is not tolerant to invalid characters
5582 20
5583 4
      $needle = self::clean($needle);
5584
      $haystack = self::clean($haystack);
5585
    }
5586 19
5587 19 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5588
        $encoding === 'UTF-8'
5589
        ||
5590 19
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5591 19
    ) {
5592
      $encoding = 'UTF-8';
5593 19
    } else {
5594 19
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5595 19
    }
5596 19
5597
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5598 19
      self::checkForSupport();
5599
    }
5600 16
5601 16
    if (
5602 16
        $encoding !== 'UTF-8'
5603 16
        &&
5604 5
        self::$SUPPORT['mbstring'] === false
5605 5
    ) {
5606 5
      trigger_error('UTF8::strripos() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5607
    }
5608
5609 19
    if (self::$SUPPORT['mbstring'] === true) {
5610
      return \mb_strripos($haystack, $needle, $offset, $encoding);
5611 17
    }
5612 13
5613 13
    if (
5614 13
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5615 8
        &&
5616 8
        self::$SUPPORT['intl'] === true
5617 8
        &&
5618
        Bootup::is_php('5.4') === true
5619
    ) {
5620 19
      return \grapheme_strripos($haystack, $needle, $offset);
5621
    }
5622 9
5623 4
    // fallback via vanilla php
5624 4
5625 4
    return self::strrpos(self::strtoupper($haystack), self::strtoupper($needle), $offset, $encoding, $cleanUtf8);
0 ignored issues
show
Security Bug introduced by
It seems like $haystack defined by self::clean($haystack) on line 5584 can also be of type false; however, voku\helper\UTF8::strtoupper() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
Security Bug introduced by
It seems like $needle defined by self::clean($needle) on line 5583 can also be of type false; however, voku\helper\UTF8::strtoupper() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
5626 6
  }
5627 6
5628 6
  /**
5629
   * Find position of last occurrence of a string in a string.
5630
   *
5631 9
   * @link http://php.net/manual/en/function.mb-strrpos.php
5632 6
   *
5633 6
   * @param string     $haystack  <p>The string being checked, for the last occurrence of needle</p>
5634 6
   * @param string|int $needle    <p>The string to find in haystack.<br />Or a code point as int.</p>
5635
   * @param int        $offset    [optional] <p>May be specified to begin searching an arbitrary number of characters
5636
   *                              into the string. Negative values will stop searching at an arbitrary point prior to
5637 19
   *                              the end of the string.
5638
   *                              </p>
5639 4
   * @param string     $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5640 4
   * @param boolean    $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5641 2
   *
5642 2
   * @return int|false <p>The numeric position of the last occurrence of needle in the haystack string.<br />If needle
5643 3
   *                   is not found, it returns false.</p>
5644 3
   */
5645 3
  public static function strrpos($haystack, $needle, $offset = null, $encoding = 'UTF-8', $cleanUtf8 = false)
5646
  {
5647
    if ((int)$needle === $needle && $needle >= 0) {
5648 4
      $needle = (string)self::chr($needle);
5649 16
    }
5650
5651 19
    // init
5652
    $haystack = (string)$haystack;
5653
    $needle = (string)$needle;
5654 19
    $offset = (int)$offset;
5655 19
5656
    if (!isset($haystack[0], $needle[0])) {
5657 3
      return false;
5658 19
    }
5659
5660 19 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5661
        $cleanUtf8 === true
5662
        ||
5663 19
        $encoding === true // INFO: the "bool"-check is only a fallback for old versions
5664 19
    ) {
5665 19
      // \mb_strrpos && iconv_strrpos is not tolerant to invalid characters
5666 2
      $needle = self::clean($needle);
5667 19
      $haystack = self::clean($haystack);
5668
    }
5669 19
5670 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5671 19
        $encoding === 'UTF-8'
5672
        ||
5673
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5674
    ) {
5675
      $encoding = 'UTF-8';
5676
    } else {
5677
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5678
    }
5679
5680
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5681
      self::checkForSupport();
5682
    }
5683
5684
    if (
5685
        $encoding !== 'UTF-8'
5686
        &&
5687 26
        self::$SUPPORT['mbstring'] === false
5688
    ) {
5689 26
      trigger_error('UTF8::strrpos() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5690
    }
5691 26
5692 5
    if (self::$SUPPORT['mbstring'] === true) {
5693
      return \mb_strrpos($haystack, $needle, $offset, $encoding);
5694
    }
5695
5696 22
    if (
5697 6
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5698
        &&
5699
        self::$SUPPORT['intl'] === true
5700 16
        &&
5701
        Bootup::is_php('5.4') === true
5702
    ) {
5703
      return \grapheme_strrpos($haystack, $needle, $offset);
5704
    }
5705
5706
    // fallback via vanilla php
5707
5708
    if ($offset > 0) {
5709
      $haystack = self::substr($haystack, $offset);
0 ignored issues
show
Security Bug introduced by
It seems like $haystack defined by self::substr($haystack, $offset) on line 5709 can also be of type false; however, voku\helper\UTF8::substr() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
5710
    } elseif ($offset < 0) {
5711
      $haystack = self::substr($haystack, 0, $offset);
0 ignored issues
show
Security Bug introduced by
It seems like $haystack defined by self::substr($haystack, 0, $offset) on line 5711 can also be of type false; however, voku\helper\UTF8::substr() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
5712 14
      $offset = 0;
5713
    }
5714 14
5715
    $pos = strrpos($haystack, $needle);
5716
    if ($pos === false) {
5717
      return false;
5718
    }
5719
5720
    return $offset + self::strlen(substr($haystack, 0, $pos));
5721
  }
5722
5723
  /**
5724
   * Finds the length of the initial segment of a string consisting entirely of characters contained within a given
5725
   * mask.
5726
   *
5727
   * @param string $str    <p>The input string.</p>
5728 1
   * @param string $mask   <p>The mask of chars</p>
5729
   * @param int    $offset [optional]
5730 1
   * @param int    $length [optional]
5731
   *
5732
   * @return int
5733
   */
5734
  public static function strspn($str, $mask, $offset = 0, $length = 2147483647)
5735
  {
5736
    // init
5737
    $length = (int)$length;
5738
    $offset = (int)$offset;
5739
5740
    if ($offset || 2147483647 !== $length) {
5741
      $str = self::substr($str, $offset, $length);
5742
    }
5743
5744 8
    $str = (string)$str;
5745
    if (!isset($str[0], $mask[0])) {
5746 8
      return 0;
5747 2
    }
5748
5749
    return preg_match('/^' . self::rxClass($mask) . '+/u', $str, $str) ? self::strlen($str[0]) : 0;
5750 7
  }
5751 7
5752 7
  /**
5753
   * Returns part of haystack string from the first occurrence of needle to the end of haystack.
5754 7
   *
5755 1
   * @param string  $haystack      <p>The input string. Must be valid UTF-8.</p>
5756 1
   * @param string  $needle        <p>The string to look for. Must be valid UTF-8.</p>
5757 7
   * @param bool    $before_needle [optional] <p>
5758
   *                               If <b>TRUE</b>, strstr() returns the part of the
5759
   *                               haystack before the first occurrence of the needle (excluding the needle).
5760 7
   *                               </p>
5761
   * @param string  $encoding      [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5762 7
   * @param boolean $cleanUtf8     [optional] <p>Clean non UTF-8 chars from the string.</p>
5763 7
   *
5764
   * @return string|false A sub-string,<br />or <strong>false</strong> if needle is not found.
5765
   */
5766 View Code Duplication
  public static function strstr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5767 7
  {
5768
    $haystack = (string)$haystack;
5769
    $needle = (string)$needle;
5770
5771 1
    if (!isset($haystack[0], $needle[0])) {
5772 1
      return false;
5773 1
    }
5774 7
5775 7
    if ($cleanUtf8 === true) {
5776 7
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5777
      // if invalid characters are found in $haystack before $needle
5778 7
      $needle = self::clean($needle);
5779 7
      $haystack = self::clean($haystack);
5780
    }
5781 7
5782
    if ($encoding !== 'UTF-8') {
5783
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5784
    }
5785
5786
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5787
      self::checkForSupport();
5788
    }
5789
5790
    if (
5791
        $encoding !== 'UTF-8'
5792
        &&
5793
        self::$SUPPORT['mbstring'] === false
5794
    ) {
5795
      trigger_error('UTF8::strstr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5796
    }
5797
5798
    if (self::$SUPPORT['mbstring'] === true) {
5799
      return \mb_strstr($haystack, $needle, $before_needle, $encoding);
5800
    }
5801 1
5802
    if (
5803 1
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5804
        &&
5805 1
        self::$SUPPORT['intl'] === true
5806 1
        &&
5807
        Bootup::is_php('5.4') === true
5808
    ) {
5809 1
      return \grapheme_strstr($haystack, $needle, $before_needle);
5810
    }
5811 1
5812
    preg_match('/^(.*?)' . preg_quote($needle, '/') . '/us', $haystack, $match);
5813 1
5814 1
    if (!isset($match[1])) {
5815 1
      return false;
5816 1
    }
5817
5818 1
    if ($before_needle) {
5819 1
      return $match[1];
5820 1
    }
5821
5822 1
    return self::substr($haystack, self::strlen($match[1]));
0 ignored issues
show
Security Bug introduced by
It seems like $haystack defined by self::clean($haystack) on line 5779 can also be of type false; however, voku\helper\UTF8::substr() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
5823
  }
5824
5825
  /**
5826
   * Unicode transformation for case-less matching.
5827
   *
5828
   * @link http://unicode.org/reports/tr21/tr21-5.html
5829
   *
5830 1
   * @param string  $str       <p>The input string.</p>
5831
   * @param bool    $full      [optional] <p>
5832
   *                           <b>true</b>, replace full case folding chars (default)<br />
5833
   *                           <b>false</b>, use only limited static array [UTF8::$commonCaseFold]
5834
   *                           </p>
5835
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5836
   *
5837
   * @return string
5838
   */
5839
  public static function strtocasefold($str, $full = true, $cleanUtf8 = false)
5840
  {
5841
    // init
5842
    $str = (string)$str;
5843
5844
    if (!isset($str[0])) {
5845
      return '';
5846
    }
5847
5848
    static $COMMON_CASE_FOLD_KEYS_CACHE = null;
5849
    static $COMMAN_CASE_FOLD_VALUES_CACHE = null;
5850
5851
    if ($COMMON_CASE_FOLD_KEYS_CACHE === null) {
5852
      $COMMON_CASE_FOLD_KEYS_CACHE = array_keys(self::$COMMON_CASE_FOLD);
5853
      $COMMAN_CASE_FOLD_VALUES_CACHE = array_values(self::$COMMON_CASE_FOLD);
5854
    }
5855
5856
    $str = str_replace($COMMON_CASE_FOLD_KEYS_CACHE, $COMMAN_CASE_FOLD_VALUES_CACHE, $str);
5857
5858
    if ($full) {
5859
5860
      static $FULL_CASE_FOLD = null;
5861
5862
      if ($FULL_CASE_FOLD === null) {
5863
        $FULL_CASE_FOLD = self::getData('caseFolding_full');
5864
      }
5865
5866
      /** @noinspection OffsetOperationsInspection */
5867
      $str = str_replace($FULL_CASE_FOLD[0], $FULL_CASE_FOLD[1], $str);
5868
    }
5869
5870
    if ($cleanUtf8 === true) {
5871
      $str = self::clean($str);
5872
    }
5873
5874
    return self::strtolower($str);
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::clean($str) on line 5871 can also be of type false; however, voku\helper\UTF8::strtolower() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
5875
  }
5876
5877
  /**
5878
   * Make a string lowercase.
5879
   *
5880
   * @link http://php.net/manual/en/function.mb-strtolower.php
5881
   *
5882
   * @param string  $str       <p>The string being lowercased.</p>
5883
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function</p>
5884
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5885
   *
5886
   * @return string str with all alphabetic characters converted to lowercase.
5887
   */
5888 View Code Duplication
  public static function strtolower($str, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5889
  {
5890
    // init
5891
    $str = (string)$str;
5892
5893
    if (!isset($str[0])) {
5894
      return '';
5895
    }
5896
5897
    if ($cleanUtf8 === true) {
5898
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5899
      // if invalid characters are found in $haystack before $needle
5900
      $str = self::clean($str);
5901
    }
5902
5903
    if ($encoding !== 'UTF-8') {
5904
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5905
    }
5906
5907
    return \mb_strtolower($str, $encoding);
5908
  }
5909
5910
  /**
5911
   * Generic case sensitive transformation for collation matching.
5912
   *
5913
   * @param string $str <p>The input string</p>
5914
   *
5915
   * @return string
5916
   */
5917
  private static function strtonatfold($str)
5918
  {
5919
    /** @noinspection PhpUndefinedClassInspection */
5920
    return preg_replace('/\p{Mn}+/u', '', \Normalizer::normalize($str, \Normalizer::NFD));
5921
  }
5922
5923
  /**
5924
   * Make a string uppercase.
5925
   *
5926
   * @link http://php.net/manual/en/function.mb-strtoupper.php
5927
   *
5928
   * @param string  $str       <p>The string being uppercased.</p>
5929
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5930
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5931
   *
5932
   * @return string str with all alphabetic characters converted to uppercase.
5933
   */
5934 View Code Duplication
  public static function strtoupper($str, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5935
  {
5936
    $str = (string)$str;
5937
5938
    if (!isset($str[0])) {
5939
      return '';
5940
    }
5941
5942
    if ($cleanUtf8 === true) {
5943
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5944
      // if invalid characters are found in $haystack before $needle
5945
      $str = self::clean($str);
5946
    }
5947
5948
    if ($encoding !== 'UTF-8') {
5949
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5950
    }
5951
5952
    return \mb_strtoupper($str, $encoding);
5953
  }
5954
5955
  /**
5956
   * Translate characters or replace sub-strings.
5957
   *
5958
   * @link  http://php.net/manual/en/function.strtr.php
5959
   *
5960
   * @param string          $str  <p>The string being translated.</p>
5961
   * @param string|string[] $from <p>The string replacing from.</p>
5962
   * @param string|string[] $to   <p>The string being translated to to.</p>
5963
   *
5964
   * @return string <p>
5965
   *                This function returns a copy of str, translating all occurrences of each character in from to the
5966
   *                corresponding character in to.
5967
   *                </p>
5968
   */
5969
  public static function strtr($str, $from, $to = INF)
5970
  {
5971
    if (INF !== $to) {
5972
      $from = self::str_split($from);
0 ignored issues
show
Bug introduced by
It seems like $from defined by self::str_split($from) on line 5972 can also be of type array<integer,string>; however, voku\helper\UTF8::str_split() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
5973
      $to = self::str_split($to);
0 ignored issues
show
Bug introduced by
It seems like $to defined by self::str_split($to) on line 5973 can also be of type array<integer,string>; however, voku\helper\UTF8::str_split() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
5974
      $countFrom = count($from);
5975
      $countTo = count($to);
5976
5977
      if ($countFrom > $countTo) {
5978
        $from = array_slice($from, 0, $countTo);
5979
      } elseif ($countFrom < $countTo) {
5980
        $to = array_slice($to, 0, $countFrom);
5981
      }
5982
5983
      $from = array_combine($from, $to);
5984
    }
5985
5986
    return strtr($str, $from);
0 ignored issues
show
Bug introduced by
It seems like $from defined by parameter $from on line 5969 can also be of type string; however, strtr() does only seem to accept array, maybe add an additional type check?

This check looks at variables that have been passed in as parameters and are passed out again to other methods.

If the outgoing method call has stricter type requirements than the method itself, an issue is raised.

An additional type check may prevent trouble.

Loading history...
5987
  }
5988
5989
  /**
5990
   * Return the width of a string.
5991
   *
5992
   * @param string  $str       <p>The input string.</p>
5993
   * @param string  $encoding  [optional] <p>Default is UTF-8</p>
5994
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5995
   *
5996
   * @return int
5997
   */
5998
  public static function strwidth($str, $encoding = 'UTF-8', $cleanUtf8 = false)
5999
  {
6000
    if ($encoding !== 'UTF-8') {
6001
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6002
    }
6003
6004
    if ($cleanUtf8 === true) {
6005
      // iconv and mbstring are not tolerant to invalid encoding
6006
      // further, their behaviour is inconsistent with that of PHP's substr
6007
      $str = self::clean($str);
6008
    }
6009
6010
    // fallback to "mb_"-function via polyfill
6011
    return \mb_strwidth($str, $encoding);
6012
  }
6013
6014
  /**
6015
   * Get part of a string.
6016
   *
6017
   * @link http://php.net/manual/en/function.mb-substr.php
6018
   *
6019
   * @param string  $str       <p>The string being checked.</p>
6020
   * @param int     $start     <p>The first position used in str.</p>
6021
   * @param int     $length    [optional] <p>The maximum length of the returned string.</p>
6022
   * @param string  $encoding  [optional] <p>Default is UTF-8</p>
6023
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
6024
   *
6025
   * @return string <p>Returns a sub-string specified by the start and length parameters.</p>
6026
   */
6027
  public static function substr($str, $start = 0, $length = null, $encoding = 'UTF-8', $cleanUtf8 = false)
6028
  {
6029
    // init
6030
    $str = (string)$str;
6031
6032
    if (!isset($str[0])) {
6033
      return '';
6034
    }
6035
6036
    if ($cleanUtf8 === true) {
6037
      // iconv and mbstring are not tolerant to invalid encoding
6038
      // further, their behaviour is inconsistent with that of PHP's substr
6039
      $str = self::clean($str);
6040
    }
6041
6042
    $str_length = 0;
6043
    if ($start || $length === null) {
6044
      $str_length = (int)self::strlen($str, $encoding);
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::clean($str) on line 6039 can also be of type false; however, voku\helper\UTF8::strlen() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
6045
    }
6046
6047
    if ($start && $start > $str_length) {
6048
      return false;
6049
    }
6050
6051
    if ($length === null) {
6052
      $length = $str_length;
6053
    } else {
6054
      $length = (int)$length;
6055
    }
6056
6057 1 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6058
        $encoding === 'UTF-8'
6059 1
        ||
6060
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
6061
    ) {
6062
      $encoding = 'UTF-8';
6063
    } else {
6064
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6065
    }
6066
6067
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6068
      self::checkForSupport();
6069 6
    }
6070
6071 6
    if (
6072 6
        $encoding === 'CP850'
6073
        &&
6074 6
        self::$SUPPORT['mbstring_func_overload'] === false
6075
    ) {
6076 6
      return substr($str, $start, $length === null ? $str_length : $length);
6077 3
    }
6078
6079
    if (
6080
        $encoding !== 'UTF-8'
6081 6
        &&
6082
        self::$SUPPORT['mbstring'] === false
6083 6
    ) {
6084 1
      trigger_error('UTF8::substr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
6085 1
    }
6086 1
6087
    if (self::$SUPPORT['mbstring'] === true) {
6088 6
      return \mb_substr($str, $start, $length, $encoding);
6089
    }
6090
6091
    if (
6092
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
6093
        &&
6094
        self::$SUPPORT['intl'] === true
6095
        &&
6096
        Bootup::is_php('5.4') === true
6097
    ) {
6098 6
      return \grapheme_substr($str, $start, $length);
6099
    }
6100 6
6101
    if (
6102 6
        $length >= 0 // "iconv_substr()" can't handle negative length
6103 6
        &&
6104
        self::$SUPPORT['iconv'] === true
6105
    ) {
6106 5
      return \iconv_substr($str, $start, $length);
6107 5
    }
6108
6109 5
    // fallback via vanilla php
6110 1
6111 1
    // split to array, and remove invalid characters
6112 1
    $array = self::split($str);
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::clean($str) on line 6039 can also be of type false; however, voku\helper\UTF8::split() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
6113
6114 5
    // extract relevant part, and join to make sting again
6115
    return implode('', array_slice($array, $start, $length));
6116
  }
6117
6118
  /**
6119
   * Binary safe comparison of two strings from an offset, up to length characters.
6120
   *
6121
   * @param string  $main_str           <p>The main string being compared.</p>
6122
   * @param string  $str                <p>The secondary string being compared.</p>
6123
   * @param int     $offset             <p>The start position for the comparison. If negative, it starts counting from
6124
   *                                    the end of the string.</p>
6125
   * @param int     $length             [optional] <p>The length of the comparison. The default value is the largest of
6126
   *                                    the length of the str compared to the length of main_str less the offset.</p>
6127
   * @param boolean $case_insensitivity [optional] <p>If case_insensitivity is TRUE, comparison is case
6128
   *                                    insensitive.</p>
6129
   *
6130
   * @return int
6131
   */
6132
  public static function substr_compare($main_str, $str, $offset, $length = 2147483647, $case_insensitivity = false)
6133
  {
6134
    $main_str = self::substr($main_str, $offset, $length);
6135
    $str = self::substr($str, 0, self::strlen($main_str));
0 ignored issues
show
Security Bug introduced by
It seems like $main_str defined by self::substr($main_str, $offset, $length) on line 6134 can also be of type false; however, voku\helper\UTF8::strlen() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
6136
6137
    return $case_insensitivity === true ? self::strcasecmp($main_str, $str) : self::strcmp($main_str, $str);
0 ignored issues
show
Security Bug introduced by
It seems like $main_str defined by self::substr($main_str, $offset, $length) on line 6134 can also be of type false; however, voku\helper\UTF8::strcasecmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
Security Bug introduced by
It seems like $str defined by self::substr($str, 0, self::strlen($main_str)) on line 6135 can also be of type false; however, voku\helper\UTF8::strcasecmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
Security Bug introduced by
It seems like $main_str defined by self::substr($main_str, $offset, $length) on line 6134 can also be of type false; however, voku\helper\UTF8::strcmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
Security Bug introduced by
It seems like $str defined by self::substr($str, 0, self::strlen($main_str)) on line 6135 can also be of type false; however, voku\helper\UTF8::strcmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
6138
  }
6139
6140
  /**
6141
   * Count the number of substring occurrences.
6142
   *
6143
   * @link  http://php.net/manual/en/function.substr-count.php
6144 1
   *
6145
   * @param string  $haystack  <p>The string to search in.</p>
6146 1
   * @param string  $needle    <p>The substring to search for.</p>
6147
   * @param int     $offset    [optional] <p>The offset where to start counting.</p>
6148
   * @param int     $length    [optional] <p>
6149
   *                           The maximum length after the specified offset to search for the
6150
   *                           substring. It outputs a warning if the offset plus the length is
6151
   *                           greater than the haystack length.
6152
   *                           </p>
6153
   * @param string  $encoding  <p>Set the charset for e.g. "\mb_" function.</p>
6154
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
6155
   *
6156
   * @return int|false <p>This functions returns an integer or false if there isn't a string.</p>
6157
   */
6158 1
  public static function substr_count($haystack, $needle, $offset = 0, $length = null, $encoding = 'UTF-8', $cleanUtf8 = false)
6159
  {
6160 1
    // init
6161
    $haystack = (string)$haystack;
6162 1
    $needle = (string)$needle;
6163 1
6164
    if (!isset($haystack[0], $needle[0])) {
6165
      return false;
6166 1
    }
6167
6168 1
    if ($offset || $length) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $length of type integer|null is loosely compared to true; this is ambiguous if the integer can be zero. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
6169 1
      $offset = (int)$offset;
6170
      $length = (int)$length;
6171
6172 1
      if (
6173
          $length + $offset <= 0
6174
          &&
6175 1
          Bootup::is_php('7.1') === false
6176 1
      ) {
6177 1
        return false;
6178 1
      }
6179 1
6180
      $haystack = self::substr($haystack, $offset, $length, $encoding);
6181
    }
6182 1
6183
    if ($encoding !== 'UTF-8') {
6184
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6185
    }
6186
6187
    if ($cleanUtf8 === true) {
6188
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
6189
      // if invalid characters are found in $haystack before $needle
6190
      $needle = self::clean($needle);
6191
      $haystack = self::clean($haystack);
0 ignored issues
show
Security Bug introduced by
It seems like $haystack defined by self::clean($haystack) on line 6191 can also be of type false; however, voku\helper\UTF8::clean() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
6192
    }
6193
6194
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6195
      self::checkForSupport();
6196
    }
6197
6198
    if (
6199
        $encoding !== 'UTF-8'
6200
        &&
6201 10
        self::$SUPPORT['mbstring'] === false
6202
    ) {
6203 10
      trigger_error('UTF8::substr_count() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
6204 10
    }
6205
6206 10
    if (self::$SUPPORT['mbstring'] === true) {
6207 3
      return \mb_substr_count($haystack, $needle, $encoding);
6208
    }
6209
6210 8
    preg_match_all('/' . preg_quote($needle, '/') . '/us', $haystack, $matches, PREG_SET_ORDER);
6211 8
6212 8
    return count($matches);
6213
  }
6214 8
6215
  /**
6216 8
   * Removes an prefix ($needle) from start of the string ($haystack), case insensitive.
6217
   *
6218 8
   * @param string $haystack <p>The string to search in.</p>
6219 1
   * @param string $needle   <p>The substring to search for.</p>
6220 1
   *
6221 1
   * @return string <p>Return the sub-string.</p>
6222
   */
6223 8 View Code Duplication
  public static function substr_ileft($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6224 8
  {
6225
    // init
6226 8
    $haystack = (string)$haystack;
6227 8
    $needle = (string)$needle;
6228 8
6229 8
    if (!isset($haystack[0])) {
6230 8
      return '';
6231
    }
6232 8
6233 8
    if (!isset($needle[0])) {
6234 8
      return $haystack;
6235 8
    }
6236
6237 8
    if (self::str_istarts_with($haystack, $needle) === true) {
6238 6
      $haystack = self::substr($haystack, self::strlen($needle));
6239 6
    }
6240 6
6241 6
    return $haystack;
6242
  }
6243 6
6244 3
  /**
6245 3
   * Removes an suffix ($needle) from end of the string ($haystack), case insensitive.
6246
   *
6247 6
   * @param string $haystack <p>The string to search in.</p>
6248 6
   * @param string $needle   <p>The substring to search for.</p>
6249
   *
6250 8
   * @return string <p>Return the sub-string.</p>
6251
   */
6252 View Code Duplication
  public static function substr_iright($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6253
  {
6254
    // init
6255
    $haystack = (string)$haystack;
6256
    $needle = (string)$needle;
6257
6258 1
    if (!isset($haystack[0])) {
6259
      return '';
6260 1
    }
6261
6262
    if (!isset($needle[0])) {
6263
      return $haystack;
6264
    }
6265
6266
    if (self::str_iends_with($haystack, $needle) === true) {
6267
      $haystack = self::substr($haystack, 0, self::strlen($haystack) - self::strlen($needle));
6268
    }
6269
6270
    return $haystack;
6271
  }
6272
6273
  /**
6274
   * Removes an prefix ($needle) from start of the string ($haystack).
6275
   *
6276
   * @param string $haystack <p>The string to search in.</p>
6277
   * @param string $needle   <p>The substring to search for.</p>
6278
   *
6279
   * @return string <p>Return the sub-string.</p>
6280
   */
6281 View Code Duplication
  public static function substr_left($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6282
  {
6283
    // init
6284
    $haystack = (string)$haystack;
6285
    $needle = (string)$needle;
6286
6287
    if (!isset($haystack[0])) {
6288
      return '';
6289
    }
6290
6291
    if (!isset($needle[0])) {
6292
      return $haystack;
6293
    }
6294
6295
    if (self::str_starts_with($haystack, $needle) === true) {
6296
      $haystack = self::substr($haystack, self::strlen($needle));
6297
    }
6298
6299
    return $haystack;
6300
  }
6301
6302
  /**
6303
   * Replace text within a portion of a string.
6304
   *
6305
   * source: https://gist.github.com/stemar/8287074
6306
   *
6307
   * @param string|string[] $str              <p>The input string or an array of stings.</p>
6308
   * @param string|string[] $replacement      <p>The replacement string or an array of stings.</p>
6309
   * @param int|int[]       $start            <p>
6310
   *                                          If start is positive, the replacing will begin at the start'th offset
6311
   *                                          into string.
6312
   *                                          <br /><br />
6313
   *                                          If start is negative, the replacing will begin at the start'th character
6314
   *                                          from the end of string.
6315
   *                                          </p>
6316
   * @param int|int[]|void  $length           [optional] <p>If given and is positive, it represents the length of the
6317
   *                                          portion of string which is to be replaced. If it is negative, it
6318
   *                                          represents the number of characters from the end of string at which to
6319
   *                                          stop replacing. If it is not given, then it will default to strlen(
6320
   *                                          string ); i.e. end the replacing at the end of string. Of course, if
6321
   *                                          length is zero then this function will have the effect of inserting
6322
   *                                          replacement into string at the given start offset.</p>
6323
   *
6324
   * @return string|string[] <p>The result string is returned. If string is an array then array is returned.</p>
6325
   */
6326
  public static function substr_replace($str, $replacement, $start, $length = null)
6327
  {
6328
    if (is_array($str) === true) {
6329
      $num = count($str);
6330
6331
      // $replacement
6332
      if (is_array($replacement) === true) {
6333
        $replacement = array_slice($replacement, 0, $num);
6334
      } else {
6335
        $replacement = array_pad(array($replacement), $num, $replacement);
6336
      }
6337
6338
      // $start
6339 View Code Duplication
      if (is_array($start) === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6340
        $start = array_slice($start, 0, $num);
6341
        foreach ($start as &$valueTmp) {
6342
          $valueTmp = (int)$valueTmp === $valueTmp ? $valueTmp : 0;
6343
        }
6344
        unset($valueTmp);
6345
      } else {
6346
        $start = array_pad(array($start), $num, $start);
6347
      }
6348
6349
      // $length
6350
      if (!isset($length)) {
6351
        $length = array_fill(0, $num, 0);
6352 View Code Duplication
      } elseif (is_array($length) === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6353
        $length = array_slice($length, 0, $num);
6354
        foreach ($length as &$valueTmpV2) {
6355
          if (isset($valueTmpV2)) {
6356
            $valueTmpV2 = (int)$valueTmpV2 === $valueTmpV2 ? $valueTmpV2 : $num;
6357
          } else {
6358
            $valueTmpV2 = 0;
6359
          }
6360
        }
6361
        unset($valueTmpV2);
6362
      } else {
6363
        $length = array_pad(array($length), $num, $length);
6364
      }
6365
6366
      // Recursive call
6367
      return array_map(array('\\voku\\helper\\UTF8', 'substr_replace'), $str, $replacement, $start, $length);
6368
6369
    } else {
6370
6371
      if (is_array($replacement) === true) {
6372
        if (count($replacement) > 0) {
6373
          $replacement = $replacement[0];
6374
        } else {
6375
          $replacement = '';
6376
        }
6377
      }
6378
    }
6379
6380
    // init
6381
    $str = (string)$str;
6382
    $replacement = (string)$replacement;
6383
6384
    if (!isset($str[0])) {
6385
      return $replacement;
6386
    }
6387
6388
    preg_match_all('/./us', $str, $smatches);
6389
    preg_match_all('/./us', $replacement, $rmatches);
6390
6391
    if ($length === null) {
6392
      $length = (int)self::strlen($str);
6393
    }
6394
6395
    array_splice($smatches[0], $start, $length, $rmatches[0]);
6396
6397
    return implode('', $smatches[0]);
6398
  }
6399
6400
  /**
6401
   * Removes an suffix ($needle) from end of the string ($haystack).
6402
   *
6403
   * @param string $haystack <p>The string to search in.</p>
6404
   * @param string $needle   <p>The substring to search for.</p>
6405
   *
6406
   * @return string <p>Return the sub-string.</p>
6407
   */
6408 View Code Duplication
  public static function substr_right($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6409
  {
6410
    $haystack = (string)$haystack;
6411
    $needle = (string)$needle;
6412
6413
    if (!isset($haystack[0])) {
6414
      return '';
6415
    }
6416
6417
    if (!isset($needle[0])) {
6418
      return $haystack;
6419
    }
6420
6421
    if (self::str_ends_with($haystack, $needle) === true) {
6422
      $haystack = self::substr($haystack, 0, self::strlen($haystack) - self::strlen($needle));
6423
    }
6424
6425
    return $haystack;
6426
  }
6427
6428
  /**
6429
   * Returns a case swapped version of the string.
6430
   *
6431
   * @param string  $str       <p>The input string.</p>
6432
   * @param string  $encoding  [optional] <p>Default is UTF-8</p>
6433
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
6434
   *
6435
   * @return string <p>Each character's case swapped.</p>
6436
   */
6437
  public static function swapCase($str, $encoding = 'UTF-8', $cleanUtf8 = false)
6438
  {
6439
    $str = (string)$str;
6440
6441
    if (!isset($str[0])) {
6442
      return '';
6443
    }
6444
6445
    if ($encoding !== 'UTF-8') {
6446
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6447
    }
6448
6449
    if ($cleanUtf8 === true) {
6450
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
6451
      // if invalid characters are found in $haystack before $needle
6452
      $str = self::clean($str);
6453
    }
6454
6455
    $strSwappedCase = preg_replace_callback(
6456
        '/[\S]/u',
6457
        function ($match) use ($encoding) {
6458
          $marchToUpper = UTF8::strtoupper($match[0], $encoding);
6459
6460
          if ($match[0] === $marchToUpper) {
6461
            return UTF8::strtolower($match[0], $encoding);
6462
          } else {
6463
            return $marchToUpper;
6464
          }
6465
        },
6466
        $str
6467
    );
6468
6469
    return $strSwappedCase;
6470
  }
6471
6472
  /**
6473
   * alias for "UTF8::to_ascii()"
6474
   *
6475
   * @see UTF8::to_ascii()
6476
   *
6477
   * @param string $s
6478
   * @param string $subst_chr
6479
   * @param bool   $strict
6480
   *
6481
   * @return string
6482
   *
6483
   * @deprecated
6484
   */
6485
  public static function toAscii($s, $subst_chr = '?', $strict = false)
6486
  {
6487
    return self::to_ascii($s, $subst_chr, $strict);
6488
  }
6489
6490
  /**
6491
   * alias for "UTF8::to_iso8859()"
6492
   *
6493
   * @see UTF8::to_iso8859()
6494
   *
6495
   * @param string $str
6496
   *
6497
   * @return string|string[]
6498
   *
6499
   * @deprecated
6500
   */
6501
  public static function toIso8859($str)
6502
  {
6503
    return self::to_iso8859($str);
6504
  }
6505
6506
  /**
6507
   * alias for "UTF8::to_latin1()"
6508
   *
6509
   * @see UTF8::to_latin1()
6510
   *
6511
   * @param $str
6512
   *
6513
   * @return string
6514
   *
6515
   * @deprecated
6516
   */
6517
  public static function toLatin1($str)
6518
  {
6519
    return self::to_latin1($str);
6520
  }
6521
6522
  /**
6523
   * alias for "UTF8::to_utf8()"
6524
   *
6525
   * @see UTF8::to_utf8()
6526
   *
6527
   * @param string $str
6528
   *
6529
   * @return string
6530
   *
6531
   * @deprecated
6532
   */
6533
  public static function toUTF8($str)
6534
  {
6535
    return self::to_utf8($str);
6536
  }
6537
6538
  /**
6539
   * Convert a string into ASCII.
6540
   *
6541
   * @param string $str     <p>The input string.</p>
6542
   * @param string $unknown [optional] <p>Character use if character unknown. (default is ?)</p>
6543
   * @param bool   $strict  [optional] <p>Use "transliterator_transliterate()" from PHP-Intl | WARNING: bad
6544
   *                        performance</p>
6545
   *
6546
   * @return string
6547
   */
6548
  public static function to_ascii($str, $unknown = '?', $strict = false)
6549
  {
6550
    static $UTF8_TO_ASCII;
6551
6552
    // init
6553
    $str = (string)$str;
6554
6555
    if (!isset($str[0])) {
6556
      return '';
6557
    }
6558
6559
    $str = self::clean($str, true, true, true);
0 ignored issues
show
Comprehensibility Best Practice introduced by
The expression self::clean($str, true, true, true); of type string|false adds false to the return on line 6563 which is incompatible with the return type documented by voku\helper\UTF8::to_ascii of type string. It seems like you forgot to handle an error condition.
Loading history...
6560
6561
    // check if we only have ASCII
6562
    if (self::is_ascii($str) === true) {
0 ignored issues
show
Security Bug introduced by
It seems like $str defined by self::clean($str, true, true, true) on line 6559 can also be of type false; however, voku\helper\UTF8::is_ascii() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
6563
      return $str;
6564
    }
6565
6566
    if ($strict === true) {
6567
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6568
        self::checkForSupport();
6569
      }
6570
6571
      if (
6572
          self::$SUPPORT['intl'] === true
6573
          &&
6574
          Bootup::is_php('5.4') === true
6575
      ) {
6576
6577
        // HACK for issue from "transliterator_transliterate()"
6578
        $str = str_replace(
6579
            'ℌ',
6580
            'H',
6581
            $str
6582
        );
6583
6584
        $str = transliterator_transliterate('NFD; [:Nonspacing Mark:] Remove; NFC; Any-Latin; Latin-ASCII;', $str);
6585
6586
        // check again, if we only have ASCII, now ...
6587
        if (self::is_ascii($str) === true) {
6588
          return $str;
6589
        }
6590
6591
      }
6592
    }
6593
6594
    preg_match_all('/.{1}|[^\x00]{1,1}$/us', $str, $ar);
6595
    $chars = $ar[0];
6596
    foreach ($chars as &$c) {
6597
6598
      $ordC0 = ord($c[0]);
6599
6600
      if ($ordC0 >= 0 && $ordC0 <= 127) {
6601
        continue;
6602
      }
6603
6604
      $ordC1 = ord($c[1]);
6605
6606
      // ASCII - next please
6607
      if ($ordC0 >= 192 && $ordC0 <= 223) {
6608
        $ord = ($ordC0 - 192) * 64 + ($ordC1 - 128);
6609
      }
6610
6611
      if ($ordC0 >= 224) {
6612
        $ordC2 = ord($c[2]);
6613
6614
        if ($ordC0 <= 239) {
6615
          $ord = ($ordC0 - 224) * 4096 + ($ordC1 - 128) * 64 + ($ordC2 - 128);
6616
        }
6617
6618
        if ($ordC0 >= 240) {
6619
          $ordC3 = ord($c[3]);
6620
6621
          if ($ordC0 <= 247) {
6622
            $ord = ($ordC0 - 240) * 262144 + ($ordC1 - 128) * 4096 + ($ordC2 - 128) * 64 + ($ordC3 - 128);
6623
          }
6624
6625
          if ($ordC0 >= 248) {
6626
            $ordC4 = ord($c[4]);
6627
6628 View Code Duplication
            if ($ordC0 <= 251) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6629
              $ord = ($ordC0 - 248) * 16777216 + ($ordC1 - 128) * 262144 + ($ordC2 - 128) * 4096 + ($ordC3 - 128) * 64 + ($ordC4 - 128);
6630
            }
6631
6632
            if ($ordC0 >= 252) {
6633
              $ordC5 = ord($c[5]);
6634
6635 View Code Duplication
              if ($ordC0 <= 253) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6636
                $ord = ($ordC0 - 252) * 1073741824 + ($ordC1 - 128) * 16777216 + ($ordC2 - 128) * 262144 + ($ordC3 - 128) * 4096 + ($ordC4 - 128) * 64 + ($ordC5 - 128);
6637
              }
6638
            }
6639
          }
6640
        }
6641
      }
6642
6643
      if ($ordC0 == 254 || $ordC0 == 255) {
6644
        $c = $unknown;
6645
        continue;
6646
      }
6647
6648
      if (!isset($ord)) {
6649
        $c = $unknown;
6650
        continue;
6651
      }
6652
6653
      $bank = $ord >> 8;
6654
      if (!isset($UTF8_TO_ASCII[$bank])) {
6655
        $UTF8_TO_ASCII[$bank] = self::getData(sprintf('x%02x', $bank));
6656
        if ($UTF8_TO_ASCII[$bank] === false) {
6657
          $UTF8_TO_ASCII[$bank] = array();
6658
        }
6659
      }
6660
6661
      $newchar = $ord & 255;
6662
6663
      if (isset($UTF8_TO_ASCII[$bank], $UTF8_TO_ASCII[$bank][$newchar])) {
6664
6665
        // keep for debugging
6666
        /*
0 ignored issues
show
Unused Code Comprehensibility introduced by
45% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
6667
        echo "file: " . sprintf('x%02x', $bank) . "\n";
6668
        echo "char: " . $c . "\n";
6669
        echo "ord: " . $ord . "\n";
6670
        echo "newchar: " . $newchar . "\n";
6671
        echo "ascii: " . $UTF8_TO_ASCII[$bank][$newchar] . "\n";
6672
        echo "bank:" . $bank . "\n\n";
6673
        */
6674
6675
        $c = $UTF8_TO_ASCII[$bank][$newchar];
6676
      } else {
6677
6678
        // keep for debugging missing chars
6679
        /*
0 ignored issues
show
Unused Code Comprehensibility introduced by
41% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
6680
        echo "file: " . sprintf('x%02x', $bank) . "\n";
6681
        echo "char: " . $c . "\n";
6682
        echo "ord: " . $ord . "\n";
6683
        echo "newchar: " . $newchar . "\n";
6684
        echo "bank:" . $bank . "\n\n";
6685
        */
6686
6687
        $c = $unknown;
6688
      }
6689
    }
6690
6691
    return implode('', $chars);
6692
  }
6693
6694
  /**
6695
   * Convert a string into "ISO-8859"-encoding (Latin-1).
6696
   *
6697
   * @param string|string[] $str
6698
   *
6699
   * @return string|string[]
6700
   */
6701
  public static function to_iso8859($str)
6702
  {
6703
    if (is_array($str) === true) {
6704
6705
      /** @noinspection ForeachSourceInspection */
6706
      foreach ($str as $k => $v) {
6707
        /** @noinspection AlterInForeachInspection */
6708
        /** @noinspection OffsetOperationsInspection */
6709
        $str[$k] = self::to_iso8859($v);
6710
      }
6711
6712
      return $str;
6713
    }
6714
6715
    $str = (string)$str;
6716
6717
    if (!isset($str[0])) {
6718
      return '';
6719
    }
6720
6721
    return self::utf8_decode($str);
6722
  }
6723
6724
  /**
6725
   * alias for "UTF8::to_iso8859()"
6726
   *
6727
   * @see UTF8::to_iso8859()
6728
   *
6729
   * @param string|string[] $str
6730
   *
6731
   * @return string|string[]
6732
   */
6733
  public static function to_latin1($str)
6734
  {
6735
    return self::to_iso8859($str);
6736
  }
6737
6738
  /**
6739
   * This function leaves UTF-8 characters alone, while converting almost all non-UTF8 to UTF8.
6740
   *
6741
   * <ul>
6742
   * <li>It decode UTF-8 codepoints and unicode escape sequences.</li>
6743
   * <li>It assumes that the encoding of the original string is either WINDOWS-1252 or ISO-8859-1.</li>
6744
   * <li>WARNING: It does not remove invalid UTF-8 characters, so you maybe need to use "UTF8::clean()" for this
6745
   * case.</li>
6746
   * </ul>
6747
   *
6748
   * @param string|string[] $str                    <p>Any string or array.</p>
6749
   * @param bool            $decodeHtmlEntityToUtf8 <p>Set to true, if you need to decode html-entities.</p>
6750
   *
6751
   * @return string|string[] <p>The UTF-8 encoded string.</p>
6752
   */
6753
  public static function to_utf8($str, $decodeHtmlEntityToUtf8 = false)
6754
  {
6755
    if (is_array($str) === true) {
6756
      /** @noinspection ForeachSourceInspection */
6757
      foreach ($str as $k => $v) {
6758
        /** @noinspection AlterInForeachInspection */
6759
        /** @noinspection OffsetOperationsInspection */
6760
        $str[$k] = self::to_utf8($v, $decodeHtmlEntityToUtf8);
6761
      }
6762
6763
      return $str;
6764
    }
6765
6766
    $str = (string)$str;
6767
6768
    if (!isset($str[0])) {
6769
      return $str;
6770
    }
6771
6772
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6773
      self::checkForSupport();
6774
    }
6775
6776 View Code Duplication
    if (self::$SUPPORT['mbstring_func_overload'] === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6777
      $max = \mb_strlen($str, '8BIT');
6778
    } else {
6779
      $max = strlen($str);
6780
    }
6781
6782
    $buf = '';
6783
6784
    /** @noinspection ForeachInvariantsInspection */
6785
    for ($i = 0; $i < $max; $i++) {
6786
6787
      $c1 = $str[$i];
6788
6789
      if ($c1 >= "\xC0") { // should be converted to UTF8, if it's not UTF8 already
6790
6791
        if ($c1 <= "\xDF") { // looks like 2 bytes UTF8
6792
6793
          $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
6794
6795
          if ($c2 >= "\x80" && $c2 <= "\xBF") { // yeah, almost sure it's UTF8 already
6796
            $buf .= $c1 . $c2;
6797
            $i++;
6798 View Code Duplication
          } else { // not valid UTF8 - convert it
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6799
            $cc1tmp = ord($c1) / 64;
6800
            $cc1 = self::chr_and_parse_int($cc1tmp) | "\xC0";
6801
            $cc2 = ($c1 & "\x3F") | "\x80";
6802
            $buf .= $cc1 . $cc2;
6803
          }
6804
6805
        } elseif ($c1 >= "\xE0" && $c1 <= "\xEF") { // looks like 3 bytes UTF8
6806
6807
          $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
6808
          $c3 = $i + 2 >= $max ? "\x00" : $str[$i + 2];
6809
6810
          if ($c2 >= "\x80" && $c2 <= "\xBF" && $c3 >= "\x80" && $c3 <= "\xBF") { // yeah, almost sure it's UTF8 already
6811
            $buf .= $c1 . $c2 . $c3;
6812
            $i += 2;
6813 View Code Duplication
          } else { // not valid UTF8 - convert it
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6814
            $cc1tmp = ord($c1) / 64;
6815
            $cc1 = self::chr_and_parse_int($cc1tmp) | "\xC0";
6816
            $cc2 = ($c1 & "\x3F") | "\x80";
6817
            $buf .= $cc1 . $cc2;
6818
          }
6819
6820
        } elseif ($c1 >= "\xF0" && $c1 <= "\xF7") { // looks like 4 bytes UTF8
6821
6822
          $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
6823
          $c3 = $i + 2 >= $max ? "\x00" : $str[$i + 2];
6824
          $c4 = $i + 3 >= $max ? "\x00" : $str[$i + 3];
6825
6826
          if ($c2 >= "\x80" && $c2 <= "\xBF" && $c3 >= "\x80" && $c3 <= "\xBF" && $c4 >= "\x80" && $c4 <= "\xBF") { // yeah, almost sure it's UTF8 already
6827
            $buf .= $c1 . $c2 . $c3 . $c4;
6828
            $i += 3;
6829 View Code Duplication
          } else { // not valid UTF8 - convert it
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6830
            $cc1tmp = ord($c1) / 64;
6831
            $cc1 = self::chr_and_parse_int($cc1tmp) | "\xC0";
6832
            $cc2 = ($c1 & "\x3F") | "\x80";
6833
            $buf .= $cc1 . $cc2;
6834
          }
6835
6836 View Code Duplication
        } else { // doesn't look like UTF8, but should be converted
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6837
          $cc1tmp = ord($c1) / 64;
6838
          $cc1 = self::chr_and_parse_int($cc1tmp) | "\xC0";
6839
          $cc2 = ($c1 & "\x3F") | "\x80";
6840
          $buf .= $cc1 . $cc2;
6841
        }
6842
6843
      } elseif (($c1 & "\xC0") === "\x80") { // needs conversion
6844
6845
        $ordC1 = ord($c1);
6846
        if (isset(self::$WIN1252_TO_UTF8[$ordC1])) { // found in Windows-1252 special cases
6847
          $buf .= self::$WIN1252_TO_UTF8[$ordC1];
6848 View Code Duplication
        } else {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6849
          $cc1 = self::chr_and_parse_int($ordC1 / 64) | "\xC0";
6850
          $cc2 = ($c1 & "\x3F") | "\x80";
6851
          $buf .= $cc1 . $cc2;
6852
        }
6853
6854
      } else { // it doesn't need conversion
6855
        $buf .= $c1;
6856
      }
6857
    }
6858
6859
    // decode unicode escape sequences
6860
    $buf = preg_replace_callback(
6861
        '/\\\\u([0-9a-f]{4})/i',
6862
        function ($match) {
6863
          return \mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
6864
        },
6865
        $buf
6866
    );
6867
6868
    // decode UTF-8 codepoints
6869
    if ($decodeHtmlEntityToUtf8 === true) {
6870
      $buf = self::html_entity_decode($buf);
6871
    }
6872
6873
    return $buf;
6874
  }
6875
6876
  /**
6877
   * Strip whitespace or other characters from beginning or end of a UTF-8 string.
6878
   *
6879
   * INFO: This is slower then "trim()"
6880
   *
6881
   * We can only use the original-function, if we use <= 7-Bit in the string / chars
6882
   * but the check for ACSII (7-Bit) cost more time, then we can safe here.
6883
   *
6884
   * @param string $str   <p>The string to be trimmed</p>
6885
   * @param string $chars [optional] <p>Optional characters to be stripped</p>
6886
   *
6887
   * @return string <p>The trimmed string.</p>
6888
   */
6889
  public static function trim($str = '', $chars = INF)
6890
  {
6891
    $str = (string)$str;
6892
6893
    if (!isset($str[0])) {
6894
      return '';
6895
    }
6896
6897
    // Info: http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page#Unicodecharactercategories
6898
    if ($chars === INF || !$chars) {
6899
      return preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);
6900
    }
6901
6902
    return self::rtrim(self::ltrim($str, $chars), $chars);
6903
  }
6904
6905
  /**
6906
   * Makes string's first char uppercase.
6907
   *
6908
   * @param string  $str       <p>The input string.</p>
6909
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
6910
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
6911
   *
6912
   * @return string <p>The resulting string</p>
6913
   */
6914
  public static function ucfirst($str, $encoding = 'UTF-8', $cleanUtf8 = false)
6915
  {
6916
    return self::strtoupper(self::substr($str, 0, 1, $encoding, $cleanUtf8), $encoding, $cleanUtf8) . self::substr($str, 1, null, $encoding, $cleanUtf8);
0 ignored issues
show
Security Bug introduced by
It seems like self::substr($str, 0, 1, $encoding, $cleanUtf8) targeting voku\helper\UTF8::substr() can also be of type false; however, voku\helper\UTF8::strtoupper() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
6917
  }
6918
6919
  /**
6920
   * alias for "UTF8::ucfirst()"
6921
   *
6922
   * @see UTF8::ucfirst()
6923
   *
6924
   * @param string  $word
6925
   * @param string  $encoding
6926
   * @param boolean $cleanUtf8
6927
   *
6928
   * @return string
6929
   */
6930
  public static function ucword($word, $encoding = 'UTF-8', $cleanUtf8 = false)
6931
  {
6932
    return self::ucfirst($word, $encoding, $cleanUtf8);
6933
  }
6934
6935
  /**
6936
   * Uppercase for all words in the string.
6937
   *
6938
   * @param string   $str        <p>The input string.</p>
6939
   * @param string[] $exceptions [optional] <p>Exclusion for some words.</p>
6940
   * @param string   $charlist   [optional] <p>Additional chars that contains to words and do not start a new word.</p>
6941
   * @param string   $encoding   [optional] <p>Set the charset for e.g. "\mb_" function.</p>
6942
   * @param boolean  $cleanUtf8  [optional] <p>Clean non UTF-8 chars from the string.</p>
6943
   *
6944
   * @return string
6945
   */
6946
  public static function ucwords($str, $exceptions = array(), $charlist = '', $encoding = 'UTF-8', $cleanUtf8 = false)
6947
  {
6948
    if (!$str) {
6949
      return '';
6950
    }
6951
6952
    $words = self::str_to_words($str, $charlist);
6953
    $newWords = array();
6954
6955
    if (count($exceptions) > 0) {
6956
      $useExceptions = true;
6957
    } else {
6958
      $useExceptions = false;
6959
    }
6960
6961
    foreach ($words as $word) {
6962
6963
      if (!$word) {
6964
        continue;
6965
      }
6966
6967
      if (
6968
          ($useExceptions === false)
6969
          ||
6970
          (
6971
              $useExceptions === true
6972
              &&
6973
              !in_array($word, $exceptions, true)
6974
          )
6975
      ) {
6976
        $word = self::ucfirst($word, $encoding, $cleanUtf8);
6977
      }
6978
6979
      $newWords[] = $word;
6980
    }
6981
6982
    return implode('', $newWords);
6983
  }
6984
6985
  /**
6986
   * Multi decode html entity & fix urlencoded-win1252-chars.
6987
   *
6988
   * e.g:
6989
   * 'test+test'                     => 'test test'
6990
   * 'D&#252;sseldorf'               => 'Düsseldorf'
6991
   * 'D%FCsseldorf'                  => 'Düsseldorf'
6992
   * 'D&#xFC;sseldorf'               => 'Düsseldorf'
6993
   * 'D%26%23xFC%3Bsseldorf'         => 'Düsseldorf'
6994
   * 'Düsseldorf'                   => 'Düsseldorf'
6995
   * 'D%C3%BCsseldorf'               => 'Düsseldorf'
6996
   * 'D%C3%83%C2%BCsseldorf'         => 'Düsseldorf'
6997
   * 'D%25C3%2583%25C2%25BCsseldorf' => 'Düsseldorf'
6998
   *
6999
   * @param string $str          <p>The input string.</p>
7000
   * @param bool   $multi_decode <p>Decode as often as possible.</p>
7001
   *
7002
   * @return string
7003
   */
7004 View Code Duplication
  public static function urldecode($str, $multi_decode = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
7005
  {
7006
    $str = (string)$str;
7007
7008
    if (!isset($str[0])) {
7009
      return '';
7010
    }
7011
7012
    $pattern = '/%u([0-9a-f]{3,4})/i';
7013
    if (preg_match($pattern, $str)) {
7014
      $str = preg_replace($pattern, '&#x\\1;', urldecode($str));
7015
    }
7016
7017
    $flags = Bootup::is_php('5.4') === true ? ENT_QUOTES | ENT_HTML5 : ENT_QUOTES;
7018
7019
    do {
7020
      $str_compare = $str;
7021
7022
      $str = self::fix_simple_utf8(
7023
          urldecode(
7024
              self::html_entity_decode(
7025
                  self::to_utf8($str),
0 ignored issues
show
Bug introduced by
It seems like self::to_utf8($str) targeting voku\helper\UTF8::to_utf8() can also be of type array; however, voku\helper\UTF8::html_entity_decode() does only seem to accept string, maybe add an additional type check?

This check looks at variables that are passed out again to other methods.

If the outgoing method call has stricter type requirements than the method itself, an issue is raised.

An additional type check may prevent trouble.

Loading history...
7026
                  $flags
7027
              )
7028
          )
7029
      );
7030
7031
    } while ($multi_decode === true && $str_compare !== $str);
7032
7033
    return (string)$str;
7034
  }
7035
7036
  /**
7037
   * Return a array with "urlencoded"-win1252 -> UTF-8
7038
   *
7039
   * @deprecated use the "UTF8::urldecode()" function to decode a string
7040
   *
7041
   * @return array
7042
   */
7043
  public static function urldecode_fix_win1252_chars()
7044
  {
7045
    return array(
7046
        '%20' => ' ',
7047
        '%21' => '!',
7048
        '%22' => '"',
7049
        '%23' => '#',
7050
        '%24' => '$',
7051
        '%25' => '%',
7052
        '%26' => '&',
7053
        '%27' => "'",
7054
        '%28' => '(',
7055
        '%29' => ')',
7056
        '%2A' => '*',
7057
        '%2B' => '+',
7058
        '%2C' => ',',
7059
        '%2D' => '-',
7060
        '%2E' => '.',
7061
        '%2F' => '/',
7062
        '%30' => '0',
7063
        '%31' => '1',
7064
        '%32' => '2',
7065
        '%33' => '3',
7066
        '%34' => '4',
7067
        '%35' => '5',
7068
        '%36' => '6',
7069
        '%37' => '7',
7070
        '%38' => '8',
7071
        '%39' => '9',
7072
        '%3A' => ':',
7073
        '%3B' => ';',
7074
        '%3C' => '<',
7075
        '%3D' => '=',
7076
        '%3E' => '>',
7077
        '%3F' => '?',
7078
        '%40' => '@',
7079
        '%41' => 'A',
7080
        '%42' => 'B',
7081
        '%43' => 'C',
7082
        '%44' => 'D',
7083
        '%45' => 'E',
7084
        '%46' => 'F',
7085
        '%47' => 'G',
7086
        '%48' => 'H',
7087
        '%49' => 'I',
7088
        '%4A' => 'J',
7089
        '%4B' => 'K',
7090
        '%4C' => 'L',
7091
        '%4D' => 'M',
7092
        '%4E' => 'N',
7093
        '%4F' => 'O',
7094
        '%50' => 'P',
7095
        '%51' => 'Q',
7096
        '%52' => 'R',
7097
        '%53' => 'S',
7098
        '%54' => 'T',
7099
        '%55' => 'U',
7100
        '%56' => 'V',
7101
        '%57' => 'W',
7102
        '%58' => 'X',
7103
        '%59' => 'Y',
7104
        '%5A' => 'Z',
7105
        '%5B' => '[',
7106
        '%5C' => '\\',
7107
        '%5D' => ']',
7108
        '%5E' => '^',
7109
        '%5F' => '_',
7110
        '%60' => '`',
7111
        '%61' => 'a',
7112
        '%62' => 'b',
7113
        '%63' => 'c',
7114
        '%64' => 'd',
7115
        '%65' => 'e',
7116
        '%66' => 'f',
7117
        '%67' => 'g',
7118
        '%68' => 'h',
7119
        '%69' => 'i',
7120
        '%6A' => 'j',
7121
        '%6B' => 'k',
7122
        '%6C' => 'l',
7123
        '%6D' => 'm',
7124
        '%6E' => 'n',
7125
        '%6F' => 'o',
7126
        '%70' => 'p',
7127
        '%71' => 'q',
7128
        '%72' => 'r',
7129
        '%73' => 's',
7130
        '%74' => 't',
7131
        '%75' => 'u',
7132
        '%76' => 'v',
7133
        '%77' => 'w',
7134
        '%78' => 'x',
7135
        '%79' => 'y',
7136
        '%7A' => 'z',
7137
        '%7B' => '{',
7138
        '%7C' => '|',
7139
        '%7D' => '}',
7140
        '%7E' => '~',
7141
        '%7F' => '',
7142
        '%80' => '`',
7143
        '%81' => '',
7144
        '%82' => '‚',
7145
        '%83' => 'ƒ',
7146
        '%84' => '„',
7147
        '%85' => '…',
7148
        '%86' => '†',
7149
        '%87' => '‡',
7150
        '%88' => 'ˆ',
7151
        '%89' => '‰',
7152
        '%8A' => 'Š',
7153
        '%8B' => '‹',
7154
        '%8C' => 'Œ',
7155
        '%8D' => '',
7156
        '%8E' => 'Ž',
7157
        '%8F' => '',
7158
        '%90' => '',
7159
        '%91' => '‘',
7160
        '%92' => '’',
7161
        '%93' => '“',
7162
        '%94' => '”',
7163
        '%95' => '•',
7164
        '%96' => '–',
7165
        '%97' => '—',
7166
        '%98' => '˜',
7167
        '%99' => '™',
7168
        '%9A' => 'š',
7169
        '%9B' => '›',
7170
        '%9C' => 'œ',
7171
        '%9D' => '',
7172
        '%9E' => 'ž',
7173
        '%9F' => 'Ÿ',
7174
        '%A0' => '',
7175
        '%A1' => '¡',
7176
        '%A2' => '¢',
7177
        '%A3' => '£',
7178
        '%A4' => '¤',
7179
        '%A5' => '¥',
7180
        '%A6' => '¦',
7181
        '%A7' => '§',
7182
        '%A8' => '¨',
7183
        '%A9' => '©',
7184
        '%AA' => 'ª',
7185
        '%AB' => '«',
7186
        '%AC' => '¬',
7187
        '%AD' => '',
7188
        '%AE' => '®',
7189
        '%AF' => '¯',
7190
        '%B0' => '°',
7191
        '%B1' => '±',
7192
        '%B2' => '²',
7193
        '%B3' => '³',
7194
        '%B4' => '´',
7195
        '%B5' => 'µ',
7196
        '%B6' => '¶',
7197
        '%B7' => '·',
7198
        '%B8' => '¸',
7199
        '%B9' => '¹',
7200
        '%BA' => 'º',
7201
        '%BB' => '»',
7202
        '%BC' => '¼',
7203
        '%BD' => '½',
7204
        '%BE' => '¾',
7205
        '%BF' => '¿',
7206
        '%C0' => 'À',
7207
        '%C1' => 'Á',
7208
        '%C2' => 'Â',
7209
        '%C3' => 'Ã',
7210
        '%C4' => 'Ä',
7211
        '%C5' => 'Å',
7212
        '%C6' => 'Æ',
7213
        '%C7' => 'Ç',
7214
        '%C8' => 'È',
7215
        '%C9' => 'É',
7216
        '%CA' => 'Ê',
7217
        '%CB' => 'Ë',
7218
        '%CC' => 'Ì',
7219
        '%CD' => 'Í',
7220
        '%CE' => 'Î',
7221
        '%CF' => 'Ï',
7222
        '%D0' => 'Ð',
7223
        '%D1' => 'Ñ',
7224
        '%D2' => 'Ò',
7225
        '%D3' => 'Ó',
7226
        '%D4' => 'Ô',
7227
        '%D5' => 'Õ',
7228
        '%D6' => 'Ö',
7229
        '%D7' => '×',
7230
        '%D8' => 'Ø',
7231
        '%D9' => 'Ù',
7232
        '%DA' => 'Ú',
7233
        '%DB' => 'Û',
7234
        '%DC' => 'Ü',
7235
        '%DD' => 'Ý',
7236
        '%DE' => 'Þ',
7237
        '%DF' => 'ß',
7238
        '%E0' => 'à',
7239
        '%E1' => 'á',
7240
        '%E2' => 'â',
7241
        '%E3' => 'ã',
7242
        '%E4' => 'ä',
7243
        '%E5' => 'å',
7244
        '%E6' => 'æ',
7245
        '%E7' => 'ç',
7246
        '%E8' => 'è',
7247
        '%E9' => 'é',
7248
        '%EA' => 'ê',
7249
        '%EB' => 'ë',
7250
        '%EC' => 'ì',
7251
        '%ED' => 'í',
7252
        '%EE' => 'î',
7253
        '%EF' => 'ï',
7254
        '%F0' => 'ð',
7255
        '%F1' => 'ñ',
7256
        '%F2' => 'ò',
7257
        '%F3' => 'ó',
7258
        '%F4' => 'ô',
7259
        '%F5' => 'õ',
7260
        '%F6' => 'ö',
7261
        '%F7' => '÷',
7262
        '%F8' => 'ø',
7263
        '%F9' => 'ù',
7264
        '%FA' => 'ú',
7265
        '%FB' => 'û',
7266
        '%FC' => 'ü',
7267
        '%FD' => 'ý',
7268
        '%FE' => 'þ',
7269
        '%FF' => 'ÿ',
7270
    );
7271
  }
7272
7273
  /**
7274
   * Decodes an UTF-8 string to ISO-8859-1.
7275
   *
7276
   * @param string $str <p>The input string.</p>
7277
   *
7278
   * @return string
7279
   */
7280
  public static function utf8_decode($str)
7281
  {
7282
    // init
7283
    $str = (string)$str;
7284
7285
    if (!isset($str[0])) {
7286
      return '';
7287
    }
7288
7289
    $str = (string)self::to_utf8($str);
7290
7291
    static $UTF8_TO_WIN1252_KEYS_CACHE = null;
7292
    static $UTF8_TO_WIN1252_VALUES_CACHE = null;
7293
7294
    if ($UTF8_TO_WIN1252_KEYS_CACHE === null) {
7295
      $UTF8_TO_WIN1252_KEYS_CACHE = array_keys(self::$UTF8_TO_WIN1252);
7296
      $UTF8_TO_WIN1252_VALUES_CACHE = array_values(self::$UTF8_TO_WIN1252);
7297
    }
7298
7299
    /** @noinspection PhpInternalEntityUsedInspection */
7300
    $str = str_replace($UTF8_TO_WIN1252_KEYS_CACHE, $UTF8_TO_WIN1252_VALUES_CACHE, $str);
7301
7302
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
7303
      self::checkForSupport();
7304
    }
7305
7306 View Code Duplication
    if (self::$SUPPORT['mbstring_func_overload'] === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
7307
      $len = \mb_strlen($str, '8BIT');
7308
    } else {
7309
      $len = strlen($str);
7310
    }
7311
7312
    /** @noinspection ForeachInvariantsInspection */
7313
    for ($i = 0, $j = 0; $i < $len; ++$i, ++$j) {
7314
      switch ($str[$i] & "\xF0") {
7315
        case "\xC0":
7316
        case "\xD0":
7317
          $c = (self::ord($str[$i] & "\x1F") << 6) | self::ord($str[++$i] & "\x3F");
7318
          $str[$j] = $c < 256 ? chr($c) : '?';
7319
          break;
7320
7321
        case "\xF0": ++$i;
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
Coding Style introduced by
There must be a comment when fall-through is intentional in a non-empty case body
Loading history...
7322
        case "\xE0":
7323
          $str[$j] = '?';
7324
          $i += 2;
7325
          break;
7326
7327
        default:
7328
          $str[$j] = $str[$i];
7329
      }
7330
    }
7331
7332
    return self::substr($str, 0, $j, '8BIT');
7333
  }
7334
7335
  /**
7336
   * Encodes an ISO-8859-1 string to UTF-8.
7337
   *
7338
   * @param string $str <p>The input string.</p>
7339
   *
7340
   * @return string
7341
   */
7342
  public static function utf8_encode($str)
7343
  {
7344
    // init
7345
    $str = (string)$str;
7346
7347
    if (!isset($str[0])) {
7348
      return '';
7349
    }
7350
7351
    $str = \utf8_encode($str);
7352
7353
    if (false === strpos($str, "\xC2")) {
7354
      return $str;
7355
    } else {
7356
7357
      static $CP1252_TO_UTF8_KEYS_CACHE = null;
7358
      static $CP1252_TO_UTF8_VALUES_CACHE = null;
7359
7360
      if ($CP1252_TO_UTF8_KEYS_CACHE === null) {
7361
        $CP1252_TO_UTF8_KEYS_CACHE = array_keys(self::$CP1252_TO_UTF8);
7362
        $CP1252_TO_UTF8_VALUES_CACHE = array_values(self::$CP1252_TO_UTF8);
7363
      }
7364
7365
      return str_replace($CP1252_TO_UTF8_KEYS_CACHE, $CP1252_TO_UTF8_VALUES_CACHE, $str);
7366
    }
7367
  }
7368
7369
  /**
7370
   * fix -> utf8-win1252 chars
7371
   *
7372
   * @param string $str <p>The input string.</p>
7373
   *
7374
   * @return string
7375
   *
7376
   * @deprecated use "UTF8::fix_simple_utf8()"
7377
   */
7378
  public static function utf8_fix_win1252_chars($str)
7379
  {
7380
    return self::fix_simple_utf8($str);
7381
  }
7382
7383
  /**
7384
   * Returns an array with all utf8 whitespace characters.
7385
   *
7386
   * @see   : http://www.bogofilter.org/pipermail/bogofilter/2003-March/001889.html
7387
   *
7388
   * @author: Derek E. [email protected]
7389
   *
7390
   * @return array <p>
7391
   *               An array with all known whitespace characters as values and the type of whitespace as keys
7392
   *               as defined in above URL.
7393
   *               </p>
7394
   */
7395
  public static function whitespace_table()
7396
  {
7397
    return self::$WHITESPACE_TABLE;
7398
  }
7399
7400
  /**
7401
   * Limit the number of words in a string.
7402
   *
7403
   * @param string $str      <p>The input string.</p>
7404
   * @param int    $words    <p>The limit of words as integer.</p>
7405
   * @param string $strAddOn <p>Replacement for the striped string.</p>
7406
   *
7407
   * @return string
7408
   */
7409
  public static function words_limit($str, $words = 100, $strAddOn = '...')
7410
  {
7411
    $str = (string)$str;
7412
7413
    if (!isset($str[0])) {
7414
      return '';
7415
    }
7416
7417
    $words = (int)$words;
7418
7419
    if ($words < 1) {
7420
      return '';
7421
    }
7422
7423
    preg_match('/^\s*+(?:\S++\s*+){1,' . $words . '}/u', $str, $matches);
7424
7425
    if (
7426
        !isset($matches[0])
7427
        ||
7428
        self::strlen($str) === self::strlen($matches[0])
7429
    ) {
7430
      return $str;
7431
    }
7432
7433
    return self::rtrim($matches[0]) . $strAddOn;
7434
  }
7435
7436
  /**
7437
   * Wraps a string to a given number of characters
7438
   *
7439
   * @link  http://php.net/manual/en/function.wordwrap.php
7440
   *
7441
   * @param string $str   <p>The input string.</p>
7442
   * @param int    $width [optional] <p>The column width.</p>
7443
   * @param string $break [optional] <p>The line is broken using the optional break parameter.</p>
7444
   * @param bool   $cut   [optional] <p>
7445
   *                      If the cut is set to true, the string is
7446
   *                      always wrapped at or before the specified width. So if you have
7447
   *                      a word that is larger than the given width, it is broken apart.
7448
   *                      </p>
7449
   *
7450
   * @return string <p>The given string wrapped at the specified column.</p>
7451
   */
7452
  public static function wordwrap($str, $width = 75, $break = "\n", $cut = false)
7453
  {
7454
    $str = (string)$str;
7455
    $break = (string)$break;
7456
7457
    if (!isset($str[0], $break[0])) {
7458
      return '';
7459
    }
7460
7461
    $w = '';
7462
    $strSplit = explode($break, $str);
7463
    $count = count($strSplit);
7464
7465
    $chars = array();
7466
    /** @noinspection ForeachInvariantsInspection */
7467
    for ($i = 0; $i < $count; ++$i) {
7468
7469
      if ($i) {
7470
        $chars[] = $break;
7471
        $w .= '#';
7472
      }
7473
7474
      $c = $strSplit[$i];
7475
      unset($strSplit[$i]);
7476
7477
      foreach (self::split($c) as $c) {
7478
        $chars[] = $c;
7479
        $w .= ' ' === $c ? ' ' : '?';
7480
      }
7481
    }
7482
7483
    $strReturn = '';
7484
    $j = 0;
7485
    $b = $i = -1;
7486
    $w = wordwrap($w, $width, '#', $cut);
7487
7488
    while (false !== $b = self::strpos($w, '#', $b + 1)) {
7489
      for (++$i; $i < $b; ++$i) {
7490
        $strReturn .= $chars[$j];
7491
        unset($chars[$j++]);
7492
      }
7493
7494
      if ($break === $chars[$j] || ' ' === $chars[$j]) {
7495
        unset($chars[$j++]);
7496
      }
7497
7498
      $strReturn .= $break;
7499
    }
7500
7501
    return $strReturn . implode('', $chars);
7502
  }
7503
7504
  /**
7505
   * Returns an array of Unicode White Space characters.
7506
   *
7507
   * @return array <p>An array with numeric code point as key and White Space Character as value.</p>
7508
   */
7509
  public static function ws()
7510
  {
7511
    return self::$WHITESPACE;
7512
  }
7513
7514
}
7515