Completed
Push — master ( 4be45f...851ea9 )
by Lars
02:55
created

UTF8   D

Complexity

Total Complexity 904

Size/Duplication

Total Lines 7362
Duplicated Lines 10.91 %

Coupling/Cohesion

Components 2
Dependencies 3

Test Coverage

Coverage 85.33%

Importance

Changes 0
Metric Value
wmc 904
lcom 2
cbo 3
dl 803
loc 7362
ccs 1570
cts 1840
cp 0.8533
rs 4.4102
c 0
b 0
f 0

171 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 4 1
A access() 0 15 3
A add_bom_to_string() 0 8 2
A binary_to_str() 0 8 2
A bom() 0 4 1
A callback() 0 4 1
A checkForSupport() 0 22 2
C chr() 0 50 10
A chr_and_parse_int() 0 4 1
A chr_map() 0 6 1
A chr_size_list() 0 10 2
B chr_to_decimal() 0 32 6
A chr_to_hex() 0 14 3
A chr_to_int() 0 4 1
A chunk_split() 0 4 1
B clean() 0 35 4
A cleanup() 0 20 2
B codepoints() 0 26 3
A count_chars() 0 4 1
A decimal_to_chr() 0 10 2
C encode() 9 77 20
C file_get_contents() 0 43 8
A file_has_bom() 0 4 1
C filter() 11 54 13
A filter_input() 10 10 2
A filter_input_array() 10 10 2
A filter_var() 10 10 2
A filter_var_array() 10 10 2
A fits_inside() 0 4 1
A fix_simple_utf8() 19 19 3
B fix_utf8() 0 24 4
D getCharDirection() 0 112 119
A getData() 0 10 2
A hasBom() 0 4 1
A hex_to_chr() 0 4 1
A hex_to_int() 0 14 3
A html_decode() 0 4 1
B html_encode() 0 38 5
C html_entity_decode() 0 65 12
B htmlentities() 0 28 6
A htmlspecialchars() 0 8 2
A iconv_loaded() 0 15 3
A int_to_chr() 0 4 1
A int_to_hex() 0 12 3
A intlChar_loaded() 0 8 2
A intl_loaded() 0 4 2
A isAscii() 0 4 1
A isBase64() 0 4 1
A isBinary() 0 4 1
A isBom() 0 4 1
A isHtml() 0 4 1
A isJson() 0 4 1
A isUtf16() 0 4 1
A isUtf32() 0 4 1
A isUtf8() 0 4 1
A is_ascii() 0 10 2
A is_base64() 0 15 4
B is_binary() 0 23 6
A is_binary_file() 0 12 2
A is_bom() 0 10 3
A is_html() 0 19 3
B is_json() 0 24 5
C is_utf16() 48 48 12
C is_utf32() 48 48 12
D is_utf8() 21 134 25
A json_decode() 12 12 2
A json_encode() 12 12 2
A lcfirst() 0 4 1
A ltrim() 15 15 4
A max() 8 8 2
A max_chr_width() 0 9 2
A mbstring_loaded() 0 10 3
A min() 8 8 2
A normalizeEncoding() 0 4 1
B normalize_encoding() 0 49 6
A normalize_msword() 19 19 3
B normalize_whitespace() 0 36 6
B number_format() 0 26 3
D ord() 0 49 14
A parse_str() 0 13 4
A pcre_utf8_support() 0 5 1
D range() 14 38 9
B rawurldecode() 31 31 6
A removeBOM() 0 4 1
A remove_bom() 0 16 4
A remove_duplicates() 0 15 4
A remove_invisible_characters() 0 20 3
B replace_diamond_question_mark() 0 41 6
A rtrim() 15 15 4
C rxClass() 0 40 8
A showSupport() 0 10 3
B single_chr_html_encode() 0 23 5
D split() 12 107 23
C str_detect_encoding() 0 82 11
A str_ends_with() 15 15 3
A str_iends_with() 15 15 3
A str_ireplace() 0 18 3
A str_istarts_with() 15 15 3
B str_limit_after_word() 0 31 5
C str_pad() 9 41 7
A str_repeat() 0 6 1
A str_replace() 0 4 1
A str_replace_first() 0 10 2
A str_shuffle() 0 8 1
A str_sort() 0 16 3
B str_split() 0 36 6
A str_starts_with() 15 15 3
A str_to_binary() 0 8 1
A str_to_words() 0 12 2
A str_transliterate() 0 4 1
B str_word_count() 0 30 5
A strcasecmp() 0 4 1
A strchr() 0 4 1
A strcmp() 0 8 2
B strcspn() 0 22 6
A strichr() 0 4 1
A string() 0 13 1
A string_has_bom() 0 10 3
A strip_tags() 0 14 3
D stripos() 9 44 10
C stristr() 7 53 11
F strlen() 18 84 22
A strnatcasecmp() 0 4 1
A strnatcmp() 0 4 2
A strncasecmp() 0 4 1
A strncmp() 0 7 1
A strpbrk() 0 15 3
F strpos() 24 105 25
A strrchr() 16 16 3
A strrev() 0 10 2
A strrichr() 15 15 3
C strripos() 32 63 15
F strrpos() 38 77 19
B strspn() 0 17 5
D strstr() 7 58 13
B strtocasefold() 0 37 6
A strtolower() 21 21 4
A strtonatfold() 0 5 1
A strtoupper() 20 20 4
A strtr() 0 19 4
A strwidth() 0 15 3
F substr() 16 76 18
A substr_compare() 0 7 2
C substr_count() 7 56 12
A substr_ileft() 19 20 4
A substr_iright() 19 20 4
A substr_left() 19 20 4
C substr_replace() 20 73 15
A substr_right() 19 19 4
B swapCase() 0 34 5
A toAscii() 0 4 1
A toIso8859() 0 4 1
A toLatin1() 0 4 1
A toUTF8() 0 4 1
F to_ascii() 6 145 27
B to_iso8859() 0 22 4
A to_latin1() 0 4 1
D to_utf8() 29 113 32
A trim() 0 15 4
A ucfirst() 0 4 1
A ucword() 0 4 1
C ucwords() 0 38 8
B urldecode() 31 31 6
B urldecode_fix_win1252_chars() 0 229 1
A utf8_decode() 0 22 3
B utf8_encode() 0 26 4
A utf8_fix_win1252_chars() 0 4 1
A whitespace_table() 0 4 1
B words_limit() 0 26 5
C wordwrap() 0 51 10
A ws() 0 4 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like UTF8 often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use UTF8, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
namespace voku\helper;
6
7
use Symfony\Polyfill\Intl\Grapheme\Grapheme;
8
use Symfony\Polyfill\Xml\Xml;
9
10
/**
11
 * UTF8-Helper-Class
12
 *
13
 * @package voku\helper
14
 */
15
final class UTF8
16
{
17
  /**
18
   * @var array
19
   */
20
  private static $WIN1252_TO_UTF8 = array(
21
      128 => "\xe2\x82\xac", // EURO SIGN
22
      130 => "\xe2\x80\x9a", // SINGLE LOW-9 QUOTATION MARK
23
      131 => "\xc6\x92", // LATIN SMALL LETTER F WITH HOOK
24
      132 => "\xe2\x80\x9e", // DOUBLE LOW-9 QUOTATION MARK
25
      133 => "\xe2\x80\xa6", // HORIZONTAL ELLIPSIS
26
      134 => "\xe2\x80\xa0", // DAGGER
27
      135 => "\xe2\x80\xa1", // DOUBLE DAGGER
28
      136 => "\xcb\x86", // MODIFIER LETTER CIRCUMFLEX ACCENT
29
      137 => "\xe2\x80\xb0", // PER MILLE SIGN
30
      138 => "\xc5\xa0", // LATIN CAPITAL LETTER S WITH CARON
31
      139 => "\xe2\x80\xb9", // SINGLE LEFT-POINTING ANGLE QUOTE
32
      140 => "\xc5\x92", // LATIN CAPITAL LIGATURE OE
33
      142 => "\xc5\xbd", // LATIN CAPITAL LETTER Z WITH CARON
34
      145 => "\xe2\x80\x98", // LEFT SINGLE QUOTATION MARK
35
      146 => "\xe2\x80\x99", // RIGHT SINGLE QUOTATION MARK
36
      147 => "\xe2\x80\x9c", // LEFT DOUBLE QUOTATION MARK
37
      148 => "\xe2\x80\x9d", // RIGHT DOUBLE QUOTATION MARK
38
      149 => "\xe2\x80\xa2", // BULLET
39
      150 => "\xe2\x80\x93", // EN DASH
40
      151 => "\xe2\x80\x94", // EM DASH
41
      152 => "\xcb\x9c", // SMALL TILDE
42
      153 => "\xe2\x84\xa2", // TRADE MARK SIGN
43
      154 => "\xc5\xa1", // LATIN SMALL LETTER S WITH CARON
44
      155 => "\xe2\x80\xba", // SINGLE RIGHT-POINTING ANGLE QUOTE
45
      156 => "\xc5\x93", // LATIN SMALL LIGATURE OE
46
      158 => "\xc5\xbe", // LATIN SMALL LETTER Z WITH CARON
47
      159 => "\xc5\xb8", // LATIN CAPITAL LETTER Y WITH DIAERESIS
48
  );
49
50
  /**
51
   * @var array
52
   */
53
  private static $CP1252_TO_UTF8 = array(
54
      '€' => '€',
55
      '‚' => '‚',
56
      'ƒ' => 'ƒ',
57
      '„' => '„',
58
      '…' => '…',
59
      '†' => '†',
60
      '‡' => '‡',
61
      'ˆ' => 'ˆ',
62
      '‰' => '‰',
63
      'Š' => 'Š',
64
      '‹' => '‹',
65
      'Œ' => 'Œ',
66
      'Ž' => 'Ž',
67
      '‘' => '‘',
68
      '’' => '’',
69
      '“' => '“',
70
      '”' => '”',
71
      '•' => '•',
72
      '–' => '–',
73
      '—' => '—',
74
      '˜' => '˜',
75
      '™' => '™',
76
      'š' => 'š',
77
      '›' => '›',
78
      'œ' => 'œ',
79
      'ž' => 'ž',
80
      'Ÿ' => 'Ÿ',
81
  );
82
83
  /**
84
   * Bom => Byte-Length
85
   *
86
   * INFO: https://en.wikipedia.org/wiki/Byte_order_mark
87
   *
88
   * @var array
89
   */
90
  private static $BOM = array(
91
      "\xef\xbb\xbf"     => 3, // UTF-8 BOM
92
      ''              => 6, // UTF-8 BOM as "WINDOWS-1252" (one char has [maybe] more then one byte ...)
93
      "\x00\x00\xfe\xff" => 4, // UTF-32 (BE) BOM
94
      '  þÿ'             => 6, // UTF-32 (BE) BOM as "WINDOWS-1252"
0 ignored issues
show
Unused Code Comprehensibility introduced by
36% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
95
      "\xff\xfe\x00\x00" => 4, // UTF-32 (LE) BOM
96
      'ÿþ  '             => 6, // UTF-32 (LE) BOM as "WINDOWS-1252"
0 ignored issues
show
Unused Code Comprehensibility introduced by
36% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
97
      "\xfe\xff"         => 2, // UTF-16 (BE) BOM
98
      'þÿ'               => 4, // UTF-16 (BE) BOM as "WINDOWS-1252"
0 ignored issues
show
Unused Code Comprehensibility introduced by
36% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
99
      "\xff\xfe"         => 2, // UTF-16 (LE) BOM
100
      'ÿþ'               => 4, // UTF-16 (LE) BOM as "WINDOWS-1252"
0 ignored issues
show
Unused Code Comprehensibility introduced by
36% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
101
  );
102
103
  /**
104
   * Numeric code point => UTF-8 Character
105
   *
106
   * url: http://www.w3schools.com/charsets/ref_utf_punctuation.asp
107
   *
108
   * @var array
109
   */
110
  private static $WHITESPACE = array(
111
    // NUL Byte
112
    0     => "\x0",
113
    // Tab
114
    9     => "\x9",
115
    // New Line
116
    10    => "\xa",
117
    // Vertical Tab
118
    11    => "\xb",
119
    // Carriage Return
120
    13    => "\xd",
121
    // Ordinary Space
122
    32    => "\x20",
123
    // NO-BREAK SPACE
124
    160   => "\xc2\xa0",
125
    // OGHAM SPACE MARK
126
    5760  => "\xe1\x9a\x80",
127
    // MONGOLIAN VOWEL SEPARATOR
128
    6158  => "\xe1\xa0\x8e",
129
    // EN QUAD
130
    8192  => "\xe2\x80\x80",
131
    // EM QUAD
132
    8193  => "\xe2\x80\x81",
133
    // EN SPACE
134
    8194  => "\xe2\x80\x82",
135
    // EM SPACE
136
    8195  => "\xe2\x80\x83",
137
    // THREE-PER-EM SPACE
138
    8196  => "\xe2\x80\x84",
139
    // FOUR-PER-EM SPACE
140
    8197  => "\xe2\x80\x85",
141
    // SIX-PER-EM SPACE
142
    8198  => "\xe2\x80\x86",
143
    // FIGURE SPACE
144
    8199  => "\xe2\x80\x87",
145
    // PUNCTUATION SPACE
146
    8200  => "\xe2\x80\x88",
147
    // THIN SPACE
148
    8201  => "\xe2\x80\x89",
149
    //HAIR SPACE
150
    8202  => "\xe2\x80\x8a",
151
    // LINE SEPARATOR
152
    8232  => "\xe2\x80\xa8",
153
    // PARAGRAPH SEPARATOR
154
    8233  => "\xe2\x80\xa9",
155
    // NARROW NO-BREAK SPACE
156
    8239  => "\xe2\x80\xaf",
157
    // MEDIUM MATHEMATICAL SPACE
158
    8287  => "\xe2\x81\x9f",
159
    // IDEOGRAPHIC SPACE
160
    12288 => "\xe3\x80\x80",
161
  );
162
163
  /**
164
   * @var array
165
   */
166
  private static $WHITESPACE_TABLE = array(
167
      'SPACE'                     => "\x20",
168
      'NO-BREAK SPACE'            => "\xc2\xa0",
169
      'OGHAM SPACE MARK'          => "\xe1\x9a\x80",
170
      'EN QUAD'                   => "\xe2\x80\x80",
171
      'EM QUAD'                   => "\xe2\x80\x81",
172
      'EN SPACE'                  => "\xe2\x80\x82",
173
      'EM SPACE'                  => "\xe2\x80\x83",
174
      'THREE-PER-EM SPACE'        => "\xe2\x80\x84",
175
      'FOUR-PER-EM SPACE'         => "\xe2\x80\x85",
176
      'SIX-PER-EM SPACE'          => "\xe2\x80\x86",
177
      'FIGURE SPACE'              => "\xe2\x80\x87",
178
      'PUNCTUATION SPACE'         => "\xe2\x80\x88",
179
      'THIN SPACE'                => "\xe2\x80\x89",
180
      'HAIR SPACE'                => "\xe2\x80\x8a",
181
      'LINE SEPARATOR'            => "\xe2\x80\xa8",
182
      'PARAGRAPH SEPARATOR'       => "\xe2\x80\xa9",
183
      'ZERO WIDTH SPACE'          => "\xe2\x80\x8b",
184
      'NARROW NO-BREAK SPACE'     => "\xe2\x80\xaf",
185
      'MEDIUM MATHEMATICAL SPACE' => "\xe2\x81\x9f",
186
      'IDEOGRAPHIC SPACE'         => "\xe3\x80\x80",
187
  );
188
189
  /**
190
   * bidirectional text chars
191
   *
192
   * url: https://www.w3.org/International/questions/qa-bidi-unicode-controls
193
   *
194
   * @var array
195
   */
196
  private static $BIDI_UNI_CODE_CONTROLS_TABLE = array(
197
    // LEFT-TO-RIGHT EMBEDDING (use -> dir = "ltr")
198
    8234 => "\xE2\x80\xAA",
199
    // RIGHT-TO-LEFT EMBEDDING (use -> dir = "rtl")
200
    8235 => "\xE2\x80\xAB",
201
    // POP DIRECTIONAL FORMATTING // (use -> </bdo>)
202
    8236 => "\xE2\x80\xAC",
203
    // LEFT-TO-RIGHT OVERRIDE // (use -> <bdo dir = "ltr">)
204
    8237 => "\xE2\x80\xAD",
205
    // RIGHT-TO-LEFT OVERRIDE // (use -> <bdo dir = "rtl">)
206
    8238 => "\xE2\x80\xAE",
207
    // LEFT-TO-RIGHT ISOLATE // (use -> dir = "ltr")
208
    8294 => "\xE2\x81\xA6",
209
    // RIGHT-TO-LEFT ISOLATE // (use -> dir = "rtl")
210
    8295 => "\xE2\x81\xA7",
211
    // FIRST STRONG ISOLATE // (use -> dir = "auto")
212
    8296 => "\xE2\x81\xA8",
213
    // POP DIRECTIONAL ISOLATE
214
    8297 => "\xE2\x81\xA9",
215
  );
216
217
  /**
218
   * @var array
219
   */
220
  private static $COMMON_CASE_FOLD = array(
221
      'ſ'            => 's',
222
      "\xCD\x85"     => 'ι',
223
      'ς'            => 'σ',
224
      "\xCF\x90"     => 'β',
225
      "\xCF\x91"     => 'θ',
226
      "\xCF\x95"     => 'φ',
227
      "\xCF\x96"     => 'π',
228
      "\xCF\xB0"     => 'κ',
229
      "\xCF\xB1"     => 'ρ',
230
      "\xCF\xB5"     => 'ε',
231
      "\xE1\xBA\x9B" => "\xE1\xB9\xA1",
232
      "\xE1\xBE\xBE" => 'ι',
233
  );
234
235
  /**
236
   * @var array
237
   */
238
  private static $BROKEN_UTF8_FIX = array(
239
      "\xc2\x80" => "\xe2\x82\xac", // EURO SIGN
240
      "\xc2\x82" => "\xe2\x80\x9a", // SINGLE LOW-9 QUOTATION MARK
241
      "\xc2\x83" => "\xc6\x92", // LATIN SMALL LETTER F WITH HOOK
242
      "\xc2\x84" => "\xe2\x80\x9e", // DOUBLE LOW-9 QUOTATION MARK
243
      "\xc2\x85" => "\xe2\x80\xa6", // HORIZONTAL ELLIPSIS
244
      "\xc2\x86" => "\xe2\x80\xa0", // DAGGER
245
      "\xc2\x87" => "\xe2\x80\xa1", // DOUBLE DAGGER
246
      "\xc2\x88" => "\xcb\x86", // MODIFIER LETTER CIRCUMFLEX ACCENT
247
      "\xc2\x89" => "\xe2\x80\xb0", // PER MILLE SIGN
248
      "\xc2\x8a" => "\xc5\xa0", // LATIN CAPITAL LETTER S WITH CARON
249
      "\xc2\x8b" => "\xe2\x80\xb9", // SINGLE LEFT-POINTING ANGLE QUOTE
250
      "\xc2\x8c" => "\xc5\x92", // LATIN CAPITAL LIGATURE OE
251
      "\xc2\x8e" => "\xc5\xbd", // LATIN CAPITAL LETTER Z WITH CARON
252
      "\xc2\x91" => "\xe2\x80\x98", // LEFT SINGLE QUOTATION MARK
253
      "\xc2\x92" => "\xe2\x80\x99", // RIGHT SINGLE QUOTATION MARK
254
      "\xc2\x93" => "\xe2\x80\x9c", // LEFT DOUBLE QUOTATION MARK
255
      "\xc2\x94" => "\xe2\x80\x9d", // RIGHT DOUBLE QUOTATION MARK
256
      "\xc2\x95" => "\xe2\x80\xa2", // BULLET
257
      "\xc2\x96" => "\xe2\x80\x93", // EN DASH
258
      "\xc2\x97" => "\xe2\x80\x94", // EM DASH
259
      "\xc2\x98" => "\xcb\x9c", // SMALL TILDE
260
      "\xc2\x99" => "\xe2\x84\xa2", // TRADE MARK SIGN
261
      "\xc2\x9a" => "\xc5\xa1", // LATIN SMALL LETTER S WITH CARON
262
      "\xc2\x9b" => "\xe2\x80\xba", // SINGLE RIGHT-POINTING ANGLE QUOTE
263
      "\xc2\x9c" => "\xc5\x93", // LATIN SMALL LIGATURE OE
264
      "\xc2\x9e" => "\xc5\xbe", // LATIN SMALL LETTER Z WITH CARON
265
      "\xc2\x9f" => "\xc5\xb8", // LATIN CAPITAL LETTER Y WITH DIAERESIS
266
      'ü'       => 'ü',
267
      'ä'       => 'ä',
268
      'ö'       => 'ö',
269
      'Ö'       => 'Ö',
270
      'ß'       => 'ß',
271
      'Ã '       => 'à',
272
      'á'       => 'á',
273
      'â'       => 'â',
274
      'ã'       => 'ã',
275
      'ù'       => 'ù',
276
      'ú'       => 'ú',
277
      'û'       => 'û',
278
      'Ù'       => 'Ù',
279
      'Ú'       => 'Ú',
280
      'Û'       => 'Û',
281
      'Ü'       => 'Ü',
282
      'ò'       => 'ò',
283
      'ó'       => 'ó',
284
      'ô'       => 'ô',
285
      'è'       => 'è',
286
      'é'       => 'é',
287
      'ê'       => 'ê',
288
      'ë'       => 'ë',
289
      'À'       => 'À',
290
      'Á'       => 'Á',
291
      'Â'       => 'Â',
292
      'Ã'       => 'Ã',
293
      'Ä'       => 'Ä',
294
      'Ã…'       => 'Å',
295
      'Ç'       => 'Ç',
296
      'È'       => 'È',
297
      'É'       => 'É',
298
      'Ê'       => 'Ê',
299
      'Ë'       => 'Ë',
300
      'ÃŒ'       => 'Ì',
301
      'Í'       => 'Í',
302
      'ÃŽ'       => 'Î',
303
      'Ï'       => 'Ï',
304
      'Ñ'       => 'Ñ',
305
      'Ã’'       => 'Ò',
306
      'Ó'       => 'Ó',
307
      'Ô'       => 'Ô',
308
      'Õ'       => 'Õ',
309
      'Ø'       => 'Ø',
310
      'Ã¥'       => 'å',
311
      'æ'       => 'æ',
312
      'ç'       => 'ç',
313
      'ì'       => 'ì',
314
      'í'       => 'í',
315
      'î'       => 'î',
316
      'ï'       => 'ï',
317
      'ð'       => 'ð',
318
      'ñ'       => 'ñ',
319
      'õ'       => 'õ',
320
      'ø'       => 'ø',
321
      'ý'       => 'ý',
322
      'ÿ'       => 'ÿ',
323
      '€'      => '€',
324
      '’'      => '’',
325
  );
326
327
  /**
328
   * @var array
329
   */
330
  private static $UTF8_TO_WIN1252 = array(
331
      "\xe2\x82\xac" => "\x80", // EURO SIGN
332
      "\xe2\x80\x9a" => "\x82", // SINGLE LOW-9 QUOTATION MARK
333
      "\xc6\x92"     => "\x83", // LATIN SMALL LETTER F WITH HOOK
334
      "\xe2\x80\x9e" => "\x84", // DOUBLE LOW-9 QUOTATION MARK
335
      "\xe2\x80\xa6" => "\x85", // HORIZONTAL ELLIPSIS
336
      "\xe2\x80\xa0" => "\x86", // DAGGER
337
      "\xe2\x80\xa1" => "\x87", // DOUBLE DAGGER
338
      "\xcb\x86"     => "\x88", // MODIFIER LETTER CIRCUMFLEX ACCENT
339
      "\xe2\x80\xb0" => "\x89", // PER MILLE SIGN
340
      "\xc5\xa0"     => "\x8a", // LATIN CAPITAL LETTER S WITH CARON
341
      "\xe2\x80\xb9" => "\x8b", // SINGLE LEFT-POINTING ANGLE QUOTE
342
      "\xc5\x92"     => "\x8c", // LATIN CAPITAL LIGATURE OE
343
      "\xc5\xbd"     => "\x8e", // LATIN CAPITAL LETTER Z WITH CARON
344
      "\xe2\x80\x98" => "\x91", // LEFT SINGLE QUOTATION MARK
345
      "\xe2\x80\x99" => "\x92", // RIGHT SINGLE QUOTATION MARK
346
      "\xe2\x80\x9c" => "\x93", // LEFT DOUBLE QUOTATION MARK
347
      "\xe2\x80\x9d" => "\x94", // RIGHT DOUBLE QUOTATION MARK
348
      "\xe2\x80\xa2" => "\x95", // BULLET
349
      "\xe2\x80\x93" => "\x96", // EN DASH
350
      "\xe2\x80\x94" => "\x97", // EM DASH
351
      "\xcb\x9c"     => "\x98", // SMALL TILDE
352
      "\xe2\x84\xa2" => "\x99", // TRADE MARK SIGN
353
      "\xc5\xa1"     => "\x9a", // LATIN SMALL LETTER S WITH CARON
354
      "\xe2\x80\xba" => "\x9b", // SINGLE RIGHT-POINTING ANGLE QUOTE
355
      "\xc5\x93"     => "\x9c", // LATIN SMALL LIGATURE OE
356
      "\xc5\xbe"     => "\x9e", // LATIN SMALL LETTER Z WITH CARON
357
      "\xc5\xb8"     => "\x9f", // LATIN CAPITAL LETTER Y WITH DIAERESIS
358
  );
359
360
  /**
361
   * @var array
362
   */
363
  private static $UTF8_MSWORD = array(
364
      "\xc2\xab"     => '"', // « (U+00AB) in UTF-8
365
      "\xc2\xbb"     => '"', // » (U+00BB) in UTF-8
366
      "\xe2\x80\x98" => "'", // ‘ (U+2018) in UTF-8
367
      "\xe2\x80\x99" => "'", // ’ (U+2019) in UTF-8
368
      "\xe2\x80\x9a" => "'", // ‚ (U+201A) in UTF-8
369
      "\xe2\x80\x9b" => "'", // ‛ (U+201B) in UTF-8
370
      "\xe2\x80\x9c" => '"', // “ (U+201C) in UTF-8
371
      "\xe2\x80\x9d" => '"', // ” (U+201D) in UTF-8
372
      "\xe2\x80\x9e" => '"', // „ (U+201E) in UTF-8
373
      "\xe2\x80\x9f" => '"', // ‟ (U+201F) in UTF-8
374
      "\xe2\x80\xb9" => "'", // ‹ (U+2039) in UTF-8
375
      "\xe2\x80\xba" => "'", // › (U+203A) in UTF-8
376
      "\xe2\x80\x93" => '-', // – (U+2013) in UTF-8
377
      "\xe2\x80\x94" => '-', // — (U+2014) in UTF-8
378
      "\xe2\x80\xa6" => '...' // … (U+2026) in UTF-8
379
  );
380
381
  /**
382
   * @var array
383
   */
384
  private static $ICONV_ENCODING = array(
385
      'ANSI_X3.4-1968',
386
      'ANSI_X3.4-1986',
387
      'ASCII',
388
      'CP367',
389
      'IBM367',
390
      'ISO-IR-6',
391
      'ISO646-US',
392
      'ISO_646.IRV:1991',
393
      'US',
394
      'US-ASCII',
395
      'CSASCII',
396
      'UTF-8',
397
      'ISO-10646-UCS-2',
398
      'UCS-2',
399
      'CSUNICODE',
400
      'UCS-2BE',
401
      'UNICODE-1-1',
402
      'UNICODEBIG',
403
      'CSUNICODE11',
404
      'UCS-2LE',
405
      'UNICODELITTLE',
406
      'ISO-10646-UCS-4',
407
      'UCS-4',
408
      'CSUCS4',
409
      'UCS-4BE',
410
      'UCS-4LE',
411
      'UTF-16',
412
      'UTF-16BE',
413
      'UTF-16LE',
414
      'UTF-32',
415
      'UTF-32BE',
416
      'UTF-32LE',
417
      'UNICODE-1-1-UTF-7',
418
      'UTF-7',
419
      'CSUNICODE11UTF7',
420
      'UCS-2-INTERNAL',
421
      'UCS-2-SWAPPED',
422
      'UCS-4-INTERNAL',
423
      'UCS-4-SWAPPED',
424
      'C99',
425
      'JAVA',
426
      'CP819',
427
      'IBM819',
428
      'ISO-8859-1',
429
      'ISO-IR-100',
430
      'ISO8859-1',
431
      'ISO_8859-1',
432
      'ISO_8859-1:1987',
433
      'L1',
434
      'LATIN1',
435
      'CSISOLATIN1',
436
      'ISO-8859-2',
437
      'ISO-IR-101',
438
      'ISO8859-2',
439
      'ISO_8859-2',
440
      'ISO_8859-2:1987',
441
      'L2',
442
      'LATIN2',
443
      'CSISOLATIN2',
444
      'ISO-8859-3',
445
      'ISO-IR-109',
446
      'ISO8859-3',
447
      'ISO_8859-3',
448
      'ISO_8859-3:1988',
449
      'L3',
450
      'LATIN3',
451
      'CSISOLATIN3',
452
      'ISO-8859-4',
453
      'ISO-IR-110',
454
      'ISO8859-4',
455
      'ISO_8859-4',
456
      'ISO_8859-4:1988',
457
      'L4',
458
      'LATIN4',
459
      'CSISOLATIN4',
460
      'CYRILLIC',
461
      'ISO-8859-5',
462
      'ISO-IR-144',
463
      'ISO8859-5',
464
      'ISO_8859-5',
465
      'ISO_8859-5:1988',
466
      'CSISOLATINCYRILLIC',
467
      'ARABIC',
468
      'ASMO-708',
469
      'ECMA-114',
470
      'ISO-8859-6',
471
      'ISO-IR-127',
472
      'ISO8859-6',
473
      'ISO_8859-6',
474
      'ISO_8859-6:1987',
475
      'CSISOLATINARABIC',
476
      'ECMA-118',
477
      'ELOT_928',
478
      'GREEK',
479
      'GREEK8',
480
      'ISO-8859-7',
481
      'ISO-IR-126',
482
      'ISO8859-7',
483
      'ISO_8859-7',
484
      'ISO_8859-7:1987',
485
      'ISO_8859-7:2003',
486
      'CSISOLATINGREEK',
487
      'HEBREW',
488
      'ISO-8859-8',
489
      'ISO-IR-138',
490
      'ISO8859-8',
491
      'ISO_8859-8',
492
      'ISO_8859-8:1988',
493
      'CSISOLATINHEBREW',
494
      'ISO-8859-9',
495
      'ISO-IR-148',
496
      'ISO8859-9',
497
      'ISO_8859-9',
498
      'ISO_8859-9:1989',
499
      'L5',
500
      'LATIN5',
501
      'CSISOLATIN5',
502
      'ISO-8859-10',
503
      'ISO-IR-157',
504
      'ISO8859-10',
505
      'ISO_8859-10',
506
      'ISO_8859-10:1992',
507
      'L6',
508
      'LATIN6',
509
      'CSISOLATIN6',
510
      'ISO-8859-11',
511
      'ISO8859-11',
512
      'ISO_8859-11',
513
      'ISO-8859-13',
514
      'ISO-IR-179',
515
      'ISO8859-13',
516
      'ISO_8859-13',
517
      'L7',
518
      'LATIN7',
519
      'ISO-8859-14',
520
      'ISO-CELTIC',
521
      'ISO-IR-199',
522
      'ISO8859-14',
523
      'ISO_8859-14',
524
      'ISO_8859-14:1998',
525
      'L8',
526
      'LATIN8',
527
      'ISO-8859-15',
528
      'ISO-IR-203',
529
      'ISO8859-15',
530
      'ISO_8859-15',
531
      'ISO_8859-15:1998',
532
      'LATIN-9',
533
      'ISO-8859-16',
534
      'ISO-IR-226',
535
      'ISO8859-16',
536
      'ISO_8859-16',
537
      'ISO_8859-16:2001',
538
      'L10',
539
      'LATIN10',
540
      'KOI8-R',
541
      'CSKOI8R',
542
      'KOI8-U',
543
      'KOI8-RU',
544
      'CP1250',
545
      'MS-EE',
546
      'WINDOWS-1250',
547
      'CP1251',
548
      'MS-CYRL',
549
      'WINDOWS-1251',
550
      'CP1252',
551
      'MS-ANSI',
552
      'WINDOWS-1252',
553
      'CP1253',
554
      'MS-GREEK',
555
      'WINDOWS-1253',
556
      'CP1254',
557
      'MS-TURK',
558
      'WINDOWS-1254',
559
      'CP1255',
560
      'MS-HEBR',
561
      'WINDOWS-1255',
562
      'CP1256',
563
      'MS-ARAB',
564
      'WINDOWS-1256',
565
      'CP1257',
566
      'WINBALTRIM',
567
      'WINDOWS-1257',
568
      'CP1258',
569
      'WINDOWS-1258',
570
      '850',
571
      'CP850',
572
      'IBM850',
573
      'CSPC850MULTILINGUAL',
574
      '862',
575
      'CP862',
576
      'IBM862',
577
      'CSPC862LATINHEBREW',
578
      '866',
579
      'CP866',
580
      'IBM866',
581
      'CSIBM866',
582
      'MAC',
583
      'MACINTOSH',
584
      'MACROMAN',
585
      'CSMACINTOSH',
586
      'MACCENTRALEUROPE',
587
      'MACICELAND',
588
      'MACCROATIAN',
589
      'MACROMANIA',
590
      'MACCYRILLIC',
591
      'MACUKRAINE',
592
      'MACGREEK',
593
      'MACTURKISH',
594
      'MACHEBREW',
595
      'MACARABIC',
596
      'MACTHAI',
597
      'HP-ROMAN8',
598
      'R8',
599
      'ROMAN8',
600
      'CSHPROMAN8',
601
      'NEXTSTEP',
602
      'ARMSCII-8',
603
      'GEORGIAN-ACADEMY',
604
      'GEORGIAN-PS',
605
      'KOI8-T',
606
      'CP154',
607
      'CYRILLIC-ASIAN',
608
      'PT154',
609
      'PTCP154',
610
      'CSPTCP154',
611
      'KZ-1048',
612
      'RK1048',
613
      'STRK1048-2002',
614
      'CSKZ1048',
615
      'MULELAO-1',
616
      'CP1133',
617
      'IBM-CP1133',
618
      'ISO-IR-166',
619
      'TIS-620',
620
      'TIS620',
621
      'TIS620-0',
622
      'TIS620.2529-1',
623
      'TIS620.2533-0',
624
      'TIS620.2533-1',
625
      'CP874',
626
      'WINDOWS-874',
627
      'VISCII',
628
      'VISCII1.1-1',
629
      'CSVISCII',
630
      'TCVN',
631
      'TCVN-5712',
632
      'TCVN5712-1',
633
      'TCVN5712-1:1993',
634
      'ISO-IR-14',
635
      'ISO646-JP',
636
      'JIS_C6220-1969-RO',
637
      'JP',
638
      'CSISO14JISC6220RO',
639
      'JISX0201-1976',
640
      'JIS_X0201',
641
      'X0201',
642
      'CSHALFWIDTHKATAKANA',
643
      'ISO-IR-87',
644
      'JIS0208',
645
      'JIS_C6226-1983',
646
      'JIS_X0208',
647
      'JIS_X0208-1983',
648
      'JIS_X0208-1990',
649
      'X0208',
650
      'CSISO87JISX0208',
651
      'ISO-IR-159',
652
      'JIS_X0212',
653
      'JIS_X0212-1990',
654
      'JIS_X0212.1990-0',
655
      'X0212',
656
      'CSISO159JISX02121990',
657
      'CN',
658
      'GB_1988-80',
659
      'ISO-IR-57',
660
      'ISO646-CN',
661
      'CSISO57GB1988',
662
      'CHINESE',
663
      'GB_2312-80',
664
      'ISO-IR-58',
665
      'CSISO58GB231280',
666
      'CN-GB-ISOIR165',
667
      'ISO-IR-165',
668
      'ISO-IR-149',
669
      'KOREAN',
670
      'KSC_5601',
671
      'KS_C_5601-1987',
672
      'KS_C_5601-1989',
673
      'CSKSC56011987',
674
      'EUC-JP',
675
      'EUCJP',
676
      'EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE',
677
      'CSEUCPKDFMTJAPANESE',
678
      'MS_KANJI',
679
      'SHIFT-JIS',
680
      'SHIFT_JIS',
681
      'SJIS',
682
      'CSSHIFTJIS',
683
      'CP932',
684
      'ISO-2022-JP',
685
      'CSISO2022JP',
686
      'ISO-2022-JP-1',
687
      'ISO-2022-JP-2',
688
      'CSISO2022JP2',
689
      'CN-GB',
690
      'EUC-CN',
691
      'EUCCN',
692
      'GB2312',
693
      'CSGB2312',
694
      'GBK',
695
      'CP936',
696
      'MS936',
697
      'WINDOWS-936',
698
      'GB18030',
699
      'ISO-2022-CN',
700
      'CSISO2022CN',
701
      'ISO-2022-CN-EXT',
702
      'HZ',
703
      'HZ-GB-2312',
704
      'EUC-TW',
705
      'EUCTW',
706
      'CSEUCTW',
707
      'BIG-5',
708
      'BIG-FIVE',
709
      'BIG5',
710
      'BIGFIVE',
711
      'CN-BIG5',
712
      'CSBIG5',
713
      'CP950',
714
      'BIG5-HKSCS:1999',
715
      'BIG5-HKSCS:2001',
716
      'BIG5-HKSCS',
717
      'BIG5-HKSCS:2004',
718
      'BIG5HKSCS',
719
      'EUC-KR',
720
      'EUCKR',
721
      'CSEUCKR',
722
      'CP949',
723
      'UHC',
724
      'CP1361',
725
      'JOHAB',
726
      'ISO-2022-KR',
727
      'CSISO2022KR',
728
      'CP856',
729
      'CP922',
730
      'CP943',
731
      'CP1046',
732
      'CP1124',
733
      'CP1129',
734
      'CP1161',
735
      'IBM-1161',
736
      'IBM1161',
737
      'CSIBM1161',
738
      'CP1162',
739
      'IBM-1162',
740
      'IBM1162',
741
      'CSIBM1162',
742
      'CP1163',
743
      'IBM-1163',
744
      'IBM1163',
745
      'CSIBM1163',
746
      'DEC-KANJI',
747
      'DEC-HANYU',
748
      '437',
749
      'CP437',
750
      'IBM437',
751
      'CSPC8CODEPAGE437',
752
      'CP737',
753
      'CP775',
754
      'IBM775',
755
      'CSPC775BALTIC',
756
      '852',
757
      'CP852',
758
      'IBM852',
759
      'CSPCP852',
760
      'CP853',
761
      '855',
762
      'CP855',
763
      'IBM855',
764
      'CSIBM855',
765
      '857',
766
      'CP857',
767
      'IBM857',
768
      'CSIBM857',
769
      'CP858',
770
      '860',
771
      'CP860',
772
      'IBM860',
773
      'CSIBM860',
774
      '861',
775
      'CP-IS',
776
      'CP861',
777
      'IBM861',
778
      'CSIBM861',
779
      '863',
780
      'CP863',
781
      'IBM863',
782
      'CSIBM863',
783
      'CP864',
784
      'IBM864',
785
      'CSIBM864',
786
      '865',
787
      'CP865',
788
      'IBM865',
789
      'CSIBM865',
790
      '869',
791
      'CP-GR',
792
      'CP869',
793
      'IBM869',
794
      'CSIBM869',
795
      'CP1125',
796
      'EUC-JISX0213',
797
      'SHIFT_JISX0213',
798
      'ISO-2022-JP-3',
799
      'BIG5-2003',
800
      'ISO-IR-230',
801
      'TDS565',
802
      'ATARI',
803
      'ATARIST',
804
      'RISCOS-LATIN1',
805
  );
806
807 1
  /**
808
   * @var array
809 1
   */
810 1
  private static $SUPPORT = array();
811
812
  /**
813
   * __construct()
814
   */
815
  public function __construct()
816
  {
817
    self::checkForSupport();
818
  }
819
820 2
  /**
821
   * Return the character at the specified position: $str[1] like functionality.
822 2
   *
823
   * @param string $str <p>A UTF-8 string.</p>
824
   * @param int    $pos <p>The position of character to return.</p>
825
   *
826
   * @return string <p>Single Multi-Byte character.</p>
827
   */
828
  public static function access($str, $pos)
829
  {
830
    $str = (string)$str;
831
    $pos = (int)$pos;
832
833
    if (!isset($str[0])) {
834 1
      return '';
835
    }
836 1
837 1
    if ($pos < 0) {
838 1
      return '';
839
    }
840 1
841
    return self::substr($str, $pos, 1);
842
  }
843
844
  /**
845
   * Prepends UTF-8 BOM character to the string and returns the whole string.
846
   *
847
   * INFO: If BOM already existed there, the Input string is returned.
848
   *
849
   * @param string $str <p>The input string.</p>
850 1
   *
851
   * @return string <p>The output string that contains BOM.</p>
852 1
   */
853
  public static function add_bom_to_string($str)
854
  {
855
    if (self::string_has_bom($str) === false) {
856
      $str = self::bom() . $str;
857
    }
858
859
    return $str;
860 2
  }
861
862 2
  /**
863
   * Convert binary into an string.
864
   *
865
   * @param mixed $bin 1|0
866
   *
867
   * @return string
868
   */
869
  public static function binary_to_str($bin)
870
  {
871
    if (!isset($bin[0])) {
872
      return '';
873
    }
874 1
875
    return pack('H*', base_convert($bin, 2, 16));
876 1
  }
877
878
  /**
879
   * Returns the UTF-8 Byte Order Mark Character.
880
   *
881
   * INFO: take a look at UTF8::$bom for e.g. UTF-16 and UTF-32 BOM values
882
   *
883
   * @return string UTF-8 Byte Order Mark
884 2
   */
885
  public static function bom()
886 2
  {
887
    return "\xef\xbb\xbf";
888 1
  }
889
890 1
  /**
891 1
   * @alias of UTF8::chr_map()
892 1
   *
893 1
   * @see   UTF8::chr_map()
894 1
   *
895 1
   * @param string|array $callback
896 2
   * @param string       $str
897
   *
898
   * @return array
899
   */
900
  public static function callback($callback, $str)
901
  {
902
    return self::chr_map($callback, $str);
903
  }
904
905
  /**
906
   * This method will auto-detect your server environment for UTF-8 support.
907 9
   *
908
   * INFO: You don't need to run it manually, it will be triggered if it's needed.
909 9
   */
910 9
  public static function checkForSupport()
911 1
  {
912
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
913
914 9
      self::$SUPPORT['already_checked_via_portable_utf8'] = true;
915
916
      // http://php.net/manual/en/book.mbstring.php
917
      self::$SUPPORT['mbstring'] = self::mbstring_loaded();
918 9
919
      // http://php.net/manual/en/book.iconv.php
920
      self::$SUPPORT['iconv'] = self::iconv_loaded();
921
922
      // http://php.net/manual/en/book.intl.php
923 9
      self::$SUPPORT['intl'] = self::intl_loaded();
924 9
925 8
      // http://php.net/manual/en/class.intlchar.php
926
      self::$SUPPORT['intlChar'] = self::intlChar_loaded();
927
928
      // http://php.net/manual/en/book.pcre.php
929 8
      self::$SUPPORT['pcre_utf8'] = self::pcre_utf8_support();
930 6
    }
931
  }
932
933 7
  /**
934 6
   * Generates a UTF-8 encoded character from the given code point.
935 6
   *
936
   * INFO: opposite to UTF8::ord()
937
   *
938 7
   * @param int    $code_point <p>The code point for which to generate a character.</p>
939 7
   * @param string $encoding   [optional] <p>Default is UTF-8</p>
940 7
   *
941 7
   * @return string|null <p>Multi-Byte character, returns null on failure or empty input.</p>
942
   */
943
  public static function chr($code_point, $encoding = 'UTF-8')
944 1
  {
945 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
946 1
      self::checkForSupport();
947 1
    }
948 1
949
    if ($encoding !== 'UTF-8') {
950
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
951
    } elseif (self::$SUPPORT['intlChar'] === true) {
952
      return \IntlChar::chr($code_point);
953
    }
954
955
    // check type of code_point, only if there is no support for "\IntlChar"
956
    $i = (int)$code_point;
957
    if ($i !== $code_point) {
958
      return null;
959
    }
960
961
    // use static cache, only if there is no support for "\IntlChar"
962
    static $CHAR_CACHE = array();
963 1
    $cacheKey = $code_point . $encoding;
964
    if (isset($CHAR_CACHE[$cacheKey]) === true) {
965 1
      return $CHAR_CACHE[$cacheKey];
966
    }
967 1
968
    if (0x80 > $code_point %= 0x200000) {
969
      $str = self::chr_and_parse_int($code_point);
970
    } elseif (0x800 > $code_point) {
971
      $str = self::chr_and_parse_int(0xC0 | $code_point >> 6) .
972
             self::chr_and_parse_int(0x80 | $code_point & 0x3F);
973
    } elseif (0x10000 > $code_point) {
974
      $str = self::chr_and_parse_int(0xE0 | $code_point >> 12) .
975
             self::chr_and_parse_int(0x80 | $code_point >> 6 & 0x3F) .
976
             self::chr_and_parse_int(0x80 | $code_point & 0x3F);
977
    } else {
978
      $str = self::chr_and_parse_int(0xF0 | $code_point >> 18) .
979
             self::chr_and_parse_int(0x80 | $code_point >> 12 & 0x3F) .
980
             self::chr_and_parse_int(0x80 | $code_point >> 6 & 0x3F) .
981
             self::chr_and_parse_int(0x80 | $code_point & 0x3F);
982 4
    }
983
984 4
    if ($encoding !== 'UTF-8') {
985 3
      $str = \mb_convert_encoding($str, $encoding, 'UTF-8');
986
    }
987
988 4
    // add into static cache
989
    $CHAR_CACHE[$cacheKey] = $str;
990
991
    return $str;
992
  }
993
994
  /**
995
   * @param int $int
996
   *
997
   * @return string
998 2
   */
999
  private static function chr_and_parse_int($int)
1000 2
  {
1001 2
    return chr((int)$int);
1002 2
  }
1003
1004 2
  /**
1005
   * Applies callback to all characters of a string.
1006 2
   *
1007
   * @param string|array $callback <p>The callback function.</p>
1008
   * @param string       $str      <p>UTF-8 string to run callback on.</p>
1009 2
   *
1010
   * @return array <p>The outcome of callback.</p>
1011 2
   */
1012 2
  public static function chr_map($callback, $str)
1013 2
  {
1014
    $chars = self::split($str);
1015 1
1016 1
    return array_map($callback, $chars);
1017 1
  }
1018
1019
  /**
1020
   * Generates an array of byte length of each character of a Unicode string.
1021
   *
1022
   * 1 byte => U+0000  - U+007F
1023 2
   * 2 byte => U+0080  - U+07FF
1024
   * 3 byte => U+0800  - U+FFFF
1025 2
   * 4 byte => U+10000 - U+10FFFF
1026 2
   *
1027
   * @param string $str <p>The original Unicode string.</p>
1028 2
   *
1029
   * @return array <p>An array of byte lengths of each character.</p>
1030
   */
1031
  public static function chr_size_list($str)
1032
  {
1033
    $str = (string)$str;
1034
1035
    if (!isset($str[0])) {
1036
      return array();
1037
    }
1038
1039 1
    return array_map('strlen', self::split($str));
1040
  }
1041 1
1042
  /**
1043
   * Get a decimal code representation of a specific character.
1044
   *
1045
   * @param string $char <p>The input character.</p>
1046
   *
1047
   * @return int
1048
   */
1049
  public static function chr_to_decimal($char)
1050
  {
1051
    $char = (string)$char;
1052
    $code = self::ord($char[0]);
1053 1
    $bytes = 1;
1054
1055 1
    if (!($code & 0x80)) {
1056
      // 0xxxxxxx
1057
      return $code;
1058
    }
1059
1060
    if (($code & 0xe0) === 0xc0) {
1061
      // 110xxxxx
1062
      $bytes = 2;
1063
      $code &= ~0xc0;
1064
    } elseif (($code & 0xf0) === 0xe0) {
1065
      // 1110xxxx
1066
      $bytes = 3;
1067
      $code &= ~0xe0;
1068
    } elseif (($code & 0xf8) === 0xf0) {
1069
      // 11110xxx
1070
      $bytes = 4;
1071 44
      $code &= ~0xf0;
1072
    }
1073
1074
    for ($i = 2; $i <= $bytes; $i++) {
1075
      // 10xxxxxx
1076
      $code = ($code << 6) + (self::ord($char[$i - 1]) & ~0x80);
1077
    }
1078
1079
    return $code;
1080
  }
1081
1082
  /**
1083
   * Get hexadecimal code point (U+xxxx) of a UTF-8 encoded character.
1084
   *
1085
   * @param string $char <p>The input character</p>
1086 44
   * @param string $pfix [optional]
1087 44
   *
1088
   * @return string <p>The code point encoded as U+xxxx<p>
1089 44
   */
1090 44
  public static function chr_to_hex($char, $pfix = 'U+')
1091
  {
1092 44
    $char = (string)$char;
1093 17
1094 17
    if (!isset($char[0])) {
1095
      return '';
1096 44
    }
1097 12
1098 12
    if ($char === '&#0;') {
1099
      $char = '';
1100 44
    }
1101 5
1102 5
    return self::int_to_hex(self::ord($char), $pfix);
1103
  }
1104 44
1105
  /**
1106
   * alias for "UTF8::chr_to_decimal()"
1107
   *
1108
   * @see UTF8::chr_to_decimal()
1109
   *
1110
   * @param string $chr
1111
   *
1112
   * @return int
1113
   */
1114 4
  public static function chr_to_int($chr)
1115
  {
1116 4
    return self::chr_to_decimal($chr);
1117
  }
1118 4
1119 1
  /**
1120
   * Splits a string into smaller chunks and multiple lines, using the specified line ending character.
1121
   *
1122
   * @param string $body     <p>The original string to be split.</p>
1123 4
   * @param int    $chunklen [optional] <p>The maximum character length of a chunk.</p>
1124
   * @param string $end      [optional] <p>The character(s) to be inserted at the end of each chunk.</p>
1125
   *
1126
   * @return string <p>The chunked string</p>
1127
   */
1128
  public static function chunk_split($body, $chunklen = 76, $end = "\r\n")
1129
  {
1130 4
    return implode($end, self::split($body, $chunklen));
1131
  }
1132 4
1133
  /**
1134
   * Accepts a string and removes all non-UTF-8 characters from it + extras if needed.
1135
   *
1136
   * @param string $str                     <p>The string to be sanitized.</p>
1137
   * @param bool   $remove_bom              [optional] <p>Set to true, if you need to remove UTF-BOM.</p>
1138
   * @param bool   $normalize_whitespace    [optional] <p>Set to true, if you need to normalize the whitespace.</p>
1139
   * @param bool   $normalize_msword        [optional] <p>Set to true, if you need to normalize MS Word chars e.g.: "…"
1140
   *                                        => "..."</p>
1141
   * @param bool   $keep_non_breaking_space [optional] <p>Set to true, to keep non-breaking-spaces, in combination with
1142
   *                                        $normalize_whitespace</p>
1143
   *
1144
   * @return string <p>Clean UTF-8 encoded string.</p>
1145
   */
1146 5
  public static function clean($str, $remove_bom = false, $normalize_whitespace = false, $normalize_msword = false, $keep_non_breaking_space = false)
1147
  {
1148 5
    // http://stackoverflow.com/questions/1401317/remove-non-utf8-characters-from-string
1149 5
    // caused connection reset problem on larger strings
1150 5
1151
    $regx = '/
1152 5
      (
1153
        (?: [\x00-\x7F]               # single-byte sequences   0xxxxxxx
1154 5
        |   [\xC0-\xDF][\x80-\xBF]    # double-byte sequences   110xxxxx 10xxxxxx
1155 5
        |   [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences   1110xxxx 10xxxxxx * 2
1156 5
        |   [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3
1157
        ){1,100}                      # ...one or more times
1158 5
      )
1159
    | ( [\x80-\xBF] )                 # invalid byte in range 10000000 - 10111111
1160 5
    | ( [\xC0-\xFF] )                 # invalid byte in range 11000000 - 11111111
1161 1
    /x';
1162
    $str = preg_replace($regx, '$1', $str);
1163 1
1164 1
    $str = self::replace_diamond_question_mark($str, '');
1165 1
    $str = self::remove_invisible_characters($str);
1166
1167 1
    if ($normalize_whitespace === true) {
1168 1
      $str = self::normalize_whitespace($str, $keep_non_breaking_space);
1169
    }
1170 5
1171
    if ($normalize_msword === true) {
1172
      $str = self::normalize_msword($str);
1173
    }
1174
1175
    if ($remove_bom === true) {
1176
      $str = self::remove_bom($str);
1177
    }
1178
1179
    return $str;
1180
  }
1181
1182 6
  /**
1183
   * Clean-up a and show only printable UTF-8 chars at the end  + fix UTF-8 encoding.
1184 6
   *
1185
   * @param string $str <p>The input string.</p>
1186
   *
1187
   * @return string
1188
   */
1189
  public static function cleanup($str)
1190
  {
1191
    $str = (string)$str;
1192
1193
    if (!isset($str[0])) {
1194 1
      return '';
1195
    }
1196 1
1197 1
    // fixed ISO <-> UTF-8 Errors
1198 1
    $str = self::fix_simple_utf8($str);
1199
1200 1
    // remove all none UTF-8 symbols
1201
    // && remove diamond question mark (�)
1202
    // && remove remove invisible characters (e.g. "\0")
1203
    // && remove BOM
1204
    // && normalize whitespace chars (but keep non-breaking-spaces)
1205
    $str = self::clean($str, true, true, false, true);
1206
1207
    return (string)$str;
1208
  }
1209
1210
  /**
1211
   * Accepts a string or a array of strings and returns an array of Unicode code points.
1212
   *
1213
   * INFO: opposite to UTF8::string()
1214
   *
1215
   * @param string|string[] $arg        <p>A UTF-8 encoded string or an array of such strings.</p>
1216 11
   * @param bool            $u_style    <p>If True, will return code points in U+xxxx format,
1217
   *                                    default, code points will be returned as integers.</p>
1218 11
   *
1219 11
   * @return array <p>The array of code points.</p>
1220
   */
1221 11
  public static function codepoints($arg, $u_style = false)
1222 5
  {
1223
    if (is_string($arg) === true) {
1224
      $arg = self::split($arg);
1225 11
    }
1226 1
1227 1
    $arg = array_map(
1228
        array(
1229 11
            '\\voku\\helper\\UTF8',
1230
            'ord',
1231
        ),
1232
        $arg
1233 11
    );
1234
1235
    if ($u_style) {
1236 11
      $arg = array_map(
1237
          array(
1238 1
              '\\voku\\helper\\UTF8',
1239 11
              'int_to_hex',
1240
          ),
1241
          $arg
1242
      );
1243 11
    }
1244
1245
    return $arg;
1246 11
  }
1247 1
1248 1
  /**
1249 1
   * Returns count of characters used in a string.
1250 11
   *
1251 11
   * @param string $str       <p>The input string.</p>
1252
   * @param bool   $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
1253
   *
1254
   * @return array <p>An associative array of Character as keys and
1255
   *               their count as values.</p>
1256 2
   */
1257
  public static function count_chars($str, $cleanUtf8 = false)
1258
  {
1259 1
    return array_count_values(self::split($str, 1, $cleanUtf8));
1260
  }
1261
1262 2
  /**
1263 1
   * Converts a int-value into an UTF-8 character.
1264
   *
1265
   * @param int $int
1266 2
   *
1267 2
   * @return string
1268 2
   */
1269
  public static function decimal_to_chr($int)
1270 2
  {
1271
    if (Bootup::is_php('5.4') === true) {
1272 2
      $flags = ENT_QUOTES | ENT_HTML5;
1273 2
    } else {
1274
      $flags = ENT_QUOTES;
1275
    }
1276
1277 1
    return self::html_entity_decode('&#' . $int . ';', $flags);
1278
  }
1279
1280
  /**
1281
   * Encode a string with a new charset-encoding.
1282
   *
1283
   * INFO:  The different to "UTF8::utf8_encode()" is that this function, try to fix also broken / double encoding,
1284
   *        so you can call this function also on a UTF-8 String and you don't mess the string.
1285
   *
1286
   * @param string $encoding <p>e.g. 'UTF-8', 'ISO-8859-1', etc.</p>
1287
   * @param string $str      <p>The input string</p>
1288
   * @param bool   $force    [optional] <p>Force the new encoding (we try to fix broken / double encoding for UTF-8)<br
1289
   *                         /> otherwise we auto-detect the current string-encoding</p>
1290
   *
1291
   * @return string
1292
   */
1293
  public static function encode($encoding, $str, $force = true)
1294
  {
1295
    $str = (string)$str;
1296
    $encoding = (string)$encoding;
1297
1298
    if (!isset($str[0], $encoding[0])) {
1299
      return $str;
1300
    }
1301
1302
    if ($encoding !== 'UTF-8') {
1303
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
1304
    }
1305
1306
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
1307
      self::checkForSupport();
1308
    }
1309
1310
    $encodingDetected = self::str_detect_encoding($str);
1311
1312
    if (
1313
        $encodingDetected
0 ignored issues
show
Bug Best Practice introduced by
The expression $encodingDetected of type false|string is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
1314
        &&
1315
        (
1316
            $force === true
1317
            ||
1318
            $encodingDetected !== $encoding
1319
        )
1320
    ) {
1321
1322
      if (
1323
          $encoding === 'UTF-8'
1324
          &&
1325
          (
1326
              $force === true
1327
              || $encodingDetected === 'UTF-8'
1328
              || $encodingDetected === 'WINDOWS-1252'
1329
              || $encodingDetected === 'ISO-8859-1'
1330
          )
1331
      ) {
1332
        return self::to_utf8($str);
1333
      }
1334
1335
      if (
1336
          $encoding === 'ISO-8859-1'
1337
          &&
1338
          (
1339
              $force === true
1340
              || $encodingDetected === 'ISO-8859-1'
1341
              || $encodingDetected === 'UTF-8'
1342
          )
1343
      ) {
1344
        return self::to_iso8859($str);
1345
      }
1346
1347 View Code Duplication
      if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1348
          $encoding !== 'UTF-8'
1349
          &&
1350
          $encoding !== 'WINDOWS-1252'
1351
          &&
1352
          self::$SUPPORT['mbstring'] === false
1353
      ) {
1354
        trigger_error('UTF8::encode() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
1355
      }
1356
1357
      $strEncoded = \mb_convert_encoding(
1358
          $str,
1359
          $encoding,
1360
          $encodingDetected
1361
      );
1362 2
1363
      if ($strEncoded) {
1364
        return $strEncoded;
1365 2
      }
1366 2
    }
1367
1368 2
    return $str;
1369 2
  }
1370
1371
  /**
1372
   * Reads entire file into a string.
1373 2
   *
1374 2
   * WARNING: do not use UTF-8 Option ($convertToUtf8) for binary-files (e.g.: images) !!!
1375
   *
1376 2
   * @link http://php.net/manual/en/function.file-get-contents.php
1377 2
   *
1378
   * @param string        $filename      <p>
1379 2
   *                                     Name of the file to read.
1380 1
   *                                     </p>
1381 1
   * @param int|false     $flags         [optional] <p>
1382 2
   *                                     Prior to PHP 6, this parameter is called
1383
   *                                     use_include_path and is a bool.
1384
   *                                     As of PHP 5 the FILE_USE_INCLUDE_PATH can be used
1385
   *                                     to trigger include path
1386 2
   *                                     search.
1387 1
   *                                     </p>
1388
   *                                     <p>
1389
   *                                     The value of flags can be any combination of
1390 1
   *                                     the following flags (with some restrictions), joined with the
1391 1
   *                                     binary OR (|)
1392 1
   *                                     operator.
1393 1
   *                                     </p>
1394
   *                                     <p>
1395 1
   *                                     <table>
1396
   *                                     Available flags
1397
   *                                     <tr valign="top">
1398
   *                                     <td>Flag</td>
1399
   *                                     <td>Description</td>
1400
   *                                     </tr>
1401
   *                                     <tr valign="top">
1402
   *                                     <td>
1403
   *                                     FILE_USE_INCLUDE_PATH
1404
   *                                     </td>
1405 1
   *                                     <td>
1406
   *                                     Search for filename in the include directory.
1407 1
   *                                     See include_path for more
1408
   *                                     information.
1409
   *                                     </td>
1410
   *                                     </tr>
1411
   *                                     <tr valign="top">
1412
   *                                     <td>
1413
   *                                     FILE_TEXT
1414
   *                                     </td>
1415
   *                                     <td>
1416
   *                                     As of PHP 6, the default encoding of the read
1417
   *                                     data is UTF-8. You can specify a different encoding by creating a
1418
   *                                     custom context or by changing the default using
1419 9
   *                                     stream_default_encoding. This flag cannot be
1420
   *                                     used with FILE_BINARY.
1421 9
   *                                     </td>
1422 9
   *                                     </tr>
1423 3
   *                                     <tr valign="top">
1424
   *                                     <td>
1425 3
   *                                     FILE_BINARY
1426 3
   *                                     </td>
1427 3
   *                                     <td>
1428 9
   *                                     With this flag, the file is read in binary mode. This is the default
1429 2
   *                                     setting and cannot be used with FILE_TEXT.
1430 2
   *                                     </td>
1431 2
   *                                     </tr>
1432 2
   *                                     </table>
1433 9
   *                                     </p>
1434
   * @param resource|null $context       [optional] <p>
1435 8
   *                                     A valid context resource created with
1436
   *                                     stream_context_create. If you don't need to use a
1437 2
   *                                     custom context, you can skip this parameter by &null;.
1438 2
   *                                     </p>
1439
   * @param int|null      $offset        [optional] <p>
1440 8
   *                                     The offset where the reading starts.
1441
   *                                     </p>
1442 8
   * @param int|null      $maxlen        [optional] <p>
1443 6
   *                                     Maximum length of data read. The default is to read until end
1444 6
   *                                     of file is reached.
1445 6
   *                                     </p>
1446
   * @param int           $timeout       <p>The time in seconds for the timeout.</p>
1447 6
   *
1448 3
   * @param boolean       $convertToUtf8 <strong>WARNING!!!</strong> <p>Maybe you can't use this option for e.g. images
1449 3
   *                                     or pdf, because they used non default utf-8 chars</p>
1450 5
   *
1451
   * @return string <p>The function returns the read data or false on failure.</p>
1452
   */
1453
  public static function file_get_contents($filename, $flags = null, $context = null, $offset = null, $maxlen = null, $timeout = 10, $convertToUtf8 = true)
1454
  {
1455 8
    // init
1456 8
    $timeout = (int)$timeout;
1457 5
    $filename = filter_var($filename, FILTER_SANITIZE_STRING);
1458 8
1459
    if ($timeout && $context === null) {
1460
      $context = stream_context_create(
1461 2
          array(
1462 2
              'http' =>
1463 8
                  array(
1464 8
                      'timeout' => $timeout,
1465 9
                  ),
1466
          )
1467 9
      );
1468
    }
1469
1470
    if (!$flags) {
1471
      $flags = false;
1472
    }
1473
1474
    if ($offset === null) {
1475
      $offset = 0;
1476
    }
1477
1478
    if (is_int($maxlen) === true) {
1479
      $data = file_get_contents($filename, $flags, $context, $offset, $maxlen);
1480
    } else {
1481
      $data = file_get_contents($filename, $flags, $context, $offset);
1482
    }
1483
1484
    // return false on error
1485
    if ($data === false) {
1486
      return false;
1487
    }
1488
1489
    if ($convertToUtf8 === true) {
1490
      $data = self::encode('UTF-8', $data, false);
1491
      $data = self::cleanup($data);
0 ignored issues
show
Bug introduced by
It seems like $data can also be of type array; however, voku\helper\UTF8::cleanup() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
1492
    }
1493
1494
    return $data;
1495
  }
1496
1497
  /**
1498
   * Checks if a file starts with BOM (Byte Order Mark) character.
1499
   *
1500
   * @param string $file_path <p>Path to a valid file.</p>
1501
   *
1502
   * @return bool <p><strong>true</strong> if the file has BOM at the start, <strong>false</strong> otherwise.</>
1503
   */
1504
  public static function file_has_bom($file_path)
1505
  {
1506
    return self::string_has_bom(file_get_contents($file_path));
1507
  }
1508
1509
  /**
1510
   * Normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1511
   *
1512
   * @param mixed  $var
1513
   * @param int    $normalization_form
1514
   * @param string $leading_combining
1515
   *
1516
   * @return mixed
1517
   */
1518
  public static function filter($var, $normalization_form = 4 /* n::NFC */, $leading_combining = '◌')
1519
  {
1520 1
    switch (gettype($var)) {
1521 View Code Duplication
      case 'array':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1522 1
        foreach ($var as $k => $v) {
1523 1
          /** @noinspection AlterInForeachInspection */
1524 1
          $var[$k] = self::filter($v, $normalization_form, $leading_combining);
1525 1
        }
1526
        break;
1527 View Code Duplication
      case 'object':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1528 1
        foreach ($var as $k => $v) {
1529
          $var->{$k} = self::filter($v, $normalization_form, $leading_combining);
1530
        }
1531
        break;
1532
      case 'string':
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
1533
1534
        if (false !== strpos($var, "\r")) {
1535
          // Workaround https://bugs.php.net/65732
1536
          $var = str_replace(array("\r\n", "\r"), "\n", $var);
1537
        }
1538
1539
        if (self::is_ascii($var) === false) {
1540 1
          /** @noinspection PhpUndefinedClassInspection */
1541
          if (\Normalizer::isNormalized($var, $normalization_form)) {
1542 1
            $n = '-';
1543 1
          } else {
1544 1
            /** @noinspection PhpUndefinedClassInspection */
1545 1
            $n = \Normalizer::normalize($var, $normalization_form);
1546
1547
            if (isset($n[0])) {
1548 1
              $var = $n;
1549
            } else {
1550
              $var = self::encode('UTF-8', $var);
1551
            }
1552
          }
1553
1554
          if (
1555
              $var[0] >= "\x80"
1556
              &&
1557
              isset($n[0], $leading_combining[0])
1558
              &&
1559 1
              preg_match('/^\p{Mn}/u', $var)
1560
          ) {
1561 1
            // Prevent leading combining chars
1562
            // for NFC-safe concatenations.
1563
            $var = $leading_combining . $var;
1564
          }
1565
        }
1566
1567
        break;
1568
    }
1569
1570
    return $var;
1571
  }
1572
1573
  /**
1574
   * "filter_input()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1575
   *
1576
   * Gets a specific external variable by name and optionally filters it
1577 7
   *
1578
   * @link  http://php.net/manual/en/function.filter-input.php
1579 7
   *
1580 7
   * @param int    $type          <p>
1581
   *                              One of <b>INPUT_GET</b>, <b>INPUT_POST</b>,
1582 7
   *                              <b>INPUT_COOKIE</b>, <b>INPUT_SERVER</b>, or
1583
   *                              <b>INPUT_ENV</b>.
1584 7
   *                              </p>
1585 2
   * @param string $variable_name <p>
1586
   *                              Name of a variable to get.
1587
   *                              </p>
1588 7
   * @param int    $filter        [optional] <p>
1589 1
   *                              The ID of the filter to apply. The
1590 1
   *                              manual page lists the available filters.
1591 1
   *                              </p>
1592
   * @param mixed  $options       [optional] <p>
1593 7
   *                              Associative array of options or bitwise disjunction of flags. If filter
1594
   *                              accepts options, flags can be provided in "flags" field of array.
1595
   *                              </p>
1596
   *
1597
   * @return mixed Value of the requested variable on success, <b>FALSE</b> if the filter fails,
1598
   * or <b>NULL</b> if the <i>variable_name</i> variable is not set.
1599
   * If the flag <b>FILTER_NULL_ON_FAILURE</b> is used, it
1600
   * returns <b>FALSE</b> if the variable is not set and <b>NULL</b> if the filter fails.
1601
   * @since 5.2.0
1602
   */
1603 1 View Code Duplication
  public static function filter_input($type, $variable_name, $filter = FILTER_DEFAULT, $options = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1604
  {
1605 1
    if (4 > func_num_args()) {
1606
      $var = filter_input($type, $variable_name, $filter);
1607 1
    } else {
1608
      $var = filter_input($type, $variable_name, $filter, $options);
1609
    }
1610 1
1611 1
    return self::filter($var);
1612
  }
1613 1
1614
  /**
1615
   * "filter_input_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1616 1
   *
1617 1
   * Gets external variables and optionally filters them
1618 1
   *
1619 1
   * @link  http://php.net/manual/en/function.filter-input-array.php
1620 1
   *
1621
   * @param int   $type       <p>
1622 1
   *                          One of <b>INPUT_GET</b>, <b>INPUT_POST</b>,
1623
   *                          <b>INPUT_COOKIE</b>, <b>INPUT_SERVER</b>, or
1624
   *                          <b>INPUT_ENV</b>.
1625
   *                          </p>
1626
   * @param mixed $definition [optional] <p>
1627
   *                          An array defining the arguments. A valid key is a string
1628
   *                          containing a variable name and a valid value is either a filter type, or an array
1629
   *                          optionally specifying the filter, flags and options. If the value is an
1630
   *                          array, valid keys are filter which specifies the
1631
   *                          filter type,
1632 1
   *                          flags which specifies any flags that apply to the
1633
   *                          filter, and options which specifies any options that
1634 1
   *                          apply to the filter. See the example below for a better understanding.
1635
   *                          </p>
1636
   *                          <p>
1637
   *                          This parameter can be also an integer holding a filter constant. Then all values in the
1638 1
   *                          input array are filtered by this filter.
1639
   *                          </p>
1640
   * @param bool  $add_empty  [optional] <p>
1641
   *                          Add missing keys as <b>NULL</b> to the return value.
1642
   *                          </p>
1643
   *
1644
   * @return mixed An array containing the values of the requested variables on success, or <b>FALSE</b>
1645
   * on failure. An array value will be <b>FALSE</b> if the filter fails, or <b>NULL</b> if
1646
   * the variable is not set. Or if the flag <b>FILTER_NULL_ON_FAILURE</b>
1647
   * is used, it returns <b>FALSE</b> if the variable is not set and <b>NULL</b> if the filter
1648
   * fails.
1649
   * @since 5.2.0
1650
   */
1651 View Code Duplication
  public static function filter_input_array($type, $definition = null, $add_empty = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1652
  {
1653
    if (2 > func_num_args()) {
1654 1
      $a = filter_input_array($type);
1655
    } else {
1656 1
      $a = filter_input_array($type, $definition, $add_empty);
1657 1
    }
1658
1659
    return self::filter($a);
1660 1
  }
1661
1662 1
  /**
1663 1
   * "filter_var()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1664 1
   *
1665 1
   * Filters a variable with a specified filter
1666 1
   *
1667 1
   * @link  http://php.net/manual/en/function.filter-var.php
1668 1
   *
1669 1
   * @param mixed $variable <p>
1670 1
   *                        Value to filter.
1671 1
   *                        </p>
1672 1
   * @param int   $filter   [optional] <p>
1673
   *                        The ID of the filter to apply. The
1674
   *                        manual page lists the available filters.
1675
   *                        </p>
1676
   * @param mixed $options  [optional] <p>
1677
   *                        Associative array of options or bitwise disjunction of flags. If filter
1678
   *                        accepts options, flags can be provided in "flags" field of array. For
1679
   *                        the "callback" filter, callable type should be passed. The
1680
   *                        callback must accept one argument, the value to be filtered, and return
1681
   *                        the value after filtering/sanitizing it.
1682
   *                        </p>
1683
   *                        <p>
1684
   *                        <code>
1685
   *                        // for filters that accept options, use this format
1686
   *                        $options = array(
1687
   *                        'options' => array(
1688
   *                        'default' => 3, // value to return if the filter fails
1689
   *                        // other options here
1690
   *                        'min_range' => 0
1691
   *                        ),
1692 1
   *                        'flags' => FILTER_FLAG_ALLOW_OCTAL,
1693 1
   *                        );
1694
   *                        $var = filter_var('0755', FILTER_VALIDATE_INT, $options);
1695
   *                        // for filter that only accept flags, you can pass them directly
1696
   *                        $var = filter_var('oops', FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE);
1697
   *                        // for filter that only accept flags, you can also pass as an array
1698
   *                        $var = filter_var('oops', FILTER_VALIDATE_BOOLEAN,
1699
   *                        array('flags' => FILTER_NULL_ON_FAILURE));
1700
   *                        // callback validate filter
1701
   *                        function foo($value)
1702
   *                        {
1703
   *                        // Expected format: Surname, GivenNames
1704
   *                        if (strpos($value, ", ") === false) return false;
1705
   *                        list($surname, $givennames) = explode(", ", $value, 2);
1706
   *                        $empty = (empty($surname) || empty($givennames));
1707
   *                        $notstrings = (!is_string($surname) || !is_string($givennames));
1708
   *                        if ($empty || $notstrings) {
1709
   *                        return false;
1710
   *                        } else {
1711
   *                        return $value;
1712
   *                        }
1713
   *                        }
1714
   *                        $var = filter_var('Doe, Jane Sue', FILTER_CALLBACK, array('options' => 'foo'));
1715
   *                        </code>
1716
   *                        </p>
1717
   *
1718
   * @return mixed the filtered data, or <b>FALSE</b> if the filter fails.
1719
   * @since 5.2.0
1720
   */
1721 View Code Duplication
  public static function filter_var($variable, $filter = FILTER_DEFAULT, $options = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1722
  {
1723
    if (3 > func_num_args()) {
1724
      $variable = filter_var($variable, $filter);
1725
    } else {
1726
      $variable = filter_var($variable, $filter, $options);
1727
    }
1728
1729
    return self::filter($variable);
1730
  }
1731
1732
  /**
1733
   * "filter_var_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1734
   *
1735
   * Gets multiple variables and optionally filters them
1736
   *
1737
   * @link  http://php.net/manual/en/function.filter-var-array.php
1738
   *
1739
   * @param array $data       <p>
1740
   *                          An array with string keys containing the data to filter.
1741
   *                          </p>
1742
   * @param mixed $definition [optional] <p>
1743
   *                          An array defining the arguments. A valid key is a string
1744
   *                          containing a variable name and a valid value is either a
1745
   *                          filter type, or an
1746
   *                          array optionally specifying the filter, flags and options.
1747
   *                          If the value is an array, valid keys are filter
1748
   *                          which specifies the filter type,
1749
   *                          flags which specifies any flags that apply to the
1750
   *                          filter, and options which specifies any options that
1751
   *                          apply to the filter. See the example below for a better understanding.
1752 1
   *                          </p>
1753
   *                          <p>
1754 1
   *                          This parameter can be also an integer holding a filter constant. Then all values in the
1755 1
   *                          input array are filtered by this filter.
1756
   *                          </p>
1757 1
   * @param bool  $add_empty  [optional] <p>
1758
   *                          Add missing keys as <b>NULL</b> to the return value.
1759
   *                          </p>
1760
   *
1761
   * @return mixed An array containing the values of the requested variables on success, or <b>FALSE</b>
1762
   * on failure. An array value will be <b>FALSE</b> if the filter fails, or <b>NULL</b> if
1763
   * the variable is not set.
1764
   * @since 5.2.0
1765
   */
1766 View Code Duplication
  public static function filter_var_array($data, $definition = null, $add_empty = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1767
  {
1768
    if (2 > func_num_args()) {
1769
      $a = filter_var_array($data);
1770
    } else {
1771
      $a = filter_var_array($data, $definition, $add_empty);
1772 1
    }
1773
1774 1
    return self::filter($a);
1775
  }
1776
1777
  /**
1778
   * Check if the number of unicode characters are not more than the specified integer.
1779
   *
1780
   * @param string $str      The original string to be checked.
1781
   * @param int    $box_size The size in number of chars to be checked against string.
1782
   *
1783
   * @return bool true if string is less than or equal to $box_size, false otherwise.
1784
   */
1785
  public static function fits_inside($str, $box_size)
1786 1
  {
1787
    return (self::strlen($str) <= $box_size);
1788 1
  }
1789 1
1790
  /**
1791
   * Try to fix simple broken UTF-8 strings.
1792 1
   *
1793 1
   * INFO: Take a look at "UTF8::fix_utf8()" if you need a more advanced fix for broken UTF-8 strings.
1794
   *
1795
   * If you received an UTF-8 string that was converted from Windows-1252 as it was ISO-8859-1
1796 1
   * (ignoring Windows-1252 chars from 80 to 9F) use this function to fix it.
1797
   * See: http://en.wikipedia.org/wiki/Windows-1252
1798
   *
1799
   * @param string $str <p>The input string</p>
1800
   *
1801
   * @return string
1802
   */
1803 View Code Duplication
  public static function fix_simple_utf8($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1804
  {
1805
    // init
1806
    $str = (string)$str;
1807
1808
    if (!isset($str[0])) {
1809
      return '';
1810 1
    }
1811
1812 1
    static $BROKEN_UTF8_TO_UTF8_KEYS_CACHE = null;
1813
    static $BROKEN_UTF8_TO_UTF8_VALUES_CACHE = null;
1814
1815
    if ($BROKEN_UTF8_TO_UTF8_KEYS_CACHE === null) {
1816
      $BROKEN_UTF8_TO_UTF8_KEYS_CACHE = array_keys(self::$BROKEN_UTF8_FIX);
1817
      $BROKEN_UTF8_TO_UTF8_VALUES_CACHE = array_values(self::$BROKEN_UTF8_FIX);
1818
    }
1819
1820
    return str_replace($BROKEN_UTF8_TO_UTF8_KEYS_CACHE, $BROKEN_UTF8_TO_UTF8_VALUES_CACHE, $str);
1821
  }
1822
1823
  /**
1824
   * Fix a double (or multiple) encoded UTF8 string.
1825
   *
1826 2
   * @param string|string[] $str <p>You can use a string or an array of strings.</p>
1827
   *
1828
   * @return mixed
1829 2
   */
1830
  public static function fix_utf8($str)
1831 2
  {
1832 2
    if (is_array($str) === true) {
1833 1
1834 1
      /** @noinspection ForeachSourceInspection */
1835
      foreach ($str as $k => $v) {
1836 2
        /** @noinspection AlterInForeachInspection */
1837 1
        /** @noinspection OffsetOperationsInspection */
1838 1
        $str[$k] = self::fix_utf8($v);
1839
      }
1840 2
1841 2
      return $str;
1842 2
    }
1843
1844 2
    $last = '';
1845
    while ($last !== $str) {
1846
      $last = $str;
1847
      $str = self::to_utf8(
1848
          self::utf8_decode($str)
0 ignored issues
show
Bug introduced by
It seems like $str defined by self::to_utf8(self::utf8_decode($str)) on line 1847 can also be of type array; however, voku\helper\UTF8::utf8_decode() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
1849
      );
1850
    }
1851
1852
    return $str;
1853
  }
1854
1855
  /**
1856
   * Get character of a specific character.
1857
   *
1858
   * @param string $char
1859
   *
1860
   * @return string <p>'RTL' or 'LTR'</p>
1861
   */
1862
  public static function getCharDirection($char)
1863
  {
1864
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
1865
      self::checkForSupport();
1866
    }
1867
1868
    if (self::$SUPPORT['intlChar'] === true) {
1869
      $tmpReturn = \IntlChar::charDirection($char);
1870
1871
      // from "IntlChar"-Class
1872
      $charDirection = array(
1873
          'RTL' => array(1, 13, 14, 15, 21),
1874
          'LTR' => array(0, 11, 12, 20),
1875
      );
1876
1877
      if (in_array($tmpReturn, $charDirection['LTR'], true)) {
1878
        return 'LTR';
1879
      } elseif (in_array($tmpReturn, $charDirection['RTL'], true)) {
1880
        return 'RTL';
1881
      }
1882
    }
1883
1884
    $c = static::chr_to_decimal($char);
1885
1886
    if (!(0x5be <= $c && 0x10b7f >= $c)) {
1887
      return 'LTR';
1888
    }
1889
1890
    if (0x85e >= $c) {
1891
1892
      if (0x5be === $c ||
1893
          0x5c0 === $c ||
1894
          0x5c3 === $c ||
1895
          0x5c6 === $c ||
1896
          (0x5d0 <= $c && 0x5ea >= $c) ||
1897
          (0x5f0 <= $c && 0x5f4 >= $c) ||
1898
          0x608 === $c ||
1899
          0x60b === $c ||
1900
          0x60d === $c ||
1901
          0x61b === $c ||
1902
          (0x61e <= $c && 0x64a >= $c) ||
1903
          (0x66d <= $c && 0x66f >= $c) ||
1904
          (0x671 <= $c && 0x6d5 >= $c) ||
1905
          (0x6e5 <= $c && 0x6e6 >= $c) ||
1906
          (0x6ee <= $c && 0x6ef >= $c) ||
1907
          (0x6fa <= $c && 0x70d >= $c) ||
1908
          0x710 === $c ||
1909
          (0x712 <= $c && 0x72f >= $c) ||
1910
          (0x74d <= $c && 0x7a5 >= $c) ||
1911
          0x7b1 === $c ||
1912
          (0x7c0 <= $c && 0x7ea >= $c) ||
1913
          (0x7f4 <= $c && 0x7f5 >= $c) ||
1914
          0x7fa === $c ||
1915
          (0x800 <= $c && 0x815 >= $c) ||
1916
          0x81a === $c ||
1917
          0x824 === $c ||
1918
          0x828 === $c ||
1919
          (0x830 <= $c && 0x83e >= $c) ||
1920
          (0x840 <= $c && 0x858 >= $c) ||
1921
          0x85e === $c
1922
      ) {
1923
        return 'RTL';
1924
      }
1925
1926 9
    } elseif (0x200f === $c) {
1927
1928 9
      return 'RTL';
1929
1930 9
    } elseif (0xfb1d <= $c) {
1931 6
1932
      if (0xfb1d === $c ||
1933
          (0xfb1f <= $c && 0xfb28 >= $c) ||
1934 9
          (0xfb2a <= $c && 0xfb36 >= $c) ||
1935 7
          (0xfb38 <= $c && 0xfb3c >= $c) ||
1936
          0xfb3e === $c ||
1937
          (0xfb40 <= $c && 0xfb41 >= $c) ||
1938
          (0xfb43 <= $c && 0xfb44 >= $c) ||
1939 9
          (0xfb46 <= $c && 0xfbc1 >= $c) ||
1940 9
          (0xfbd3 <= $c && 0xfd3d >= $c) ||
1941
          (0xfd50 <= $c && 0xfd8f >= $c) ||
1942 9
          (0xfd92 <= $c && 0xfdc7 >= $c) ||
1943 9
          (0xfdf0 <= $c && 0xfdfc >= $c) ||
1944 9
          (0xfe70 <= $c && 0xfe74 >= $c) ||
1945 9
          (0xfe76 <= $c && 0xfefc >= $c) ||
1946 9
          (0x10800 <= $c && 0x10805 >= $c) ||
1947 6
          0x10808 === $c ||
1948
          (0x1080a <= $c && 0x10835 >= $c) ||
1949
          (0x10837 <= $c && 0x10838 >= $c) ||
1950 9
          0x1083c === $c ||
1951 2
          (0x1083f <= $c && 0x10855 >= $c) ||
1952 2
          (0x10857 <= $c && 0x1085f >= $c) ||
1953
          (0x10900 <= $c && 0x1091b >= $c) ||
1954 9
          (0x10920 <= $c && 0x10939 >= $c) ||
1955 4
          0x1093f === $c ||
1956 4
          0x10a00 === $c ||
1957 4
          (0x10a10 <= $c && 0x10a13 >= $c) ||
1958
          (0x10a15 <= $c && 0x10a17 >= $c) ||
1959
          (0x10a19 <= $c && 0x10a33 >= $c) ||
1960 4
          (0x10a40 <= $c && 0x10a47 >= $c) ||
1961
          (0x10a50 <= $c && 0x10a58 >= $c) ||
1962
          (0x10a60 <= $c && 0x10a7f >= $c) ||
1963 9
          (0x10b00 <= $c && 0x10b35 >= $c) ||
1964
          (0x10b40 <= $c && 0x10b55 >= $c) ||
1965 9
          (0x10b58 <= $c && 0x10b72 >= $c) ||
1966 9
          (0x10b78 <= $c && 0x10b7f >= $c)
1967
      ) {
1968 7
        return 'RTL';
1969
      }
1970 7
    }
1971 6
1972
    return 'LTR';
1973 4
  }
1974
1975 9
  /**
1976
   * get data from "/data/*.ser"
1977 9
   *
1978
   * @param string $file
1979
   *
1980 9
   * @return bool|string|array|int <p>Will return false on error.</p>
1981 9
   */
1982 9
  private static function getData($file)
1983
  {
1984 9
    $file = __DIR__ . '/data/' . $file . '.php';
1985
    if (file_exists($file)) {
1986 9
      /** @noinspection PhpIncludeInspection */
1987
      return require $file;
1988 9
    } else {
1989
      return false;
1990
    }
1991
  }
1992
1993
  /**
1994
   * alias for "UTF8::string_has_bom()"
1995
   *
1996
   * @see UTF8::string_has_bom()
1997
   *
1998
   * @param string $str
1999
   *
2000
   * @return bool
2001
   *
2002
   * @deprecated
2003
   */
2004
  public static function hasBom($str)
2005
  {
2006
    return self::string_has_bom($str);
2007
  }
2008
2009
  /**
2010
   * Converts a hexadecimal-value into an UTF-8 character.
2011
   *
2012
   * @param string $hexdec <p>The hexadecimal value.</p>
2013
   *
2014
   * @return string|false <p>One single UTF-8 character.</p>
2015
   */
2016
  public static function hex_to_chr($hexdec)
2017
  {
2018
    return self::decimal_to_chr(hexdec($hexdec));
2019
  }
2020
2021
  /**
2022
   * Converts hexadecimal U+xxxx code point representation to integer.
2023
   *
2024
   * INFO: opposite to UTF8::int_to_hex()
2025
   *
2026
   * @param string $hexdec <p>The hexadecimal code point representation.</p>
2027
   *
2028
   * @return int|false <p>The code point, or false on failure.</p>
2029
   */
2030
  public static function hex_to_int($hexdec)
2031
  {
2032
    $hexdec = (string)$hexdec;
2033
2034
    if (!isset($hexdec[0])) {
2035
      return false;
2036
    }
2037
2038
    if (preg_match('/^(?:\\\u|U\+|)([a-z0-9]{4,6})$/i', $hexdec, $match)) {
2039
      return intval($match[1], 16);
2040
    }
2041
2042
    return false;
2043
  }
2044
2045
  /**
2046
   * alias for "UTF8::html_entity_decode()"
2047
   *
2048
   * @see UTF8::html_entity_decode()
2049
   *
2050
   * @param string $str
2051
   * @param int    $flags
2052
   * @param string $encoding
2053
   *
2054
   * @return string
2055
   */
2056
  public static function html_decode($str, $flags = null, $encoding = 'UTF-8')
2057
  {
2058
    return self::html_entity_decode($str, $flags, $encoding);
2059
  }
2060
2061
  /**
2062
   * Converts a UTF-8 string to a series of HTML numbered entities.
2063
   *
2064
   * INFO: opposite to UTF8::html_decode()
2065
   *
2066
   * @param string $str            <p>The Unicode string to be encoded as numbered entities.</p>
2067
   * @param bool   $keepAsciiChars [optional] <p>Keep ASCII chars.</p>
2068
   * @param string $encoding       [optional] <p>Default is UTF-8</p>
2069
   *
2070
   * @return string <p>HTML numbered entities.</p>
2071
   */
2072
  public static function html_encode($str, $keepAsciiChars = false, $encoding = 'UTF-8')
2073
  {
2074
    // init
2075
    $str = (string)$str;
2076
2077
    if (!isset($str[0])) {
2078
      return '';
2079
    }
2080
2081
    if ($encoding !== 'UTF-8') {
2082
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2083
    }
2084
2085
    # INFO: http://stackoverflow.com/questions/35854535/better-explanation-of-convmap-in-mb-encode-numericentity
2086
    if (function_exists('mb_encode_numericentity')) {
2087
2088
      $startCode = 0x00;
2089
      if ($keepAsciiChars === true) {
2090
        $startCode = 0x80;
2091
      }
2092
2093
      return mb_encode_numericentity(
2094 2
          $str,
2095
          array($startCode, 0xfffff, 0, 0xfffff, 0),
2096 2
          $encoding
2097 1
      );
2098 1
    }
2099
2100 2
    return implode(
2101
        '',
2102 2
        array_map(
2103 1
            function ($data) use ($keepAsciiChars, $encoding) {
2104
              return UTF8::single_chr_html_encode($data, $keepAsciiChars, $encoding);
2105
            },
2106 2
            self::split($str)
2107 2
        )
2108 2
    );
2109 2
  }
2110 2
2111 1
  /**
2112
   * UTF-8 version of html_entity_decode()
2113 1
   *
2114 1
   * The reason we are not using html_entity_decode() by itself is because
2115 1
   * while it is not technically correct to leave out the semicolon
2116 1
   * at the end of an entity most browsers will still interpret the entity
2117 1
   * correctly. html_entity_decode() does not convert entities without
2118 2
   * semicolons, so we are left with our own little solution here. Bummer.
2119
   *
2120 2
   * Convert all HTML entities to their applicable characters
2121
   *
2122
   * INFO: opposite to UTF8::html_encode()
2123
   *
2124
   * @link http://php.net/manual/en/function.html-entity-decode.php
2125
   *
2126
   * @param string $str      <p>
2127
   *                         The input string.
2128
   *                         </p>
2129
   * @param int    $flags    [optional] <p>
2130
   *                         A bitmask of one or more of the following flags, which specify how to handle quotes and
2131
   *                         which document type to use. The default is ENT_COMPAT | ENT_HTML401.
2132
   *                         <table>
2133
   *                         Available <i>flags</i> constants
2134
   *                         <tr valign="top">
2135
   *                         <td>Constant Name</td>
2136
   *                         <td>Description</td>
2137
   *                         </tr>
2138
   *                         <tr valign="top">
2139
   *                         <td><b>ENT_COMPAT</b></td>
2140
   *                         <td>Will convert double-quotes and leave single-quotes alone.</td>
2141
   *                         </tr>
2142
   *                         <tr valign="top">
2143
   *                         <td><b>ENT_QUOTES</b></td>
2144
   *                         <td>Will convert both double and single quotes.</td>
2145
   *                         </tr>
2146
   *                         <tr valign="top">
2147
   *                         <td><b>ENT_NOQUOTES</b></td>
2148
   *                         <td>Will leave both double and single quotes unconverted.</td>
2149
   *                         </tr>
2150
   *                         <tr valign="top">
2151
   *                         <td><b>ENT_HTML401</b></td>
2152
   *                         <td>
2153
   *                         Handle code as HTML 4.01.
2154
   *                         </td>
2155
   *                         </tr>
2156
   *                         <tr valign="top">
2157
   *                         <td><b>ENT_XML1</b></td>
2158
   *                         <td>
2159
   *                         Handle code as XML 1.
2160
   *                         </td>
2161
   *                         </tr>
2162
   *                         <tr valign="top">
2163
   *                         <td><b>ENT_XHTML</b></td>
2164
   *                         <td>
2165
   *                         Handle code as XHTML.
2166
   *                         </td>
2167
   *                         </tr>
2168
   *                         <tr valign="top">
2169
   *                         <td><b>ENT_HTML5</b></td>
2170
   *                         <td>
2171
   *                         Handle code as HTML 5.
2172
   *                         </td>
2173
   *                         </tr>
2174
   *                         </table>
2175
   *                         </p>
2176
   * @param string $encoding [optional] <p>Encoding to use.</p>
2177
   *
2178
   * @return string <p>The decoded string.</p>
2179
   */
2180
  public static function html_entity_decode($str, $flags = null, $encoding = 'UTF-8')
2181
  {
2182
    // init
2183
    $str = (string)$str;
2184
2185
    if (!isset($str[0])) {
2186
      return '';
2187
    }
2188
2189
    if (!isset($str[3])) { // examples: &; || &x;
0 ignored issues
show
Unused Code Comprehensibility introduced by
46% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
2190
      return $str;
2191
    }
2192
2193
    if (
2194
        strpos($str, '&') === false
2195
        ||
2196
        (
2197
            strpos($str, '&#') === false
2198
            &&
2199
            strpos($str, ';') === false
2200
        )
2201
    ) {
2202
      return $str;
2203
    }
2204
2205
    if ($encoding !== 'UTF-8') {
2206
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2207
    }
2208
2209
    if ($flags === null) {
2210
      if (Bootup::is_php('5.4') === true) {
2211
        $flags = ENT_QUOTES | ENT_HTML5;
2212
      } else {
2213
        $flags = ENT_QUOTES;
2214
      }
2215
    }
2216
2217
    do {
2218
      $str_compare = $str;
2219
2220
      $str = preg_replace_callback(
2221
          "/&#\d{2,6};/",
2222
          function ($matches) use ($encoding) {
2223
            $returnTmp = \mb_convert_encoding($matches[0], $encoding, 'HTML-ENTITIES');
2224
2225
            if ($returnTmp !== '"' && $returnTmp !== "'") {
2226
              return $returnTmp;
2227
            } else {
2228
              return $matches[0];
2229
            }
2230
          },
2231
          $str
2232 1
      );
2233
2234 1
      // decode numeric & UTF16 two byte entities
2235
      $str = html_entity_decode(
2236
          preg_replace('/(&#(?:x0*[0-9a-f]{2,6}(?![0-9a-f;])|(?:0*\d{2,6}(?![0-9;]))))/iS', '$1;', $str),
2237
          $flags,
2238 1
          $encoding
2239
      );
2240
2241
    } while ($str_compare !== $str);
2242
2243
    return $str;
2244
  }
2245
2246 1
  /**
2247
   * Convert all applicable characters to HTML entities: UTF-8 version of htmlentities()
2248 1
   *
2249
   * @link http://php.net/manual/en/function.htmlentities.php
2250
   *
2251
   * @param string $str           <p>
2252
   *                              The input string.
2253
   *                              </p>
2254
   * @param int    $flags         [optional] <p>
2255
   *                              A bitmask of one or more of the following flags, which specify how to handle quotes,
2256
   *                              invalid code unit sequences and the used document type. The default is
2257
   *                              ENT_COMPAT | ENT_HTML401.
2258
   *                              <table>
2259
   *                              Available <i>flags</i> constants
2260
   *                              <tr valign="top">
2261 3
   *                              <td>Constant Name</td>
2262
   *                              <td>Description</td>
2263 3
   *                              </tr>
2264 3
   *                              <tr valign="top">
2265
   *                              <td><b>ENT_COMPAT</b></td>
2266 3
   *                              <td>Will convert double-quotes and leave single-quotes alone.</td>
2267
   *                              </tr>
2268 3
   *                              <tr valign="top">
2269
   *                              <td><b>ENT_QUOTES</b></td>
2270
   *                              <td>Will convert both double and single quotes.</td>
2271
   *                              </tr>
2272
   *                              <tr valign="top">
2273
   *                              <td><b>ENT_NOQUOTES</b></td>
2274
   *                              <td>Will leave both double and single quotes unconverted.</td>
2275
   *                              </tr>
2276
   *                              <tr valign="top">
2277
   *                              <td><b>ENT_IGNORE</b></td>
2278
   *                              <td>
2279 1
   *                              Silently discard invalid code unit sequences instead of returning
2280
   *                              an empty string. Using this flag is discouraged as it
2281 1
   *                              may have security implications.
2282
   *                              </td>
2283
   *                              </tr>
2284
   *                              <tr valign="top">
2285
   *                              <td><b>ENT_SUBSTITUTE</b></td>
2286
   *                              <td>
2287
   *                              Replace invalid code unit sequences with a Unicode Replacement Character
2288
   *                              U+FFFD (UTF-8) or &#38;#38;#FFFD; (otherwise) instead of returning an empty string.
2289 2
   *                              </td>
2290
   *                              </tr>
2291 2
   *                              <tr valign="top">
2292
   *                              <td><b>ENT_DISALLOWED</b></td>
2293
   *                              <td>
2294
   *                              Replace invalid code points for the given document type with a
2295
   *                              Unicode Replacement Character U+FFFD (UTF-8) or &#38;#38;#FFFD;
2296
   *                              (otherwise) instead of leaving them as is. This may be useful, for
2297
   *                              instance, to ensure the well-formedness of XML documents with
2298
   *                              embedded external content.
2299
   *                              </td>
2300
   *                              </tr>
2301
   *                              <tr valign="top">
2302
   *                              <td><b>ENT_HTML401</b></td>
2303 2
   *                              <td>
2304
   *                              Handle code as HTML 4.01.
2305 2
   *                              </td>
2306
   *                              </tr>
2307
   *                              <tr valign="top">
2308
   *                              <td><b>ENT_XML1</b></td>
2309
   *                              <td>
2310
   *                              Handle code as XML 1.
2311
   *                              </td>
2312
   *                              </tr>
2313
   *                              <tr valign="top">
2314
   *                              <td><b>ENT_XHTML</b></td>
2315
   *                              <td>
2316
   *                              Handle code as XHTML.
2317 1
   *                              </td>
2318
   *                              </tr>
2319 1
   *                              <tr valign="top">
2320
   *                              <td><b>ENT_HTML5</b></td>
2321
   *                              <td>
2322
   *                              Handle code as HTML 5.
2323
   *                              </td>
2324
   *                              </tr>
2325
   *                              </table>
2326
   *                              </p>
2327
   * @param string $encoding      [optional] <p>
2328
   *                              Like <b>htmlspecialchars</b>,
2329
   *                              <b>htmlentities</b> takes an optional third argument
2330
   *                              <i>encoding</i> which defines encoding used in
2331
   *                              conversion.
2332
   *                              Although this argument is technically optional, you are highly
2333
   *                              encouraged to specify the correct value for your code.
2334
   *                              </p>
2335
   * @param bool   $double_encode [optional] <p>
2336
   *                              When <i>double_encode</i> is turned off PHP will not
2337
   *                              encode existing html entities. The default is to convert everything.
2338
   *                              </p>
2339
   *
2340
   *
2341
   * @return string the encoded string.
2342
   * </p>
2343
   * <p>
2344
   * If the input <i>string</i> contains an invalid code unit
2345
   * sequence within the given <i>encoding</i> an empty string
2346
   * will be returned, unless either the <b>ENT_IGNORE</b> or
2347
   * <b>ENT_SUBSTITUTE</b> flags are set.
2348
   */
2349
  public static function htmlentities($str, $flags = ENT_COMPAT, $encoding = 'UTF-8', $double_encode = true)
2350
  {
2351
    if ($encoding !== 'UTF-8') {
2352
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2353
    }
2354
2355
    $str = htmlentities($str, $flags, $encoding, $double_encode);
2356
2357
    if ($encoding !== 'UTF-8') {
2358
      return $str;
2359 1
    }
2360
2361 1
    $byteLengths = self::chr_size_list($str);
2362
    $search = array();
2363
    $replacements = array();
2364
    foreach ($byteLengths as $counter => $byteLength) {
2365
      if ($byteLength >= 3) {
2366
        $char = self::access($str, $counter);
2367
2368
        if (!isset($replacements[$char])) {
2369
          $search[$char] = $char;
2370
          $replacements[$char] = self::html_encode($char);
0 ignored issues
show
Security Bug introduced by
It seems like $char defined by self::access($str, $counter) on line 2366 can also be of type false; however, voku\helper\UTF8::html_encode() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
2371
        }
2372
      }
2373
    }
2374
2375
    return str_replace($search, $replacements, $str);
2376
  }
2377
2378
  /**
2379
   * Convert only special characters to HTML entities: UTF-8 version of htmlspecialchars()
2380
   *
2381
   * INFO: Take a look at "UTF8::htmlentities()"
2382
   *
2383
   * @link http://php.net/manual/en/function.htmlspecialchars.php
2384
   *
2385
   * @param string $str           <p>
2386
   *                              The string being converted.
2387 1
   *                              </p>
2388
   * @param int    $flags         [optional] <p>
2389 1
   *                              A bitmask of one or more of the following flags, which specify how to handle quotes,
2390
   *                              invalid code unit sequences and the used document type. The default is
2391
   *                              ENT_COMPAT | ENT_HTML401.
2392
   *                              <table>
2393
   *                              Available <i>flags</i> constants
2394
   *                              <tr valign="top">
2395
   *                              <td>Constant Name</td>
2396
   *                              <td>Description</td>
2397
   *                              </tr>
2398
   *                              <tr valign="top">
2399
   *                              <td><b>ENT_COMPAT</b></td>
2400
   *                              <td>Will convert double-quotes and leave single-quotes alone.</td>
2401 1
   *                              </tr>
2402
   *                              <tr valign="top">
2403 1
   *                              <td><b>ENT_QUOTES</b></td>
2404
   *                              <td>Will convert both double and single quotes.</td>
2405
   *                              </tr>
2406
   *                              <tr valign="top">
2407
   *                              <td><b>ENT_NOQUOTES</b></td>
2408
   *                              <td>Will leave both double and single quotes unconverted.</td>
2409
   *                              </tr>
2410
   *                              <tr valign="top">
2411
   *                              <td><b>ENT_IGNORE</b></td>
2412
   *                              <td>
2413
   *                              Silently discard invalid code unit sequences instead of returning
2414
   *                              an empty string. Using this flag is discouraged as it
2415
   *                              may have security implications.
2416 16
   *                              </td>
2417
   *                              </tr>
2418 16
   *                              <tr valign="top">
2419
   *                              <td><b>ENT_SUBSTITUTE</b></td>
2420
   *                              <td>
2421
   *                              Replace invalid code unit sequences with a Unicode Replacement Character
2422
   *                              U+FFFD (UTF-8) or &#38;#38;#FFFD; (otherwise) instead of returning an empty string.
2423
   *                              </td>
2424
   *                              </tr>
2425
   *                              <tr valign="top">
2426
   *                              <td><b>ENT_DISALLOWED</b></td>
2427
   *                              <td>
2428
   *                              Replace invalid code points for the given document type with a
2429
   *                              Unicode Replacement Character U+FFFD (UTF-8) or &#38;#38;#FFFD;
2430
   *                              (otherwise) instead of leaving them as is. This may be useful, for
2431 28
   *                              instance, to ensure the well-formedness of XML documents with
2432
   *                              embedded external content.
2433 28
   *                              </td>
2434
   *                              </tr>
2435 28
   *                              <tr valign="top">
2436 5
   *                              <td><b>ENT_HTML401</b></td>
2437
   *                              <td>
2438
   *                              Handle code as HTML 4.01.
2439 28
   *                              </td>
2440
   *                              </tr>
2441
   *                              <tr valign="top">
2442
   *                              <td><b>ENT_XML1</b></td>
2443
   *                              <td>
2444
   *                              Handle code as XML 1.
2445
   *                              </td>
2446
   *                              </tr>
2447
   *                              <tr valign="top">
2448
   *                              <td><b>ENT_XHTML</b></td>
2449 1
   *                              <td>
2450
   *                              Handle code as XHTML.
2451 1
   *                              </td>
2452
   *                              </tr>
2453 1
   *                              <tr valign="top">
2454 1
   *                              <td><b>ENT_HTML5</b></td>
2455
   *                              <td>
2456
   *                              Handle code as HTML 5.
2457 1
   *                              </td>
2458 1
   *                              </tr>
2459
   *                              </table>
2460 1
   *                              </p>
2461
   * @param string $encoding      [optional] <p>
2462
   *                              Defines encoding used in conversion.
2463
   *                              </p>
2464
   *                              <p>
2465
   *                              For the purposes of this function, the encodings
2466
   *                              ISO-8859-1, ISO-8859-15,
2467
   *                              UTF-8, cp866,
2468
   *                              cp1251, cp1252, and
2469
   *                              KOI8-R are effectively equivalent, provided the
2470
   *                              <i>string</i> itself is valid for the encoding, as
2471 16
   *                              the characters affected by <b>htmlspecialchars</b> occupy
2472
   *                              the same positions in all of these encodings.
2473
   *                              </p>
2474 16
   * @param bool   $double_encode [optional] <p>
2475
   *                              When <i>double_encode</i> is turned off PHP will not
2476
   *                              encode existing html entities, the default is to convert everything.
2477 16
   *                              </p>
2478
   *
2479 16
   * @return string The converted string.
2480 16
   * </p>
2481 15
   * <p>
2482 16
   * If the input <i>string</i> contains an invalid code unit
2483 6
   * sequence within the given <i>encoding</i> an empty string
2484
   * will be returned, unless either the <b>ENT_IGNORE</b> or
2485 15
   * <b>ENT_SUBSTITUTE</b> flags are set.
2486
   */
2487
  public static function htmlspecialchars($str, $flags = ENT_COMPAT, $encoding = 'UTF-8', $double_encode = true)
2488
  {
2489
    if ($encoding !== 'UTF-8') {
2490
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2491
    }
2492
2493
    return htmlspecialchars($str, $flags, $encoding, $double_encode);
2494
  }
2495
2496
  /**
2497
   * Checks whether iconv is available on the server.
2498
   *
2499
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
2500
   */
2501
  public static function iconv_loaded()
2502
  {
2503
    $return = extension_loaded('iconv') ? true : false;
2504
2505
    // INFO: "default_charset" is already set by the "Bootup"-class
2506
2507
    if (Bootup::is_php('5.6') === false) {
2508
      // INFO: "iconv_set_encoding" is deprecated since PHP >= 5.6
2509
      iconv_set_encoding('input_encoding', 'UTF-8');
2510
      iconv_set_encoding('output_encoding', 'UTF-8');
2511
      iconv_set_encoding('internal_encoding', 'UTF-8');
2512
    }
2513
2514
    return $return;
2515
  }
2516
2517
  /**
2518
   * alias for "UTF8::decimal_to_chr()"
2519
   *
2520
   * @see UTF8::decimal_to_chr()
2521
   *
2522
   * @param int $int
2523
   *
2524
   * @return string
2525
   */
2526
  public static function int_to_chr($int)
2527
  {
2528
    return self::decimal_to_chr($int);
2529
  }
2530
2531
  /**
2532
   * Converts Integer to hexadecimal U+xxxx code point representation.
2533
   *
2534
   * INFO: opposite to UTF8::hex_to_int()
2535
   *
2536 1
   * @param int    $int  <p>The integer to be converted to hexadecimal code point.</p>
2537
   * @param string $pfix [optional]
2538 1
   *
2539
   * @return string <p>The code point, or empty string on failure.</p>
2540 1
   */
2541
  public static function int_to_hex($int, $pfix = 'U+')
2542
  {
2543
    if ((int)$int === $int) {
2544
      $hex = dechex($int);
2545 1
2546
      $hex = (strlen($hex) < 4 ? substr('0000' . $hex, -4) : $hex);
2547 1
2548
      return $pfix . $hex;
2549 1
    }
2550 1
2551
    return '';
2552 1
  }
2553
2554
  /**
2555
   * Checks whether intl-char is available on the server.
2556
   *
2557
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
2558
   */
2559
  public static function intlChar_loaded()
2560
  {
2561
    return (
2562
        Bootup::is_php('7.0') === true
2563 1
        &&
2564
        class_exists('IntlChar') === true
2565 1
    );
2566
  }
2567 1
2568
  /**
2569
   * Checks whether intl is available on the server.
2570
   *
2571
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
2572 1
   */
2573 1
  public static function intl_loaded()
2574 1
  {
2575 1
    return extension_loaded('intl') ? true : false;
2576 1
  }
2577
2578 1
  /**
2579
   * alias for "UTF8::is_ascii()"
2580
   *
2581
   * @see UTF8::is_ascii()
2582
   *
2583
   * @param string $str
2584
   *
2585
   * @return boolean
2586
   *
2587
   * @deprecated
2588
   */
2589
  public static function isAscii($str)
2590
  {
2591
    return self::is_ascii($str);
2592
  }
2593 4
2594
  /**
2595 4
   * alias for "UTF8::is_base64()"
2596
   *
2597 4
   * @see UTF8::is_base64()
2598
   *
2599 4
   * @param string $str
2600 4
   *
2601 4
   * @return bool
2602 4
   *
2603 4
   * @deprecated
2604 4
   */
2605 4
  public static function isBase64($str)
2606 4
  {
2607 4
    return self::is_base64($str);
2608 2
  }
2609 2
2610 4
  /**
2611 4
   * alias for "UTF8::is_binary()"
2612 4
   *
2613
   * @see UTF8::is_binary()
2614 4
   *
2615 4
   * @param string $str
2616 4
   *
2617 4
   * @return bool
2618 4
   *
2619 4
   * @deprecated
2620 4
   */
2621 4
  public static function isBinary($str)
2622 4
  {
2623 3
    return self::is_binary($str);
2624 3
  }
2625 4
2626 4
  /**
2627 4
   * alias for "UTF8::is_bom()"
2628
   *
2629 4
   * @see UTF8::is_bom()
2630 3
   *
2631 2
   * @param string $utf8_chr
2632
   *
2633 3
   * @return boolean
2634
   *
2635
   * @deprecated
2636
   */
2637 3
  public static function isBom($utf8_chr)
2638
  {
2639 3
    return self::is_bom($utf8_chr);
2640
  }
2641
2642
  /**
2643
   * alias for "UTF8::is_html()"
2644
   *
2645
   * @see UTF8::is_html()
2646
   *
2647
   * @param string $str
2648
   *
2649
   * @return boolean
2650
   *
2651
   * @deprecated
2652
   */
2653 3
  public static function isHtml($str)
2654
  {
2655 3
    return self::is_html($str);
2656
  }
2657 3
2658
  /**
2659 3
   * alias for "UTF8::is_json()"
2660 3
   *
2661 3
   * @see UTF8::is_json()
2662 3
   *
2663 3
   * @param string $str
2664 3
   *
2665 3
   * @return bool
2666 3
   *
2667 3
   * @deprecated
2668 1
   */
2669 1
  public static function isJson($str)
2670 3
  {
2671 3
    return self::is_json($str);
2672 3
  }
2673
2674 3
  /**
2675 3
   * alias for "UTF8::is_utf16()"
2676 3
   *
2677 3
   * @see UTF8::is_utf16()
2678 3
   *
2679 3
   * @param string $str
2680 3
   *
2681 3
   * @return int|false false if is't not UTF16, 1 for UTF-16LE, 2 for UTF-16BE.
2682 3
   *
2683 1
   * @deprecated
2684 1
   */
2685 3
  public static function isUtf16($str)
2686 3
  {
2687 3
    return self::is_utf16($str);
2688
  }
2689 3
2690 1
  /**
2691 1
   * alias for "UTF8::is_utf32()"
2692
   *
2693 1
   * @see UTF8::is_utf32()
2694
   *
2695
   * @param string $str
2696
   *
2697 3
   * @return int|false false if is't not UTF16, 1 for UTF-32LE, 2 for UTF-32BE.
2698
   *
2699 3
   * @deprecated
2700
   */
2701
  public static function isUtf32($str)
2702
  {
2703
    return self::is_utf32($str);
2704
  }
2705
2706
  /**
2707
   * alias for "UTF8::is_utf8()"
2708
   *
2709
   * @see UTF8::is_utf8()
2710
   *
2711
   * @param string $str
2712 43
   * @param bool   $strict
2713
   *
2714 43
   * @return bool
2715
   *
2716 43
   * @deprecated
2717 3
   */
2718
  public static function isUtf8($str, $strict = false)
2719
  {
2720 41
    return self::is_utf8($str, $strict);
2721 1
  }
2722 1
2723
  /**
2724
   * Checks if a string is 7 bit ASCII.
2725
   *
2726
   * @param string $str <p>The string to check.</p>
2727
   *
2728
   * @return bool <p>
2729
   *              <strong>true</strong> if it is ASCII<br />
2730 41
   *              <strong>false</strong> otherwise
2731
   *              </p>
2732
   */
2733
  public static function is_ascii($str)
2734
  {
2735
    $str = (string)$str;
2736
2737
    if (!isset($str[0])) {
2738
      return true;
2739
    }
2740 41
2741
    return (bool)!preg_match('/[\x80-\xFF]/', $str);
2742 41
  }
2743 41
2744 41
  /**
2745
   * Returns true if the string is base64 encoded, false otherwise.
2746
   *
2747 41
   * @param string $str <p>The input string.</p>
2748 41
   *
2749 41
   * @return bool <p>Whether or not $str is base64 encoded.</p>
2750
   */
2751
  public static function is_base64($str)
2752 41
  {
2753
    $str = (string)$str;
2754 36
2755 41
    if (!isset($str[0])) {
2756
      return false;
2757 34
    }
2758 34
2759 34
    $base64String = (string)base64_decode($str, true);
2760 34
    if ($base64String && base64_encode($base64String) === $str) {
2761 39
      return true;
2762
    } else {
2763 21
      return false;
2764 21
    }
2765 21
  }
2766 21
2767 33
  /**
2768
   * Check if the input is binary... (is look like a hack).
2769 9
   *
2770 9
   * @param mixed $input
2771 9
   *
2772 9
   * @return bool
2773 16
   */
2774
  public static function is_binary($input)
2775
  {
2776
    $input = (string)$input;
2777
2778
    if (!isset($input[0])) {
2779
      return false;
2780
    }
2781
2782 3
    if (preg_match('~^[01]+$~', $input)) {
2783 3
      return true;
2784 3
    }
2785 3
2786 9
    $testLength = strlen($input);
2787
    if ($testLength && substr_count($input, "\x0") / $testLength > 0.3) {
2788 3
      return true;
2789 3
    }
2790 3
2791 3
    if (substr_count($input, "\x00") > 0) {
2792 3
      return true;
2793
    }
2794
2795
    return false;
2796 5
  }
2797
2798 41
  /**
2799
   * Check if the file is binary.
2800
   *
2801 36
   * @param string $file
2802
   *
2803 33
   * @return boolean
2804 33
   */
2805 33
  public static function is_binary_file($file)
2806 33
  {
2807
    try {
2808
      $fp = fopen($file, 'rb');
2809
      $block = fread($fp, 512);
2810
      fclose($fp);
2811 33
    } catch (\Exception $e) {
2812
      $block = '';
2813
    }
2814
2815
    return self::is_binary($block);
2816
  }
2817 33
2818 33
  /**
2819 33
   * Checks if the given string is equal to any "Byte Order Mark".
2820 33
   *
2821
   * WARNING: Use "UTF8::string_has_bom()" if you will check BOM in a string.
2822 33
   *
2823
   * @param string $str <p>The input string.</p>
2824 33
   *
2825 33
   * @return bool <p><strong>true</strong> if the $utf8_chr is Byte Order Mark, <strong>false</strong> otherwise.</p>
2826 5
   */
2827
  public static function is_bom($str)
2828
  {
2829 33
    foreach (self::$BOM as $bomString => $bomByteLength) {
2830 33
      if ($str === $bomString) {
2831 33
        return true;
2832 33
      }
2833 33
    }
2834
2835
    return false;
2836
  }
2837
2838 18
  /**
2839
   * Check if the string contains any html-tags <lall>.
2840
   *
2841 41
   * @param string $str <p>The input string.</p>
2842
   *
2843 20
   * @return boolean
2844
   */
2845
  public static function is_html($str)
2846
  {
2847
    $str = (string)$str;
2848
2849
    if (!isset($str[0])) {
2850
      return false;
2851
    }
2852
2853
    // init
2854
    $matches = array();
2855
2856
    preg_match("/<\/?\w+(?:(?:\s+\w+(?:\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)*+\s*|\s*)\/?>/", $str, $matches);
2857
2858
    if (count($matches) === 0) {
2859
      return false;
2860
    } else {
2861
      return true;
2862
    }
2863
  }
2864
2865
  /**
2866
   * Try to check if "$str" is an json-string.
2867
   *
2868
   * @param string $str <p>The input string.</p>
2869
   *
2870
   * @return bool
2871
   */
2872
  public static function is_json($str)
2873
  {
2874
    $str = (string)$str;
2875
2876
    if (!isset($str[0])) {
2877
      return false;
2878
    }
2879
2880
    $json = self::json_decode($str);
2881
2882
    if (
2883 2
        (
2884
            is_object($json) === true
2885 2
            ||
2886
            is_array($json) === true
2887 2
        )
2888 2
        &&
2889 2
        json_last_error() === JSON_ERROR_NONE
2890
    ) {
2891
      return true;
2892
    } else {
2893 2
      return false;
2894
    }
2895
  }
2896
2897
  /**
2898
   * Check if the string is UTF-16.
2899
   *
2900
   * @param string $str <p>The input string.</p>
2901
   *
2902
   * @return int|false <p>
2903
   *                   <strong>false</strong> if is't not UTF-16,<br />
2904
   *                   <strong>1</strong> for UTF-16LE,<br />
2905
   *                   <strong>2</strong> for UTF-16BE.
2906
   *                   </p>
2907
   */
2908 View Code Duplication
  public static function is_utf16($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
2909
  {
2910
    $str = self::remove_bom($str);
2911
2912
    if (self::is_binary($str) === true) {
2913
2914
      $maybeUTF16LE = 0;
2915
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-16LE');
2916
      if ($test) {
2917
        $test2 = \mb_convert_encoding($test, 'UTF-16LE', 'UTF-8');
2918
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-16LE');
2919
        if ($test3 === $test) {
2920
          $strChars = self::count_chars($str, true);
2921
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
2922
            if (in_array($test3char, $strChars, true) === true) {
2923
              $maybeUTF16LE++;
2924
            }
2925
          }
2926
        }
2927
      }
2928
2929
      $maybeUTF16BE = 0;
2930
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-16BE');
2931
      if ($test) {
2932 2
        $test2 = \mb_convert_encoding($test, 'UTF-16BE', 'UTF-8');
2933
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-16BE');
2934 2
        if ($test3 === $test) {
2935
          $strChars = self::count_chars($str, true);
2936 2
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
2937
            if (in_array($test3char, $strChars, true) === true) {
2938
              $maybeUTF16BE++;
2939 2
            }
2940
          }
2941
        }
2942 2
      }
2943
2944
      if ($maybeUTF16BE !== $maybeUTF16LE) {
2945
        if ($maybeUTF16LE > $maybeUTF16BE) {
2946
          return 1;
2947
        } else {
2948
          return 2;
2949
        }
2950
      }
2951
2952 6
    }
2953
2954 6
    return false;
2955
  }
2956
2957
  /**
2958
   * Check if the string is UTF-32.
2959
   *
2960
   * @param string $str
2961
   *
2962
   * @return int|false <p>
2963
   *                   <strong>false</strong> if is't not UTF-16,<br />
2964
   *                   <strong>1</strong> for UTF-32LE,<br />
2965 24
   *                   <strong>2</strong> for UTF-32BE.
2966
   *                   </p>
2967 24
   */
2968 View Code Duplication
  public static function is_utf32($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
2969 24
  {
2970 2
    $str = self::remove_bom($str);
2971
2972
    if (self::is_binary($str) === true) {
2973
2974 23
      $maybeUTF32LE = 0;
2975 2
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-32LE');
2976
      if ($test) {
2977
        $test2 = \mb_convert_encoding($test, 'UTF-32LE', 'UTF-8');
2978 23
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-32LE');
2979
        if ($test3 === $test) {
2980 23
          $strChars = self::count_chars($str, true);
2981
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
2982
            if (in_array($test3char, $strChars, true) === true) {
2983
              $maybeUTF32LE++;
2984
            }
2985
          }
2986
        }
2987
      }
2988
2989
      $maybeUTF32BE = 0;
2990 1
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-32BE');
2991
      if ($test) {
2992 1
        $test2 = \mb_convert_encoding($test, 'UTF-32BE', 'UTF-8');
2993
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-32BE');
2994
        if ($test3 === $test) {
2995
          $strChars = self::count_chars($str, true);
2996 1
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
2997
            if (in_array($test3char, $strChars, true) === true) {
2998
              $maybeUTF32BE++;
2999
            }
3000
          }
3001
        }
3002
      }
3003
3004
      if ($maybeUTF32BE !== $maybeUTF32LE) {
3005
        if ($maybeUTF32LE > $maybeUTF32BE) {
3006
          return 1;
3007 1
        } else {
3008
          return 2;
3009 1
        }
3010 1
      }
3011 1
3012
    }
3013 1
3014
    return false;
3015
  }
3016
3017
  /**
3018
   * Checks whether the passed string contains only byte sequences that appear valid UTF-8 characters.
3019
   *
3020
   * @see    http://hsivonen.iki.fi/php-utf8/
3021
   *
3022 2
   * @param string $str    <p>The string to be checked.</p>
3023
   * @param bool   $strict <p>Check also if the string is not UTF-16 or UTF-32.</p>
3024 2
   *
3025
   * @return bool
3026 2
   */
3027 2
  public static function is_utf8($str, $strict = false)
3028 2
  {
3029
    $str = (string)$str;
3030 2
3031
    if (!isset($str[0])) {
3032
      return true;
3033
    }
3034
3035
    if ($strict === true) {
3036
      if (self::is_utf16($str) !== false) {
3037
        return false;
3038
      }
3039
3040 1
      if (self::is_utf32($str) !== false) {
3041
        return false;
3042 1
      }
3043
    }
3044
3045
    if (self::pcre_utf8_support() !== true) {
3046 1
3047
      // If even just the first character can be matched, when the /u
3048
      // modifier is used, then it's valid UTF-8. If the UTF-8 is somehow
3049
      // invalid, nothing at all will match, even if the string contains
3050
      // some valid sequences
3051
      return (preg_match('/^.{1}/us', $str, $ar) === 1);
3052
3053
    } else {
3054
3055
      $mState = 0; // cached expected number of octets after the current octet
3056
      // until the beginning of the next UTF8 character sequence
3057
      $mUcs4 = 0; // cached Unicode character
3058 1
      $mBytes = 1; // cached expected number of octets in the current sequence
3059
      $len = strlen($str);
3060 1
3061
      /** @noinspection ForeachInvariantsInspection */
3062
      for ($i = 0; $i < $len; $i++) {
3063
        $in = ord($str[$i]);
3064
        if ($mState === 0) {
3065
          // When mState is zero we expect either a US-ASCII character or a
3066
          // multi-octet sequence.
3067
          if (0 === (0x80 & $in)) {
3068
            // US-ASCII, pass straight through.
3069
            $mBytes = 1;
3070 16 View Code Duplication
          } elseif (0xC0 === (0xE0 & $in)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3071
            // First octet of 2 octet sequence.
3072 16
            $mUcs4 = $in;
3073
            $mUcs4 = ($mUcs4 & 0x1F) << 6;
3074 16
            $mState = 1;
3075 2
            $mBytes = 2;
3076
          } elseif (0xE0 === (0xF0 & $in)) {
3077
            // First octet of 3 octet sequence.
3078 16
            $mUcs4 = $in;
3079 1
            $mUcs4 = ($mUcs4 & 0x0F) << 12;
3080
            $mState = 2;
3081
            $mBytes = 3;
3082 16 View Code Duplication
          } elseif (0xF0 === (0xF8 & $in)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3083 4
            // First octet of 4 octet sequence.
3084
            $mUcs4 = $in;
3085
            $mUcs4 = ($mUcs4 & 0x07) << 18;
3086 15
            $mState = 3;
3087 14
            $mBytes = 4;
3088
          } elseif (0xF8 === (0xFC & $in)) {
3089
            /* First octet of 5 octet sequence.
3090 4
            *
3091 4
            * This is illegal because the encoded codepoint must be either
3092 4
            * (a) not the shortest form or
3093
            * (b) outside the Unicode range of 0-0x10FFFF.
3094
            * Rather than trying to resynchronize, we will carry on until the end
3095 4
            * of the sequence and let the later error handling code catch it.
3096 4
            */
3097 4
            $mUcs4 = $in;
3098 4
            $mUcs4 = ($mUcs4 & 0x03) << 24;
3099 4
            $mState = 4;
3100 4
            $mBytes = 5;
3101 4 View Code Duplication
          } elseif (0xFC === (0xFE & $in)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3102 4
            // First octet of 6 octet sequence, see comments for 5 octet sequence.
3103 4
            $mUcs4 = $in;
3104 4
            $mUcs4 = ($mUcs4 & 1) << 30;
3105 4
            $mState = 5;
3106 4
            $mBytes = 6;
3107 4
          } else {
3108 4
            /* Current octet is neither in the US-ASCII range nor a legal first
3109 4
             * octet of a multi-octet sequence.
3110
             */
3111 4
            return false;
3112 4
          }
3113 4
        } else {
3114
          // When mState is non-zero, we expect a continuation of the multi-octet
3115 4
          // sequence
3116
          if (0x80 === (0xC0 & $in)) {
3117 4
            // Legal continuation.
3118
            $shift = ($mState - 1) * 6;
3119
            $tmp = $in;
3120
            $tmp = ($tmp & 0x0000003F) << $shift;
3121
            $mUcs4 |= $tmp;
3122
            /**
3123
             * End of the multi-octet sequence. mUcs4 now contains the final
3124
             * Unicode code point to be output
3125
             */
3126
            if (0 === --$mState) {
3127 13
              /*
3128
              * Check for illegal sequences and code points.
3129 13
              */
3130 13
              // From Unicode 3.1, non-shortest form is illegal
3131
              if (
3132 13
                  (2 === $mBytes && $mUcs4 < 0x0080) ||
3133 1
                  (3 === $mBytes && $mUcs4 < 0x0800) ||
3134 1
                  (4 === $mBytes && $mUcs4 < 0x10000) ||
3135 1
                  (4 < $mBytes) ||
3136
                  // From Unicode 3.2, surrogate characters are illegal.
3137 13
                  (($mUcs4 & 0xFFFFF800) === 0xD800) ||
3138
                  // Code points outside the Unicode range are illegal.
3139
                  ($mUcs4 > 0x10FFFF)
3140
              ) {
3141
                return false;
3142
              }
3143
              // initialize UTF8 cache
3144
              $mState = 0;
3145
              $mUcs4 = 0;
3146
              $mBytes = 1;
3147
            }
3148
          } else {
3149
            /**
3150 18
             *((0xC0 & (*in) != 0x80) && (mState != 0))
3151
             * Incomplete multi-octet sequence.
3152 18
             */
3153 18
            return false;
3154
          }
3155 18
        }
3156
      }
3157 18
3158
      return true;
3159 2
    }
3160
  }
3161 2
3162
  /**
3163 1
   * (PHP 5 &gt;= 5.2.0, PECL json &gt;= 1.2.0)<br/>
3164 1
   * Decodes a JSON string
3165
   *
3166 2
   * @link http://php.net/manual/en/function.json-decode.php
3167 2
   *
3168
   * @param string $json    <p>
3169 18
   *                        The <i>json</i> string being decoded.
3170 18
   *                        </p>
3171 1
   *                        <p>
3172 1
   *                        This function only works with UTF-8 encoded strings.
3173
   *                        </p>
3174 18
   *                        <p>PHP implements a superset of
3175 18
   *                        JSON - it will also encode and decode scalar types and <b>NULL</b>. The JSON standard
3176
   *                        only supports these values when they are nested inside an array or an object.
3177 18
   *                        </p>
3178
   * @param bool   $assoc   [optional] <p>
3179
   *                        When <b>TRUE</b>, returned objects will be converted into
3180
   *                        associative arrays.
3181
   *                        </p>
3182
   * @param int    $depth   [optional] <p>
3183
   *                        User specified recursion depth.
3184
   *                        </p>
3185
   * @param int    $options [optional] <p>
3186
   *                        Bitmask of JSON decode options. Currently only
3187
   *                        <b>JSON_BIGINT_AS_STRING</b>
3188
   *                        is supported (default is to cast large integers as floats)
3189
   *                        </p>
3190
   *
3191
   * @return mixed the value encoded in <i>json</i> in appropriate
3192
   * PHP type. Values true, false and
3193
   * null (case-insensitive) are returned as <b>TRUE</b>, <b>FALSE</b>
3194
   * and <b>NULL</b> respectively. <b>NULL</b> is returned if the
3195
   * <i>json</i> cannot be decoded or if the encoded
3196
   * data is deeper than the recursion limit.
3197
   */
3198 View Code Duplication
  public static function json_decode($json, $assoc = false, $depth = 512, $options = 0)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3199
  {
3200
    $json = (string)self::filter($json);
3201
3202
    if (Bootup::is_php('5.4') === true) {
3203
      $json = json_decode($json, $assoc, $depth, $options);
3204
    } else {
3205
      $json = json_decode($json, $assoc, $depth);
3206
    }
3207
3208
    return $json;
3209
  }
3210
3211
  /**
3212
   * (PHP 5 &gt;= 5.2.0, PECL json &gt;= 1.2.0)<br/>
3213
   * Returns the JSON representation of a value.
3214
   *
3215
   * @link http://php.net/manual/en/function.json-encode.php
3216
   *
3217
   * @param mixed $value   <p>
3218
   *                       The <i>value</i> being encoded. Can be any type except
3219
   *                       a resource.
3220
   *                       </p>
3221
   *                       <p>
3222
   *                       All string data must be UTF-8 encoded.
3223
   *                       </p>
3224
   *                       <p>PHP implements a superset of
3225
   *                       JSON - it will also encode and decode scalar types and <b>NULL</b>. The JSON standard
3226
   *                       only supports these values when they are nested inside an array or an object.
3227
   *                       </p>
3228
   * @param int   $options [optional] <p>
3229
   *                       Bitmask consisting of <b>JSON_HEX_QUOT</b>,
3230 17
   *                       <b>JSON_HEX_TAG</b>,
3231
   *                       <b>JSON_HEX_AMP</b>,
3232 17
   *                       <b>JSON_HEX_APOS</b>,
3233 3
   *                       <b>JSON_NUMERIC_CHECK</b>,
3234
   *                       <b>JSON_PRETTY_PRINT</b>,
3235
   *                       <b>JSON_UNESCAPED_SLASHES</b>,
3236 16
   *                       <b>JSON_FORCE_OBJECT</b>,
3237
   *                       <b>JSON_UNESCAPED_UNICODE</b>. The behaviour of these
3238
   *                       constants is described on
3239
   *                       the JSON constants page.
3240 16
   *                       </p>
3241
   * @param int   $depth   [optional] <p>
3242
   *                       Set the maximum depth. Must be greater than zero.
3243
   *                       </p>
3244
   *
3245
   * @return string a JSON encoded string on success or <b>FALSE</b> on failure.
3246
   */
3247 View Code Duplication
  public static function json_encode($value, $options = 0, $depth = 512)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3248 16
  {
3249 16
    $value = self::filter($value);
3250 15
3251
    if (Bootup::is_php('5.5') === true) {
3252
      $json = json_encode($value, $options, $depth);
3253 9
    } else {
3254 9
      $json = json_encode($value, $options);
3255 9
    }
3256
3257 9
    return $json;
3258 1
  }
3259
3260
  /**
3261 9
   * Makes string's first char lowercase.
3262 4
   *
3263
   * @param string $str <p>The input string</p>
3264
   *
3265 9
   * @return string <p>The resulting string</p>
3266 5
   */
3267
  public static function lcfirst($str)
3268
  {
3269 9
    return self::strtolower(self::substr($str, 0, 1)) . self::substr($str, 1);
0 ignored issues
show
Security Bug introduced by
It seems like self::substr($str, 0, 1) targeting voku\helper\UTF8::substr() can also be of type false; however, voku\helper\UTF8::strtolower() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
3270
  }
3271
3272
  /**
3273
   * Strip whitespace or other characters from beginning of a UTF-8 string.
3274
   *
3275
   * @param string $str   <p>The string to be trimmed</p>
3276
   * @param string $chars <p>Optional characters to be stripped</p>
3277
   *
3278
   * @return string <p>The string with unwanted characters stripped from the left.</p>
3279
   */
3280 View Code Duplication
  public static function ltrim($str = '', $chars = INF)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3281
  {
3282
    $str = (string)$str;
3283
3284
    if (!isset($str[0])) {
3285 1
      return '';
3286
    }
3287
3288 1
    // Info: http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page#Unicodecharactercategories
3289
    if ($chars === INF || !$chars) {
3290 1
      return preg_replace('/^[\pZ\pC]+/u', '', $str);
3291 1
    }
3292 1
3293
    return preg_replace('/^' . self::rxClass($chars) . '+/u', '', $str);
3294
  }
3295 1
3296
  /**
3297
   * Returns the UTF-8 character with the maximum code point in the given data.
3298
   *
3299
   * @param mixed $arg <p>A UTF-8 encoded string or an array of such strings.</p>
3300
   *
3301
   * @return string <p>The character with the highest code point than others.</p>
3302
   */
3303 41 View Code Duplication
  public static function max($arg)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3304
  {
3305
    if (is_array($arg) === true) {
3306 41
      $arg = implode('', $arg);
3307
    }
3308
3309
    return self::chr(max(self::codepoints($arg)));
3310
  }
3311
3312
  /**
3313
   * Calculates and returns the maximum number of bytes taken by any
3314
   * UTF-8 encoded character in the given string.
3315
   *
3316
   * @param string $str <p>The original Unicode string.</p>
3317 1
   *
3318
   * @return int <p>Max byte lengths of the given chars.</p>
3319 1
   */
3320 1
  public static function max_chr_width($str)
3321
  {
3322
    $bytes = self::chr_size_list($str);
3323 1
    if (count($bytes) > 0) {
3324 1
      return (int)max($bytes);
3325 1
    } else {
3326
      return 0;
3327
    }
3328 1
  }
3329
3330
  /**
3331 1
   * Checks whether mbstring is available on the server.
3332
   *
3333
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
3334
   */
3335 1
  public static function mbstring_loaded()
3336 1
  {
3337 1
    $return = extension_loaded('mbstring') ? true : false;
3338
3339
    if ($return === true) {
3340 1
      \mb_internal_encoding('UTF-8');
3341
    }
3342
3343 1
    return $return;
3344
  }
3345
3346
  /**
3347 1
   * Returns the UTF-8 character with the minimum code point in the given data.
3348
   *
3349 1
   * @param mixed $arg <strong>A UTF-8 encoded string or an array of such strings.</strong>
3350 1
   *
3351 1
   * @return string <p>The character with the lowest code point than others.</p>
3352 1
   */
3353 1 View Code Duplication
  public static function min($arg)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3354
  {
3355
    if (is_array($arg) === true) {
3356
      $arg = implode('', $arg);
3357
    }
3358
3359
    return self::chr(min(self::codepoints($arg)));
3360
  }
3361
3362
  /**
3363
   * alias for "UTF8::normalize_encoding()"
3364
   *
3365 5
   * @see UTF8::normalize_encoding()
3366
   *
3367 5
   * @param string $encoding
3368
   * @param mixed  $fallback
3369
   *
3370
   * @return string
3371
   *
3372
   * @deprecated
3373
   */
3374
  public static function normalizeEncoding($encoding, $fallback = false)
3375
  {
3376
    return self::normalize_encoding($encoding, $fallback);
3377 10
  }
3378
3379 10
  /**
3380 10
   * Normalize the encoding-"name" input.
3381 5
   *
3382 5
   * @param string $encoding <p>e.g.: ISO, UTF8, WINDOWS-1251 etc.</p>
3383 10
   * @param mixed  $fallback <p>e.g.: UTF-8</p>
3384
   *
3385 10
   * @return string <p>e.g.: ISO-8859-1, UTF-8, WINDOWS-1251 etc.</p>
3386
   */
3387
  public static function normalize_encoding($encoding, $fallback = false)
3388
  {
3389
    static $STATIC_NORMALIZE_ENCODING_CACHE = array();
3390
3391
    if (!$encoding) {
3392
      return $fallback;
3393
    }
3394
3395
    if ('UTF-8' === $encoding) {
3396 1
      return $encoding;
3397
    }
3398 1
3399 1
    if (in_array($encoding, self::$ICONV_ENCODING, true)) {
3400 1
      return $encoding;
3401
    }
3402 1
3403 1
    if (isset($STATIC_NORMALIZE_ENCODING_CACHE[$encoding])) {
3404 1
      return $STATIC_NORMALIZE_ENCODING_CACHE[$encoding];
3405 1
    }
3406 1
3407
    $encodingOrig = $encoding;
3408 1
    $encoding = strtoupper($encoding);
3409
    $encodingUpperHelper = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);
3410
3411
    $equivalences = array(
3412
        'ISO88591'    => 'ISO-8859-1',
3413
        'ISO8859'     => 'ISO-8859-1',
3414
        'ISO'         => 'ISO-8859-1',
3415
        'LATIN1'      => 'ISO-8859-1',
3416
        'LATIN'       => 'ISO-8859-1',
3417
        'WIN1252'     => 'ISO-8859-1',
3418
        'WINDOWS1252' => 'ISO-8859-1',
3419
        'UTF16'       => 'UTF-16',
3420
        'UTF32'       => 'UTF-32',
3421
        'UTF8'        => 'UTF-8',
3422
        'UTF'         => 'UTF-8',
3423
        'UTF7'        => 'UTF-7',
3424 45
        '8BIT'        => 'CP850',
3425
        'BINARY'      => 'CP850',
3426
    );
3427 45
3428
    if (!empty($equivalences[$encodingUpperHelper])) {
3429
      $encoding = $equivalences[$encodingUpperHelper];
3430
    }
3431 45
3432 45
    $STATIC_NORMALIZE_ENCODING_CACHE[$encodingOrig] = $encoding;
3433 45
3434 45
    return $encoding;
3435
  }
3436 45
3437
  /**
3438
   * Normalize some MS Word special characters.
3439 45
   *
3440 45
   * @param string $str <p>The string to be normalized.</p>
3441
   *
3442 45
   * @return string
3443
   */
3444 View Code Duplication
  public static function normalize_msword($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3445
  {
3446
    // init
3447
    $str = (string)$str;
3448
3449
    if (!isset($str[0])) {
3450
      return '';
3451
    }
3452
3453 45
    static $UTF8_MSWORD_KEYS_CACHE = null;
3454
    static $UTF8_MSWORD_VALUES_CACHE = null;
3455 45
3456
    if ($UTF8_MSWORD_KEYS_CACHE === null) {
3457 45
      $UTF8_MSWORD_KEYS_CACHE = array_keys(self::$UTF8_MSWORD);
3458 45
      $UTF8_MSWORD_VALUES_CACHE = array_values(self::$UTF8_MSWORD);
3459 45
    }
3460
3461 45
    return str_replace($UTF8_MSWORD_KEYS_CACHE, $UTF8_MSWORD_VALUES_CACHE, $str);
3462 45
  }
3463 45
3464
  /**
3465 45
   * Normalize the whitespace.
3466
   *
3467
   * @param string $str                     <p>The string to be normalized.</p>
3468
   * @param bool   $keepNonBreakingSpace    [optional] <p>Set to true, to keep non-breaking-spaces.</p>
3469
   * @param bool   $keepBidiUnicodeControls [optional] <p>Set to true, to keep non-printable (for the web)
3470
   *                                        bidirectional text chars.</p>
3471
   *
3472
   * @return string
3473
   */
3474
  public static function normalize_whitespace($str, $keepNonBreakingSpace = false, $keepBidiUnicodeControls = false)
3475
  {
3476 23
    // init
3477
    $str = (string)$str;
3478 23
3479
    if (!isset($str[0])) {
3480 23
      return '';
3481 5
    }
3482
3483
    static $WHITESPACE_CACHE = array();
3484
    $cacheKey = (int)$keepNonBreakingSpace;
3485 19
3486 3
    if (!isset($WHITESPACE_CACHE[$cacheKey])) {
3487
3488
      $WHITESPACE_CACHE[$cacheKey] = self::$WHITESPACE_TABLE;
3489 18
3490
      if ($keepNonBreakingSpace === true) {
3491 18
        /** @noinspection OffsetOperationsInspection */
3492
        unset($WHITESPACE_CACHE[$cacheKey]['NO-BREAK SPACE']);
3493
      }
3494
3495
      $WHITESPACE_CACHE[$cacheKey] = array_values($WHITESPACE_CACHE[$cacheKey]);
3496
    }
3497
3498
    if ($keepBidiUnicodeControls === false) {
3499
      static $BIDI_UNICODE_CONTROLS_CACHE = null;
3500
3501
      if ($BIDI_UNICODE_CONTROLS_CACHE === null) {
3502 52
        $BIDI_UNICODE_CONTROLS_CACHE = array_values(self::$BIDI_UNI_CODE_CONTROLS_TABLE);
3503
      }
3504 52
3505
      $str = str_replace($BIDI_UNICODE_CONTROLS_CACHE, '', $str);
3506 52
    }
3507
3508 52
    return str_replace($WHITESPACE_CACHE[$cacheKey], ' ', $str);
3509 40
  }
3510
3511
  /**
3512 18
   * Format a number with grouped thousands.
3513
   *
3514
   * @param float  $number
3515 18
   * @param int    $decimals
3516 17
   * @param string $dec_point
3517
   * @param string $thousands_sep
3518 17
   *
3519 17
   * @return string
3520 17
   *    *
3521 2
   * @deprecated Because this has nothing to do with UTF8. :/
3522 2
   */
3523
  public static function number_format($number, $decimals = 0, $dec_point = '.', $thousands_sep = ',')
3524
  {
3525 18
    $thousands_sep = (string)$thousands_sep;
3526
    $dec_point = (string)$dec_point;
3527 18
    $number = (float)$number;
3528 18
3529 18
    if (
3530
        isset($thousands_sep[1], $dec_point[1])
3531 18
        &&
3532 18
        Bootup::is_php('5.4') === true
3533 18
    ) {
3534
      return str_replace(
3535
          array(
3536
              '.',
3537 18
              ',',
3538
          ),
3539 18
          array(
3540
              $dec_point,
3541
              $thousands_sep,
3542
          ),
3543
          number_format($number, $decimals, '.', ',')
3544
      );
3545
    }
3546
3547
    return number_format($number, $decimals, $dec_point, $thousands_sep);
3548
  }
3549
3550
  /**
3551
   * Calculates Unicode code point of the given UTF-8 encoded character.
3552
   *
3553
   * INFO: opposite to UTF8::chr()
3554
   *
3555
   * @param string      $chr      <p>The character of which to calculate code point.<p/>
3556
   * @param string|null $encoding [optional] <p>Default is UTF-8</p>
3557
   *
3558
   * @return int <p>
3559
   *             Unicode code point of the given character,<br />
3560 1
   *             0 on invalid UTF-8 byte sequence.
3561
   *             </p>
3562 1
   */
3563 1
  public static function ord($chr, $encoding = 'UTF-8')
3564
  {
3565
3566
    if ($encoding !== 'UTF-8') {
3567
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
3568 1
3569 1
      // check again, if it's still not UTF-8
3570 1
      /** @noinspection NotOptimalIfConditionsInspection */
3571 1
      if ($encoding !== 'UTF-8') {
3572
        $chr = (string)\mb_convert_encoding($chr, 'UTF-8', $encoding);
3573
      }
3574 1
    }
3575
3576
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
3577
      self::checkForSupport();
3578
    }
3579
3580
    if (self::$SUPPORT['intlChar'] === true) {
3581
      $tmpReturn = \IntlChar::ord($chr);
3582
      if ($tmpReturn) {
3583
        return $tmpReturn;
3584
      }
3585
    }
3586 36
3587
    // use static cache, if there is no support for "\IntlChar"
3588 36
    static $CHAR_CACHE = array();
3589
    if (isset($CHAR_CACHE[$chr]) === true) {
3590 36
      return $CHAR_CACHE[$chr];
3591 2
    }
3592
3593
    $chr_orig = $chr;
3594
    /** @noinspection CallableParameterUseCaseInTypeContextInspection */
3595 36
    $chr = unpack('C*', substr($chr, 0, 4));
3596 36
    $code = $chr ? $chr[1] : 0;
3597
3598 36
    if (0xF0 <= $code && isset($chr[4])) {
3599
      return $CHAR_CACHE[$chr_orig] = (($code - 0xF0) << 18) + (($chr[2] - 0x80) << 12) + (($chr[3] - 0x80) << 6) + $chr[4] - 0x80;
3600
    }
3601
3602 36
    if (0xE0 <= $code && isset($chr[3])) {
3603
      return $CHAR_CACHE[$chr_orig] = (($code - 0xE0) << 12) + (($chr[2] - 0x80) << 6) + $chr[3] - 0x80;
3604 36
    }
3605 6
3606 6
    if (0xC0 <= $code && isset($chr[2])) {
3607
      return $CHAR_CACHE[$chr_orig] = (($code - 0xC0) << 6) + $chr[2] - 0x80;
3608 36
    }
3609 36
3610 36
    return $CHAR_CACHE[$chr_orig] = $code;
3611 36
  }
3612 36
3613
  /**
3614 36
   * Parses the string into an array (into the the second parameter).
3615
   *
3616
   * WARNING: Instead of "parse_str()" this method do not (re-)placing variables in the current scope,
3617
   *          if the second parameter is not set!
3618
   *
3619
   * @link http://php.net/manual/en/function.parse-str.php
3620
   *
3621
   * @param string  $str       <p>The input string.</p>
3622
   * @param array   $result    <p>The result will be returned into this reference parameter.</p>
3623
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
3624
   *
3625
   * @return bool <p>Will return <strong>false</strong> if php can't parse the string and we haven't any $result.</p>
3626
   */
3627
  public static function parse_str($str, &$result, $cleanUtf8 = false)
3628
  {
3629
    if ($cleanUtf8 === true) {
3630
      $str = self::clean($str);
3631
    }
3632
3633
    $return = \mb_parse_str($str, $result);
3634
    if ($return === false || empty($result)) {
3635
      return false;
3636
    }
3637
3638
    return true;
3639
  }
3640
3641
  /**
3642
   * Checks if \u modifier is available that enables Unicode support in PCRE.
3643
   *
3644
   * @return bool <p><strong>true</strong> if support is available, <strong>false</strong> otherwise.</p>
3645
   */
3646 36
  public static function pcre_utf8_support()
3647 5
  {
3648
    /** @noinspection PhpUsageOfSilenceOperatorInspection */
3649 5
    return (bool)@preg_match('//u', '');
3650 5
  }
3651
3652
  /**
3653 36
   * Create an array containing a range of UTF-8 characters.
3654
   *
3655
   * @param mixed $var1 <p>Numeric or hexadecimal code points, or a UTF-8 character to start from.</p>
3656
   * @param mixed $var2 <p>Numeric or hexadecimal code points, or a UTF-8 character to end at.</p>
3657 36
   *
3658
   * @return array
3659
   */
3660
  public static function range($var1, $var2)
3661
  {
3662
    if (!$var1 || !$var2) {
3663
      return array();
3664
    }
3665
3666 View Code Duplication
    if (ctype_digit((string)$var1)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3667
      $start = (int)$var1;
3668
    } elseif (ctype_xdigit($var1)) {
3669
      $start = (int)self::hex_to_int($var1);
3670 12
    } else {
3671
      $start = self::ord($var1);
3672
    }
3673
3674
    if (!$start) {
3675
      return array();
3676 12
    }
3677 2
3678 1 View Code Duplication
    if (ctype_digit((string)$var2)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3679 2
      $end = (int)$var2;
3680 1
    } elseif (ctype_xdigit($var2)) {
3681 2
      $end = (int)self::hex_to_int($var2);
3682
    } else {
3683 2
      $end = self::ord($var2);
3684
    }
3685
3686 2
    if (!$end) {
3687
      return array();
3688
    }
3689
3690
    return array_map(
3691
        array(
3692 12
            '\\voku\\helper\\UTF8',
3693 3
            'chr',
3694
        ),
3695
        range($start, $end)
3696
    );
3697
  }
3698
3699
  /**
3700 12
   * Multi decode html entity & fix urlencoded-win1252-chars.
3701 9
   *
3702
   * e.g:
3703
   * 'test+test'                     => 'test+test'
3704
   * 'D&#252;sseldorf'               => 'Düsseldorf'
3705
   * 'D%FCsseldorf'                  => 'Düsseldorf'
3706
   * 'D&#xFC;sseldorf'               => 'Düsseldorf'
3707
   * 'D%26%23xFC%3Bsseldorf'         => 'Düsseldorf'
3708
   * 'Düsseldorf'                   => 'Düsseldorf'
3709
   * 'D%C3%BCsseldorf'               => 'Düsseldorf'
3710 6
   * 'D%C3%83%C2%BCsseldorf'         => 'Düsseldorf'
3711 6
   * 'D%25C3%2583%25C2%25BCsseldorf' => 'Düsseldorf'
3712 6
   *
3713 6
   * @param string $str          <p>The input string.</p>
3714 6
   * @param bool   $multi_decode <p>Decode as often as possible.</p>
3715 6
   *
3716 6
   * @return string
3717 6
   */
3718 6 View Code Duplication
  public static function rawurldecode($str, $multi_decode = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3719 6
  {
3720 6
    $str = (string)$str;
3721 6
3722 6
    if (!isset($str[0])) {
3723 6
      return '';
3724 6
    }
3725 6
3726 6
    $pattern = '/%u([0-9a-f]{3,4})/i';
3727 6
    if (preg_match($pattern, $str)) {
3728 6
      $str = preg_replace($pattern, '&#x\\1;', rawurldecode($str));
3729 6
    }
3730 6
3731
    $flags = Bootup::is_php('5.4') === true ? ENT_QUOTES | ENT_HTML5 : ENT_QUOTES;
3732 6
3733 6
    do {
3734 6
      $str_compare = $str;
3735
3736
      $str = self::fix_simple_utf8(
3737
          rawurldecode(
3738
              self::html_entity_decode(
3739
                  self::to_utf8($str),
0 ignored issues
show
Bug introduced by
It seems like self::to_utf8($str) targeting voku\helper\UTF8::to_utf8() can also be of type array; however, voku\helper\UTF8::html_entity_decode() does only seem to accept string, maybe add an additional type check?

This check looks at variables that are passed out again to other methods.

If the outgoing method call has stricter type requirements than the method itself, an issue is raised.

An additional type check may prevent trouble.

Loading history...
3740
                  $flags
3741
              )
3742
          )
3743
      );
3744
3745
    } while ($multi_decode === true && $str_compare !== $str);
3746
3747
    return (string)$str;
3748
  }
3749
3750
  /**
3751
   * alias for "UTF8::remove_bom()"
3752
   *
3753
   * @see UTF8::remove_bom()
3754
   *
3755
   * @param string $str
3756
   *
3757
   * @return string
3758
   *
3759
   * @deprecated
3760
   */
3761
  public static function removeBOM($str)
3762
  {
3763
    return self::remove_bom($str);
3764
  }
3765
3766
  /**
3767
   * Remove the BOM from UTF-8 / UTF-16 / UTF-32 strings.
3768
   *
3769
   * @param string $str <p>The input string.</p>
3770
   *
3771
   * @return string <p>String without UTF-BOM</p>
3772
   */
3773
  public static function remove_bom($str)
3774
  {
3775
    $str = (string)$str;
3776
3777
    if (!isset($str[0])) {
3778 14
      return '';
3779
    }
3780 14
3781
    foreach (self::$BOM as $bomString => $bomByteLength) {
3782
      if (0 === strpos($str, $bomString)) {
3783 14
        $str = substr($str, $bomByteLength);
3784 14
      }
3785 1
    }
3786 1
3787 13
    return $str;
3788
  }
3789 14
3790
  /**
3791 14
   * Removes duplicate occurrences of a string in another string.
3792 14
   *
3793
   * @param string          $str  <p>The base string.</p>
3794 14
   * @param string|string[] $what <p>String to search for in the base string.</p>
3795
   *
3796
   * @return string <p>The result string with removed duplicates.</p>
3797
   */
3798
  public static function remove_duplicates($str, $what = ' ')
3799
  {
3800
    if (is_string($what) === true) {
3801
      $what = array($what);
3802
    }
3803
3804
    if (is_array($what) === true) {
3805
      /** @noinspection ForeachSourceInspection */
3806 1
      foreach ($what as $item) {
3807
        $str = preg_replace('/(' . preg_quote($item, '/') . ')+/', $item, $str);
3808 1
      }
3809
    }
3810 1
3811
    return $str;
3812
  }
3813
3814 1
  /**
3815
   * Remove invisible characters from a string.
3816 1
   *
3817
   * e.g.: This prevents sandwiching null characters between ascii characters, like Java\0script.
3818
   *
3819
   * copy&past from https://github.com/bcit-ci/CodeIgniter/blob/develop/system/core/Common.php
3820 1
   *
3821 1
   * @param string $str
3822
   * @param bool   $url_encoded
3823
   * @param string $replacement
3824 1
   *
3825 1
   * @return string
3826 1
   */
3827 1
  public static function remove_invisible_characters($str, $url_encoded = true, $replacement = '')
3828
  {
3829 1
    // init
3830
    $non_displayables = array();
3831
3832 1
    // every control character except newline (dec 10),
3833
    // carriage return (dec 13) and horizontal tab (dec 09)
0 ignored issues
show
Unused Code Comprehensibility introduced by
37% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
3834
    if ($url_encoded) {
3835 1
      $non_displayables[] = '/%0[0-8bcef]/'; // url encoded 00-08, 11, 12, 14, 15
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
3836
      $non_displayables[] = '/%1[0-9a-f]/'; // url encoded 16-31
3837
    }
3838
3839
    $non_displayables[] = '/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]+/S'; // 00-08, 11, 12, 14-31, 127
0 ignored issues
show
Unused Code Comprehensibility introduced by
62% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
3840
3841
    do {
3842
      $str = preg_replace($non_displayables, $replacement, $str, -1, $count);
3843
    } while ($count !== 0);
3844
3845
    return $str;
3846
  }
3847
3848
  /**
3849
   * Replace the diamond question mark (�) and invalid-UTF8 chars with the replacement.
3850
   *
3851 2
   * @param string $str                <p>The input string</p>
3852
   * @param string $replacementChar    <p>The replacement character.</p>
3853 2
   * @param bool   $processInvalidUtf8 <p>Convert invalid UTF-8 chars </p>
3854
   *
3855
   * @return string
3856 2
   */
3857 2
  public static function replace_diamond_question_mark($str, $replacementChar = '', $processInvalidUtf8 = true)
3858
  {
3859 2
    $str = (string)$str;
3860
3861 2
    if (!isset($str[0])) {
3862 2
      return '';
3863
    }
3864 2
3865
    if ($processInvalidUtf8 === true) {
3866
      $replacementCharHelper = $replacementChar;
3867 2
      if ($replacementChar === '') {
3868 2
        $replacementCharHelper = 'none';
3869 2
      }
3870 2
3871 2
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
3872
        self::checkForSupport();
3873 2
      }
3874 2
3875 2
      if (self::$SUPPORT['mbstring'] === false) {
3876 2
        trigger_error('UTF8::replace_diamond_question_mark() without mbstring cannot handle all chars correctly', E_USER_WARNING);
3877 2
      }
3878 2
3879
      $save = \mb_substitute_character();
3880 2
      \mb_substitute_character($replacementCharHelper);
3881 2
      /** @noinspection CallableParameterUseCaseInTypeContextInspection */
3882 2
      $str = \mb_convert_encoding($str, 'UTF-8', 'UTF-8');
3883 2
      \mb_substitute_character($save);
3884 2
    }
3885 2
3886
    return str_replace(
3887 2
        array(
3888
            "\xEF\xBF\xBD",
3889
            '�',
3890 2
        ),
3891
        array(
3892
            $replacementChar,
3893
            $replacementChar,
3894
        ),
3895
        $str
3896
    );
3897
  }
3898
3899
  /**
3900
   * Strip whitespace or other characters from end of a UTF-8 string.
3901
   *
3902
   * @param string $str   <p>The string to be trimmed.</p>
3903
   * @param string $chars <p>Optional characters to be stripped.</p>
3904
   *
3905
   * @return string <p>The string with unwanted characters stripped from the right.</p>
3906
   */
3907 View Code Duplication
  public static function rtrim($str = '', $chars = INF)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3908
  {
3909
    $str = (string)$str;
3910
3911 1
    if (!isset($str[0])) {
3912
      return '';
3913 1
    }
3914
3915 1
    // Info: http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page#Unicodecharactercategories
3916
    if ($chars === INF || !$chars) {
3917
      return preg_replace('/[\pZ\pC]+$/u', '', $str);
3918
    }
3919
3920
    return preg_replace('/' . self::rxClass($chars) . '+$/u', '', $str);
3921
  }
3922
3923
  /**
3924
   * rxClass
3925
   *
3926
   * @param string $s
3927
   * @param string $class
3928
   *
3929
   * @return string
3930
   */
3931
  private static function rxClass($s, $class = '')
3932
  {
3933
    static $RX_CLASSS_CACHE = array();
3934
3935
    $cacheKey = $s . $class;
3936
3937
    if (isset($RX_CLASSS_CACHE[$cacheKey])) {
3938
      return $RX_CLASSS_CACHE[$cacheKey];
3939
    }
3940
3941
    /** @noinspection CallableParameterUseCaseInTypeContextInspection */
3942
    $class = array($class);
3943
3944
    /** @noinspection SuspiciousLoopInspection */
3945
    foreach (self::str_split($s) as $s) {
3946
      if ('-' === $s) {
3947 12
        $class[0] = '-' . $class[0];
3948
      } elseif (!isset($s[2])) {
3949 12
        $class[0] .= preg_quote($s, '/');
3950
      } elseif (1 === self::strlen($s)) {
3951
        $class[0] .= $s;
3952
      } else {
3953
        $class[] = $s;
3954
      }
3955
    }
3956
3957
    if ($class[0]) {
3958
      $class[0] = '[' . $class[0] . ']';
3959 1
    }
3960
3961 1
    if (1 === count($class)) {
3962
      $return = $class[0];
3963 1
    } else {
3964
      $return = '(?:' . implode('|', $class) . ')';
3965 1
    }
3966
3967
    $RX_CLASSS_CACHE[$cacheKey] = $return;
3968
3969
    return $return;
3970
  }
3971
3972
  /**
3973
   * WARNING: Echo native UTF8-Support libs, e.g. for debugging.
3974
   */
3975
  public static function showSupport()
3976
  {
3977 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
3978
      self::checkForSupport();
3979 1
    }
3980
3981 1
    foreach (self::$SUPPORT as $utf8Support) {
3982 1
      echo $utf8Support . "\n<br>";
3983 1
    }
3984
  }
3985 1
3986 1
  /**
3987 1
   * Converts a UTF-8 character to HTML Numbered Entity like "&#123;".
3988 1
   *
3989
   * @param string $char           <p>The Unicode character to be encoded as numbered entity.</p>
3990
   * @param bool   $keepAsciiChars <p>Set to <strong>true</strong> to keep ASCII chars.</>
3991 1
   * @param string $encoding       [optional] <p>Default is UTF-8</p>
3992
   *
3993
   * @return string <p>The HTML numbered entity.</p>
3994
   */
3995
  public static function single_chr_html_encode($char, $keepAsciiChars = false, $encoding = 'UTF-8')
3996
  {
3997
    // init
3998
    $char = (string)$char;
3999
4000
    if (!isset($char[0])) {
4001
      return '';
4002 21
    }
4003
4004
    if (
4005 21
        $keepAsciiChars === true
4006 21
        &&
4007
        self::is_ascii($char) === true
4008 21
    ) {
4009 1
      return $char;
4010
    }
4011
4012 20
    if ($encoding !== 'UTF-8') {
4013
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
4014
    }
4015
4016 20
    return '&#' . self::ord($char, $encoding) . ';';
4017 20
  }
4018
4019 20
  /**
4020 20
   * Convert a string to an array of Unicode characters.
4021
   *
4022
   * @param string  $str       <p>The string to split into array.</p>
4023 1
   * @param int     $length    [optional] <p>Max character length of each array element.</p>
4024 1
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
4025
   *
4026
   * @return string[] <p>An array containing chunks of the string.</p>
4027 1
   */
4028 1
  public static function split($str, $length = 1, $cleanUtf8 = false)
4029 1
  {
4030 1
    $str = (string)$str;
4031 1
4032
    if (!isset($str[0])) {
4033 1
      return array();
4034
    }
4035 1
4036
    // init
4037
    $ret = array();
4038
4039
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
4040
      self::checkForSupport();
4041
    }
4042
4043
    if (self::$SUPPORT['pcre_utf8'] === true) {
4044
4045 1
      if ($cleanUtf8 === true) {
4046
        $str = self::clean($str);
4047 1
      }
4048
4049 1
      preg_match_all('/./us', $str, $retArray);
4050
      if (isset($retArray[0])) {
4051 1
        $ret = $retArray[0];
4052
      }
4053
      unset($retArray);
4054
4055
    } else {
4056
4057
      // fallback
4058
4059
      $len = strlen($str);
4060
4061
      /** @noinspection ForeachInvariantsInspection */
4062
      for ($i = 0; $i < $len; $i++) {
4063
4064
        if (($str[$i] & "\x80") === "\x00") {
4065 7
4066
          $ret[] = $str[$i];
4067 7
4068
        } elseif (
4069
            isset($str[$i + 1])
4070
            &&
4071
            ($str[$i] & "\xE0") === "\xC0"
4072
        ) {
4073
4074
          if (($str[$i + 1] & "\xC0") === "\x80") {
4075
            $ret[] = $str[$i] . $str[$i + 1];
4076
4077
            $i++;
4078
          }
4079
4080 View Code Duplication
        } elseif (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4081
            isset($str[$i + 2])
4082
            &&
4083 1
            ($str[$i] & "\xF0") === "\xE0"
4084
        ) {
4085 1
4086 1
          if (
4087
              ($str[$i + 1] & "\xC0") === "\x80"
4088 1
              &&
4089
              ($str[$i + 2] & "\xC0") === "\x80"
4090 1
          ) {
4091
            $ret[] = $str[$i] . $str[$i + 1] . $str[$i + 2];
4092 1
4093 1
            $i += 2;
4094 1
          }
4095 1
4096
        } elseif (
4097 1
            isset($str[$i + 3])
4098
            &&
4099 1
            ($str[$i] & "\xF8") === "\xF0"
4100 1
        ) {
4101 1
4102 1 View Code Duplication
          if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4103 1
              ($str[$i + 1] & "\xC0") === "\x80"
4104 1
              &&
4105
              ($str[$i + 2] & "\xC0") === "\x80"
4106 1
              &&
4107
              ($str[$i + 3] & "\xC0") === "\x80"
4108 1
          ) {
4109
            $ret[] = $str[$i] . $str[$i + 1] . $str[$i + 2] . $str[$i + 3];
4110
4111
            $i += 3;
4112 1
          }
4113
4114
        }
4115
      }
4116
    }
4117
4118
    if ($length > 1) {
4119
      $ret = array_chunk($ret, $length);
4120
4121
      return array_map(
4122
          function ($item) {
4123
            return implode('', $item);
4124
          }, $ret
4125
      );
4126
    }
4127
4128
    /** @noinspection OffsetOperationsInspection */
4129 9
    if (isset($ret[0]) && $ret[0] === '') {
4130
      return array();
4131 9
    }
4132
4133
    return $ret;
4134
  }
4135
4136
  /**
4137
   * Optimized "\mb_detect_encoding()"-function -> with support for UTF-16 and UTF-32.
4138
   *
4139
   * @param string $str <p>The input string.</p>
4140
   *
4141
   * @return false|string <p>
4142
   *                      The detected string-encoding e.g. UTF-8 or UTF-16BE,<br />
4143
   *                      otherwise it will return false.
4144
   *                      </p>
4145
   */
4146
  public static function str_detect_encoding($str)
4147 1
  {
4148
    //
4149 1
    // 1.) check binary strings (010001001...) like UTF-16 / UTF-32
4150
    //
4151
4152
    if (self::is_binary($str) === true) {
4153
      if (self::is_utf16($str) === 1) {
4154
        return 'UTF-16LE';
4155
      } elseif (self::is_utf16($str) === 2) {
4156
        return 'UTF-16BE';
4157
      } elseif (self::is_utf32($str) === 1) {
4158
        return 'UTF-32LE';
4159
      } elseif (self::is_utf32($str) === 2) {
4160
        return 'UTF-32BE';
4161
      }
4162
    }
4163
4164 12
    //
4165
    // 2.) simple check for ASCII chars
4166 12
    //
4167 11
4168 11
    if (self::is_ascii($str) === true) {
4169 12
      return 'ASCII';
4170
    }
4171
4172
    //
4173
    // 3.) simple check for UTF-8 chars
4174
    //
4175
4176
    if (self::is_utf8($str) === true) {
4177
      return 'UTF-8';
4178
    }
4179
4180
    //
4181
    // 4.) check via "\mb_detect_encoding()"
4182 9
    //
4183
    // INFO: UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always with "\mb_detect_encoding()"
4184 9
4185 1
    $detectOrder = array(
4186
        'ISO-8859-1',
4187
        'ISO-8859-2',
4188 8
        'ISO-8859-3',
4189 2
        'ISO-8859-4',
4190 2
        'ISO-8859-5',
4191
        'ISO-8859-6',
4192 8
        'ISO-8859-7',
4193 8
        'ISO-8859-8',
4194 1
        'ISO-8859-9',
4195
        'ISO-8859-10',
4196
        'ISO-8859-13',
4197 7
        'ISO-8859-14',
4198
        'ISO-8859-15',
4199 7
        'ISO-8859-16',
4200
        'WINDOWS-1251',
4201
        'WINDOWS-1252',
4202 1
        'WINDOWS-1254',
4203
        'ISO-2022-JP',
4204
        'JIS',
4205
        'EUC-JP',
4206
    );
4207
4208
    $encoding = \mb_detect_encoding($str, $detectOrder, true);
4209
    if ($encoding) {
4210
      return $encoding;
4211
    }
4212
4213
    //
4214
    // 5.) check via "iconv()"
4215
    //
4216
4217
    $md5 = md5($str);
4218 1
    foreach (self::$ICONV_ENCODING as $encodingTmp) {
4219
      # INFO: //IGNORE and //TRANSLIT still throw notice
4220 1
      /** @noinspection PhpUsageOfSilenceOperatorInspection */
4221
      if (md5(@\iconv($encodingTmp, $encodingTmp . '//IGNORE', $str)) === $md5) {
4222
        return $encodingTmp;
4223
      }
4224
    }
4225
4226
    return false;
4227
  }
4228
4229
  /**
4230
   * Check if the string ends with the given substring.
4231
   *
4232 2
   * @param string $haystack <p>The string to search in.</p>
4233
   * @param string $needle   <p>The substring to search for.</p>
4234 2
   *
4235 2
   * @return bool
4236
   */
4237 2 View Code Duplication
  public static function str_ends_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4238 2
  {
4239 2
    $haystack = (string)$haystack;
4240
    $needle = (string)$needle;
4241 2
4242 2
    if (!isset($haystack[0], $needle[0])) {
4243
      return false;
4244
    }
4245
4246
    if ($needle === self::substr($haystack, -self::strlen($needle))) {
4247
      return true;
4248
    }
4249
4250
    return false;
4251
  }
4252 3
4253
  /**
4254 3
   * Check if the string ends with the given substring, case insensitive.
4255 3
   *
4256 3
   * @param string $haystack <p>The string to search in.</p>
4257
   * @param string $needle   <p>The substring to search for.</p>
4258 3
   *
4259
   * @return bool
4260 3
   */
4261 View Code Duplication
  public static function str_iends_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4262
  {
4263
    $haystack = (string)$haystack;
4264
    $needle = (string)$needle;
4265
4266
    if (!isset($haystack[0], $needle[0])) {
4267
      return false;
4268
    }
4269
4270
    if (self::strcasecmp(self::substr($haystack, -self::strlen($needle)), $needle) === 0) {
0 ignored issues
show
Security Bug introduced by
It seems like self::substr($haystack, -self::strlen($needle)) targeting voku\helper\UTF8::substr() can also be of type false; however, voku\helper\UTF8::strcasecmp() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
4271
      return true;
4272
    }
4273
4274
    return false;
4275
  }
4276
4277
  /**
4278
   * Case-insensitive and UTF-8 safe version of <function>str_replace</function>.
4279
   *
4280
   * @link  http://php.net/manual/en/function.str-ireplace.php
4281
   *
4282 2
   * @param mixed $search  <p>
4283
   *                       Every replacement with search array is
4284
   *                       performed on the result of previous replacement.
4285 2
   *                       </p>
4286
   * @param mixed $replace <p>
4287 2
   *                       </p>
4288
   * @param mixed $subject <p>
4289
   *                       If subject is an array, then the search and
4290
   *                       replace is performed with every entry of
4291
   *                       subject, and the return value is an array as
4292
   *                       well.
4293
   *                       </p>
4294
   * @param int   $count   [optional] <p>
4295
   *                       The number of matched and replaced needles will
4296
   *                       be returned in count which is passed by
4297
   *                       reference.
4298
   *                       </p>
4299
   *
4300
   * @return mixed <p>A string or an array of replacements.</p>
4301
   */
4302
  public static function str_ireplace($search, $replace, $subject, &$count = null)
4303
  {
4304
    $search = (array)$search;
4305
4306
    /** @noinspection AlterInForeachInspection */
4307
    foreach ($search as &$s) {
4308
      if ('' === $s .= '') {
4309
        $s = '/^(?<=.)$/';
4310
      } else {
4311
        $s = '/' . preg_quote($s, '/') . '/ui';
4312
      }
4313
    }
4314 8
4315
    $subject = preg_replace($search, $replace, $subject, -1, $replace);
4316 8
    $count = $replace; // used as reference parameter
4317 8
4318
    return $subject;
4319 8
  }
4320 3
4321
  /**
4322
   * Check if the string starts with the given substring, case insensitive.
4323 7
   *
4324 1
   * @param string $haystack <p>The string to search in.</p>
4325 1
   * @param string $needle   <p>The substring to search for.</p>
4326 1
   *
4327
   * @return bool
4328
   */
4329 View Code Duplication
  public static function str_istarts_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4330 7
  {
4331 1
    $haystack = (string)$haystack;
4332 7
    $needle = (string)$needle;
4333 7
4334 7
    if (!isset($haystack[0], $needle[0])) {
4335
      return false;
4336
    }
4337
4338 7
    if (self::stripos($haystack, $needle) === 0) {
4339
      return true;
4340
    }
4341
4342
    return false;
4343
  }
4344
4345
  /**
4346
   * Limit the number of characters in a string, but also after the next word.
4347
   *
4348
   * @param string $str
4349
   * @param int    $length
4350
   * @param string $strAddOn
4351
   *
4352
   * @return string
4353
   */
4354
  public static function str_limit_after_word($str, $length = 100, $strAddOn = '...')
4355 8
  {
4356
    $str = (string)$str;
4357 8
4358 2
    if (!isset($str[0])) {
4359
      return '';
4360
    }
4361 6
4362
    $length = (int)$length;
4363
4364
    if (self::strlen($str) <= $length) {
4365 6
      return $str;
4366
    }
4367
4368
    if (self::substr($str, $length - 1, 1) === ' ') {
4369
      return self::substr($str, 0, $length - 1) . $strAddOn;
4370
    }
4371
4372 6
    $str = self::substr($str, 0, $length);
4373
    $array = explode(' ', $str);
4374
    array_pop($array);
4375
    $new_str = implode(' ', $array);
4376
4377
    if ($new_str === '') {
4378
      $str = self::substr($str, 0, $length - 1) . $strAddOn;
0 ignored issues
show
Security Bug introduced by
It seems like $str can also be of type false; however, voku\helper\UTF8::substr() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
4379
    } else {
4380
      $str = $new_str . $strAddOn;
4381
    }
4382
4383
    return $str;
4384
  }
4385
4386
  /**
4387 62
   * Pad a UTF-8 string to given length with another string.
4388
   *
4389 62
   * @param string $str        <p>The input string.</p>
4390
   * @param int    $pad_length <p>The length of return string.</p>
4391 62
   * @param string $pad_string [optional] <p>String to use for padding the input string.</p>
4392 4
   * @param int    $pad_type   [optional] <p>
4393
   *                           Can be <strong>STR_PAD_RIGHT</strong> (default),
4394
   *                           <strong>STR_PAD_LEFT</strong> or <strong>STR_PAD_BOTH</strong>
4395
   *                           </p>
4396
   *
4397 61
   * @return string <strong>Returns the padded string</strong>
4398 2
   */
4399 61
  public static function str_pad($str, $pad_length, $pad_string = ' ', $pad_type = STR_PAD_RIGHT)
4400 60
  {
4401 60
    $str_length = self::strlen($str);
4402 2
4403
    if (
4404
        is_int($pad_length) === true
4405
        &&
4406 61
        $pad_length > 0
4407 61
        &&
4408 1
        $pad_length >= $str_length
4409
    ) {
4410
      $ps_length = self::strlen($pad_string);
4411 61
4412 2
      $diff = $pad_length - $str_length;
4413 2
4414
      switch ($pad_type) {
4415 61 View Code Duplication
        case STR_PAD_LEFT:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4416
          $pre = str_repeat($pad_string, (int)ceil($diff / $ps_length));
4417
          $pre = self::substr($pre, 0, $diff);
4418
          $post = '';
4419
          break;
4420
4421
        case STR_PAD_BOTH:
4422
          $pre = str_repeat($pad_string, (int)ceil($diff / $ps_length / 2));
4423
          $pre = self::substr($pre, 0, (int)$diff / 2);
4424
          $post = str_repeat($pad_string, (int)ceil($diff / $ps_length / 2));
4425
          $post = self::substr($post, 0, (int)ceil($diff / 2));
4426
          break;
4427
4428
        case STR_PAD_RIGHT:
4429 View Code Duplication
        default:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4430 1
          $post = str_repeat($pad_string, (int)ceil($diff / $ps_length));
4431
          $post = self::substr($post, 0, $diff);
4432 1
          $pre = '';
4433
      }
4434
4435
      return $pre . $str . $post;
4436
    }
4437
4438
    return $str;
4439
  }
4440
4441
  /**
4442
   * Repeat a string.
4443
   *
4444
   * @param string $str        <p>
4445
   *                           The string to be repeated.
4446
   *                           </p>
4447
   * @param int    $multiplier <p>
4448
   *                           Number of time the input string should be
4449 2
   *                           repeated.
4450
   *                           </p>
4451 2
   *                           <p>
4452
   *                           multiplier has to be greater than or equal to 0.
4453
   *                           If the multiplier is set to 0, the function
4454
   *                           will return an empty string.
4455
   *                           </p>
4456
   *
4457
   * @return string <p>The repeated string.</p>
4458
   */
4459
  public static function str_repeat($str, $multiplier)
4460
  {
4461
    $str = self::filter($str);
4462
4463
    return str_repeat($str, $multiplier);
4464
  }
4465
4466
  /**
4467 1
   * INFO: This is only a wrapper for "str_replace()"  -> the original functions is already UTF-8 safe.
4468
   *
4469 1
   * Replace all occurrences of the search string with the replacement string
4470
   *
4471
   * @link http://php.net/manual/en/function.str-replace.php
4472
   *
4473
   * @param mixed $search  <p>
4474
   *                       The value being searched for, otherwise known as the needle.
4475
   *                       An array may be used to designate multiple needles.
4476
   *                       </p>
4477
   * @param mixed $replace <p>
4478
   *                       The replacement value that replaces found search
4479
   *                       values. An array may be used to designate multiple replacements.
4480
   *                       </p>
4481
   * @param mixed $subject <p>
4482
   *                       The string or array being searched and replaced on,
4483
   *                       otherwise known as the haystack.
4484
   *                       </p>
4485 2
   *                       <p>
4486
   *                       If subject is an array, then the search and
4487 2
   *                       replace is performed with every entry of
4488 2
   *                       subject, and the return value is an array as
4489
   *                       well.
4490 2
   *                       </p>
4491
   * @param int   $count   [optional] If passed, this will hold the number of matched and replaced needles.
4492
   *
4493
   * @return mixed <p>This function returns a string or an array with the replaced values.</p>
4494
   */
4495
  public static function str_replace($search, $replace, $subject, &$count = null)
4496
  {
4497
    return str_replace($search, $replace, $subject, $count);
4498
  }
4499
4500
  /**
4501
   * Replace the first "$search"-term with the "$replace"-term.
4502
   *
4503 1
   * @param string $search
4504
   * @param string $replace
4505 1
   * @param string $subject
4506 1
   *
4507
   * @return string
4508 1
   */
4509 1
  public static function str_replace_first($search, $replace, $subject)
4510
  {
4511
    $pos = self::strpos($subject, $search);
4512 1
4513 1
    if ($pos !== false) {
4514
      return self::substr_replace($subject, $replace, $pos, self::strlen($search));
4515 1
    }
4516
4517
    return $subject;
4518
  }
4519
4520
  /**
4521
   * Shuffles all the characters in the string.
4522
   *
4523
   * @param string $str <p>The input string</p>
4524
   *
4525
   * @return string <p>The shuffled string.</p>
4526
   */
4527
  public static function str_shuffle($str)
4528
  {
4529
    $array = self::split($str);
4530
4531
    shuffle($array);
4532
4533
    return implode('', $array);
4534
  }
4535 15
4536
  /**
4537 15
   * Sort all characters according to code points.
4538 15
   *
4539
   * @param string $str    <p>A UTF-8 string.</p>
4540 15
   * @param bool   $unique <p>Sort unique. If <strong>true</strong>, repeated characters are ignored.</p>
4541 2
   * @param bool   $desc   <p>If <strong>true</strong>, will sort characters in reverse code point order.</p>
4542
   *
4543
   * @return string <p>String of sorted characters.</p>
4544
   */
4545 14
  public static function str_sort($str, $unique = false, $desc = false)
4546
  {
4547
    $array = self::codepoints($str);
4548
4549 14
    if ($unique) {
4550
      $array = array_flip(array_flip($array));
4551
    }
4552
4553 14
    if ($desc) {
4554
      arsort($array);
4555
    } else {
4556 2
      asort($array);
4557 2
    }
4558 2
4559
    return self::string($array);
4560 14
  }
4561
4562
  /**
4563
   * Split a string into an array.
4564
   *
4565
   * @param string $str
4566 14
   * @param int    $len
4567 2
   *
4568 14
   * @return array
4569 14
   */
4570 14
  public static function str_split($str, $len = 1)
4571 1
  {
4572
    // init
4573
    $len = (int)$len;
4574 14
    $str = (string)$str;
4575 14
4576
    if (!isset($str[0])) {
4577
      return array();
4578
    }
4579
4580
    if ($len < 1) {
4581
      return str_split($str, $len);
4582
    }
4583
4584
    /** @noinspection PhpInternalEntityUsedInspection */
4585
    preg_match_all('/' . Grapheme::GRAPHEME_CLUSTER_RX . '/u', $str, $a);
4586
    $a = $a[0];
4587
4588
    if ($len === 1) {
4589
      return $a;
4590
    }
4591
4592
    $arrayOutput = array();
4593
    $p = -1;
4594
4595
    /** @noinspection PhpForeachArrayIsUsedAsValueInspection */
4596
    foreach ($a as $l => $a) {
4597
      if ($l % $len) {
4598
        $arrayOutput[$p] .= $a;
4599
      } else {
4600
        $arrayOutput[++$p] = $a;
4601
      }
4602
    }
4603
4604
    return $arrayOutput;
4605
  }
4606
4607
  /**
4608
   * Check if the string starts with the given substring.
4609
   *
4610
   * @param string $haystack <p>The string to search in.</p>
4611
   * @param string $needle   <p>The substring to search for.</p>
4612
   *
4613
   * @return bool
4614
   */
4615 View Code Duplication
  public static function str_starts_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4616
  {
4617
    $haystack = (string)$haystack;
4618
    $needle = (string)$needle;
4619
4620 1
    if (!isset($haystack[0], $needle[0])) {
4621
      return false;
4622 1
    }
4623 1
4624 1
    if (self::strpos($haystack, $needle) === 0) {
4625
      return true;
4626 1
    }
4627
4628
    return false;
4629
  }
4630
4631
  /**
4632
   * Get a binary representation of a specific string.
4633 1
   *
4634
   * @param string $str <p>The input string.</p>
4635
   *
4636
   * @return string
4637
   */
4638
  public static function str_to_binary($str)
4639
  {
4640
    $str = (string)$str;
4641
4642
    $value = unpack('H*', $str);
4643 4
4644
    return base_convert($value[1], 16, 2);
4645 4
  }
4646
4647 4
  /**
4648 2
   * Convert a string into an array of words.
4649
   *
4650
   * @param string $str
4651 3
   * @param string $charlist
4652
   *
4653
   * @return array
4654
   */
4655
  public static function str_to_words($str, $charlist = '')
4656
  {
4657
    $str = (string)$str;
4658
4659
    if (!isset($str[0])) {
4660
      return array('');
4661
    }
4662
4663
    $charlist = self::rxClass($charlist, '\pL');
4664
4665
    return \preg_split("/({$charlist}+(?:[\p{Pd}’']{$charlist}+)*)/u", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
4666
  }
4667
4668
  /**
4669
   * alias for "UTF8::to_ascii()"
4670
   *
4671
   * @see UTF8::to_ascii()
4672
   *
4673
   * @param string $str
4674
   * @param string $unknown
4675
   * @param bool   $strict
4676
   *
4677 1
   * @return string
4678
   */
4679 1
  public static function str_transliterate($str, $unknown = '?', $strict = false)
4680 1
  {
4681 1
    return self::to_ascii($str, $unknown, $strict);
4682
  }
4683 1
4684
  /**
4685
   * Counts number of words in the UTF-8 string.
4686
   *
4687
   * @param string $str      <p>The input string.</p>
4688
   * @param int    $format   [optional] <p>
4689
   *                         <strong>0</strong> => return a number of words (default)<br />
4690 1
   *                         <strong>1</strong> => return an array of words<br />
4691
   *                         <strong>2</strong> => return an array of words with word-offset as key
4692
   *                         </p>
4693
   * @param string $charlist [optional] <p>Additional chars that contains to words and do not start a new word.</p>
4694
   *
4695
   * @return array|int <p>The number of words in the string</p>
4696
   */
4697
  public static function str_word_count($str, $format = 0, $charlist = '')
4698
  {
4699
    $strParts = self::str_to_words($str, $charlist);
4700
4701
    $len = count($strParts);
4702
4703
    if ($format === 1) {
4704
4705
      $numberOfWords = array();
4706
      for ($i = 1; $i < $len; $i += 2) {
4707 1
        $numberOfWords[] = $strParts[$i];
4708
      }
4709 1
4710
    } elseif ($format === 2) {
4711
4712
      $numberOfWords = array();
4713
      $offset = self::strlen($strParts[0]);
4714
      for ($i = 1; $i < $len; $i += 2) {
4715
        $numberOfWords[$offset] = $strParts[$i];
4716
        $offset += self::strlen($strParts[$i]) + self::strlen($strParts[$i + 1]);
4717
      }
4718
4719
    } else {
4720
4721
      $numberOfWords = ($len - 1) / 2;
4722
4723
    }
4724
4725
    return $numberOfWords;
4726
  }
4727
4728
  /**
4729 11
   * Case-insensitive string comparison.
4730
   *
4731 11
   * INFO: Case-insensitive version of UTF8::strcmp()
4732
   *
4733 11
   * @param string $str1
4734 2
   * @param string $str2
4735 2
   *
4736
   * @return int <p>
4737 11
   *             <strong>&lt; 0</strong> if str1 is less than str2;<br />
4738
   *             <strong>&gt; 0</strong> if str1 is greater than str2,<br />
4739 11
   *             <strong>0</strong> if they are equal.
4740 2
   *             </p>
4741
   */
4742
  public static function strcasecmp($str1, $str2)
4743
  {
4744 10
    return self::strcmp(self::strtocasefold($str1), self::strtocasefold($str2));
4745 10
  }
4746
4747
  /**
4748
   * alias for "UTF8::strstr()"
4749 10
   *
4750
   * @see UTF8::strstr()
4751 10
   *
4752
   * @param string  $haystack
4753
   * @param string  $needle
4754 3
   * @param bool    $before_needle
4755 3
   * @param string  $encoding
4756 3
   * @param boolean $cleanUtf8
4757
   *
4758 10
   * @return string|false
4759
   */
4760
  public static function strchr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
4761
  {
4762
    return self::strstr($haystack, $needle, $before_needle, $encoding, $cleanUtf8);
4763
  }
4764 10
4765 1
  /**
4766 10
   * Case-sensitive string comparison.
4767 10
   *
4768 10
   * @param string $str1
4769 1
   * @param string $str2
4770
   *
4771
   * @return int  <p>
4772
   *              <strong>&lt; 0</strong> if str1 is less than str2<br />
4773
   *              <strong>&gt; 0</strong> if str1 is greater than str2<br />
4774 10
   *              <strong>0</strong> if they are equal.
4775 10
   *              </p>
4776 10
   */
4777 10
  public static function strcmp($str1, $str2)
4778
  {
4779
    /** @noinspection PhpUndefinedClassInspection */
4780
    return $str1 . '' === $str2 . '' ? 0 : strcmp(
4781
        \Normalizer::normalize($str1, \Normalizer::NFD),
4782
        \Normalizer::normalize($str2, \Normalizer::NFD)
4783
    );
4784
  }
4785
4786
  /**
4787
   * Find length of initial segment not matching mask.
4788
   *
4789
   * @param string $str
4790
   * @param string $charList
4791
   * @param int    $offset
4792
   * @param int    $length
4793
   *
4794
   * @return int|null
4795
   */
4796
  public static function strcspn($str, $charList, $offset = 0, $length = 2147483647)
4797
  {
4798
    if ('' === $charList .= '') {
4799
      return null;
4800
    }
4801
4802
    if ($offset || 2147483647 !== $length) {
4803
      $str = (string)self::substr($str, $offset, $length);
4804
    }
4805
4806
    $str = (string)$str;
4807
    if (!isset($str[0])) {
4808
      return null;
4809
    }
4810
4811
    if (preg_match('/^(.*?)' . self::rxClass($charList) . '/us', $str, $length)) {
4812
      /** @noinspection OffsetOperationsInspection */
4813 10
      return self::strlen($length[1]);
4814
    }
4815
4816 10
    return self::strlen($str);
4817 10
  }
4818
4819 10
  /**
4820 2
   * alias for "UTF8::stristr()"
4821 2
   *
4822
   * @see UTF8::stristr()
4823 10
   *
4824 10
   * @param string  $haystack
4825 2
   * @param string  $needle
4826
   * @param bool    $before_needle
4827
   * @param string  $encoding
4828 8
   * @param boolean $cleanUtf8
4829
   *
4830
   * @return string|false
4831
   */
4832
  public static function strichr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
4833
  {
4834
    return self::stristr($haystack, $needle, $before_needle, $encoding, $cleanUtf8);
4835
  }
4836
4837
  /**
4838
   * Create a UTF-8 string from code points.
4839
   *
4840
   * INFO: opposite to UTF8::codepoints()
4841
   *
4842
   * @param array $array <p>Integer or Hexadecimal codepoints.</p>
4843
   *
4844
   * @return string <p>UTF-8 encoded string.</p>
4845 2
   */
4846
  public static function string(array $array)
4847 2
  {
4848
    return implode(
4849
        '',
4850
        array_map(
4851
            array(
4852
                '\\voku\\helper\\UTF8',
4853
                'chr',
4854 2
            ),
4855 1
            $array
4856 1
        )
4857
    );
4858
  }
4859
4860 2
  /**
4861 2
   * Checks if string starts with "BOM" (Byte Order Mark Character) character.
4862 2
   *
4863 2
   * @param string $str <p>The input string.</p>
4864
   *
4865
   * @return bool <p><strong>true</strong> if the string has BOM at the start, <strong>false</strong> otherwise.</p>
4866
   */
4867
  public static function string_has_bom($str)
4868
  {
4869
    foreach (self::$BOM as $bomString => $bomByteLength) {
4870
      if (0 === strpos($str, $bomString)) {
4871
        return true;
4872
      }
4873
    }
4874
4875
    return false;
4876
  }
4877
4878
  /**
4879
   * Strip HTML and PHP tags from a string + clean invalid UTF-8.
4880
   *
4881
   * @link http://php.net/manual/en/function.strip-tags.php
4882 11
   *
4883
   * @param string  $str            <p>
4884 11
   *                                The input string.
4885 11
   *                                </p>
4886 11
   * @param string  $allowable_tags [optional] <p>
4887
   *                                You can use the optional second parameter to specify tags which should
4888 11
   *                                not be stripped.
4889 1
   *                                </p>
4890 1
   *                                <p>
4891 1
   *                                HTML comments and PHP tags are also stripped. This is hardcoded and
4892
   *                                can not be changed with allowable_tags.
4893 11
   *                                </p>
4894
   * @param boolean $cleanUtf8      [optional] <p>Clean non UTF-8 chars from the string.</p>
4895 11
   *
4896
   * @return string <p>The stripped string.</p>
4897 11
   */
4898 1
  public static function strip_tags($str, $allowable_tags = null, $cleanUtf8 = false)
4899 1
  {
4900
    $str = (string)$str;
4901
4902 11
    if (!isset($str[0])) {
4903 11
      return '';
4904
    }
4905 11
4906
    if ($cleanUtf8) {
4907 11
      $str = self::clean($str);
4908
    }
4909
4910
    return strip_tags($str, $allowable_tags);
4911
  }
4912
4913
  /**
4914
   * Finds position of first occurrence of a string within another, case insensitive.
4915
   *
4916
   * @link http://php.net/manual/en/function.mb-stripos.php
4917
   *
4918
   * @param string  $haystack  <p>
4919
   *                           The string from which to get the position of the first occurrence
4920
   *                           of needle
4921 21
   *                           </p>
4922
   * @param string  $needle    <p>
4923
   *                           The string to find in haystack
4924 21
   *                           </p>
4925
   * @param int     $offset    [optional] <p>
4926 21
   *                           The position in haystack
4927 6
   *                           to start searching
4928
   *                           </p>
4929
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
4930 19
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
4931
   *
4932
   * @return int|false <p>
4933
   *                   Return the numeric position of the first occurrence of needle in the haystack string,<br />
4934
   *                   or false if needle is not found.
4935
   *                   </p>
4936 19
   */
4937 2
  public static function stripos($haystack, $needle, $offset = null, $encoding = 'UTF-8', $cleanUtf8 = false)
4938 2
  {
4939
    $haystack = (string)$haystack;
4940 19
    $needle = (string)$needle;
4941
    $offset = (int)$offset;
4942
4943
    if (!isset($haystack[0], $needle[0])) {
4944
      return false;
4945
    }
4946
4947
    if ($cleanUtf8 === true) {
4948
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
4949
      // if invalid characters are found in $haystack before $needle
4950 3
      $haystack = self::clean($haystack);
4951
      $needle = self::clean($needle);
4952 3
    }
4953
4954 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4955
        $encoding === 'UTF-8'
4956
        ||
4957
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
4958
    ) {
4959
      $encoding = 'UTF-8';
4960
    } else {
4961
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
4962
    }
4963
4964
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
4965
      self::checkForSupport();
4966 16
    }
4967
4968 16
    if (
4969
        $encoding == 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
4970 16
        &&
4971 2
        self::$SUPPORT['intl'] === true
4972
        &&
4973
        Bootup::is_php('5.4') === true
4974 15
    ) {
4975
      return \grapheme_stripos($haystack, $needle, $offset);
4976
    }
4977
4978
    // fallback to "mb_"-function via polyfill
4979
    return \mb_stripos($haystack, $needle, $offset, $encoding);
4980 15
  }
4981 2
4982 2
  /**
4983
   * Returns all of haystack starting from and including the first occurrence of needle to the end.
4984 15
   *
4985
   * @param string  $haystack      <p>The input string. Must be valid UTF-8.</p>
4986
   * @param string  $needle        <p>The string to look for. Must be valid UTF-8.</p>
4987
   * @param bool    $before_needle [optional] <p>
4988
   *                               If <b>TRUE</b>, grapheme_strstr() returns the part of the
4989
   *                               haystack before the first occurrence of the needle (excluding the needle).
4990
   *                               </p>
4991
   * @param string  $encoding      [optional] <p>Set the charset for e.g. "\mb_" function</p>
4992
   * @param boolean $cleanUtf8     [optional] <p>Clean non UTF-8 chars from the string.</p>
4993
   *
4994
   * @return false|string A sub-string,<br />or <strong>false</strong> if needle is not found.
4995
   */
4996
  public static function stristr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
4997
  {
4998
    $haystack = (string)$haystack;
4999
    $needle = (string)$needle;
5000
    $before_needle = (bool)$before_needle;
5001 1
5002
    if (!isset($haystack[0], $needle[0])) {
5003 1
      return false;
5004 1
    }
5005 1
5006 1
    if ($encoding !== 'UTF-8') {
5007 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5008
    }
5009 1
5010 1
    if ($cleanUtf8 === true) {
5011 1
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5012 1
      // if invalid characters are found in $haystack before $needle
5013 1
      $needle = self::clean($needle);
5014
      $haystack = self::clean($haystack);
5015 1
    }
5016 1
5017
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5018 1
      self::checkForSupport();
5019
    }
5020
5021 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5022
        $encoding !== 'UTF-8'
5023
        &&
5024
        self::$SUPPORT['mbstring'] === false
5025
    ) {
5026
      trigger_error('UTF8::stristr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5027
    }
5028
5029
    if (self::$SUPPORT['mbstring'] === true) {
5030 1
      return \mb_stristr($haystack, $needle, $before_needle, $encoding);
5031
    }
5032 1
5033 1
    if (self::$SUPPORT['intl'] === true) {
5034 1
      return \grapheme_stristr($haystack, $needle, $before_needle);
5035
    }
5036 1
5037
    preg_match('/^(.*?)' . preg_quote($needle, '/') . '/usi', $haystack, $match);
5038
5039
    if (!isset($match[1])) {
5040 1
      return false;
5041 1
    }
5042
5043 1
    if ($before_needle) {
5044
      return $match[1];
5045
    }
5046
5047
    return self::substr($haystack, self::strlen($match[1]));
5048
  }
5049
5050
  /**
5051
   * Get the string length, not the byte-length!
5052
   *
5053
   * @link     http://php.net/manual/en/function.mb-strlen.php
5054
   *
5055
   * @param string  $str       <p>The string being checked for length.</p>
5056
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5057
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5058
   *
5059 47
   * @return int <p>The number of characters in the string $str having character encoding $encoding. (One multi-byte
5060
   *             character counted as +1)</p>
5061
   */
5062 47
  public static function strlen($str, $encoding = 'UTF-8', $cleanUtf8 = false)
5063
  {
5064 47
    $str = (string)$str;
5065 9
5066
    if (!isset($str[0])) {
5067
      return 0;
5068 45
    }
5069
5070 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5071
        $encoding === 'UTF-8'
5072 1
        ||
5073 1
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5074
    ) {
5075 45
      $encoding = 'UTF-8';
5076 45
    } else {
5077 37
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5078 37
    }
5079
5080 45
    switch ($encoding) {
5081 2
      case 'ASCII':
5082
      case 'CP850':
5083
        return strlen($str);
5084 43
    }
5085 20
5086 20
    if ($cleanUtf8 === true) {
5087 41
      $str = self::clean($str);
5088
    }
5089
5090 43
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5091
      self::checkForSupport();
5092
    }
5093
5094 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5095
        $encoding !== 'UTF-8'
5096 43
        &&
5097 2
        self::$SUPPORT['mbstring'] === false
5098 43
        &&
5099 43
        self::$SUPPORT['iconv'] === false
5100 43
    ) {
5101 1
      trigger_error('UTF8::strlen() without mbstring / iconv cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5102
    }
5103
5104 43
    if (
5105 43
        $encoding !== 'UTF-8'
5106
        &&
5107
        self::$SUPPORT['iconv'] === true
5108
        &&
5109
        self::$SUPPORT['mbstring'] === false
5110
    ) {
5111
      $returnTmp = \iconv_strlen($str, $encoding);
5112
      if ($returnTmp !== false) {
5113
        return $returnTmp;
5114
      }
5115
    }
5116
5117
    if (self::$SUPPORT['mbstring'] === true) {
5118
      return \mb_strlen($str, $encoding);
5119
    }
5120
5121
    if (self::$SUPPORT['intl'] === true) {
5122
      $str = self::clean($str);
5123
      $returnTmp = \grapheme_strlen($str);
5124
      if ($returnTmp !== null) {
5125
        return $returnTmp;
5126
      }
5127
    }
5128
5129
    if (self::$SUPPORT['iconv'] === true) {
5130
      $returnTmp = \iconv_strlen($str, $encoding);
5131
      if ($returnTmp !== false) {
5132
        return $returnTmp;
5133
      }
5134
    }
5135 1
5136
    // fallback via vanilla php
5137 1
    preg_match_all('/./us', $str, $parts);
5138 1
    $returnTmp = count($parts[0]);
5139
    if ($returnTmp !== 0) {
5140 1
      return $returnTmp;
5141
    }
5142
5143
    // fallback to "mb_"-function via polyfill
5144
    return \mb_strlen($str);
5145
  }
5146
5147
  /**
5148
   * Case insensitive string comparisons using a "natural order" algorithm.
5149
   *
5150
   * INFO: natural order version of UTF8::strcasecmp()
5151
   *
5152
   * @param string $str1 <p>The first string.</p>
5153
   * @param string $str2 <p>The second string.</p>
5154
   *
5155
   * @return int <strong>&lt; 0</strong> if str1 is less than str2<br />
5156
   *             <strong>&gt; 0</strong> if str1 is greater than str2<br />
5157
   *             <strong>0</strong> if they are equal
5158
   */
5159
  public static function strnatcasecmp($str1, $str2)
5160
  {
5161 1
    return self::strnatcmp(self::strtocasefold($str1), self::strtocasefold($str2));
5162
  }
5163 1
5164 1
  /**
5165
   * String comparisons using a "natural order" algorithm
5166 1
   *
5167 1
   * INFO: natural order version of UTF8::strcmp()
5168
   *
5169
   * @link  http://php.net/manual/en/function.strnatcmp.php
5170 1
   *
5171 1
   * @param string $str1 <p>The first string.</p>
5172 1
   * @param string $str2 <p>The second string.</p>
5173
   *
5174 1
   * @return int <strong>&lt; 0</strong> if str1 is less than str2;<br />
5175 1
   *             <strong>&gt; 0</strong> if str1 is greater than str2;<br />
5176
   *             <strong>0</strong> if they are equal
5177
   */
5178 1
  public static function strnatcmp($str1, $str2)
5179 1
  {
5180
    return $str1 . '' === $str2 . '' ? 0 : strnatcmp(self::strtonatfold($str1), self::strtonatfold($str2));
5181 1
  }
5182 1
5183 1
  /**
5184
   * Case-insensitive string comparison of the first n characters.
5185 1
   *
5186
   * @link  http://php.net/manual/en/function.strncasecmp.php
5187
   *
5188
   * @param string $str1 <p>The first string.</p>
5189
   * @param string $str2 <p>The second string.</p>
5190
   * @param int    $len  <p>The length of strings to be used in the comparison.</p>
5191
   *
5192 1
   * @return int <strong>&lt; 0</strong> if <i>str1</i> is less than <i>str2</i>;<br />
5193
   *             <strong>&gt; 0</strong> if <i>str1</i> is greater than <i>str2</i>;<br />
5194
   *             <strong>0</strong> if they are equal
5195
   */
5196
  public static function strncasecmp($str1, $str2, $len)
5197
  {
5198
    return self::strncmp(self::strtocasefold($str1), self::strtocasefold($str2), $len);
5199
  }
5200
5201
  /**
5202
   * String comparison of the first n characters.
5203
   *
5204
   * @link  http://php.net/manual/en/function.strncmp.php
5205
   *
5206
   * @param string $str1 <p>The first string.</p>
5207 6
   * @param string $str2 <p>The second string.</p>
5208
   * @param int    $len  <p>Number of characters to use in the comparison.</p>
5209 6
   *
5210 1
   * @return int <strong>&lt; 0</strong> if <i>str1</i> is less than <i>str2</i>;<br />
5211
   *             <strong>&gt; 0</strong> if <i>str1</i> is greater than <i>str2</i>;<br />
5212
   *             <strong>0</strong> if they are equal
5213 1
   */
5214 1
  public static function strncmp($str1, $str2, $len)
5215 1
  {
5216 1
    $str1 = self::substr($str1, 0, $len);
5217
    $str2 = self::substr($str2, 0, $len);
5218
5219
    return self::strcmp($str1, $str2);
0 ignored issues
show
Security Bug introduced by
It seems like $str1 defined by self::substr($str1, 0, $len) on line 5216 can also be of type false; however, voku\helper\UTF8::strcmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
Security Bug introduced by
It seems like $str2 defined by self::substr($str2, 0, $len) on line 5217 can also be of type false; however, voku\helper\UTF8::strcmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
5220 1
  }
5221 1
5222 1
  /**
5223 1
   * Search a string for any of a set of characters.
5224 1
   *
5225 1
   * @link  http://php.net/manual/en/function.strpbrk.php
5226 1
   *
5227 1
   * @param string $haystack  <p>The string where char_list is looked for.</p>
5228
   * @param string $char_list <p>This parameter is case sensitive.</p>
5229
   *
5230
   * @return string String starting from the character found, or false if it is not found.
5231 1
   */
5232 1
  public static function strpbrk($haystack, $char_list)
5233 1
  {
5234 1
    $haystack = (string)$haystack;
5235 1
    $char_list = (string)$char_list;
5236 1
5237 1
    if (!isset($haystack[0], $char_list[0])) {
5238 1
      return false;
5239
    }
5240
5241 1
    if (preg_match('/' . self::rxClass($char_list) . '/us', $haystack, $m)) {
5242 1
      return substr($haystack, strpos($haystack, $m[0]));
5243 1
    } else {
5244 1
      return false;
5245
    }
5246
  }
5247
5248 1
  /**
5249
   * Find position of first occurrence of string in a string.
5250 6
   *
5251 1
   * @link http://php.net/manual/en/function.mb-strpos.php
5252 1
   *
5253 1
   * @param string  $haystack  <p>The string being checked.</p>
5254 1
   * @param string  $needle    <p>The position counted from the beginning of haystack.</p>
5255
   * @param int     $offset    [optional] <p>The search offset. If it is not specified, 0 is used.</p>
5256 1
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5257
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5258
   *
5259 6
   * @return int|false <p>
5260 6
   *                   The numeric position of the first occurrence of needle in the haystack string.<br />
5261
   *                   If needle is not found it returns false.
5262 6
   *                   </p>
5263 4
   */
5264 4
  public static function strpos($haystack, $needle, $offset = 0, $encoding = 'UTF-8', $cleanUtf8 = false)
5265
  {
5266 6
    $haystack = (string)$haystack;
5267
    $needle = (string)$needle;
5268 6
5269
    if (!isset($haystack[0], $needle[0])) {
5270
      return false;
5271
    }
5272
5273
    // init
5274
    $offset = (int)$offset;
5275
5276
    // iconv and mbstring do not support integer $needle
5277
5278
    if (((int)$needle) === $needle && ($needle >= 0)) {
0 ignored issues
show
Unused Code Bug introduced by
The strict comparison === seems to always evaluate to false as the types of (int) $needle (integer) and $needle (string) can never be identical. Maybe you want to use a loose comparison == instead?
Loading history...
5279
      $needle = (string)self::chr($needle);
5280 1
    }
5281
5282 1
    if ($cleanUtf8 === true) {
5283
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5284 1
      // if invalid characters are found in $haystack before $needle
5285 1
      $needle = self::clean($needle);
5286
      $haystack = self::clean($haystack);
5287
    }
5288 1
5289 1 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5290 1
        $encoding === 'UTF-8'
5291
        ||
5292 1
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5293
    ) {
5294
      $encoding = 'UTF-8';
5295 1
    } else {
5296 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5297
    }
5298 1
5299 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5300
      self::checkForSupport();
5301 1
    }
5302
5303 1 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5304 1
        $encoding !== 'UTF-8'
0 ignored issues
show
Comprehensibility introduced by
Consider adding parentheses for clarity. Current Interpretation: ($encoding !== 'UTF-8') ...PPORT['iconv'] === true, Probably Intended Meaning: $encoding !== ('UTF-8' &...PORT['iconv'] === true)

When comparing the result of a bit operation, we suggest to add explicit parenthesis and not to rely on PHP’s built-in operator precedence to ensure the code behaves as intended and to make it more readable.

Let’s take a look at these examples:

// Returns always int(0).
return 0 === $foo & 4;
return (0 === $foo) & 4;

// More likely intended return: true/false
return 0 === ($foo & 4);
Loading history...
5305
        &
5306 1
        self::$SUPPORT['iconv'] === true
5307
        &&
5308 1
        self::$SUPPORT['mbstring'] === false
5309
    ) {
5310 1
      trigger_error('UTF8::strpos() without mbstring / iconv cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5311
    }
5312 1
5313
    if (
5314
        $offset >= 0 // iconv_strpos() can't handle negative offset
5315
        &&
5316
        $encoding !== 'UTF-8'
5317
        &&
5318
        self::$SUPPORT['mbstring'] === false
5319
        &&
5320
        self::$SUPPORT['iconv'] === true
5321
    ) {
5322
      // ignore invalid negative offset to keep compatibility
5323
      // with php < 5.5.35, < 5.6.21, < 7.0.6
0 ignored issues
show
Unused Code Comprehensibility introduced by
39% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
5324
      return \iconv_strpos($haystack, $needle, $offset > 0 ? $offset : 0, $encoding);
5325
    }
5326 7
5327
    if (self::$SUPPORT['mbstring'] === true) {
5328 7
      return \mb_strpos($haystack, $needle, $offset, $encoding);
5329
    }
5330
5331 View Code Duplication
    if (self::$SUPPORT['intl'] === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5332
      $returnTmp = \grapheme_strpos($haystack, $needle, $offset);
5333
      if ($returnTmp !== false) {
5334
        return $returnTmp;
5335
      }
5336
    }
5337
5338
    if (
5339
        $offset >= 0 // iconv_strpos() can't handle negative offset
5340 1
        &&
5341
        self::$SUPPORT['iconv'] === true
5342 1
    ) {
5343
      // ignore invalid negative offset to keep compatibility
5344
      // with php < 5.5.35, < 5.6.21, < 7.0.6
0 ignored issues
show
Unused Code Comprehensibility introduced by
39% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
5345
      return \iconv_strpos($haystack, $needle, $offset > 0 ? $offset : 0, $encoding);
5346
    }
5347
5348
    // fallback via vanilla php
5349
5350
    $haystack = self::substr($haystack, $offset);
5351
5352
    if ($offset < 0) {
5353
      $offset = 0;
5354 1
    }
5355
5356 1
    $pos = strpos($haystack, $needle);
5357
    if ($pos === false) {
5358
      return false;
5359
    }
5360
5361
    $returnTmp = $offset + self::strlen(substr($haystack, 0, $pos));
5362
    if ($returnTmp !== false) {
5363
      return $returnTmp;
5364
    }
5365
5366
    // fallback to "mb_"-function via polyfill
5367
    return \mb_strpos($haystack, $needle, $offset);
5368 1
  }
5369
5370 1
  /**
5371
   * Finds the last occurrence of a character in a string within another.
5372
   *
5373
   * @link http://php.net/manual/en/function.mb-strrchr.php
5374
   *
5375
   * @param string $haystack      <p>The string from which to get the last occurrence of needle.</p>
5376
   * @param string $needle        <p>The string to find in haystack</p>
5377
   * @param bool   $before_needle [optional] <p>
5378
   *                              Determines which portion of haystack
5379
   *                              this function returns.
5380
   *                              If set to true, it returns all of haystack
5381
   *                              from the beginning to the last occurrence of needle.
5382
   *                              If set to false, it returns all of haystack
5383
   *                              from the last occurrence of needle to the end,
5384
   *                              </p>
5385 13
   * @param string $encoding      [optional] <p>
5386
   *                              Character encoding name to use.
5387 13
   *                              If it is omitted, internal character encoding is used.
5388
   *                              </p>
5389
   * @param bool   $cleanUtf8     [optional] <p>Clean non UTF-8 chars from the string.</p>
5390 13
   *
5391
   * @return string|false The portion of haystack or false if needle is not found.
5392 13
   */
5393 3 View Code Duplication
  public static function strrchr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5394
  {
5395
    if ($encoding !== 'UTF-8') {
5396 11
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5397
    }
5398
5399 11
    if ($cleanUtf8 === true) {
5400 7
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5401
      // if invalid characters are found in $haystack before $needle
5402
      $needle = self::clean($needle);
5403 5
      $haystack = self::clean($haystack);
5404 1
    }
5405
5406
    // fallback to "mb_"-function via polyfill
5407
    return \mb_strrchr($haystack, $needle, $before_needle, $encoding);
5408 1
  }
5409 1
5410
  /**
5411
   * Reverses characters order in the string.
5412 1
   *
5413 1
   * @param string $str The input string
5414
   *
5415
   * @return string The string with characters in the reverse sequence
5416 1
   */
5417
  public static function strrev($str)
5418
  {
5419 1
    $str = (string)$str;
5420
5421 5
    if (!isset($str[0])) {
5422 5
      return '';
5423 5
    }
5424
5425 5
    return implode('', array_reverse(self::split($str)));
5426
  }
5427 5
5428 5
  /**
5429
   * Finds the last occurrence of a character in a string within another, case insensitive.
5430
   *
5431 5
   * @link http://php.net/manual/en/function.mb-strrichr.php
5432
   *
5433
   * @param string  $haystack      <p>The string from which to get the last occurrence of needle.</p>
5434 5
   * @param string  $needle        <p>The string to find in haystack.</p>
5435 5
   * @param bool    $before_needle [optional] <p>
5436 5
   *                               Determines which portion of haystack
5437
   *                               this function returns.
5438 5
   *                               If set to true, it returns all of haystack
5439 2
   *                               from the beginning to the last occurrence of needle.
5440
   *                               If set to false, it returns all of haystack
5441 2
   *                               from the last occurrence of needle to the end,
5442 2
   *                               </p>
5443 2
   * @param string  $encoding      [optional] <p>
5444
   *                               Character encoding name to use.
5445 2
   *                               If it is omitted, internal character encoding is used.
5446 1
   *                               </p>
5447
   * @param boolean $cleanUtf8     [optional] <p>Clean non UTF-8 chars from the string.</p>
5448 1
   *
5449 1
   * @return string|false <p>The portion of haystack or<br />false if needle is not found.</p>
5450 1
   */
5451 View Code Duplication
  public static function strrichr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5452 1
  {
5453
    if ($encoding !== 'UTF-8') {
5454
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5455
    }
5456
5457
    if ($cleanUtf8 === true) {
5458
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5459
      // if invalid characters are found in $haystack before $needle
5460
      $needle = self::clean($needle);
5461
      $haystack = self::clean($haystack);
5462
    }
5463
5464
    return \mb_strrichr($haystack, $needle, $before_needle, $encoding);
5465
  }
5466
5467 1
  /**
5468 2
   * Find position of last occurrence of a case-insensitive string.
5469
   *
5470 5
   * @param string  $haystack  <p>The string to look in.</p>
5471
   * @param string  $needle    <p>The string to look for.</p>
5472
   * @param int     $offset    [optional] <p>Number of characters to ignore in the beginning or end.</p>
5473
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5474
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5475 5
   *
5476
   * @return int|false <p>
5477
   *                   The numeric position of the last occurrence of needle in the haystack string.<br />If needle is
5478
   *                   not found, it returns false.
5479
   *                   </p>
5480 5
   */
5481 5
  public static function strripos($haystack, $needle, $offset = 0, $encoding = 'UTF-8', $cleanUtf8 = false)
5482 1
  {
5483 1
    if ((int)$needle === $needle && $needle >= 0) {
0 ignored issues
show
Unused Code Bug introduced by
The strict comparison === seems to always evaluate to false as the types of (int) $needle (integer) and $needle (string) can never be identical. Maybe you want to use a loose comparison == instead?
Loading history...
5484
      $needle = (string)self::chr($needle);
5485 1
    }
5486 1
5487 1
    // init
5488
    $haystack = (string)$haystack;
5489 1
    $needle = (string)$needle;
5490
    $offset = (int)$offset;
5491 5
5492 5
    if (!isset($haystack[0], $needle[0])) {
5493 5
      return false;
5494 5
    }
5495 1
5496 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5497 5
        $cleanUtf8 === true
5498
        ||
5499 5
        $encoding === true // INFO: the "bool"-check is only a fallback for old versions
5500
    ) {
5501
      // \mb_strripos && iconv_strripos is not tolerant to invalid characters
5502
5503
      $needle = self::clean($needle);
5504
      $haystack = self::clean($haystack);
5505
    }
5506
5507 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5508
        $encoding === 'UTF-8'
5509 2
        ||
5510
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5511 2
    ) {
5512
      $encoding = 'UTF-8';
5513 1
    } else {
5514
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5515
    }
5516 1
5517 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5518
      self::checkForSupport();
5519 1
    }
5520
5521 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5522 2
        $encoding !== 'UTF-8'
5523
        &&
5524 2
        self::$SUPPORT['mbstring'] === false
5525 1
    ) {
5526
      trigger_error('UTF8::strripos() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5527
    }
5528 2
5529
    if (self::$SUPPORT['mbstring'] === true) {
5530
      return \mb_strripos($haystack, $needle, $offset, $encoding);
5531
    }
5532
5533 View Code Duplication
    if (self::$SUPPORT['intl'] === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5534
      $returnTmp = \grapheme_strripos($haystack, $needle, $offset);
5535
      if ($returnTmp !== false) {
5536
        return $returnTmp;
5537
      }
5538
    }
5539
5540 1
    // fallback via vanilla php
5541
5542 1
    return self::strrpos(self::strtonatfold($haystack), self::strtonatfold($needle), $offset, $encoding, $cleanUtf8);
5543
  }
5544
5545
  /**
5546
   * Find position of last occurrence of a string in a string.
5547
   *
5548
   * @link http://php.net/manual/en/function.mb-strrpos.php
5549
   *
5550
   * @param string     $haystack  <p>The string being checked, for the last occurrence of needle</p>
5551
   * @param string|int $needle    <p>The string to find in haystack.<br />Or a code point as int.</p>
5552
   * @param int        $offset    [optional] <p>May be specified to begin searching an arbitrary number of characters
5553
   *                              into the string. Negative values will stop searching at an arbitrary point prior to
5554
   *                              the end of the string.
5555
   *                              </p>
5556
   * @param string     $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5557
   * @param boolean    $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5558
   *
5559
   * @return int|false <p>The numeric position of the last occurrence of needle in the haystack string.<br />If needle
5560
   *                   is not found, it returns false.</p>
5561
   */
5562
  public static function strrpos($haystack, $needle, $offset = null, $encoding = 'UTF-8', $cleanUtf8 = false)
5563
  {
5564
    if ((int)$needle === $needle && $needle >= 0) {
5565
      $needle = (string)self::chr($needle);
5566
    }
5567
5568 20
    // init
5569
    $haystack = (string)$haystack;
5570 20
    $needle = (string)$needle;
5571 2
    $offset = (int)$offset;
5572
5573
    if (!isset($haystack[0], $needle[0])) {
5574 2
      return false;
5575 2
    }
5576
5577 2 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5578
        $cleanUtf8 === true
5579
        ||
5580 20
        $encoding === true // INFO: the "bool"-check is only a fallback for old versions
5581
    ) {
5582 20
      // \mb_strrpos && iconv_strrpos is not tolerant to invalid characters
5583 4
      $needle = self::clean($needle);
5584
      $haystack = self::clean($haystack);
5585
    }
5586 19
5587 19 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5588
        $encoding === 'UTF-8'
5589
        ||
5590 19
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5591 19
    ) {
5592
      $encoding = 'UTF-8';
5593 19
    } else {
5594 19
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5595 19
    }
5596 19
5597
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5598 19
      self::checkForSupport();
5599
    }
5600 16
5601 16 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5602 16
        $encoding !== 'UTF-8'
5603 16
        &&
5604 5
        self::$SUPPORT['mbstring'] === false
5605 5
    ) {
5606 5
      trigger_error('UTF8::strrpos() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5607
    }
5608
5609 19 View Code Duplication
    if (self::$SUPPORT['mbstring'] === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5610
      $returnTmp = \mb_strrpos($haystack, $needle, $offset, $encoding);
5611 17
      if ($returnTmp !== false) {
5612 13
        return $returnTmp;
5613 13
      }
5614 13
    }
5615 8
5616 8 View Code Duplication
    if (self::$SUPPORT['intl'] === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5617 8
      $returnTmp = \grapheme_strrpos($haystack, $needle, $offset);
5618
      if ($returnTmp !== false) {
5619
        return $returnTmp;
5620 19
      }
5621
    }
5622 9
5623 4
    // fallback via vanilla php
5624 4
5625 4
    if ($offset > 0) {
5626 6
      $haystack = self::substr($haystack, $offset);
5627 6
    } elseif ($offset < 0) {
5628 6
      $haystack = self::substr($haystack, 0, $offset);
5629
      $offset = 0;
5630
    }
5631 9
5632 6
    $pos = strrpos($haystack, $needle);
5633 6
    if ($pos === false) {
5634 6
      return false;
5635
    }
5636
5637 19
    return $offset + self::strlen(substr($haystack, 0, $pos));
5638
  }
5639 4
5640 4
  /**
5641 2
   * Finds the length of the initial segment of a string consisting entirely of characters contained within a given
5642 2
   * mask.
5643 3
   *
5644 3
   * @param string $str    <p>The input string.</p>
5645 3
   * @param string $mask   <p>The mask of chars</p>
5646
   * @param int    $offset [optional]
5647
   * @param int    $length [optional]
5648 4
   *
5649 16
   * @return int
5650
   */
5651 19
  public static function strspn($str, $mask, $offset = 0, $length = 2147483647)
5652
  {
5653
    // init
5654 19
    $length = (int)$length;
5655 19
    $offset = (int)$offset;
5656
5657 3
    if ($offset || 2147483647 !== $length) {
5658 19
      $str = self::substr($str, $offset, $length);
5659
    }
5660 19
5661
    $str = (string)$str;
5662
    if (!isset($str[0], $mask[0])) {
5663 19
      return 0;
5664 19
    }
5665 19
5666 2
    return preg_match('/^' . self::rxClass($mask) . '+/u', $str, $str) ? self::strlen($str[0]) : 0;
5667 19
  }
5668
5669 19
  /**
5670
   * Returns part of haystack string from the first occurrence of needle to the end of haystack.
5671 19
   *
5672
   * @param string  $haystack      <p>The input string. Must be valid UTF-8.</p>
5673
   * @param string  $needle        <p>The string to look for. Must be valid UTF-8.</p>
5674
   * @param bool    $before_needle [optional] <p>
5675
   *                               If <b>TRUE</b>, strstr() returns the part of the
5676
   *                               haystack before the first occurrence of the needle (excluding the needle).
5677
   *                               </p>
5678
   * @param string  $encoding      [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5679
   * @param boolean $cleanUtf8     [optional] <p>Clean non UTF-8 chars from the string.</p>
5680
   *
5681
   * @return string|false A sub-string,<br />or <strong>false</strong> if needle is not found.
5682
   */
5683
  public static function strstr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
5684
  {
5685
    $haystack = (string)$haystack;
5686
    $needle = (string)$needle;
5687 26
5688
    if (!isset($haystack[0], $needle[0])) {
5689 26
      return false;
5690
    }
5691 26
5692 5
    if ($cleanUtf8 === true) {
5693
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5694
      // if invalid characters are found in $haystack before $needle
5695
      $needle = self::clean($needle);
5696 22
      $haystack = self::clean($haystack);
5697 6
    }
5698
5699
    if ($encoding !== 'UTF-8') {
5700 16
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5701
    }
5702
5703
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5704
      self::checkForSupport();
5705
    }
5706
5707 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5708
        $encoding !== 'UTF-8'
5709
        &&
5710
        self::$SUPPORT['mbstring'] === false
5711
    ) {
5712 14
      trigger_error('UTF8::strstr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5713
    }
5714 14
5715
    if (self::$SUPPORT['mbstring'] === true) {
5716
      $returnTmp = \mb_strstr($haystack, $needle, $before_needle, $encoding);
5717
      if ($returnTmp !== false) {
5718
        return $returnTmp;
5719
      }
5720
    }
5721
5722
    if (self::$SUPPORT['intl'] === true) {
5723
      $returnTmp = \grapheme_strstr($haystack, $needle, $before_needle);
5724
      if ($returnTmp !== false) {
5725
        return $returnTmp;
5726
      }
5727
    }
5728 1
5729
    preg_match('/^(.*?)' . preg_quote($needle, '/') . '/us', $haystack, $match);
5730 1
5731
    if (!isset($match[1])) {
5732
      return false;
5733
    }
5734
5735
    if ($before_needle) {
5736
      return $match[1];
5737
    }
5738
5739
    return self::substr($haystack, self::strlen($match[1]));
5740
  }
5741
5742
  /**
5743
   * Unicode transformation for case-less matching.
5744 8
   *
5745
   * @link http://unicode.org/reports/tr21/tr21-5.html
5746 8
   *
5747 2
   * @param string  $str       <p>The input string.</p>
5748
   * @param bool    $full      [optional] <p>
5749
   *                           <b>true</b>, replace full case folding chars (default)<br />
5750 7
   *                           <b>false</b>, use only limited static array [UTF8::$commonCaseFold]
5751 7
   *                           </p>
5752 7
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5753
   *
5754 7
   * @return string
5755 1
   */
5756 1
  public static function strtocasefold($str, $full = true, $cleanUtf8 = false)
5757 7
  {
5758
    // init
5759
    $str = (string)$str;
5760 7
5761
    if (!isset($str[0])) {
5762 7
      return '';
5763 7
    }
5764
5765
    static $COMMON_CASE_FOLD_KEYS_CACHE = null;
5766
    static $COMMAN_CASE_FOLD_VALUES_CACHE = null;
5767 7
5768
    if ($COMMON_CASE_FOLD_KEYS_CACHE === null) {
5769
      $COMMON_CASE_FOLD_KEYS_CACHE = array_keys(self::$COMMON_CASE_FOLD);
5770
      $COMMAN_CASE_FOLD_VALUES_CACHE = array_values(self::$COMMON_CASE_FOLD);
5771 1
    }
5772 1
5773 1
    $str = str_replace($COMMON_CASE_FOLD_KEYS_CACHE, $COMMAN_CASE_FOLD_VALUES_CACHE, $str);
5774 7
5775 7
    if ($full) {
5776 7
5777
      static $FULL_CASE_FOLD = null;
5778 7
5779 7
      if ($FULL_CASE_FOLD === null) {
5780
        $FULL_CASE_FOLD = self::getData('caseFolding_full');
5781 7
      }
5782
5783
      /** @noinspection OffsetOperationsInspection */
5784
      $str = str_replace($FULL_CASE_FOLD[0], $FULL_CASE_FOLD[1], $str);
5785
    }
5786
5787
    if ($cleanUtf8 === true) {
5788
      $str = self::clean($str);
5789
    }
5790
5791
    return self::strtolower($str);
5792
  }
5793
5794
  /**
5795
   * Make a string lowercase.
5796
   *
5797
   * @link http://php.net/manual/en/function.mb-strtolower.php
5798
   *
5799
   * @param string  $str       <p>The string being lowercased.</p>
5800
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function</p>
5801 1
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5802
   *
5803 1
   * @return string str with all alphabetic characters converted to lowercase.
5804
   */
5805 1 View Code Duplication
  public static function strtolower($str, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5806 1
  {
5807
    // init
5808
    $str = (string)$str;
5809 1
5810
    if (!isset($str[0])) {
5811 1
      return '';
5812
    }
5813 1
5814 1
    if ($cleanUtf8 === true) {
5815 1
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5816 1
      // if invalid characters are found in $haystack before $needle
5817
      $str = self::clean($str);
5818 1
    }
5819 1
5820 1
    if ($encoding !== 'UTF-8') {
5821
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5822 1
    }
5823
5824
    return \mb_strtolower($str, $encoding);
5825
  }
5826
5827
  /**
5828
   * Generic case sensitive transformation for collation matching.
5829
   *
5830 1
   * @param string $str <p>The input string</p>
5831
   *
5832
   * @return string
5833
   */
5834
  private static function strtonatfold($str)
5835
  {
5836
    /** @noinspection PhpUndefinedClassInspection */
5837
    return preg_replace('/\p{Mn}+/u', '', \Normalizer::normalize($str, \Normalizer::NFD));
5838
  }
5839
5840
  /**
5841
   * Make a string uppercase.
5842
   *
5843
   * @link http://php.net/manual/en/function.mb-strtoupper.php
5844
   *
5845
   * @param string  $str       <p>The string being uppercased.</p>
5846
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
5847
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5848
   *
5849
   * @return string str with all alphabetic characters converted to uppercase.
5850
   */
5851 View Code Duplication
  public static function strtoupper($str, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5852
  {
5853
    $str = (string)$str;
5854
5855
    if (!isset($str[0])) {
5856
      return '';
5857
    }
5858
5859
    if ($cleanUtf8 === true) {
5860
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5861
      // if invalid characters are found in $haystack before $needle
5862
      $str = self::clean($str);
5863
    }
5864
5865
    if ($encoding !== 'UTF-8') {
5866
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5867
    }
5868
5869
    return \mb_strtoupper($str, $encoding);
5870
  }
5871
5872
  /**
5873
   * Translate characters or replace sub-strings.
5874
   *
5875
   * @link  http://php.net/manual/en/function.strtr.php
5876
   *
5877
   * @param string          $str  <p>The string being translated.</p>
5878
   * @param string|string[] $from <p>The string replacing from.</p>
5879
   * @param string|string[] $to   <p>The string being translated to to.</p>
5880
   *
5881
   * @return string <p>
5882
   *                This function returns a copy of str, translating all occurrences of each character in from to the
5883
   *                corresponding character in to.
5884
   *                </p>
5885
   */
5886
  public static function strtr($str, $from, $to = INF)
5887
  {
5888
    if (INF !== $to) {
5889
      $from = self::str_split($from);
0 ignored issues
show
Bug introduced by
It seems like $from defined by self::str_split($from) on line 5889 can also be of type array<integer,string>; however, voku\helper\UTF8::str_split() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
5890
      $to = self::str_split($to);
0 ignored issues
show
Bug introduced by
It seems like $to defined by self::str_split($to) on line 5890 can also be of type array<integer,string>; however, voku\helper\UTF8::str_split() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
5891
      $countFrom = count($from);
5892
      $countTo = count($to);
5893
5894
      if ($countFrom > $countTo) {
5895
        $from = array_slice($from, 0, $countTo);
5896
      } elseif ($countFrom < $countTo) {
5897
        $to = array_slice($to, 0, $countFrom);
5898
      }
5899
5900
      $from = array_combine($from, $to);
5901
    }
5902
5903
    return strtr($str, $from);
0 ignored issues
show
Bug introduced by
It seems like $from defined by parameter $from on line 5886 can also be of type string; however, strtr() does only seem to accept array, maybe add an additional type check?

This check looks at variables that have been passed in as parameters and are passed out again to other methods.

If the outgoing method call has stricter type requirements than the method itself, an issue is raised.

An additional type check may prevent trouble.

Loading history...
5904
  }
5905
5906
  /**
5907
   * Return the width of a string.
5908
   *
5909
   * @param string  $str       <p>The input string.</p>
5910
   * @param string  $encoding  [optional] <p>Default is UTF-8</p>
5911
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5912
   *
5913
   * @return int
5914
   */
5915
  public static function strwidth($str, $encoding = 'UTF-8', $cleanUtf8 = false)
5916
  {
5917
    if ($encoding !== 'UTF-8') {
5918
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5919
    }
5920
5921
    if ($cleanUtf8 === true) {
5922
      // iconv and mbstring are not tolerant to invalid encoding
5923
      // further, their behaviour is inconsistent with that of PHP's substr
5924
      $str = self::clean($str);
5925
    }
5926
5927
    // fallback to "mb_"-function via polyfill
5928
    return \mb_strwidth($str, $encoding);
5929
  }
5930
5931
  /**
5932
   * Get part of a string.
5933
   *
5934
   * @link http://php.net/manual/en/function.mb-substr.php
5935
   *
5936
   * @param string  $str       <p>The string being checked.</p>
5937
   * @param int     $start     <p>The first position used in str.</p>
5938
   * @param int     $length    [optional] <p>The maximum length of the returned string.</p>
5939
   * @param string  $encoding  [optional] <p>Default is UTF-8</p>
5940
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
5941
   *
5942
   * @return string <p>Returns a sub-string specified by the start and length parameters.</p>
5943
   */
5944
  public static function substr($str, $start = 0, $length = null, $encoding = 'UTF-8', $cleanUtf8 = false)
5945
  {
5946
    // init
5947
    $str = (string)$str;
5948
5949
    if (!isset($str[0])) {
5950
      return '';
5951
    }
5952
5953
    if ($cleanUtf8 === true) {
5954
      // iconv and mbstring are not tolerant to invalid encoding
5955
      // further, their behaviour is inconsistent with that of PHP's substr
5956
      $str = self::clean($str);
5957
    }
5958
5959
    $str_length = 0;
5960
    if ($start || $length === null) {
5961
      $str_length = (int)self::strlen($str);
5962
    }
5963
5964
    if ($start && $start > $str_length) {
5965
      return false;
5966
    }
5967
5968
    if ($length === null) {
5969
      $length = $str_length;
5970
    } else {
5971
      $length = (int)$length;
5972
    }
5973
5974 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5975
        $encoding === 'UTF-8'
5976
        ||
5977
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5978
    ) {
5979
      $encoding = 'UTF-8';
5980
    } else {
5981
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5982
    }
5983
5984
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5985
      self::checkForSupport();
5986
    }
5987
5988 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5989
        $encoding !== 'UTF-8'
5990
        &&
5991
        self::$SUPPORT['mbstring'] === false
5992
    ) {
5993
      trigger_error('UTF8::substr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5994
    }
5995
5996
    if (self::$SUPPORT['mbstring'] === true) {
5997
      return \mb_substr($str, $start, $length, $encoding);
5998
    }
5999
6000
    if (
6001
        $length >= 0 // "iconv_substr()" can't handle negative length
6002
        &&
6003
        self::$SUPPORT['iconv'] === true
6004
    ) {
6005
      return \iconv_substr($str, $start, $length);
6006
    }
6007
6008
    if (self::$SUPPORT['intl'] === true) {
6009
      return \grapheme_substr($str, $start, $length);
6010
    }
6011
6012
    // fallback via vanilla php
6013
6014
    // split to array, and remove invalid characters
6015
    $array = self::split($str);
6016
6017
    // extract relevant part, and join to make sting again
6018
    return implode('', array_slice($array, $start, $length));
6019
  }
6020
6021
  /**
6022
   * Binary safe comparison of two strings from an offset, up to length characters.
6023
   *
6024
   * @param string  $main_str           <p>The main string being compared.</p>
6025
   * @param string  $str                <p>The secondary string being compared.</p>
6026
   * @param int     $offset             <p>The start position for the comparison. If negative, it starts counting from
6027
   *                                    the end of the string.</p>
6028
   * @param int     $length             [optional] <p>The length of the comparison. The default value is the largest of
6029
   *                                    the length of the str compared to the length of main_str less the offset.</p>
6030
   * @param boolean $case_insensitivity [optional] <p>If case_insensitivity is TRUE, comparison is case
6031
   *                                    insensitive.</p>
6032
   *
6033
   * @return int
6034
   */
6035
  public static function substr_compare($main_str, $str, $offset, $length = 2147483647, $case_insensitivity = false)
6036
  {
6037
    $main_str = self::substr($main_str, $offset, $length);
6038
    $str = self::substr($str, 0, self::strlen($main_str));
0 ignored issues
show
Security Bug introduced by
It seems like $main_str defined by self::substr($main_str, $offset, $length) on line 6037 can also be of type false; however, voku\helper\UTF8::strlen() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
6039
6040
    return $case_insensitivity === true ? self::strcasecmp($main_str, $str) : self::strcmp($main_str, $str);
0 ignored issues
show
Security Bug introduced by
It seems like $main_str defined by self::substr($main_str, $offset, $length) on line 6037 can also be of type false; however, voku\helper\UTF8::strcasecmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
Security Bug introduced by
It seems like $str defined by self::substr($str, 0, self::strlen($main_str)) on line 6038 can also be of type false; however, voku\helper\UTF8::strcasecmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
Security Bug introduced by
It seems like $main_str defined by self::substr($main_str, $offset, $length) on line 6037 can also be of type false; however, voku\helper\UTF8::strcmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
Security Bug introduced by
It seems like $str defined by self::substr($str, 0, self::strlen($main_str)) on line 6038 can also be of type false; however, voku\helper\UTF8::strcmp() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
6041
  }
6042
6043
  /**
6044
   * Count the number of substring occurrences.
6045
   *
6046
   * @link  http://php.net/manual/en/function.substr-count.php
6047
   *
6048
   * @param string  $haystack  <p>The string to search in.</p>
6049
   * @param string  $needle    <p>The substring to search for.</p>
6050
   * @param int     $offset    [optional] <p>The offset where to start counting.</p>
6051
   * @param int     $length    [optional] <p>
6052
   *                           The maximum length after the specified offset to search for the
6053
   *                           substring. It outputs a warning if the offset plus the length is
6054
   *                           greater than the haystack length.
6055
   *                           </p>
6056
   * @param string  $encoding  <p>Set the charset for e.g. "\mb_" function.</p>
6057 1
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
6058
   *
6059 1
   * @return int|false <p>This functions returns an integer or false if there isn't a string.</p>
6060
   */
6061
  public static function substr_count($haystack, $needle, $offset = 0, $length = null, $encoding = 'UTF-8', $cleanUtf8 = false)
6062
  {
6063
    // init
6064
    $haystack = (string)$haystack;
6065
    $needle = (string)$needle;
6066
6067
    if (!isset($haystack[0], $needle[0])) {
6068
      return false;
6069 6
    }
6070
6071 6
    if ($offset || $length) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $length of type integer|null is loosely compared to true; this is ambiguous if the integer can be zero. You might want to explicitly use !== null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
6072 6
      $offset = (int)$offset;
6073
      $length = (int)$length;
6074 6
6075
      if (
6076 6
          $length + $offset <= 0
6077 3
          &&
6078
          Bootup::is_php('7.1') === false
6079
      ) {
6080
        return false;
6081 6
      }
6082
6083 6
      $haystack = self::substr($haystack, $offset, $length, $encoding);
6084 1
    }
6085 1
6086 1
    if ($encoding !== 'UTF-8') {
6087
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6088 6
    }
6089
6090
    if ($cleanUtf8 === true) {
6091
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
6092
      // if invalid characters are found in $haystack before $needle
6093
      $needle = self::clean($needle);
6094
      $haystack = self::clean($haystack);
0 ignored issues
show
Security Bug introduced by
It seems like $haystack can also be of type false; however, voku\helper\UTF8::clean() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
6095
    }
6096
6097
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6098 6
      self::checkForSupport();
6099
    }
6100 6
6101 View Code Duplication
    if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6102 6
        $encoding !== 'UTF-8'
6103 6
        &&
6104
        self::$SUPPORT['mbstring'] === false
6105
    ) {
6106 5
      trigger_error('UTF8::substr_count() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
6107 5
    }
6108
6109 5
    if (self::$SUPPORT['mbstring'] === true) {
6110 1
      return \mb_substr_count($haystack, $needle, $encoding);
6111 1
    }
6112 1
6113
    preg_match_all('/' . preg_quote($needle, '/') . '/us', $haystack, $matches, PREG_SET_ORDER);
6114 5
6115
    return count($matches);
6116
  }
6117
6118
  /**
6119
   * Removes an prefix ($needle) from start of the string ($haystack), case insensitive.
6120
   *
6121
   * @param string $haystack <p>The string to search in.</p>
6122
   * @param string $needle   <p>The substring to search for.</p>
6123
   *
6124
   * @return string <p>Return the sub-string.</p>
6125
   */
6126 View Code Duplication
  public static function substr_ileft($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6127
  {
6128
    // init
6129
    $haystack = (string)$haystack;
6130
    $needle = (string)$needle;
6131
6132
    if (!isset($haystack[0])) {
6133
      return '';
6134
    }
6135
6136
    if (!isset($needle[0])) {
6137
      return $haystack;
6138
    }
6139
6140
    if (self::str_istarts_with($haystack, $needle) === true) {
6141
      $haystack = self::substr($haystack, self::strlen($needle));
6142
    }
6143
6144 1
    return $haystack;
6145
  }
6146 1
6147
  /**
6148
   * Removes an suffix ($needle) from end of the string ($haystack), case insensitive.
6149
   *
6150
   * @param string $haystack <p>The string to search in.</p>
6151
   * @param string $needle   <p>The substring to search for.</p>
6152
   *
6153
   * @return string <p>Return the sub-string.</p>
6154
   */
6155 View Code Duplication
  public static function substr_iright($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6156
  {
6157
    // init
6158 1
    $haystack = (string)$haystack;
6159
    $needle = (string)$needle;
6160 1
6161
    if (!isset($haystack[0])) {
6162 1
      return '';
6163 1
    }
6164
6165
    if (!isset($needle[0])) {
6166 1
      return $haystack;
6167
    }
6168 1
6169 1
    if (self::str_iends_with($haystack, $needle) === true) {
6170
      $haystack = self::substr($haystack, 0, self::strlen($haystack) - self::strlen($needle));
6171
    }
6172 1
6173
    return $haystack;
6174
  }
6175 1
6176 1
  /**
6177 1
   * Removes an prefix ($needle) from start of the string ($haystack).
6178 1
   *
6179 1
   * @param string $haystack <p>The string to search in.</p>
6180
   * @param string $needle   <p>The substring to search for.</p>
6181
   *
6182 1
   * @return string <p>Return the sub-string.</p>
6183
   */
6184 View Code Duplication
  public static function substr_left($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6185
  {
6186
    // init
6187
    $haystack = (string)$haystack;
6188
    $needle = (string)$needle;
6189
6190
    if (!isset($haystack[0])) {
6191
      return '';
6192
    }
6193
6194
    if (!isset($needle[0])) {
6195
      return $haystack;
6196
    }
6197
6198
    if (self::str_starts_with($haystack, $needle) === true) {
6199
      $haystack = self::substr($haystack, self::strlen($needle));
6200
    }
6201 10
6202
    return $haystack;
6203 10
  }
6204 10
6205
  /**
6206 10
   * Replace text within a portion of a string.
6207 3
   *
6208
   * source: https://gist.github.com/stemar/8287074
6209
   *
6210 8
   * @param string|string[] $str              <p>The input string or an array of stings.</p>
6211 8
   * @param string|string[] $replacement      <p>The replacement string or an array of stings.</p>
6212 8
   * @param int|int[]       $start            <p>
6213
   *                                          If start is positive, the replacing will begin at the start'th offset
6214 8
   *                                          into string.
6215
   *                                          <br /><br />
6216 8
   *                                          If start is negative, the replacing will begin at the start'th character
6217
   *                                          from the end of string.
6218 8
   *                                          </p>
6219 1
   * @param int|int[]|void  $length           [optional] <p>If given and is positive, it represents the length of the
6220 1
   *                                          portion of string which is to be replaced. If it is negative, it
6221 1
   *                                          represents the number of characters from the end of string at which to
6222
   *                                          stop replacing. If it is not given, then it will default to strlen(
6223 8
   *                                          string ); i.e. end the replacing at the end of string. Of course, if
6224 8
   *                                          length is zero then this function will have the effect of inserting
6225
   *                                          replacement into string at the given start offset.</p>
6226 8
   *
6227 8
   * @return string|string[] <p>The result string is returned. If string is an array then array is returned.</p>
6228 8
   */
6229 8
  public static function substr_replace($str, $replacement, $start, $length = null)
6230 8
  {
6231
    if (is_array($str) === true) {
6232 8
      $num = count($str);
6233 8
6234 8
      // $replacement
6235 8
      if (is_array($replacement) === true) {
6236
        $replacement = array_slice($replacement, 0, $num);
6237 8
      } else {
6238 6
        $replacement = array_pad(array($replacement), $num, $replacement);
6239 6
      }
6240 6
6241 6
      // $start
6242 View Code Duplication
      if (is_array($start) === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6243 6
        $start = array_slice($start, 0, $num);
6244 3
        foreach ($start as &$valueTmp) {
6245 3
          $valueTmp = (int)$valueTmp === $valueTmp ? $valueTmp : 0;
6246
        }
6247 6
        unset($valueTmp);
6248 6
      } else {
6249
        $start = array_pad(array($start), $num, $start);
6250 8
      }
6251
6252
      // $length
6253
      if (!isset($length)) {
6254
        $length = array_fill(0, $num, 0);
6255 View Code Duplication
      } elseif (is_array($length) === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6256
        $length = array_slice($length, 0, $num);
6257
        foreach ($length as &$valueTmpV2) {
6258 1
          if (isset($valueTmpV2)) {
6259
            $valueTmpV2 = (int)$valueTmpV2 === $valueTmpV2 ? $valueTmpV2 : $num;
6260 1
          } else {
6261
            $valueTmpV2 = 0;
6262
          }
6263
        }
6264
        unset($valueTmpV2);
6265
      } else {
6266
        $length = array_pad(array($length), $num, $length);
6267
      }
6268
6269
      // Recursive call
6270
      return array_map(array(__CLASS__, 'substr_replace'), $str, $replacement, $start, $length);
6271
6272
    } else {
6273
6274
      if (is_array($replacement) === true) {
6275
        if (count($replacement) > 0) {
6276
          $replacement = $replacement[0];
6277
        } else {
6278
          $replacement = '';
6279
        }
6280
      }
6281
    }
6282
6283
    // init
6284
    $str = (string)$str;
6285
    $replacement = (string)$replacement;
6286
6287
    if (!isset($str[0])) {
6288
      return $replacement;
6289
    }
6290
6291
    preg_match_all('/./us', $str, $smatches);
6292
    preg_match_all('/./us', $replacement, $rmatches);
6293
6294
    if ($length === null) {
6295
      $length = (int)self::strlen($str);
6296
    }
6297
6298
    array_splice($smatches[0], $start, $length, $rmatches[0]);
6299
6300
    return implode('', $smatches[0]);
6301
  }
6302
6303
  /**
6304
   * Removes an suffix ($needle) from end of the string ($haystack).
6305
   *
6306
   * @param string $haystack <p>The string to search in.</p>
6307
   * @param string $needle   <p>The substring to search for.</p>
6308
   *
6309
   * @return string <p>Return the sub-string.</p>
6310
   */
6311 View Code Duplication
  public static function substr_right($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6312
  {
6313
    $haystack = (string)$haystack;
6314
    $needle = (string)$needle;
6315
6316
    if (!isset($haystack[0])) {
6317
      return '';
6318
    }
6319
6320
    if (!isset($needle[0])) {
6321
      return $haystack;
6322
    }
6323
6324
    if (self::str_ends_with($haystack, $needle) === true) {
6325
      $haystack = self::substr($haystack, 0, self::strlen($haystack) - self::strlen($needle));
6326
    }
6327
6328
    return $haystack;
6329
  }
6330
6331
  /**
6332
   * Returns a case swapped version of the string.
6333
   *
6334
   * @param string  $str       <p>The input string.</p>
6335
   * @param string  $encoding  [optional] <p>Default is UTF-8</p>
6336
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
6337
   *
6338
   * @return string <p>Each character's case swapped.</p>
6339
   */
6340
  public static function swapCase($str, $encoding = 'UTF-8', $cleanUtf8 = false)
6341
  {
6342
    $str = (string)$str;
6343
6344
    if (!isset($str[0])) {
6345
      return '';
6346
    }
6347
6348
    if ($encoding !== 'UTF-8') {
6349
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6350
    }
6351
6352
    if ($cleanUtf8 === true) {
6353
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
6354
      // if invalid characters are found in $haystack before $needle
6355
      $str = self::clean($str);
6356
    }
6357
6358
    $strSwappedCase = preg_replace_callback(
6359
        '/[\S]/u',
6360
        function ($match) use ($encoding) {
6361
          $marchToUpper = UTF8::strtoupper($match[0], $encoding);
6362
6363
          if ($match[0] === $marchToUpper) {
6364
            return UTF8::strtolower($match[0], $encoding);
6365
          } else {
6366
            return $marchToUpper;
6367
          }
6368
        },
6369
        $str
6370
    );
6371
6372
    return $strSwappedCase;
6373
  }
6374
6375
  /**
6376
   * alias for "UTF8::to_ascii()"
6377
   *
6378
   * @see UTF8::to_ascii()
6379
   *
6380
   * @param string $s
6381
   * @param string $subst_chr
6382
   * @param bool   $strict
6383
   *
6384
   * @return string
6385
   *
6386
   * @deprecated
6387
   */
6388
  public static function toAscii($s, $subst_chr = '?', $strict = false)
6389
  {
6390
    return self::to_ascii($s, $subst_chr, $strict);
6391
  }
6392
6393
  /**
6394
   * alias for "UTF8::to_iso8859()"
6395
   *
6396
   * @see UTF8::to_iso8859()
6397
   *
6398
   * @param string $str
6399
   *
6400
   * @return string|string[]
6401
   *
6402
   * @deprecated
6403
   */
6404
  public static function toIso8859($str)
6405
  {
6406
    return self::to_iso8859($str);
6407
  }
6408
6409
  /**
6410
   * alias for "UTF8::to_latin1()"
6411
   *
6412
   * @see UTF8::to_latin1()
6413
   *
6414
   * @param $str
6415
   *
6416
   * @return string
6417
   *
6418
   * @deprecated
6419
   */
6420
  public static function toLatin1($str)
6421
  {
6422
    return self::to_latin1($str);
6423
  }
6424
6425
  /**
6426
   * alias for "UTF8::to_utf8()"
6427
   *
6428
   * @see UTF8::to_utf8()
6429
   *
6430
   * @param string $str
6431
   *
6432
   * @return string
6433
   *
6434
   * @deprecated
6435
   */
6436
  public static function toUTF8($str)
6437
  {
6438
    return self::to_utf8($str);
6439
  }
6440
6441
  /**
6442
   * Convert a string into ASCII.
6443
   *
6444
   * @param string $str     <p>The input string.</p>
6445
   * @param string $unknown [optional] <p>Character use if character unknown. (default is ?)</p>
6446
   * @param bool   $strict  [optional] <p>Use "transliterator_transliterate()" from PHP-Intl | WARNING: bad
6447
   *                        performance</p>
6448
   *
6449
   * @return string
6450
   */
6451
  public static function to_ascii($str, $unknown = '?', $strict = false)
6452
  {
6453
    static $UTF8_TO_ASCII;
6454
6455
    // init
6456
    $str = (string)$str;
6457
6458
    if (!isset($str[0])) {
6459
      return '';
6460
    }
6461
6462
    $str = self::clean($str, true, true, true);
6463
6464
    // check if we only have ASCII
6465
    if (self::is_ascii($str) === true) {
6466
      return $str;
6467
    }
6468
6469
    if ($strict === true) {
6470
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6471
        self::checkForSupport();
6472
      }
6473
6474
      if (
6475
          self::$SUPPORT['intl'] === true
6476
          &&
6477
          Bootup::is_php('5.4') === true
6478
      ) {
6479
6480
        // HACK for issue from "transliterator_transliterate()"
6481
        $str = str_replace(
6482
            'ℌ',
6483
            'H',
6484
            $str
6485
        );
6486
6487
        $str = transliterator_transliterate('NFD; [:Nonspacing Mark:] Remove; NFC; Any-Latin; Latin-ASCII;', $str);
6488
6489
        // check again, if we only have ASCII, now ...
6490
        if (self::is_ascii($str) === true) {
6491
          return $str;
6492
        }
6493
6494
      }
6495
    }
6496
6497
    preg_match_all('/.{1}|[^\x00]{1,1}$/us', $str, $ar);
6498
    $chars = $ar[0];
6499
    foreach ($chars as &$c) {
6500
6501
      $ordC0 = ord($c[0]);
6502
6503
      if ($ordC0 >= 0 && $ordC0 <= 127) {
6504
        continue;
6505
      }
6506
6507
      $ordC1 = ord($c[1]);
6508
6509
      // ASCII - next please
6510
      if ($ordC0 >= 192 && $ordC0 <= 223) {
6511
        $ord = ($ordC0 - 192) * 64 + ($ordC1 - 128);
6512
      }
6513
6514
      if ($ordC0 >= 224) {
6515
        $ordC2 = ord($c[2]);
6516
6517
        if ($ordC0 <= 239) {
6518
          $ord = ($ordC0 - 224) * 4096 + ($ordC1 - 128) * 64 + ($ordC2 - 128);
6519
        }
6520
6521
        if ($ordC0 >= 240) {
6522
          $ordC3 = ord($c[3]);
6523
6524
          if ($ordC0 <= 247) {
6525
            $ord = ($ordC0 - 240) * 262144 + ($ordC1 - 128) * 4096 + ($ordC2 - 128) * 64 + ($ordC3 - 128);
6526
          }
6527
6528
          if ($ordC0 >= 248) {
6529
            $ordC4 = ord($c[4]);
6530
6531 View Code Duplication
            if ($ordC0 <= 251) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6532
              $ord = ($ordC0 - 248) * 16777216 + ($ordC1 - 128) * 262144 + ($ordC2 - 128) * 4096 + ($ordC3 - 128) * 64 + ($ordC4 - 128);
6533
            }
6534
6535
            if ($ordC0 >= 252) {
6536
              $ordC5 = ord($c[5]);
6537
6538 View Code Duplication
              if ($ordC0 <= 253) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6539
                $ord = ($ordC0 - 252) * 1073741824 + ($ordC1 - 128) * 16777216 + ($ordC2 - 128) * 262144 + ($ordC3 - 128) * 4096 + ($ordC4 - 128) * 64 + ($ordC5 - 128);
6540
              }
6541
            }
6542
          }
6543
        }
6544
      }
6545
6546
      if ($ordC0 == 254 || $ordC0 == 255) {
6547
        $c = $unknown;
6548
        continue;
6549
      }
6550
6551
      if (!isset($ord)) {
6552
        $c = $unknown;
6553
        continue;
6554
      }
6555
6556
      $bank = $ord >> 8;
6557
      if (!isset($UTF8_TO_ASCII[$bank])) {
6558
        $UTF8_TO_ASCII[$bank] = self::getData(sprintf('x%02x', $bank));
6559
        if ($UTF8_TO_ASCII[$bank] === false) {
6560
          $UTF8_TO_ASCII[$bank] = array();
6561
        }
6562
      }
6563
6564
      $newchar = $ord & 255;
6565
6566
      if (isset($UTF8_TO_ASCII[$bank], $UTF8_TO_ASCII[$bank][$newchar])) {
6567
6568
        // keep for debugging
6569
        /*
0 ignored issues
show
Unused Code Comprehensibility introduced by
45% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
6570
        echo "file: " . sprintf('x%02x', $bank) . "\n";
6571
        echo "char: " . $c . "\n";
6572
        echo "ord: " . $ord . "\n";
6573
        echo "newchar: " . $newchar . "\n";
6574
        echo "ascii: " . $UTF8_TO_ASCII[$bank][$newchar] . "\n";
6575
        echo "bank:" . $bank . "\n\n";
6576
        */
6577
6578
        $c = $UTF8_TO_ASCII[$bank][$newchar];
6579
      } else {
6580
6581
        // keep for debugging missing chars
6582
        /*
0 ignored issues
show
Unused Code Comprehensibility introduced by
41% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
6583
        echo "file: " . sprintf('x%02x', $bank) . "\n";
6584
        echo "char: " . $c . "\n";
6585
        echo "ord: " . $ord . "\n";
6586
        echo "newchar: " . $newchar . "\n";
6587
        echo "bank:" . $bank . "\n\n";
6588
        */
6589
6590
        $c = $unknown;
6591
      }
6592
    }
6593
6594
    return implode('', $chars);
6595
  }
6596
6597
  /**
6598
   * Convert a string into "ISO-8859"-encoding (Latin-1).
6599
   *
6600
   * @param string|string[] $str
6601
   *
6602
   * @return string|string[]
6603
   */
6604
  public static function to_iso8859($str)
6605
  {
6606
    if (is_array($str) === true) {
6607
6608
      /** @noinspection ForeachSourceInspection */
6609
      foreach ($str as $k => $v) {
6610
        /** @noinspection AlterInForeachInspection */
6611
        /** @noinspection OffsetOperationsInspection */
6612
        $str[$k] = self::to_iso8859($v);
6613
      }
6614
6615
      return $str;
6616
    }
6617
6618
    $str = (string)$str;
6619
6620
    if (!isset($str[0])) {
6621
      return '';
6622
    }
6623
6624
    return self::utf8_decode($str);
6625
  }
6626
6627
  /**
6628
   * alias for "UTF8::to_iso8859()"
6629
   *
6630
   * @see UTF8::to_iso8859()
6631
   *
6632
   * @param string|string[] $str
6633
   *
6634
   * @return string|string[]
6635
   */
6636
  public static function to_latin1($str)
6637
  {
6638
    return self::to_iso8859($str);
6639
  }
6640
6641
  /**
6642
   * This function leaves UTF-8 characters alone, while converting almost all non-UTF8 to UTF8.
6643
   *
6644
   * <ul>
6645
   * <li>It decode UTF-8 codepoints and unicode escape sequences.</li>
6646
   * <li>It assumes that the encoding of the original string is either WINDOWS-1252 or ISO-8859-1.</li>
6647
   * <li>WARNING: It does not remove invalid UTF-8 characters, so you maybe need to use "UTF8::clean()" for this
6648
   * case.</li>
6649
   * </ul>
6650
   *
6651
   * @param string|string[] $str                    <p>Any string or array.</p>
6652
   * @param bool            $decodeHtmlEntityToUtf8 <p>Set to true, if you need to decode html-entities.</p>
6653
   *
6654
   * @return string|string[] <p>The UTF-8 encoded string.</p>
6655
   */
6656
  public static function to_utf8($str, $decodeHtmlEntityToUtf8 = false)
6657
  {
6658
    if (is_array($str) === true) {
6659
      /** @noinspection ForeachSourceInspection */
6660
      foreach ($str as $k => $v) {
6661
        /** @noinspection AlterInForeachInspection */
6662
        /** @noinspection OffsetOperationsInspection */
6663
        $str[$k] = self::to_utf8($v, $decodeHtmlEntityToUtf8);
6664
      }
6665
6666
      return $str;
6667
    }
6668
6669
    $str = (string)$str;
6670
6671
    if (!isset($str[0])) {
6672
      return $str;
6673
    }
6674
6675
    $max = strlen($str);
6676
    $buf = '';
6677
6678
    /** @noinspection ForeachInvariantsInspection */
6679
    for ($i = 0; $i < $max; $i++) {
6680
6681
      $c1 = $str[$i];
6682
6683
      if ($c1 >= "\xC0") { // should be converted to UTF8, if it's not UTF8 already
6684
6685
        if ($c1 <= "\xDF") { // looks like 2 bytes UTF8
6686
6687
          $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
6688
6689
          if ($c2 >= "\x80" && $c2 <= "\xBF") { // yeah, almost sure it's UTF8 already
6690
            $buf .= $c1 . $c2;
6691
            $i++;
6692 View Code Duplication
          } else { // not valid UTF8 - convert it
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6693
            $cc1tmp = ord($c1) / 64;
6694
            $cc1 = self::chr_and_parse_int($cc1tmp) | "\xC0";
6695
            $cc2 = ($c1 & "\x3F") | "\x80";
6696
            $buf .= $cc1 . $cc2;
6697
          }
6698
6699
        } elseif ($c1 >= "\xE0" && $c1 <= "\xEF") { // looks like 3 bytes UTF8
6700
6701
          $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
6702
          $c3 = $i + 2 >= $max ? "\x00" : $str[$i + 2];
6703
6704
          if ($c2 >= "\x80" && $c2 <= "\xBF" && $c3 >= "\x80" && $c3 <= "\xBF") { // yeah, almost sure it's UTF8 already
6705
            $buf .= $c1 . $c2 . $c3;
6706
            $i += 2;
6707 View Code Duplication
          } else { // not valid UTF8 - convert it
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6708
            $cc1tmp = ord($c1) / 64;
6709
            $cc1 = self::chr_and_parse_int($cc1tmp) | "\xC0";
6710
            $cc2 = ($c1 & "\x3F") | "\x80";
6711
            $buf .= $cc1 . $cc2;
6712
          }
6713
6714
        } elseif ($c1 >= "\xF0" && $c1 <= "\xF7") { // looks like 4 bytes UTF8
6715
6716
          $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
6717
          $c3 = $i + 2 >= $max ? "\x00" : $str[$i + 2];
6718
          $c4 = $i + 3 >= $max ? "\x00" : $str[$i + 3];
6719
6720
          if ($c2 >= "\x80" && $c2 <= "\xBF" && $c3 >= "\x80" && $c3 <= "\xBF" && $c4 >= "\x80" && $c4 <= "\xBF") { // yeah, almost sure it's UTF8 already
6721
            $buf .= $c1 . $c2 . $c3 . $c4;
6722
            $i += 3;
6723 View Code Duplication
          } else { // not valid UTF8 - convert it
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6724
            $cc1tmp = ord($c1) / 64;
6725
            $cc1 = self::chr_and_parse_int($cc1tmp) | "\xC0";
6726
            $cc2 = ($c1 & "\x3F") | "\x80";
6727
            $buf .= $cc1 . $cc2;
6728
          }
6729
6730 View Code Duplication
        } else { // doesn't look like UTF8, but should be converted
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6731
          $cc1tmp = ord($c1) / 64;
6732
          $cc1 = self::chr_and_parse_int($cc1tmp) | "\xC0";
6733
          $cc2 = ($c1 & "\x3F") | "\x80";
6734
          $buf .= $cc1 . $cc2;
6735
        }
6736
6737
      } elseif (($c1 & "\xC0") === "\x80") { // needs conversion
6738
6739
        $ordC1 = ord($c1);
6740
        if (isset(self::$WIN1252_TO_UTF8[$ordC1])) { // found in Windows-1252 special cases
6741
          $buf .= self::$WIN1252_TO_UTF8[$ordC1];
6742 View Code Duplication
        } else {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6743
          $cc1 = self::chr_and_parse_int($ordC1 / 64) | "\xC0";
6744
          $cc2 = ($c1 & "\x3F") | "\x80";
6745
          $buf .= $cc1 . $cc2;
6746
        }
6747
6748
      } else { // it doesn't need conversion
6749
        $buf .= $c1;
6750
      }
6751
    }
6752
6753
    // decode unicode escape sequences
6754
    $buf = preg_replace_callback(
6755
        '/\\\\u([0-9a-f]{4})/i',
6756
        function ($match) {
6757
          return \mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
6758
        },
6759
        $buf
6760
    );
6761
6762
    // decode UTF-8 codepoints
6763
    if ($decodeHtmlEntityToUtf8 === true) {
6764
      $buf = self::html_entity_decode($buf);
6765
    }
6766
6767
    return $buf;
6768
  }
6769
6770
  /**
6771
   * Strip whitespace or other characters from beginning or end of a UTF-8 string.
6772
   *
6773
   * INFO: This is slower then "trim()"
6774
   *
6775
   * We can only use the original-function, if we use <= 7-Bit in the string / chars
6776
   * but the check for ACSII (7-Bit) cost more time, then we can safe here.
6777
   *
6778
   * @param string $str   <p>The string to be trimmed</p>
6779
   * @param string $chars [optional] <p>Optional characters to be stripped</p>
6780
   *
6781
   * @return string <p>The trimmed string.</p>
6782
   */
6783
  public static function trim($str = '', $chars = INF)
6784
  {
6785
    $str = (string)$str;
6786
6787
    if (!isset($str[0])) {
6788
      return '';
6789
    }
6790
6791
    // Info: http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page#Unicodecharactercategories
6792
    if ($chars === INF || !$chars) {
6793
      return preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);
6794
    }
6795
6796
    return self::rtrim(self::ltrim($str, $chars), $chars);
6797
  }
6798
6799
  /**
6800
   * Makes string's first char uppercase.
6801
   *
6802
   * @param string  $str       <p>The input string.</p>
6803
   * @param string  $encoding  [optional] <p>Set the charset for e.g. "\mb_" function.</p>
6804
   * @param boolean $cleanUtf8 [optional] <p>Clean non UTF-8 chars from the string.</p>
6805
   *
6806
   * @return string <p>The resulting string</p>
6807
   */
6808
  public static function ucfirst($str, $encoding = 'UTF-8', $cleanUtf8 = false)
6809
  {
6810
    return self::strtoupper(self::substr($str, 0, 1, $encoding, $cleanUtf8), $encoding, $cleanUtf8) . self::substr($str, 1, null, $encoding, $cleanUtf8);
0 ignored issues
show
Security Bug introduced by
It seems like self::substr($str, 0, 1, $encoding, $cleanUtf8) targeting voku\helper\UTF8::substr() can also be of type false; however, voku\helper\UTF8::strtoupper() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
6811
  }
6812
6813
  /**
6814
   * alias for "UTF8::ucfirst()"
6815
   *
6816
   * @see UTF8::ucfirst()
6817
   *
6818
   * @param string  $word
6819
   * @param string  $encoding
6820
   * @param boolean $cleanUtf8
6821
   *
6822
   * @return string
6823
   */
6824
  public static function ucword($word, $encoding = 'UTF-8', $cleanUtf8 = false)
6825
  {
6826
    return self::ucfirst($word, $encoding, $cleanUtf8);
6827
  }
6828
6829
  /**
6830
   * Uppercase for all words in the string.
6831
   *
6832
   * @param string   $str        <p>The input string.</p>
6833
   * @param string[] $exceptions [optional] <p>Exclusion for some words.</p>
6834
   * @param string   $charlist   [optional] <p>Additional chars that contains to words and do not start a new word.</p>
6835
   * @param string   $encoding   [optional] <p>Set the charset for e.g. "\mb_" function.</p>
6836
   * @param boolean  $cleanUtf8  [optional] <p>Clean non UTF-8 chars from the string.</p>
6837
   *
6838
   * @return string
6839
   */
6840
  public static function ucwords($str, $exceptions = array(), $charlist = '', $encoding = 'UTF-8', $cleanUtf8 = false)
6841
  {
6842
    if (!$str) {
6843
      return '';
6844
    }
6845
6846
    $words = self::str_to_words($str, $charlist);
6847
    $newWords = array();
6848
6849
    if (count($exceptions) > 0) {
6850
      $useExceptions = true;
6851
    } else {
6852
      $useExceptions = false;
6853
    }
6854
6855
    foreach ($words as $word) {
6856
6857
      if (!$word) {
6858
        continue;
6859
      }
6860
6861
      if (
6862
          ($useExceptions === false)
6863
          ||
6864
          (
6865
              $useExceptions === true
6866
              &&
6867
              !in_array($word, $exceptions, true)
6868
          )
6869
      ) {
6870
        $word = self::ucfirst($word, $encoding, $cleanUtf8);
6871
      }
6872
6873
      $newWords[] = $word;
6874
    }
6875
6876
    return implode('', $newWords);
6877
  }
6878
6879
  /**
6880
   * Multi decode html entity & fix urlencoded-win1252-chars.
6881
   *
6882
   * e.g:
6883
   * 'test+test'                     => 'test test'
6884
   * 'D&#252;sseldorf'               => 'Düsseldorf'
6885
   * 'D%FCsseldorf'                  => 'Düsseldorf'
6886
   * 'D&#xFC;sseldorf'               => 'Düsseldorf'
6887
   * 'D%26%23xFC%3Bsseldorf'         => 'Düsseldorf'
6888
   * 'Düsseldorf'                   => 'Düsseldorf'
6889
   * 'D%C3%BCsseldorf'               => 'Düsseldorf'
6890
   * 'D%C3%83%C2%BCsseldorf'         => 'Düsseldorf'
6891
   * 'D%25C3%2583%25C2%25BCsseldorf' => 'Düsseldorf'
6892
   *
6893
   * @param string $str          <p>The input string.</p>
6894
   * @param bool   $multi_decode <p>Decode as often as possible.</p>
6895
   *
6896
   * @return string
6897
   */
6898 View Code Duplication
  public static function urldecode($str, $multi_decode = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6899
  {
6900
    $str = (string)$str;
6901
6902
    if (!isset($str[0])) {
6903
      return '';
6904
    }
6905
6906
    $pattern = '/%u([0-9a-f]{3,4})/i';
6907
    if (preg_match($pattern, $str)) {
6908
      $str = preg_replace($pattern, '&#x\\1;', urldecode($str));
6909
    }
6910
6911
    $flags = Bootup::is_php('5.4') === true ? ENT_QUOTES | ENT_HTML5 : ENT_QUOTES;
6912
6913
    do {
6914
      $str_compare = $str;
6915
6916
      $str = self::fix_simple_utf8(
6917
          urldecode(
6918
              self::html_entity_decode(
6919
                  self::to_utf8($str),
0 ignored issues
show
Bug introduced by
It seems like self::to_utf8($str) targeting voku\helper\UTF8::to_utf8() can also be of type array; however, voku\helper\UTF8::html_entity_decode() does only seem to accept string, maybe add an additional type check?

This check looks at variables that are passed out again to other methods.

If the outgoing method call has stricter type requirements than the method itself, an issue is raised.

An additional type check may prevent trouble.

Loading history...
6920
                  $flags
6921
              )
6922
          )
6923
      );
6924
6925
    } while ($multi_decode === true && $str_compare !== $str);
6926
6927
    return (string)$str;
6928
  }
6929
6930
  /**
6931
   * Return a array with "urlencoded"-win1252 -> UTF-8
6932
   *
6933
   * @deprecated use the "UTF8::urldecode()" function to decode a string
6934
   *
6935
   * @return array
6936
   */
6937
  public static function urldecode_fix_win1252_chars()
6938
  {
6939
    return array(
6940
        '%20' => ' ',
6941
        '%21' => '!',
6942
        '%22' => '"',
6943
        '%23' => '#',
6944
        '%24' => '$',
6945
        '%25' => '%',
6946
        '%26' => '&',
6947
        '%27' => "'",
6948
        '%28' => '(',
6949
        '%29' => ')',
6950
        '%2A' => '*',
6951
        '%2B' => '+',
6952
        '%2C' => ',',
6953
        '%2D' => '-',
6954
        '%2E' => '.',
6955
        '%2F' => '/',
6956
        '%30' => '0',
6957
        '%31' => '1',
6958
        '%32' => '2',
6959
        '%33' => '3',
6960
        '%34' => '4',
6961
        '%35' => '5',
6962
        '%36' => '6',
6963
        '%37' => '7',
6964
        '%38' => '8',
6965
        '%39' => '9',
6966
        '%3A' => ':',
6967
        '%3B' => ';',
6968
        '%3C' => '<',
6969
        '%3D' => '=',
6970
        '%3E' => '>',
6971
        '%3F' => '?',
6972
        '%40' => '@',
6973
        '%41' => 'A',
6974
        '%42' => 'B',
6975
        '%43' => 'C',
6976
        '%44' => 'D',
6977
        '%45' => 'E',
6978
        '%46' => 'F',
6979
        '%47' => 'G',
6980
        '%48' => 'H',
6981
        '%49' => 'I',
6982
        '%4A' => 'J',
6983
        '%4B' => 'K',
6984
        '%4C' => 'L',
6985
        '%4D' => 'M',
6986
        '%4E' => 'N',
6987
        '%4F' => 'O',
6988
        '%50' => 'P',
6989
        '%51' => 'Q',
6990
        '%52' => 'R',
6991
        '%53' => 'S',
6992
        '%54' => 'T',
6993
        '%55' => 'U',
6994
        '%56' => 'V',
6995
        '%57' => 'W',
6996
        '%58' => 'X',
6997
        '%59' => 'Y',
6998
        '%5A' => 'Z',
6999
        '%5B' => '[',
7000
        '%5C' => '\\',
7001
        '%5D' => ']',
7002
        '%5E' => '^',
7003
        '%5F' => '_',
7004
        '%60' => '`',
7005
        '%61' => 'a',
7006
        '%62' => 'b',
7007
        '%63' => 'c',
7008
        '%64' => 'd',
7009
        '%65' => 'e',
7010
        '%66' => 'f',
7011
        '%67' => 'g',
7012
        '%68' => 'h',
7013
        '%69' => 'i',
7014
        '%6A' => 'j',
7015
        '%6B' => 'k',
7016
        '%6C' => 'l',
7017
        '%6D' => 'm',
7018
        '%6E' => 'n',
7019
        '%6F' => 'o',
7020
        '%70' => 'p',
7021
        '%71' => 'q',
7022
        '%72' => 'r',
7023
        '%73' => 's',
7024
        '%74' => 't',
7025
        '%75' => 'u',
7026
        '%76' => 'v',
7027
        '%77' => 'w',
7028
        '%78' => 'x',
7029
        '%79' => 'y',
7030
        '%7A' => 'z',
7031
        '%7B' => '{',
7032
        '%7C' => '|',
7033
        '%7D' => '}',
7034
        '%7E' => '~',
7035
        '%7F' => '',
7036
        '%80' => '`',
7037
        '%81' => '',
7038
        '%82' => '‚',
7039
        '%83' => 'ƒ',
7040
        '%84' => '„',
7041
        '%85' => '…',
7042
        '%86' => '†',
7043
        '%87' => '‡',
7044
        '%88' => 'ˆ',
7045
        '%89' => '‰',
7046
        '%8A' => 'Š',
7047
        '%8B' => '‹',
7048
        '%8C' => 'Œ',
7049
        '%8D' => '',
7050
        '%8E' => 'Ž',
7051
        '%8F' => '',
7052
        '%90' => '',
7053
        '%91' => '‘',
7054
        '%92' => '’',
7055
        '%93' => '“',
7056
        '%94' => '”',
7057
        '%95' => '•',
7058
        '%96' => '–',
7059
        '%97' => '—',
7060
        '%98' => '˜',
7061
        '%99' => '™',
7062
        '%9A' => 'š',
7063
        '%9B' => '›',
7064
        '%9C' => 'œ',
7065
        '%9D' => '',
7066
        '%9E' => 'ž',
7067
        '%9F' => 'Ÿ',
7068
        '%A0' => '',
7069
        '%A1' => '¡',
7070
        '%A2' => '¢',
7071
        '%A3' => '£',
7072
        '%A4' => '¤',
7073
        '%A5' => '¥',
7074
        '%A6' => '¦',
7075
        '%A7' => '§',
7076
        '%A8' => '¨',
7077
        '%A9' => '©',
7078
        '%AA' => 'ª',
7079
        '%AB' => '«',
7080
        '%AC' => '¬',
7081
        '%AD' => '',
7082
        '%AE' => '®',
7083
        '%AF' => '¯',
7084
        '%B0' => '°',
7085
        '%B1' => '±',
7086
        '%B2' => '²',
7087
        '%B3' => '³',
7088
        '%B4' => '´',
7089
        '%B5' => 'µ',
7090
        '%B6' => '¶',
7091
        '%B7' => '·',
7092
        '%B8' => '¸',
7093
        '%B9' => '¹',
7094
        '%BA' => 'º',
7095
        '%BB' => '»',
7096
        '%BC' => '¼',
7097
        '%BD' => '½',
7098
        '%BE' => '¾',
7099
        '%BF' => '¿',
7100
        '%C0' => 'À',
7101
        '%C1' => 'Á',
7102
        '%C2' => 'Â',
7103
        '%C3' => 'Ã',
7104
        '%C4' => 'Ä',
7105
        '%C5' => 'Å',
7106
        '%C6' => 'Æ',
7107
        '%C7' => 'Ç',
7108
        '%C8' => 'È',
7109
        '%C9' => 'É',
7110
        '%CA' => 'Ê',
7111
        '%CB' => 'Ë',
7112
        '%CC' => 'Ì',
7113
        '%CD' => 'Í',
7114
        '%CE' => 'Î',
7115
        '%CF' => 'Ï',
7116
        '%D0' => 'Ð',
7117
        '%D1' => 'Ñ',
7118
        '%D2' => 'Ò',
7119
        '%D3' => 'Ó',
7120
        '%D4' => 'Ô',
7121
        '%D5' => 'Õ',
7122
        '%D6' => 'Ö',
7123
        '%D7' => '×',
7124
        '%D8' => 'Ø',
7125
        '%D9' => 'Ù',
7126
        '%DA' => 'Ú',
7127
        '%DB' => 'Û',
7128
        '%DC' => 'Ü',
7129
        '%DD' => 'Ý',
7130
        '%DE' => 'Þ',
7131
        '%DF' => 'ß',
7132
        '%E0' => 'à',
7133
        '%E1' => 'á',
7134
        '%E2' => 'â',
7135
        '%E3' => 'ã',
7136
        '%E4' => 'ä',
7137
        '%E5' => 'å',
7138
        '%E6' => 'æ',
7139
        '%E7' => 'ç',
7140
        '%E8' => 'è',
7141
        '%E9' => 'é',
7142
        '%EA' => 'ê',
7143
        '%EB' => 'ë',
7144
        '%EC' => 'ì',
7145
        '%ED' => 'í',
7146
        '%EE' => 'î',
7147
        '%EF' => 'ï',
7148
        '%F0' => 'ð',
7149
        '%F1' => 'ñ',
7150
        '%F2' => 'ò',
7151
        '%F3' => 'ó',
7152
        '%F4' => 'ô',
7153
        '%F5' => 'õ',
7154
        '%F6' => 'ö',
7155
        '%F7' => '÷',
7156
        '%F8' => 'ø',
7157
        '%F9' => 'ù',
7158
        '%FA' => 'ú',
7159
        '%FB' => 'û',
7160
        '%FC' => 'ü',
7161
        '%FD' => 'ý',
7162
        '%FE' => 'þ',
7163
        '%FF' => 'ÿ',
7164
    );
7165
  }
7166
7167
  /**
7168
   * Decodes an UTF-8 string to ISO-8859-1.
7169
   *
7170
   * @param string $str <p>The input string.</p>
7171
   *
7172
   * @return string
7173
   */
7174
  public static function utf8_decode($str)
7175
  {
7176
    // init
7177
    $str = (string)$str;
7178
7179
    if (!isset($str[0])) {
7180
      return '';
7181
    }
7182
7183
    $str = (string)self::to_utf8($str);
7184
7185
    static $UTF8_TO_WIN1252_KEYS_CACHE = null;
7186
    static $UTF8_TO_WIN1252_VALUES_CACHE = null;
7187
7188
    if ($UTF8_TO_WIN1252_KEYS_CACHE === null) {
7189
      $UTF8_TO_WIN1252_KEYS_CACHE = array_keys(self::$UTF8_TO_WIN1252);
7190
      $UTF8_TO_WIN1252_VALUES_CACHE = array_values(self::$UTF8_TO_WIN1252);
7191
    }
7192
7193
    /** @noinspection PhpInternalEntityUsedInspection */
7194
    return Xml::utf8_decode(str_replace($UTF8_TO_WIN1252_KEYS_CACHE, $UTF8_TO_WIN1252_VALUES_CACHE, $str));
7195
  }
7196
7197
  /**
7198
   * Encodes an ISO-8859-1 string to UTF-8.
7199
   *
7200
   * @param string $str <p>The input string.</p>
7201
   *
7202
   * @return string
7203
   */
7204
  public static function utf8_encode($str)
7205
  {
7206
    // init
7207
    $str = (string)$str;
7208
7209
    if (!isset($str[0])) {
7210
      return '';
7211
    }
7212
7213
    $str = \utf8_encode($str);
7214
7215
    if (false === strpos($str, "\xC2")) {
7216
      return $str;
7217
    } else {
7218
7219
      static $CP1252_TO_UTF8_KEYS_CACHE = null;
7220
      static $CP1252_TO_UTF8_VALUES_CACHE = null;
7221
7222
      if ($CP1252_TO_UTF8_KEYS_CACHE === null) {
7223
        $CP1252_TO_UTF8_KEYS_CACHE = array_keys(self::$CP1252_TO_UTF8);
7224
        $CP1252_TO_UTF8_VALUES_CACHE = array_values(self::$CP1252_TO_UTF8);
7225
      }
7226
7227
      return str_replace($CP1252_TO_UTF8_KEYS_CACHE, $CP1252_TO_UTF8_VALUES_CACHE, $str);
7228
    }
7229
  }
7230
7231
  /**
7232
   * fix -> utf8-win1252 chars
7233
   *
7234
   * @param string $str <p>The input string.</p>
7235
   *
7236
   * @return string
7237
   *
7238
   * @deprecated use "UTF8::fix_simple_utf8()"
7239
   */
7240
  public static function utf8_fix_win1252_chars($str)
7241
  {
7242
    return self::fix_simple_utf8($str);
7243
  }
7244
7245
  /**
7246
   * Returns an array with all utf8 whitespace characters.
7247
   *
7248
   * @see   : http://www.bogofilter.org/pipermail/bogofilter/2003-March/001889.html
7249
   *
7250
   * @author: Derek E. [email protected]
7251
   *
7252
   * @return array <p>
7253
   *               An array with all known whitespace characters as values and the type of whitespace as keys
7254
   *               as defined in above URL.
7255
   *               </p>
7256
   */
7257
  public static function whitespace_table()
7258
  {
7259
    return self::$WHITESPACE_TABLE;
7260
  }
7261
7262
  /**
7263
   * Limit the number of words in a string.
7264
   *
7265
   * @param string $str      <p>The input string.</p>
7266
   * @param int    $words    <p>The limit of words as integer.</p>
7267
   * @param string $strAddOn <p>Replacement for the striped string.</p>
7268
   *
7269
   * @return string
7270
   */
7271
  public static function words_limit($str, $words = 100, $strAddOn = '...')
7272
  {
7273
    $str = (string)$str;
7274
7275
    if (!isset($str[0])) {
7276
      return '';
7277
    }
7278
7279
    $words = (int)$words;
7280
7281
    if ($words < 1) {
7282
      return '';
7283
    }
7284
7285
    preg_match('/^\s*+(?:\S++\s*+){1,' . $words . '}/u', $str, $matches);
7286
7287
    if (
7288
        !isset($matches[0])
7289
        ||
7290
        self::strlen($str) === self::strlen($matches[0])
7291
    ) {
7292
      return $str;
7293
    }
7294
7295
    return self::rtrim($matches[0]) . $strAddOn;
7296
  }
7297
7298
  /**
7299
   * Wraps a string to a given number of characters
7300
   *
7301
   * @link  http://php.net/manual/en/function.wordwrap.php
7302
   *
7303
   * @param string $str   <p>The input string.</p>
7304
   * @param int    $width [optional] <p>The column width.</p>
7305
   * @param string $break [optional] <p>The line is broken using the optional break parameter.</p>
7306
   * @param bool   $cut   [optional] <p>
7307
   *                      If the cut is set to true, the string is
7308
   *                      always wrapped at or before the specified width. So if you have
7309
   *                      a word that is larger than the given width, it is broken apart.
7310
   *                      </p>
7311
   *
7312
   * @return string <p>The given string wrapped at the specified column.</p>
7313
   */
7314
  public static function wordwrap($str, $width = 75, $break = "\n", $cut = false)
7315
  {
7316
    $str = (string)$str;
7317
    $break = (string)$break;
7318
7319
    if (!isset($str[0], $break[0])) {
7320
      return '';
7321
    }
7322
7323
    $w = '';
7324
    $strSplit = explode($break, $str);
7325
    $count = count($strSplit);
7326
7327
    $chars = array();
7328
    /** @noinspection ForeachInvariantsInspection */
7329
    for ($i = 0; $i < $count; ++$i) {
7330
7331
      if ($i) {
7332
        $chars[] = $break;
7333
        $w .= '#';
7334
      }
7335
7336
      $c = $strSplit[$i];
7337
      unset($strSplit[$i]);
7338
7339
      foreach (self::split($c) as $c) {
7340
        $chars[] = $c;
7341
        $w .= ' ' === $c ? ' ' : '?';
7342
      }
7343
    }
7344
7345
    $strReturn = '';
7346
    $j = 0;
7347
    $b = $i = -1;
7348
    $w = wordwrap($w, $width, '#', $cut);
7349
7350
    while (false !== $b = self::strpos($w, '#', $b + 1)) {
7351
      for (++$i; $i < $b; ++$i) {
7352
        $strReturn .= $chars[$j];
7353
        unset($chars[$j++]);
7354
      }
7355
7356
      if ($break === $chars[$j] || ' ' === $chars[$j]) {
7357
        unset($chars[$j++]);
7358
      }
7359
7360
      $strReturn .= $break;
7361
    }
7362
7363
    return $strReturn . implode('', $chars);
7364
  }
7365
7366
  /**
7367
   * Returns an array of Unicode White Space characters.
7368
   *
7369
   * @return array <p>An array with numeric code point as key and White Space Character as value.</p>
7370
   */
7371
  public static function ws()
7372
  {
7373
    return self::$WHITESPACE;
7374
  }
7375
7376
}
7377