Completed
Push — master ( 1847a1...3fa890 )
by Lars
07:47
created

UTF8   D

Complexity

Total Complexity 1028

Size/Duplication

Total Lines 8025
Duplicated Lines 11.69 %

Coupling/Cohesion

Components 2
Dependencies 2

Test Coverage

Coverage 85.69%

Importance

Changes 0
Metric Value
wmc 1028
lcom 2
cbo 2
dl 938
loc 8025
ccs 2246
cts 2621
cp 0.8569
rs 4.4102
c 0
b 0
f 0

178 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 4 1
A add_bom_to_string() 0 8 2
C file_get_contents() 0 43 8
A file_has_bom() 0 4 1
C filter() 11 54 13
A filter_input() 10 10 2
A filter_input_array() 10 10 2
A filter_var() 10 10 2
A filter_var_array() 10 10 2
A fits_inside() 0 4 1
A fix_simple_utf8() 19 19 3
B fix_utf8() 0 24 4
D getCharDirection() 0 114 119
A getData() 0 10 2
A hasBom() 0 4 1
A hex_to_chr() 0 4 1
A hex_to_int() 0 14 3
A html_decode() 0 4 1
B html_encode() 0 38 5
B htmlentities() 0 38 6
A htmlspecialchars() 0 8 2
A iconv_loaded() 0 15 3
A int_to_chr() 0 4 1
A int_to_hex() 0 12 3
A intlChar_loaded() 0 8 2
A intl_loaded() 0 4 2
A isAscii() 0 4 1
A isBase64() 0 4 1
A isBinary() 0 4 1
A isBom() 0 4 1
A isHtml() 0 4 1
A isJson() 0 4 1
A isUtf16() 0 4 1
A isUtf32() 0 4 1
A isUtf8() 0 4 1
A is_ascii() 0 10 2
A is_base64() 0 15 4
B is_binary() 0 23 6
A is_bom() 0 10 3
A is_html() 0 19 3
C is_utf16() 48 48 12
C is_utf32() 48 48 12
D is_utf8() 26 141 27
A json_encode() 12 12 2
A lcfirst() 0 15 2
A lcword() 0 4 1
C lcwords() 20 38 8
A ltrim() 15 15 4
A max() 8 8 2
A max_chr_width() 0 9 2
A mbstring_loaded() 0 10 3
A utf8_fix_win1252_chars() 0 4 1
A whitespace_table() 0 4 1
B words_limit() 0 27 5
C wordwrap() 0 51 10
A ws() 0 4 1
A access() 0 16 3
A binary_to_str() 0 13 3
A bom() 0 4 1
A callback() 0 4 1
B checkForSupport() 0 31 4
C chr() 9 70 14
A chr_and_parse_int() 0 4 1
A chr_map() 0 6 1
A chr_size_list() 0 15 2
B chr_to_decimal() 0 32 6
A chr_to_hex() 0 14 3
A chr_to_int() 0 4 1
A chunk_split() 0 4 1
B clean() 0 35 4
A cleanup() 20 20 2
B codepoints() 0 26 3
A count_chars() 0 4 1
A decimal_to_chr() 0 10 2
C encode() 34 78 21
A getSupportInfo() 0 16 4
C html_entity_decode() 9 75 15
A is_binary_file() 0 12 2
B is_json() 0 24 5
A json_decode() 12 12 2
A mbstring_overloaded() 0 12 3
A min() 8 8 2
A normalizeEncoding() 0 4 1
B normalize_encoding() 0 98 6
A normalize_msword() 18 18 3
B normalize_whitespace() 0 35 6
A strip_whitespace() 0 10 2
B number_format() 0 26 3
C ord() 0 53 14
A parse_str() 0 14 4
A pcre_utf8_support() 0 5 1
D range() 14 38 9
B rawurldecode() 31 31 6
A removeBOM() 0 4 1
B remove_bom() 0 20 5
A remove_duplicates() 0 15 4
A remove_invisible_characters() 0 20 3
B replace_diamond_question_mark() 0 36 5
A rtrim() 15 15 4
C rxClass() 0 40 8
A showSupport() 0 12 3
B single_chr_html_encode() 0 22 5
D split() 17 114 25
C str_detect_encoding() 0 90 11
A str_ends_with() 0 15 3
A str_iends_with() 0 15 3
A str_ireplace() 0 18 3
A str_istarts_with() 15 15 3
B str_limit_after_word() 0 31 5
C str_pad() 9 41 7
A str_repeat() 0 6 1
A str_replace() 0 4 1
A str_replace_first() 0 10 2
A str_shuffle() 0 8 1
A str_sort() 0 16 3
B str_split() 0 36 6
A str_starts_with() 15 15 3
A str_to_binary() 0 8 1
C str_to_words() 0 51 11
A str_transliterate() 0 4 1
B str_word_count() 0 30 5
A strcasecmp() 0 4 1
A strchr() 0 4 1
A strcmp() 0 8 2
C strcspn() 7 26 7
A strichr() 0 4 1
A string() 0 13 1
A string_has_bom() 0 10 3
A strip_tags() 14 14 3
D stripos() 9 44 10
C stristr() 7 67 16
D strlen() 18 95 25
A strnatcasecmp() 0 4 1
A strnatcmp() 0 4 2
A strncasecmp() 0 4 1
A strncmp() 0 7 1
A strpbrk() 0 15 3
F strpos() 18 129 32
A strrchr() 16 16 3
A strrev() 0 10 2
A strrichr() 15 15 3
C strripos() 26 66 16
F strrpos() 26 85 21
B strspn() 7 17 6
C strstr() 7 58 13
B strtocasefold() 0 37 6
D strtolower() 45 45 9
A strtonatfold() 0 5 1
D strtoupper() 44 44 9
C strtr() 0 33 7
A strwidth() 0 15 3
B array_change_key_case() 0 27 6
F substr() 16 107 28
B substr_compare() 0 26 6
C substr_count() 7 71 16
B substr_ileft() 24 24 5
B substr_iright() 24 24 5
B substr_left() 24 24 5
C substr_replace() 20 77 17
B substr_right() 23 23 5
B swapCase() 0 34 5
A toAscii() 0 4 1
A toIso8859() 0 4 1
A toLatin1() 0 4 1
A toUTF8() 0 4 1
F to_ascii() 6 150 28
B to_iso8859() 0 22 4
A to_latin1() 0 4 1
F to_utf8() 5 102 33
A to_utf8_convert() 0 15 2
A trim() 0 15 4
A ucfirst() 0 21 3
A ucword() 0 4 1
C ucwords() 20 57 11
B urldecode() 31 31 6
B urldecode_fix_win1252_chars() 0 229 1
C utf8_decode() 5 56 11
B utf8_encode() 0 31 5

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like UTF8 often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use UTF8, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace voku\helper;
4
5
/**
6
 * UTF8-Helper-Class
7
 *
8
 * @package voku\helper
9
 */
10
final class UTF8
11
{
12
  // (CRLF|([ZWNJ-ZWJ]|T+|L*(LV?V+|LV|LVT)T*|L+|[^Control])[Extend]*|[Control])
13
  // This regular expression is a work around for http://bugs.exim.org/1279
14
  const GRAPHEME_CLUSTER_RX = '(?:\r\n|(?:[ -~\x{200C}\x{200D}]|[ᆨ-ᇹ]+|[ᄀ-ᅟ]*(?:[가개갸걔거게겨계고과괘괴교구궈궤귀규그긔기까깨꺄꺠꺼께껴꼐꼬꽈꽤꾀꾜꾸꿔꿰뀌뀨끄끠끼나내냐냬너네녀녜노놔놰뇌뇨누눠눼뉘뉴느늬니다대댜댸더데뎌뎨도돠돼되됴두둬뒈뒤듀드듸디따때땨떄떠떼뗘뗴또똬뙈뙤뚀뚜뚸뛔뛰뜌뜨띄띠라래랴럐러레려례로롸뢔뢰료루뤄뤠뤼류르릐리마매먀먜머메며몌모뫄뫠뫼묘무뭐뭬뮈뮤므믜미바배뱌뱨버베벼볘보봐봬뵈뵤부붜붸뷔뷰브븨비빠빼뺘뺴뻐뻬뼈뼤뽀뽜뽸뾔뾰뿌뿨쀄쀠쀼쁘쁴삐사새샤섀서세셔셰소솨쇄쇠쇼수숴쉐쉬슈스싀시싸쌔쌰썌써쎄쎠쎼쏘쏴쐐쐬쑈쑤쒀쒜쒸쓔쓰씌씨아애야얘어에여예오와왜외요우워웨위유으의이자재쟈쟤저제져졔조좌좨죄죠주줘줴쥐쥬즈즤지짜째쨔쨰쩌쩨쪄쪠쪼쫘쫴쬐쬬쭈쭤쮀쮜쮸쯔쯰찌차채챠챼처체쳐쳬초촤쵀최쵸추춰췌취츄츠츼치카캐캬컈커케켜켸코콰쾌쾨쿄쿠쿼퀘퀴큐크킈키타태탸턔터테텨톄토톼퇘퇴툐투퉈퉤튀튜트틔티파패퍄퍠퍼페펴폐포퐈퐤푀표푸풔풰퓌퓨프픠피하해햐햬허헤혀혜호화홰회효후훠훼휘휴흐희히]?[ᅠ-ᆢ]+|[가-힣])[ᆨ-ᇹ]*|[ᄀ-ᅟ]+|[^\p{Cc}\p{Cf}\p{Zl}\p{Zp}])[\p{Mn}\p{Me}\x{09BE}\x{09D7}\x{0B3E}\x{0B57}\x{0BBE}\x{0BD7}\x{0CC2}\x{0CD5}\x{0CD6}\x{0D3E}\x{0D57}\x{0DCF}\x{0DDF}\x{200C}\x{200D}\x{1D165}\x{1D16E}-\x{1D172}]*|[\p{Cc}\p{Cf}\p{Zl}\p{Zp}])';
15
16
  /**
17
   * @var array
18
   */
19
  private static $WIN1252_TO_UTF8 = array(
20
      128 => "\xe2\x82\xac", // EURO SIGN
21
      130 => "\xe2\x80\x9a", // SINGLE LOW-9 QUOTATION MARK
22
      131 => "\xc6\x92", // LATIN SMALL LETTER F WITH HOOK
23
      132 => "\xe2\x80\x9e", // DOUBLE LOW-9 QUOTATION MARK
24
      133 => "\xe2\x80\xa6", // HORIZONTAL ELLIPSIS
25
      134 => "\xe2\x80\xa0", // DAGGER
26
      135 => "\xe2\x80\xa1", // DOUBLE DAGGER
27
      136 => "\xcb\x86", // MODIFIER LETTER CIRCUMFLEX ACCENT
28
      137 => "\xe2\x80\xb0", // PER MILLE SIGN
29
      138 => "\xc5\xa0", // LATIN CAPITAL LETTER S WITH CARON
30
      139 => "\xe2\x80\xb9", // SINGLE LEFT-POINTING ANGLE QUOTE
31
      140 => "\xc5\x92", // LATIN CAPITAL LIGATURE OE
32
      142 => "\xc5\xbd", // LATIN CAPITAL LETTER Z WITH CARON
33
      145 => "\xe2\x80\x98", // LEFT SINGLE QUOTATION MARK
34
      146 => "\xe2\x80\x99", // RIGHT SINGLE QUOTATION MARK
35
      147 => "\xe2\x80\x9c", // LEFT DOUBLE QUOTATION MARK
36
      148 => "\xe2\x80\x9d", // RIGHT DOUBLE QUOTATION MARK
37
      149 => "\xe2\x80\xa2", // BULLET
38
      150 => "\xe2\x80\x93", // EN DASH
39
      151 => "\xe2\x80\x94", // EM DASH
40
      152 => "\xcb\x9c", // SMALL TILDE
41
      153 => "\xe2\x84\xa2", // TRADE MARK SIGN
42
      154 => "\xc5\xa1", // LATIN SMALL LETTER S WITH CARON
43
      155 => "\xe2\x80\xba", // SINGLE RIGHT-POINTING ANGLE QUOTE
44
      156 => "\xc5\x93", // LATIN SMALL LIGATURE OE
45
      158 => "\xc5\xbe", // LATIN SMALL LETTER Z WITH CARON
46
      159 => "\xc5\xb8", // LATIN CAPITAL LETTER Y WITH DIAERESIS
47
      164 => "\xc3\xb1", // ñ
48
      165 => "\xc3\x91", // Ñ
49
  );
50
51
  /**
52
   * @var array
53
   */
54
  private static $CP1252_TO_UTF8 = array(
55
      '€' => '€',
56
      '‚' => '‚',
57
      'ƒ' => 'ƒ',
58
      '„' => '„',
59
      '…' => '…',
60
      '†' => '†',
61
      '‡' => '‡',
62
      'ˆ' => 'ˆ',
63
      '‰' => '‰',
64
      'Š' => 'Š',
65
      '‹' => '‹',
66
      'Œ' => 'Œ',
67
      'Ž' => 'Ž',
68
      '‘' => '‘',
69
      '’' => '’',
70
      '“' => '“',
71
      '”' => '”',
72
      '•' => '•',
73
      '–' => '–',
74
      '—' => '—',
75
      '˜' => '˜',
76
      '™' => '™',
77
      'š' => 'š',
78
      '›' => '›',
79
      'œ' => 'œ',
80
      'ž' => 'ž',
81
      'Ÿ' => 'Ÿ',
82
  );
83
84
  /**
85
   * Bom => Byte-Length
86
   *
87
   * INFO: https://en.wikipedia.org/wiki/Byte_order_mark
88
   *
89
   * @var array
90
   */
91
  private static $BOM = array(
92
      "\xef\xbb\xbf"     => 3, // UTF-8 BOM
93
      ''              => 6, // UTF-8 BOM as "WINDOWS-1252" (one char has [maybe] more then one byte ...)
94
      "\x00\x00\xfe\xff" => 4, // UTF-32 (BE) BOM
95
      '  þÿ'             => 6, // UTF-32 (BE) BOM as "WINDOWS-1252"
96
      "\xff\xfe\x00\x00" => 4, // UTF-32 (LE) BOM
97
      'ÿþ  '             => 6, // UTF-32 (LE) BOM as "WINDOWS-1252"
98
      "\xfe\xff"         => 2, // UTF-16 (BE) BOM
99
      'þÿ'               => 4, // UTF-16 (BE) BOM as "WINDOWS-1252"
100
      "\xff\xfe"         => 2, // UTF-16 (LE) BOM
101
      'ÿþ'               => 4, // UTF-16 (LE) BOM as "WINDOWS-1252"
102
  );
103
104
  /**
105
   * Numeric code point => UTF-8 Character
106
   *
107
   * url: http://www.w3schools.com/charsets/ref_utf_punctuation.asp
108
   *
109
   * @var array
110
   */
111
  private static $WHITESPACE = array(
112
    // NUL Byte
113
    0     => "\x0",
114
    // Tab
115
    9     => "\x9",
116
    // New Line
117
    10    => "\xa",
118
    // Vertical Tab
119
    11    => "\xb",
120
    // Carriage Return
121
    13    => "\xd",
122
    // Ordinary Space
123
    32    => "\x20",
124
    // NO-BREAK SPACE
125
    160   => "\xc2\xa0",
126
    // OGHAM SPACE MARK
127
    5760  => "\xe1\x9a\x80",
128
    // MONGOLIAN VOWEL SEPARATOR
129
    6158  => "\xe1\xa0\x8e",
130
    // EN QUAD
131
    8192  => "\xe2\x80\x80",
132
    // EM QUAD
133
    8193  => "\xe2\x80\x81",
134
    // EN SPACE
135
    8194  => "\xe2\x80\x82",
136
    // EM SPACE
137
    8195  => "\xe2\x80\x83",
138
    // THREE-PER-EM SPACE
139
    8196  => "\xe2\x80\x84",
140
    // FOUR-PER-EM SPACE
141
    8197  => "\xe2\x80\x85",
142
    // SIX-PER-EM SPACE
143
    8198  => "\xe2\x80\x86",
144
    // FIGURE SPACE
145
    8199  => "\xe2\x80\x87",
146
    // PUNCTUATION SPACE
147
    8200  => "\xe2\x80\x88",
148
    // THIN SPACE
149
    8201  => "\xe2\x80\x89",
150
    //HAIR SPACE
151
    8202  => "\xe2\x80\x8a",
152
    // LINE SEPARATOR
153
    8232  => "\xe2\x80\xa8",
154
    // PARAGRAPH SEPARATOR
155
    8233  => "\xe2\x80\xa9",
156
    // NARROW NO-BREAK SPACE
157
    8239  => "\xe2\x80\xaf",
158
    // MEDIUM MATHEMATICAL SPACE
159
    8287  => "\xe2\x81\x9f",
160
    // IDEOGRAPHIC SPACE
161
    12288 => "\xe3\x80\x80",
162
  );
163
164
  /**
165
   * @var array
166
   */
167
  private static $WHITESPACE_TABLE = array(
168
      'SPACE'                     => "\x20",
169
      'NO-BREAK SPACE'            => "\xc2\xa0",
170
      'OGHAM SPACE MARK'          => "\xe1\x9a\x80",
171
      'EN QUAD'                   => "\xe2\x80\x80",
172
      'EM QUAD'                   => "\xe2\x80\x81",
173
      'EN SPACE'                  => "\xe2\x80\x82",
174
      'EM SPACE'                  => "\xe2\x80\x83",
175
      'THREE-PER-EM SPACE'        => "\xe2\x80\x84",
176
      'FOUR-PER-EM SPACE'         => "\xe2\x80\x85",
177
      'SIX-PER-EM SPACE'          => "\xe2\x80\x86",
178
      'FIGURE SPACE'              => "\xe2\x80\x87",
179
      'PUNCTUATION SPACE'         => "\xe2\x80\x88",
180
      'THIN SPACE'                => "\xe2\x80\x89",
181
      'HAIR SPACE'                => "\xe2\x80\x8a",
182
      'LINE SEPARATOR'            => "\xe2\x80\xa8",
183
      'PARAGRAPH SEPARATOR'       => "\xe2\x80\xa9",
184
      'ZERO WIDTH SPACE'          => "\xe2\x80\x8b",
185
      'NARROW NO-BREAK SPACE'     => "\xe2\x80\xaf",
186
      'MEDIUM MATHEMATICAL SPACE' => "\xe2\x81\x9f",
187
      'IDEOGRAPHIC SPACE'         => "\xe3\x80\x80",
188
  );
189
190
  /**
191
   * bidirectional text chars
192
   *
193
   * url: https://www.w3.org/International/questions/qa-bidi-unicode-controls
194
   *
195
   * @var array
196
   */
197
  private static $BIDI_UNI_CODE_CONTROLS_TABLE = array(
198
    // LEFT-TO-RIGHT EMBEDDING (use -> dir = "ltr")
199
    8234 => "\xE2\x80\xAA",
200
    // RIGHT-TO-LEFT EMBEDDING (use -> dir = "rtl")
201
    8235 => "\xE2\x80\xAB",
202
    // POP DIRECTIONAL FORMATTING // (use -> </bdo>)
203
    8236 => "\xE2\x80\xAC",
204
    // LEFT-TO-RIGHT OVERRIDE // (use -> <bdo dir = "ltr">)
205
    8237 => "\xE2\x80\xAD",
206
    // RIGHT-TO-LEFT OVERRIDE // (use -> <bdo dir = "rtl">)
207
    8238 => "\xE2\x80\xAE",
208
    // LEFT-TO-RIGHT ISOLATE // (use -> dir = "ltr")
209
    8294 => "\xE2\x81\xA6",
210
    // RIGHT-TO-LEFT ISOLATE // (use -> dir = "rtl")
211
    8295 => "\xE2\x81\xA7",
212
    // FIRST STRONG ISOLATE // (use -> dir = "auto")
213
    8296 => "\xE2\x81\xA8",
214
    // POP DIRECTIONAL ISOLATE
215
    8297 => "\xE2\x81\xA9",
216
  );
217
218
  /**
219
   * @var array
220
   */
221
  private static $COMMON_CASE_FOLD = array(
222
      'ſ'            => 's',
223
      "\xCD\x85"     => 'ι',
224
      'ς'            => 'σ',
225
      "\xCF\x90"     => 'β',
226
      "\xCF\x91"     => 'θ',
227
      "\xCF\x95"     => 'φ',
228
      "\xCF\x96"     => 'π',
229
      "\xCF\xB0"     => 'κ',
230
      "\xCF\xB1"     => 'ρ',
231
      "\xCF\xB5"     => 'ε',
232
      "\xE1\xBA\x9B" => "\xE1\xB9\xA1",
233
      "\xE1\xBE\xBE" => 'ι',
234
  );
235
236
  /**
237
   * @var array
238
   */
239
  private static $BROKEN_UTF8_FIX = array(
240
      "\xc2\x80" => "\xe2\x82\xac", // EURO SIGN
241
      "\xc2\x82" => "\xe2\x80\x9a", // SINGLE LOW-9 QUOTATION MARK
242
      "\xc2\x83" => "\xc6\x92", // LATIN SMALL LETTER F WITH HOOK
243
      "\xc2\x84" => "\xe2\x80\x9e", // DOUBLE LOW-9 QUOTATION MARK
244
      "\xc2\x85" => "\xe2\x80\xa6", // HORIZONTAL ELLIPSIS
245
      "\xc2\x86" => "\xe2\x80\xa0", // DAGGER
246
      "\xc2\x87" => "\xe2\x80\xa1", // DOUBLE DAGGER
247
      "\xc2\x88" => "\xcb\x86", // MODIFIER LETTER CIRCUMFLEX ACCENT
248
      "\xc2\x89" => "\xe2\x80\xb0", // PER MILLE SIGN
249
      "\xc2\x8a" => "\xc5\xa0", // LATIN CAPITAL LETTER S WITH CARON
250
      "\xc2\x8b" => "\xe2\x80\xb9", // SINGLE LEFT-POINTING ANGLE QUOTE
251
      "\xc2\x8c" => "\xc5\x92", // LATIN CAPITAL LIGATURE OE
252
      "\xc2\x8e" => "\xc5\xbd", // LATIN CAPITAL LETTER Z WITH CARON
253
      "\xc2\x91" => "\xe2\x80\x98", // LEFT SINGLE QUOTATION MARK
254
      "\xc2\x92" => "\xe2\x80\x99", // RIGHT SINGLE QUOTATION MARK
255
      "\xc2\x93" => "\xe2\x80\x9c", // LEFT DOUBLE QUOTATION MARK
256
      "\xc2\x94" => "\xe2\x80\x9d", // RIGHT DOUBLE QUOTATION MARK
257
      "\xc2\x95" => "\xe2\x80\xa2", // BULLET
258
      "\xc2\x96" => "\xe2\x80\x93", // EN DASH
259
      "\xc2\x97" => "\xe2\x80\x94", // EM DASH
260
      "\xc2\x98" => "\xcb\x9c", // SMALL TILDE
261
      "\xc2\x99" => "\xe2\x84\xa2", // TRADE MARK SIGN
262
      "\xc2\x9a" => "\xc5\xa1", // LATIN SMALL LETTER S WITH CARON
263
      "\xc2\x9b" => "\xe2\x80\xba", // SINGLE RIGHT-POINTING ANGLE QUOTE
264
      "\xc2\x9c" => "\xc5\x93", // LATIN SMALL LIGATURE OE
265
      "\xc2\x9e" => "\xc5\xbe", // LATIN SMALL LETTER Z WITH CARON
266
      "\xc2\x9f" => "\xc5\xb8", // LATIN CAPITAL LETTER Y WITH DIAERESIS
267
      'ü'       => 'ü',
268
      'ä'       => 'ä',
269
      'ö'       => 'ö',
270
      'Ö'       => 'Ö',
271
      'ß'       => 'ß',
272
      'Ã '       => 'à',
273
      'á'       => 'á',
274
      'â'       => 'â',
275
      'ã'       => 'ã',
276
      'ù'       => 'ù',
277
      'ú'       => 'ú',
278
      'û'       => 'û',
279
      'Ù'       => 'Ù',
280
      'Ú'       => 'Ú',
281
      'Û'       => 'Û',
282
      'Ü'       => 'Ü',
283
      'ò'       => 'ò',
284
      'ó'       => 'ó',
285
      'ô'       => 'ô',
286
      'è'       => 'è',
287
      'é'       => 'é',
288
      'ê'       => 'ê',
289
      'ë'       => 'ë',
290
      'À'       => 'À',
291
      'Á'       => 'Á',
292
      'Â'       => 'Â',
293
      'Ã'       => 'Ã',
294
      'Ä'       => 'Ä',
295
      'Ã…'       => 'Å',
296
      'Ç'       => 'Ç',
297
      'È'       => 'È',
298
      'É'       => 'É',
299
      'Ê'       => 'Ê',
300
      'Ë'       => 'Ë',
301
      'ÃŒ'       => 'Ì',
302
      'Í'       => 'Í',
303
      'ÃŽ'       => 'Î',
304
      'Ï'       => 'Ï',
305
      'Ñ'       => 'Ñ',
306
      'Ã’'       => 'Ò',
307
      'Ó'       => 'Ó',
308
      'Ô'       => 'Ô',
309
      'Õ'       => 'Õ',
310
      'Ø'       => 'Ø',
311
      'Ã¥'       => 'å',
312
      'æ'       => 'æ',
313
      'ç'       => 'ç',
314
      'ì'       => 'ì',
315
      'í'       => 'í',
316
      'î'       => 'î',
317
      'ï'       => 'ï',
318
      'ð'       => 'ð',
319
      'ñ'       => 'ñ',
320
      'õ'       => 'õ',
321
      'ø'       => 'ø',
322
      'ý'       => 'ý',
323
      'ÿ'       => 'ÿ',
324
      '€'      => '€',
325
      '’'      => '’',
326
  );
327
328
  /**
329
   * @var array
330
   */
331
  private static $UTF8_TO_WIN1252 = array(
332
      "\xe2\x82\xac" => "\x80", // EURO SIGN
333
      "\xe2\x80\x9a" => "\x82", // SINGLE LOW-9 QUOTATION MARK
334
      "\xc6\x92"     => "\x83", // LATIN SMALL LETTER F WITH HOOK
335
      "\xe2\x80\x9e" => "\x84", // DOUBLE LOW-9 QUOTATION MARK
336
      "\xe2\x80\xa6" => "\x85", // HORIZONTAL ELLIPSIS
337
      "\xe2\x80\xa0" => "\x86", // DAGGER
338
      "\xe2\x80\xa1" => "\x87", // DOUBLE DAGGER
339
      "\xcb\x86"     => "\x88", // MODIFIER LETTER CIRCUMFLEX ACCENT
340
      "\xe2\x80\xb0" => "\x89", // PER MILLE SIGN
341
      "\xc5\xa0"     => "\x8a", // LATIN CAPITAL LETTER S WITH CARON
342
      "\xe2\x80\xb9" => "\x8b", // SINGLE LEFT-POINTING ANGLE QUOTE
343
      "\xc5\x92"     => "\x8c", // LATIN CAPITAL LIGATURE OE
344
      "\xc5\xbd"     => "\x8e", // LATIN CAPITAL LETTER Z WITH CARON
345
      "\xe2\x80\x98" => "\x91", // LEFT SINGLE QUOTATION MARK
346
      "\xe2\x80\x99" => "\x92", // RIGHT SINGLE QUOTATION MARK
347
      "\xe2\x80\x9c" => "\x93", // LEFT DOUBLE QUOTATION MARK
348
      "\xe2\x80\x9d" => "\x94", // RIGHT DOUBLE QUOTATION MARK
349
      "\xe2\x80\xa2" => "\x95", // BULLET
350
      "\xe2\x80\x93" => "\x96", // EN DASH
351
      "\xe2\x80\x94" => "\x97", // EM DASH
352
      "\xcb\x9c"     => "\x98", // SMALL TILDE
353
      "\xe2\x84\xa2" => "\x99", // TRADE MARK SIGN
354
      "\xc5\xa1"     => "\x9a", // LATIN SMALL LETTER S WITH CARON
355
      "\xe2\x80\xba" => "\x9b", // SINGLE RIGHT-POINTING ANGLE QUOTE
356
      "\xc5\x93"     => "\x9c", // LATIN SMALL LIGATURE OE
357
      "\xc5\xbe"     => "\x9e", // LATIN SMALL LETTER Z WITH CARON
358
      "\xc5\xb8"     => "\x9f", // LATIN CAPITAL LETTER Y WITH DIAERESIS
359
  );
360
361
  /**
362
   * @var array
363
   */
364
  private static $UTF8_MSWORD = array(
365
      "\xc2\xab"     => '"', // « (U+00AB) in UTF-8
366
      "\xc2\xbb"     => '"', // » (U+00BB) in UTF-8
367
      "\xe2\x80\x98" => "'", // ‘ (U+2018) in UTF-8
368
      "\xe2\x80\x99" => "'", // ’ (U+2019) in UTF-8
369
      "\xe2\x80\x9a" => "'", // ‚ (U+201A) in UTF-8
370
      "\xe2\x80\x9b" => "'", // ‛ (U+201B) in UTF-8
371
      "\xe2\x80\x9c" => '"', // “ (U+201C) in UTF-8
372
      "\xe2\x80\x9d" => '"', // ” (U+201D) in UTF-8
373
      "\xe2\x80\x9e" => '"', // „ (U+201E) in UTF-8
374
      "\xe2\x80\x9f" => '"', // ‟ (U+201F) in UTF-8
375
      "\xe2\x80\xb9" => "'", // ‹ (U+2039) in UTF-8
376
      "\xe2\x80\xba" => "'", // › (U+203A) in UTF-8
377
      "\xe2\x80\x93" => '-', // – (U+2013) in UTF-8
378
      "\xe2\x80\x94" => '-', // — (U+2014) in UTF-8
379
      "\xe2\x80\xa6" => '...' // … (U+2026) in UTF-8
380
  );
381
382
  /**
383
   * @var array
384
   */
385
  private static $ICONV_ENCODING = array(
386
      'ANSI_X3.4-1968',
387
      'ANSI_X3.4-1986',
388
      'ASCII',
389
      'CP367',
390
      'IBM367',
391
      'ISO-IR-6',
392
      'ISO646-US',
393
      'ISO_646.IRV:1991',
394
      'US',
395
      'US-ASCII',
396
      'CSASCII',
397
      'UTF-8',
398
      'ISO-10646-UCS-2',
399
      'UCS-2',
400
      'CSUNICODE',
401
      'UCS-2BE',
402
      'UNICODE-1-1',
403
      'UNICODEBIG',
404
      'CSUNICODE11',
405
      'UCS-2LE',
406
      'UNICODELITTLE',
407
      'ISO-10646-UCS-4',
408
      'UCS-4',
409
      'CSUCS4',
410
      'UCS-4BE',
411
      'UCS-4LE',
412
      'UTF-16',
413
      'UTF-16BE',
414
      'UTF-16LE',
415
      'UTF-32',
416
      'UTF-32BE',
417
      'UTF-32LE',
418
      'UNICODE-1-1-UTF-7',
419
      'UTF-7',
420
      'CSUNICODE11UTF7',
421
      'UCS-2-INTERNAL',
422
      'UCS-2-SWAPPED',
423
      'UCS-4-INTERNAL',
424
      'UCS-4-SWAPPED',
425
      'C99',
426
      'JAVA',
427
      'CP819',
428
      'IBM819',
429
      'ISO-8859-1',
430
      'ISO-IR-100',
431
      'ISO8859-1',
432
      'ISO_8859-1',
433
      'ISO_8859-1:1987',
434
      'L1',
435
      'LATIN1',
436
      'CSISOLATIN1',
437
      'ISO-8859-2',
438
      'ISO-IR-101',
439
      'ISO8859-2',
440
      'ISO_8859-2',
441
      'ISO_8859-2:1987',
442
      'L2',
443
      'LATIN2',
444
      'CSISOLATIN2',
445
      'ISO-8859-3',
446
      'ISO-IR-109',
447
      'ISO8859-3',
448
      'ISO_8859-3',
449
      'ISO_8859-3:1988',
450
      'L3',
451
      'LATIN3',
452
      'CSISOLATIN3',
453
      'ISO-8859-4',
454
      'ISO-IR-110',
455
      'ISO8859-4',
456
      'ISO_8859-4',
457
      'ISO_8859-4:1988',
458
      'L4',
459
      'LATIN4',
460
      'CSISOLATIN4',
461
      'CYRILLIC',
462
      'ISO-8859-5',
463
      'ISO-IR-144',
464
      'ISO8859-5',
465
      'ISO_8859-5',
466
      'ISO_8859-5:1988',
467
      'CSISOLATINCYRILLIC',
468
      'ARABIC',
469
      'ASMO-708',
470
      'ECMA-114',
471
      'ISO-8859-6',
472
      'ISO-IR-127',
473
      'ISO8859-6',
474
      'ISO_8859-6',
475
      'ISO_8859-6:1987',
476
      'CSISOLATINARABIC',
477
      'ECMA-118',
478
      'ELOT_928',
479
      'GREEK',
480
      'GREEK8',
481
      'ISO-8859-7',
482
      'ISO-IR-126',
483
      'ISO8859-7',
484
      'ISO_8859-7',
485
      'ISO_8859-7:1987',
486
      'ISO_8859-7:2003',
487
      'CSISOLATINGREEK',
488
      'HEBREW',
489
      'ISO-8859-8',
490
      'ISO-IR-138',
491
      'ISO8859-8',
492
      'ISO_8859-8',
493
      'ISO_8859-8:1988',
494
      'CSISOLATINHEBREW',
495
      'ISO-8859-9',
496
      'ISO-IR-148',
497
      'ISO8859-9',
498
      'ISO_8859-9',
499
      'ISO_8859-9:1989',
500
      'L5',
501
      'LATIN5',
502
      'CSISOLATIN5',
503
      'ISO-8859-10',
504
      'ISO-IR-157',
505
      'ISO8859-10',
506
      'ISO_8859-10',
507
      'ISO_8859-10:1992',
508
      'L6',
509
      'LATIN6',
510
      'CSISOLATIN6',
511
      'ISO-8859-11',
512
      'ISO8859-11',
513
      'ISO_8859-11',
514
      'ISO-8859-13',
515
      'ISO-IR-179',
516
      'ISO8859-13',
517
      'ISO_8859-13',
518
      'L7',
519
      'LATIN7',
520
      'ISO-8859-14',
521
      'ISO-CELTIC',
522
      'ISO-IR-199',
523
      'ISO8859-14',
524
      'ISO_8859-14',
525
      'ISO_8859-14:1998',
526
      'L8',
527
      'LATIN8',
528
      'ISO-8859-15',
529
      'ISO-IR-203',
530
      'ISO8859-15',
531
      'ISO_8859-15',
532
      'ISO_8859-15:1998',
533
      'LATIN-9',
534
      'ISO-8859-16',
535
      'ISO-IR-226',
536
      'ISO8859-16',
537
      'ISO_8859-16',
538
      'ISO_8859-16:2001',
539
      'L10',
540
      'LATIN10',
541
      'KOI8-R',
542
      'CSKOI8R',
543
      'KOI8-U',
544
      'KOI8-RU',
545
      'CP1250',
546
      'MS-EE',
547
      'WINDOWS-1250',
548
      'CP1251',
549
      'MS-CYRL',
550
      'WINDOWS-1251',
551
      'CP1252',
552
      'MS-ANSI',
553
      'WINDOWS-1252',
554
      'CP1253',
555
      'MS-GREEK',
556
      'WINDOWS-1253',
557
      'CP1254',
558
      'MS-TURK',
559
      'WINDOWS-1254',
560
      'CP1255',
561
      'MS-HEBR',
562
      'WINDOWS-1255',
563
      'CP1256',
564
      'MS-ARAB',
565
      'WINDOWS-1256',
566
      'CP1257',
567
      'WINBALTRIM',
568
      'WINDOWS-1257',
569
      'CP1258',
570
      'WINDOWS-1258',
571
      '850',
572
      'CP850',
573
      'IBM850',
574
      'CSPC850MULTILINGUAL',
575
      '862',
576
      'CP862',
577
      'IBM862',
578
      'CSPC862LATINHEBREW',
579
      '866',
580
      'CP866',
581
      'IBM866',
582
      'CSIBM866',
583
      'MAC',
584
      'MACINTOSH',
585
      'MACROMAN',
586
      'CSMACINTOSH',
587
      'MACCENTRALEUROPE',
588
      'MACICELAND',
589
      'MACCROATIAN',
590
      'MACROMANIA',
591
      'MACCYRILLIC',
592
      'MACUKRAINE',
593
      'MACGREEK',
594
      'MACTURKISH',
595
      'MACHEBREW',
596
      'MACARABIC',
597
      'MACTHAI',
598
      'HP-ROMAN8',
599
      'R8',
600
      'ROMAN8',
601
      'CSHPROMAN8',
602
      'NEXTSTEP',
603
      'ARMSCII-8',
604
      'GEORGIAN-ACADEMY',
605
      'GEORGIAN-PS',
606
      'KOI8-T',
607
      'CP154',
608
      'CYRILLIC-ASIAN',
609
      'PT154',
610
      'PTCP154',
611
      'CSPTCP154',
612
      'KZ-1048',
613
      'RK1048',
614
      'STRK1048-2002',
615
      'CSKZ1048',
616
      'MULELAO-1',
617
      'CP1133',
618
      'IBM-CP1133',
619
      'ISO-IR-166',
620
      'TIS-620',
621
      'TIS620',
622
      'TIS620-0',
623
      'TIS620.2529-1',
624
      'TIS620.2533-0',
625
      'TIS620.2533-1',
626
      'CP874',
627
      'WINDOWS-874',
628
      'VISCII',
629
      'VISCII1.1-1',
630
      'CSVISCII',
631
      'TCVN',
632
      'TCVN-5712',
633
      'TCVN5712-1',
634
      'TCVN5712-1:1993',
635
      'ISO-IR-14',
636
      'ISO646-JP',
637
      'JIS_C6220-1969-RO',
638
      'JP',
639
      'CSISO14JISC6220RO',
640
      'JISX0201-1976',
641
      'JIS_X0201',
642
      'X0201',
643
      'CSHALFWIDTHKATAKANA',
644
      'ISO-IR-87',
645
      'JIS0208',
646
      'JIS_C6226-1983',
647
      'JIS_X0208',
648
      'JIS_X0208-1983',
649
      'JIS_X0208-1990',
650
      'X0208',
651
      'CSISO87JISX0208',
652
      'ISO-IR-159',
653
      'JIS_X0212',
654
      'JIS_X0212-1990',
655
      'JIS_X0212.1990-0',
656
      'X0212',
657
      'CSISO159JISX02121990',
658
      'CN',
659
      'GB_1988-80',
660
      'ISO-IR-57',
661
      'ISO646-CN',
662
      'CSISO57GB1988',
663
      'CHINESE',
664
      'GB_2312-80',
665
      'ISO-IR-58',
666
      'CSISO58GB231280',
667
      'CN-GB-ISOIR165',
668
      'ISO-IR-165',
669
      'ISO-IR-149',
670
      'KOREAN',
671
      'KSC_5601',
672
      'KS_C_5601-1987',
673
      'KS_C_5601-1989',
674
      'CSKSC56011987',
675
      'EUC-JP',
676
      'EUCJP',
677
      'EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE',
678
      'CSEUCPKDFMTJAPANESE',
679
      'MS_KANJI',
680
      'SHIFT-JIS',
681
      'SHIFT_JIS',
682
      'SJIS',
683
      'CSSHIFTJIS',
684
      'CP932',
685
      'ISO-2022-JP',
686
      'CSISO2022JP',
687
      'ISO-2022-JP-1',
688
      'ISO-2022-JP-2',
689
      'CSISO2022JP2',
690
      'CN-GB',
691
      'EUC-CN',
692
      'EUCCN',
693
      'GB2312',
694
      'CSGB2312',
695
      'GBK',
696
      'CP936',
697
      'MS936',
698
      'WINDOWS-936',
699
      'GB18030',
700
      'ISO-2022-CN',
701
      'CSISO2022CN',
702
      'ISO-2022-CN-EXT',
703
      'HZ',
704
      'HZ-GB-2312',
705
      'EUC-TW',
706
      'EUCTW',
707
      'CSEUCTW',
708
      'BIG-5',
709
      'BIG-FIVE',
710
      'BIG5',
711
      'BIGFIVE',
712
      'CN-BIG5',
713
      'CSBIG5',
714
      'CP950',
715
      'BIG5-HKSCS:1999',
716
      'BIG5-HKSCS:2001',
717
      'BIG5-HKSCS',
718
      'BIG5-HKSCS:2004',
719
      'BIG5HKSCS',
720
      'EUC-KR',
721
      'EUCKR',
722
      'CSEUCKR',
723
      'CP949',
724
      'UHC',
725
      'CP1361',
726
      'JOHAB',
727
      'ISO-2022-KR',
728
      'CSISO2022KR',
729
      'CP856',
730
      'CP922',
731
      'CP943',
732
      'CP1046',
733
      'CP1124',
734
      'CP1129',
735
      'CP1161',
736
      'IBM-1161',
737
      'IBM1161',
738
      'CSIBM1161',
739
      'CP1162',
740
      'IBM-1162',
741
      'IBM1162',
742
      'CSIBM1162',
743
      'CP1163',
744
      'IBM-1163',
745
      'IBM1163',
746
      'CSIBM1163',
747
      'DEC-KANJI',
748
      'DEC-HANYU',
749
      '437',
750
      'CP437',
751
      'IBM437',
752
      'CSPC8CODEPAGE437',
753
      'CP737',
754
      'CP775',
755
      'IBM775',
756
      'CSPC775BALTIC',
757
      '852',
758
      'CP852',
759
      'IBM852',
760
      'CSPCP852',
761
      'CP853',
762
      '855',
763
      'CP855',
764
      'IBM855',
765
      'CSIBM855',
766
      '857',
767
      'CP857',
768
      'IBM857',
769
      'CSIBM857',
770
      'CP858',
771
      '860',
772
      'CP860',
773
      'IBM860',
774
      'CSIBM860',
775
      '861',
776
      'CP-IS',
777
      'CP861',
778
      'IBM861',
779
      'CSIBM861',
780
      '863',
781
      'CP863',
782
      'IBM863',
783
      'CSIBM863',
784
      'CP864',
785
      'IBM864',
786
      'CSIBM864',
787
      '865',
788
      'CP865',
789
      'IBM865',
790
      'CSIBM865',
791
      '869',
792
      'CP-GR',
793
      'CP869',
794
      'IBM869',
795
      'CSIBM869',
796
      'CP1125',
797
      'EUC-JISX0213',
798
      'SHIFT_JISX0213',
799
      'ISO-2022-JP-3',
800
      'BIG5-2003',
801
      'ISO-IR-230',
802
      'TDS565',
803
      'ATARI',
804
      'ATARIST',
805
      'RISCOS-LATIN1',
806
  );
807
808
  /**
809
   * @var array
810
   */
811
  private static $SUPPORT = array();
812
813
  /**
814
   * __construct()
815
   */
816 16
  public function __construct()
817
  {
818 16
    self::checkForSupport();
819 16
  }
820
821
  /**
822
   * Return the character at the specified position: $str[1] like functionality.
823
   *
824
   * @param string $str <p>A UTF-8 string.</p>
825
   * @param int    $pos <p>The position of character to return.</p>
826
   *
827
   * @return string <p>Single Multi-Byte character.</p>
828
   */
829 3
  public static function access($str, $pos)
830
  {
831 3
    $str = (string)$str;
832
833 3
    if (!isset($str[0])) {
834 1
      return '';
835
    }
836
837 3
    $pos = (int)$pos;
838
839 3
    if ($pos < 0) {
840 1
      return '';
841
    }
842
843 3
    return (string)self::substr($str, $pos, 1);
844
  }
845
846
  /**
847
   * Prepends UTF-8 BOM character to the string and returns the whole string.
848
   *
849
   * INFO: If BOM already existed there, the Input string is returned.
850
   *
851
   * @param string $str <p>The input string.</p>
852
   *
853
   * @return string <p>The output string that contains BOM.</p>
854
   */
855 1
  public static function add_bom_to_string($str)
856
  {
857 1
    if (self::string_has_bom($str) === false) {
858 1
      $str = self::bom() . $str;
859 1
    }
860
861 1
    return $str;
862
  }
863
864
  /**
865
   * Convert binary into an string.
866
   *
867
   * @param mixed $bin 1|0
868
   *
869
   * @return string
870
   */
871 1
  public static function binary_to_str($bin)
872
  {
873 1
    if (!isset($bin[0])) {
874
      return '';
875
    }
876
877 1
    $convert = base_convert($bin, 2, 16);
878 1
    if ($convert === '0') {
879 1
      return '';
880
    }
881
882 1
    return pack('H*', $convert);
883
  }
884
885
  /**
886
   * Returns the UTF-8 Byte Order Mark Character.
887
   *
888
   * INFO: take a look at UTF8::$bom for e.g. UTF-16 and UTF-32 BOM values
889
   *
890
   * @return string UTF-8 Byte Order Mark
891
   */
892 2
  public static function bom()
893
  {
894 2
    return "\xef\xbb\xbf";
895
  }
896
897
  /**
898
   * @alias of UTF8::chr_map()
899
   *
900
   * @see   UTF8::chr_map()
901
   *
902
   * @param string|array $callback
903
   * @param string       $str
904
   *
905
   * @return array
906
   */
907 1
  public static function callback($callback, $str)
908
  {
909 1
    return self::chr_map($callback, $str);
910
  }
911
912
  /**
913
   * This method will auto-detect your server environment for UTF-8 support.
914
   *
915
   * INFO: You don't need to run it manually, it will be triggered if it's needed.
916
   */
917 19
  public static function checkForSupport()
918
  {
919 19
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
920
921 1
      self::$SUPPORT['already_checked_via_portable_utf8'] = true;
922
923
      // http://php.net/manual/en/book.mbstring.php
924 1
      self::$SUPPORT['mbstring'] = self::mbstring_loaded();
925 1
      self::$SUPPORT['mbstring_func_overload'] = self::mbstring_overloaded();
926
927
      // http://php.net/manual/en/book.iconv.php
928 1
      self::$SUPPORT['iconv'] = self::iconv_loaded();
929
930
      // http://php.net/manual/en/book.intl.php
931 1
      self::$SUPPORT['intl'] = self::intl_loaded();
932 1
      self::$SUPPORT['intl__transliterator_list_ids'] = array();
933
      if (
934 1
          self::$SUPPORT['intl'] === true
935 1
          &&
936 1
          function_exists('transliterator_list_ids') === true
937 1
      ) {
938
        self::$SUPPORT['intl__transliterator_list_ids'] = transliterator_list_ids();
939
      }
940
941
      // http://php.net/manual/en/class.intlchar.php
942 1
      self::$SUPPORT['intlChar'] = self::intlChar_loaded();
943
944
      // http://php.net/manual/en/book.pcre.php
945 1
      self::$SUPPORT['pcre_utf8'] = self::pcre_utf8_support();
946 1
    }
947 19
  }
948
949
  /**
950
   * Generates a UTF-8 encoded character from the given code point.
951
   *
952
   * INFO: opposite to UTF8::ord()
953
   *
954
   * @param int    $code_point <p>The code point for which to generate a character.</p>
955
   * @param string $encoding   [optional] <p>Default is UTF-8</p>
956
   *
957
   * @return string|null <p>Multi-Byte character, returns null on failure or empty input.</p>
958
   */
959 10
  public static function chr($code_point, $encoding = 'UTF-8')
960
  {
961
    // init
962 10
    static $CHAR_CACHE = array();
963
964 10
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
965
      self::checkForSupport();
966
    }
967
968 10
    if ($encoding !== 'UTF-8') {
969 2
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
970 2
    }
971
972 View Code Duplication
    if (
973
        $encoding !== 'UTF-8'
974 10
        &&
975
        $encoding !== 'WINDOWS-1252'
976 10
        &&
977 1
        self::$SUPPORT['mbstring'] === false
978 10
    ) {
979
      trigger_error('UTF8::chr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
980
    }
981
982 10
    $cacheKey = $code_point . $encoding;
983 10
    if (isset($CHAR_CACHE[$cacheKey]) === true) {
984 8
      return $CHAR_CACHE[$cacheKey];
985
    }
986
987 9
    if (self::$SUPPORT['intlChar'] === true) {
988
      $str = \IntlChar::chr($code_point);
989
990
      if ($encoding !== 'UTF-8') {
991
        $str = \mb_convert_encoding($str, $encoding, 'UTF-8');
992
      }
993
994
      $CHAR_CACHE[$cacheKey] = $str;
995
      return $str;
996
    }
997
998
    // check type of code_point, only if there is no support for "\IntlChar"
999 9
    if ((int)$code_point !== $code_point) {
1000 1
      $CHAR_CACHE[$cacheKey] = null;
1001 1
      return null;
1002
    }
1003
1004 9
    if ($code_point <= 0x7F) {
1005 7
      $str = self::chr_and_parse_int($code_point);
1006 9
    } elseif ($code_point <= 0x7FF) {
1007 6
      $str = self::chr_and_parse_int(($code_point >> 6) + 0xC0) .
1008 6
             self::chr_and_parse_int(($code_point & 0x3F) + 0x80);
1009 7
    } elseif ($code_point <= 0xFFFF) {
1010 7
      $str = self::chr_and_parse_int(($code_point >> 12) + 0xE0) .
1011 7
             self::chr_and_parse_int((($code_point >> 6) & 0x3F) + 0x80) .
1012 7
             self::chr_and_parse_int(($code_point & 0x3F) + 0x80);
1013 7
    } else {
1014 1
      $str = self::chr_and_parse_int(($code_point >> 18) + 0xF0) .
1015 1
             self::chr_and_parse_int((($code_point >> 12) & 0x3F) + 0x80) .
1016 1
             self::chr_and_parse_int((($code_point >> 6) & 0x3F) + 0x80) .
1017 1
             self::chr_and_parse_int(($code_point & 0x3F) + 0x80);
1018
    }
1019
1020 9
    if ($encoding !== 'UTF-8') {
1021 1
      $str = \mb_convert_encoding($str, $encoding, 'UTF-8');
1022 1
    }
1023
1024
    // add into static cache
1025 9
    $CHAR_CACHE[$cacheKey] = $str;
1026
1027 9
    return $str;
1028
  }
1029
1030
  /**
1031
   * @param int $int
1032
   *
1033
   * @return string
1034
   */
1035 26
  private static function chr_and_parse_int($int)
1036
  {
1037 26
    return chr((int)$int);
1038
  }
1039
1040
  /**
1041
   * Applies callback to all characters of a string.
1042
   *
1043
   * @param string|array $callback <p>The callback function.</p>
1044
   * @param string       $str      <p>UTF-8 string to run callback on.</p>
1045
   *
1046
   * @return array <p>The outcome of callback.</p>
1047
   */
1048 1
  public static function chr_map($callback, $str)
1049
  {
1050 1
    $chars = self::split($str);
1051
1052 1
    return array_map($callback, $chars);
1053
  }
1054
1055
  /**
1056
   * Generates an array of byte length of each character of a Unicode string.
1057
   *
1058
   * 1 byte => U+0000  - U+007F
1059
   * 2 byte => U+0080  - U+07FF
1060
   * 3 byte => U+0800  - U+FFFF
1061
   * 4 byte => U+10000 - U+10FFFF
1062
   *
1063
   * @param string $str <p>The original Unicode string.</p>
1064
   *
1065
   * @return array <p>An array of byte lengths of each character.</p>
1066
   */
1067 4
  public static function chr_size_list($str)
1068
  {
1069 4
    $str = (string)$str;
1070
1071 4
    if (!isset($str[0])) {
1072 3
      return array();
1073
    }
1074
1075 4
    return array_map(
1076
        function ($data) {
1077 4
          return UTF8::strlen($data, '8BIT');
1078 4
        },
1079 4
        self::split($str)
1080 4
    );
1081
  }
1082
1083
  /**
1084
   * Get a decimal code representation of a specific character.
1085
   *
1086
   * @param string $char <p>The input character.</p>
1087
   *
1088
   * @return int
1089
   */
1090 2
  public static function chr_to_decimal($char)
1091
  {
1092 2
    $char = (string)$char;
1093 2
    $code = self::ord($char[0]);
1094 2
    $bytes = 1;
1095
1096 2
    if (!($code & 0x80)) {
1097
      // 0xxxxxxx
1098 2
      return $code;
1099
    }
1100
1101 2
    if (($code & 0xe0) === 0xc0) {
1102
      // 110xxxxx
1103 2
      $bytes = 2;
1104 2
      $code &= ~0xc0;
1105 2
    } elseif (($code & 0xf0) === 0xe0) {
1106
      // 1110xxxx
1107 2
      $bytes = 3;
1108 2
      $code &= ~0xe0;
1109 2
    } elseif (($code & 0xf8) === 0xf0) {
1110
      // 11110xxx
1111 1
      $bytes = 4;
1112 1
      $code &= ~0xf0;
1113 1
    }
1114
1115 2
    for ($i = 2; $i <= $bytes; $i++) {
1116
      // 10xxxxxx
1117 2
      $code = ($code << 6) + (self::ord($char[$i - 1]) & ~0x80);
1118 2
    }
1119
1120 2
    return $code;
1121
  }
1122
1123
  /**
1124
   * Get hexadecimal code point (U+xxxx) of a UTF-8 encoded character.
1125
   *
1126
   * @param string $char <p>The input character</p>
1127
   * @param string $pfix [optional]
1128
   *
1129
   * @return string <p>The code point encoded as U+xxxx<p>
1130
   */
1131 1
  public static function chr_to_hex($char, $pfix = 'U+')
1132
  {
1133 1
    $char = (string)$char;
1134
1135 1
    if (!isset($char[0])) {
1136 1
      return '';
1137
    }
1138
1139 1
    if ($char === '&#0;') {
1140
      $char = '';
1141
    }
1142
1143 1
    return self::int_to_hex(self::ord($char), $pfix);
1144
  }
1145
1146
  /**
1147
   * alias for "UTF8::chr_to_decimal()"
1148
   *
1149
   * @see UTF8::chr_to_decimal()
1150
   *
1151
   * @param string $chr
1152
   *
1153
   * @return int
1154
   */
1155 1
  public static function chr_to_int($chr)
1156
  {
1157 1
    return self::chr_to_decimal($chr);
1158
  }
1159
1160
  /**
1161
   * Splits a string into smaller chunks and multiple lines, using the specified line ending character.
1162
   *
1163
   * @param string $body     <p>The original string to be split.</p>
1164
   * @param int    $chunklen [optional] <p>The maximum character length of a chunk.</p>
1165
   * @param string $end      [optional] <p>The character(s) to be inserted at the end of each chunk.</p>
1166
   *
1167
   * @return string <p>The chunked string</p>
1168
   */
1169 1
  public static function chunk_split($body, $chunklen = 76, $end = "\r\n")
1170
  {
1171 1
    return implode($end, self::split($body, $chunklen));
1172
  }
1173
1174
  /**
1175
   * Accepts a string and removes all non-UTF-8 characters from it + extras if needed.
1176
   *
1177
   * @param string $str                     <p>The string to be sanitized.</p>
1178
   * @param bool   $remove_bom              [optional] <p>Set to true, if you need to remove UTF-BOM.</p>
1179
   * @param bool   $normalize_whitespace    [optional] <p>Set to true, if you need to normalize the whitespace.</p>
1180
   * @param bool   $normalize_msword        [optional] <p>Set to true, if you need to normalize MS Word chars e.g.: "…"
1181
   *                                        => "..."</p>
1182
   * @param bool   $keep_non_breaking_space [optional] <p>Set to true, to keep non-breaking-spaces, in combination with
1183
   *                                        $normalize_whitespace</p>
1184
   *
1185
   * @return string <p>Clean UTF-8 encoded string.</p>
1186
   */
1187 61
  public static function clean($str, $remove_bom = false, $normalize_whitespace = false, $normalize_msword = false, $keep_non_breaking_space = false)
1188
  {
1189
    // http://stackoverflow.com/questions/1401317/remove-non-utf8-characters-from-string
1190
    // caused connection reset problem on larger strings
1191
1192
    $regx = '/
1193
      (
1194
        (?: [\x00-\x7F]               # single-byte sequences   0xxxxxxx
1195
        |   [\xC0-\xDF][\x80-\xBF]    # double-byte sequences   110xxxxx 10xxxxxx
1196
        |   [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences   1110xxxx 10xxxxxx * 2
1197
        |   [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3
1198
        ){1,100}                      # ...one or more times
1199
      )
1200
    | ( [\x80-\xBF] )                 # invalid byte in range 10000000 - 10111111
1201
    | ( [\xC0-\xFF] )                 # invalid byte in range 11000000 - 11111111
1202 61
    /x';
1203 61
    $str = preg_replace($regx, '$1', $str);
1204
1205 61
    $str = self::replace_diamond_question_mark($str, '');
1206 61
    $str = self::remove_invisible_characters($str);
1207
1208 61
    if ($normalize_whitespace === true) {
1209 36
      $str = self::normalize_whitespace($str, $keep_non_breaking_space);
1210 36
    }
1211
1212 61
    if ($normalize_msword === true) {
1213 15
      $str = self::normalize_msword($str);
1214 15
    }
1215
1216 61
    if ($remove_bom === true) {
1217 35
      $str = self::remove_bom($str);
1218 35
    }
1219
1220 61
    return $str;
1221
  }
1222
1223
  /**
1224
   * Clean-up a and show only printable UTF-8 chars at the end  + fix UTF-8 encoding.
1225
   *
1226
   * @param string $str <p>The input string.</p>
1227
   *
1228
   * @return string
1229
   */
1230 21 View Code Duplication
  public static function cleanup($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1231
  {
1232 21
    $str = (string)$str;
1233
1234 21
    if (!isset($str[0])) {
1235 2
      return '';
1236
    }
1237
1238
    // fixed ISO <-> UTF-8 Errors
1239 21
    $str = self::fix_simple_utf8($str);
1240
1241
    // remove all none UTF-8 symbols
1242
    // && remove diamond question mark (�)
1243
    // && remove remove invisible characters (e.g. "\0")
1244
    // && remove BOM
1245
    // && normalize whitespace chars (but keep non-breaking-spaces)
1246 21
    $str = self::clean($str, true, true, false, true);
1247
1248 21
    return (string)$str;
1249
  }
1250
1251
  /**
1252
   * Accepts a string or a array of strings and returns an array of Unicode code points.
1253
   *
1254
   * INFO: opposite to UTF8::string()
1255
   *
1256
   * @param string|string[] $arg        <p>A UTF-8 encoded string or an array of such strings.</p>
1257
   * @param bool            $u_style    <p>If True, will return code points in U+xxxx format,
1258
   *                                    default, code points will be returned as integers.</p>
1259
   *
1260
   * @return array <p>The array of code points.</p>
1261
   */
1262 7
  public static function codepoints($arg, $u_style = false)
1263
  {
1264 7
    if (is_string($arg) === true) {
1265 7
      $arg = self::split($arg);
1266 7
    }
1267
1268 7
    $arg = array_map(
1269
        array(
1270 7
            '\\voku\\helper\\UTF8',
1271 7
            'ord',
1272 7
        ),
1273
        $arg
1274 7
    );
1275
1276 7
    if ($u_style) {
1277 1
      $arg = array_map(
1278
          array(
1279 1
              '\\voku\\helper\\UTF8',
1280 1
              'int_to_hex',
1281 1
          ),
1282
          $arg
1283 1
      );
1284 1
    }
1285
1286 7
    return $arg;
1287
  }
1288
1289
  /**
1290
   * Returns count of characters used in a string.
1291
   *
1292
   * @param string $str       <p>The input string.</p>
1293
   * @param bool   $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
1294
   *
1295
   * @return array <p>An associative array of Character as keys and
1296
   *               their count as values.</p>
1297
   */
1298 7
  public static function count_chars($str, $cleanUtf8 = false)
1299
  {
1300 7
    return array_count_values(self::split($str, 1, $cleanUtf8));
1301
  }
1302
1303
  /**
1304
   * Converts a int-value into an UTF-8 character.
1305
   *
1306
   * @param mixed $int
1307
   *
1308
   * @return string
1309
   */
1310 5
  public static function decimal_to_chr($int)
1311
  {
1312 5
    if (Bootup::is_php('5.4') === true) {
1313
      $flags = ENT_QUOTES | ENT_HTML5;
1314
    } else {
1315 5
      $flags = ENT_QUOTES;
1316
    }
1317
1318 5
    return self::html_entity_decode('&#' . $int . ';', $flags);
1319
  }
1320
1321
  /**
1322
   * Encode a string with a new charset-encoding.
1323
   *
1324
   * INFO:  The different to "UTF8::utf8_encode()" is that this function, try to fix also broken / double encoding,
1325
   *        so you can call this function also on a UTF-8 String and you don't mess the string.
1326
   *
1327
   * @param string $encoding <p>e.g. 'UTF-16', 'UTF-8', 'ISO-8859-1', etc.</p>
1328
   * @param string $str      <p>The input string</p>
1329
   * @param bool   $force    [optional] <p>Force the new encoding (we try to fix broken / double encoding for
1330
   *                         UTF-8)<br> otherwise we auto-detect the current string-encoding</p>
1331
   *
1332
   * @return string
1333
   */
1334 11
  public static function encode($encoding, $str, $force = true)
1335
  {
1336 11
    $str = (string)$str;
1337 11
    $encoding = (string)$encoding;
1338
1339 11
    if (!isset($str[0], $encoding[0])) {
1340 5
      return $str;
1341
    }
1342
1343 11
    if ($encoding !== 'UTF-8') {
1344 2
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
1345 2
    }
1346
1347 11
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
1348
      self::checkForSupport();
1349
    }
1350
1351 11
    $encodingDetected = self::str_detect_encoding($str);
1352
1353
    if (
1354
        $encodingDetected !== false
1355 11
        &&
1356
        (
1357
            $force === true
1358 11
            ||
1359
            $encodingDetected !== $encoding
1360 3
        )
1361 11
    ) {
1362
1363 View Code Duplication
      if (
1364
          $encoding === 'UTF-8'
1365 11
          &&
1366
          (
1367
              $force === true
1368 11
              || $encodingDetected === 'UTF-8'
1369 2
              || $encodingDetected === 'WINDOWS-1252'
1370 2
              || $encodingDetected === 'ISO-8859-1'
1371 2
          )
1372 11
      ) {
1373 11
        return self::to_utf8($str);
1374
      }
1375
1376 View Code Duplication
      if (
1377
          $encoding === 'ISO-8859-1'
1378 3
          &&
1379
          (
1380
              $force === true
1381 2
              || $encodingDetected === 'ISO-8859-1'
1382 1
              || $encodingDetected === 'WINDOWS-1252'
1383 1
              || $encodingDetected === 'UTF-8'
1384 1
          )
1385 3
      ) {
1386 2
        return self::to_iso8859($str);
1387
      }
1388
1389 View Code Duplication
      if (
1390
          $encoding !== 'UTF-8'
1391 2
          &&
1392
          $encoding !== 'WINDOWS-1252'
1393 2
          &&
1394 1
          self::$SUPPORT['mbstring'] === false
1395 2
      ) {
1396
        trigger_error('UTF8::encode() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
1397
      }
1398
1399 2
      $strEncoded = \mb_convert_encoding(
1400 2
          $str,
1401 2
          $encoding,
1402
          $encodingDetected
1403 2
      );
1404
1405 2
      if ($strEncoded) {
1406 2
        return $strEncoded;
1407
      }
1408
    }
1409
1410 1
    return $str;
1411
  }
1412
1413
  /**
1414
   * Reads entire file into a string.
1415
   *
1416
   * WARNING: do not use UTF-8 Option ($convertToUtf8) for binary-files (e.g.: images) !!!
1417
   *
1418
   * @link http://php.net/manual/en/function.file-get-contents.php
1419
   *
1420
   * @param string        $filename      <p>
1421
   *                                     Name of the file to read.
1422
   *                                     </p>
1423
   * @param int|false     $flags         [optional] <p>
1424
   *                                     Prior to PHP 6, this parameter is called
1425
   *                                     use_include_path and is a bool.
1426
   *                                     As of PHP 5 the FILE_USE_INCLUDE_PATH can be used
1427
   *                                     to trigger include path
1428
   *                                     search.
1429
   *                                     </p>
1430
   *                                     <p>
1431
   *                                     The value of flags can be any combination of
1432
   *                                     the following flags (with some restrictions), joined with the
1433
   *                                     binary OR (|)
1434
   *                                     operator.
1435
   *                                     </p>
1436
   *                                     <p>
1437
   *                                     <table>
1438
   *                                     Available flags
1439
   *                                     <tr valign="top">
1440
   *                                     <td>Flag</td>
1441
   *                                     <td>Description</td>
1442
   *                                     </tr>
1443
   *                                     <tr valign="top">
1444
   *                                     <td>
1445
   *                                     FILE_USE_INCLUDE_PATH
1446
   *                                     </td>
1447
   *                                     <td>
1448
   *                                     Search for filename in the include directory.
1449
   *                                     See include_path for more
1450
   *                                     information.
1451
   *                                     </td>
1452
   *                                     </tr>
1453
   *                                     <tr valign="top">
1454
   *                                     <td>
1455
   *                                     FILE_TEXT
1456
   *                                     </td>
1457
   *                                     <td>
1458
   *                                     As of PHP 6, the default encoding of the read
1459
   *                                     data is UTF-8. You can specify a different encoding by creating a
1460
   *                                     custom context or by changing the default using
1461
   *                                     stream_default_encoding. This flag cannot be
1462
   *                                     used with FILE_BINARY.
1463
   *                                     </td>
1464
   *                                     </tr>
1465
   *                                     <tr valign="top">
1466
   *                                     <td>
1467
   *                                     FILE_BINARY
1468
   *                                     </td>
1469
   *                                     <td>
1470
   *                                     With this flag, the file is read in binary mode. This is the default
1471
   *                                     setting and cannot be used with FILE_TEXT.
1472
   *                                     </td>
1473
   *                                     </tr>
1474
   *                                     </table>
1475
   *                                     </p>
1476
   * @param resource|null $context       [optional] <p>
1477
   *                                     A valid context resource created with
1478
   *                                     stream_context_create. If you don't need to use a
1479
   *                                     custom context, you can skip this parameter by &null;.
1480
   *                                     </p>
1481
   * @param int|null $offset             [optional] <p>
1482
   *                                     The offset where the reading starts.
1483
   *                                     </p>
1484
   * @param int|null $maxLength          [optional] <p>
1485
   *                                     Maximum length of data read. The default is to read until end
1486
   *                                     of file is reached.
1487
   *                                     </p>
1488
   * @param int      $timeout            <p>The time in seconds for the timeout.</p>
1489
   *
1490
   * @param boolean  $convertToUtf8      <strong>WARNING!!!</strong> <p>Maybe you can't use this option for e.g. images
1491
   *                                     or pdf, because they used non default utf-8 chars</p>
1492
   *
1493
   * @return string <p>The function returns the read data or false on failure.</p>
1494
   */
1495 3
  public static function file_get_contents($filename, $flags = null, $context = null, $offset = null, $maxLength = null, $timeout = 10, $convertToUtf8 = true)
1496
  {
1497
    // init
1498 3
    $timeout = (int)$timeout;
1499 3
    $filename = filter_var($filename, FILTER_SANITIZE_STRING);
1500
1501 3
    if ($timeout && $context === null) {
1502 2
      $context = stream_context_create(
1503
          array(
1504
              'http' =>
1505
                  array(
1506 2
                      'timeout' => $timeout,
1507 2
                  ),
1508
          )
1509 2
      );
1510 2
    }
1511
1512 3
    if (!$flags) {
1513 3
      $flags = false;
1514 3
    }
1515
1516 3
    if ($offset === null) {
1517 3
      $offset = 0;
1518 3
    }
1519
1520 3
    if (is_int($maxLength) === true) {
1521 1
      $data = file_get_contents($filename, $flags, $context, $offset, $maxLength);
1522 1
    } else {
1523 3
      $data = file_get_contents($filename, $flags, $context, $offset);
1524
    }
1525
1526
    // return false on error
1527 3
    if ($data === false) {
1528 1
      return false;
1529
    }
1530
1531 2
    if ($convertToUtf8 === true) {
1532 2
      $data = self::encode('UTF-8', $data, false);
1533 2
      $data = self::cleanup($data);
0 ignored issues
show
Bug introduced by
It seems like $data can also be of type array; however, voku\helper\UTF8::cleanup() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
1534 2
    }
1535
1536 2
    return $data;
1537
  }
1538
1539
  /**
1540
   * Checks if a file starts with BOM (Byte Order Mark) character.
1541
   *
1542
   * @param string $file_path <p>Path to a valid file.</p>
1543
   *
1544
   * @return bool <p><strong>true</strong> if the file has BOM at the start, <strong>false</strong> otherwise.</>
1545
   */
1546 1
  public static function file_has_bom($file_path)
1547
  {
1548 1
    return self::string_has_bom(file_get_contents($file_path));
1549
  }
1550
1551
  /**
1552
   * Normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1553
   *
1554
   * @param mixed  $var
1555
   * @param int    $normalization_form
1556
   * @param string $leading_combining
1557
   *
1558
   * @return mixed
1559
   */
1560 9
  public static function filter($var, $normalization_form = 4 /* n::NFC */, $leading_combining = '◌')
1561
  {
1562 9
    switch (gettype($var)) {
1563 9 View Code Duplication
      case 'array':
1564 3
        foreach ($var as $k => $v) {
1565
          /** @noinspection AlterInForeachInspection */
1566 3
          $var[$k] = self::filter($v, $normalization_form, $leading_combining);
1567 3
        }
1568 3
        break;
1569 9 View Code Duplication
      case 'object':
1570 2
        foreach ($var as $k => $v) {
1571 2
          $var->{$k} = self::filter($v, $normalization_form, $leading_combining);
1572 2
        }
1573 2
        break;
1574 9
      case 'string':
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
1575
1576 8
        if (false !== strpos($var, "\r")) {
1577
          // Workaround https://bugs.php.net/65732
1578 2
          $var = str_replace(array("\r\n", "\r"), "\n", $var);
1579 2
        }
1580
1581 8
        if (self::is_ascii($var) === false) {
1582
          /** @noinspection PhpUndefinedClassInspection */
1583 8
          if (\Normalizer::isNormalized($var, $normalization_form)) {
1584 6
            $n = '-';
1585 6
          } else {
1586
            /** @noinspection PhpUndefinedClassInspection */
1587 6
            $n = \Normalizer::normalize($var, $normalization_form);
1588
1589 6
            if (isset($n[0])) {
1590 3
              $var = $n;
1591 3
            } else {
1592 5
              $var = self::encode('UTF-8', $var, true);
1593
            }
1594
          }
1595
1596
          if (
1597 8
              $var[0] >= "\x80"
1598 8
              &&
1599 6
              isset($n[0], $leading_combining[0])
1600 8
              &&
1601 5
              preg_match('/^\p{Mn}/u', $var)
1602 8
          ) {
1603
            // Prevent leading combining chars
1604
            // for NFC-safe concatenations.
1605 2
            $var = $leading_combining . $var;
1606 2
          }
1607 8
        }
1608
1609 8
        break;
1610 9
    }
1611
1612 9
    return $var;
1613
  }
1614
1615
  /**
1616
   * "filter_input()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1617
   *
1618
   * Gets a specific external variable by name and optionally filters it
1619
   *
1620
   * @link  http://php.net/manual/en/function.filter-input.php
1621
   *
1622
   * @param int    $type          <p>
1623
   *                              One of <b>INPUT_GET</b>, <b>INPUT_POST</b>,
1624
   *                              <b>INPUT_COOKIE</b>, <b>INPUT_SERVER</b>, or
1625
   *                              <b>INPUT_ENV</b>.
1626
   *                              </p>
1627
   * @param string $variable_name <p>
1628
   *                              Name of a variable to get.
1629
   *                              </p>
1630
   * @param int    $filter        [optional] <p>
1631
   *                              The ID of the filter to apply. The
1632
   *                              manual page lists the available filters.
1633
   *                              </p>
1634
   * @param mixed  $options       [optional] <p>
1635
   *                              Associative array of options or bitwise disjunction of flags. If filter
1636
   *                              accepts options, flags can be provided in "flags" field of array.
1637
   *                              </p>
1638
   *
1639
   * @return mixed Value of the requested variable on success, <b>FALSE</b> if the filter fails,
1640
   * or <b>NULL</b> if the <i>variable_name</i> variable is not set.
1641
   * If the flag <b>FILTER_NULL_ON_FAILURE</b> is used, it
1642
   * returns <b>FALSE</b> if the variable is not set and <b>NULL</b> if the filter fails.
1643
   * @since 5.2.0
1644
   */
1645 View Code Duplication
  public static function filter_input($type, $variable_name, $filter = FILTER_DEFAULT, $options = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1646
  {
1647
    if (4 > func_num_args()) {
1648
      $var = filter_input($type, $variable_name, $filter);
1649
    } else {
1650
      $var = filter_input($type, $variable_name, $filter, $options);
1651
    }
1652
1653
    return self::filter($var);
1654
  }
1655
1656
  /**
1657
   * "filter_input_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1658
   *
1659
   * Gets external variables and optionally filters them
1660
   *
1661
   * @link  http://php.net/manual/en/function.filter-input-array.php
1662
   *
1663
   * @param int   $type       <p>
1664
   *                          One of <b>INPUT_GET</b>, <b>INPUT_POST</b>,
1665
   *                          <b>INPUT_COOKIE</b>, <b>INPUT_SERVER</b>, or
1666
   *                          <b>INPUT_ENV</b>.
1667
   *                          </p>
1668
   * @param mixed $definition [optional] <p>
1669
   *                          An array defining the arguments. A valid key is a string
1670
   *                          containing a variable name and a valid value is either a filter type, or an array
1671
   *                          optionally specifying the filter, flags and options. If the value is an
1672
   *                          array, valid keys are filter which specifies the
1673
   *                          filter type,
1674
   *                          flags which specifies any flags that apply to the
1675
   *                          filter, and options which specifies any options that
1676
   *                          apply to the filter. See the example below for a better understanding.
1677
   *                          </p>
1678
   *                          <p>
1679
   *                          This parameter can be also an integer holding a filter constant. Then all values in the
1680
   *                          input array are filtered by this filter.
1681
   *                          </p>
1682
   * @param bool  $add_empty  [optional] <p>
1683
   *                          Add missing keys as <b>NULL</b> to the return value.
1684
   *                          </p>
1685
   *
1686
   * @return mixed An array containing the values of the requested variables on success, or <b>FALSE</b>
1687
   * on failure. An array value will be <b>FALSE</b> if the filter fails, or <b>NULL</b> if
1688
   * the variable is not set. Or if the flag <b>FILTER_NULL_ON_FAILURE</b>
1689
   * is used, it returns <b>FALSE</b> if the variable is not set and <b>NULL</b> if the filter
1690
   * fails.
1691
   * @since 5.2.0
1692
   */
1693 View Code Duplication
  public static function filter_input_array($type, $definition = null, $add_empty = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1694
  {
1695
    if (2 > func_num_args()) {
1696
      $a = filter_input_array($type);
1697
    } else {
1698
      $a = filter_input_array($type, $definition, $add_empty);
1699
    }
1700
1701
    return self::filter($a);
1702
  }
1703
1704
  /**
1705
   * "filter_var()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1706
   *
1707
   * Filters a variable with a specified filter
1708
   *
1709
   * @link  http://php.net/manual/en/function.filter-var.php
1710
   *
1711
   * @param mixed $variable <p>
1712
   *                        Value to filter.
1713
   *                        </p>
1714
   * @param int   $filter   [optional] <p>
1715
   *                        The ID of the filter to apply. The
1716
   *                        manual page lists the available filters.
1717
   *                        </p>
1718
   * @param mixed $options  [optional] <p>
1719
   *                        Associative array of options or bitwise disjunction of flags. If filter
1720
   *                        accepts options, flags can be provided in "flags" field of array. For
1721
   *                        the "callback" filter, callable type should be passed. The
1722
   *                        callback must accept one argument, the value to be filtered, and return
1723
   *                        the value after filtering/sanitizing it.
1724
   *                        </p>
1725
   *                        <p>
1726
   *                        <code>
1727
   *                        // for filters that accept options, use this format
1728
   *                        $options = array(
1729
   *                        'options' => array(
1730
   *                        'default' => 3, // value to return if the filter fails
1731
   *                        // other options here
1732
   *                        'min_range' => 0
1733
   *                        ),
1734
   *                        'flags' => FILTER_FLAG_ALLOW_OCTAL,
1735
   *                        );
1736
   *                        $var = filter_var('0755', FILTER_VALIDATE_INT, $options);
1737
   *                        // for filter that only accept flags, you can pass them directly
1738
   *                        $var = filter_var('oops', FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE);
1739
   *                        // for filter that only accept flags, you can also pass as an array
1740
   *                        $var = filter_var('oops', FILTER_VALIDATE_BOOLEAN,
1741
   *                        array('flags' => FILTER_NULL_ON_FAILURE));
1742
   *                        // callback validate filter
1743
   *                        function foo($value)
1744
   *                        {
1745
   *                        // Expected format: Surname, GivenNames
1746
   *                        if (strpos($value, ", ") === false) return false;
1747
   *                        list($surname, $givennames) = explode(", ", $value, 2);
1748
   *                        $empty = (empty($surname) || empty($givennames));
1749
   *                        $notstrings = (!is_string($surname) || !is_string($givennames));
1750
   *                        if ($empty || $notstrings) {
1751
   *                        return false;
1752
   *                        } else {
1753
   *                        return $value;
1754
   *                        }
1755
   *                        }
1756
   *                        $var = filter_var('Doe, Jane Sue', FILTER_CALLBACK, array('options' => 'foo'));
1757
   *                        </code>
1758
   *                        </p>
1759
   *
1760
   * @return mixed the filtered data, or <b>FALSE</b> if the filter fails.
1761
   * @since 5.2.0
1762
   */
1763 1 View Code Duplication
  public static function filter_var($variable, $filter = FILTER_DEFAULT, $options = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1764
  {
1765 1
    if (3 > func_num_args()) {
1766 1
      $variable = filter_var($variable, $filter);
1767 1
    } else {
1768 1
      $variable = filter_var($variable, $filter, $options);
1769
    }
1770
1771 1
    return self::filter($variable);
1772
  }
1773
1774
  /**
1775
   * "filter_var_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.
1776
   *
1777
   * Gets multiple variables and optionally filters them
1778
   *
1779
   * @link  http://php.net/manual/en/function.filter-var-array.php
1780
   *
1781
   * @param array $data       <p>
1782
   *                          An array with string keys containing the data to filter.
1783
   *                          </p>
1784
   * @param mixed $definition [optional] <p>
1785
   *                          An array defining the arguments. A valid key is a string
1786
   *                          containing a variable name and a valid value is either a
1787
   *                          filter type, or an
1788
   *                          array optionally specifying the filter, flags and options.
1789
   *                          If the value is an array, valid keys are filter
1790
   *                          which specifies the filter type,
1791
   *                          flags which specifies any flags that apply to the
1792
   *                          filter, and options which specifies any options that
1793
   *                          apply to the filter. See the example below for a better understanding.
1794
   *                          </p>
1795
   *                          <p>
1796
   *                          This parameter can be also an integer holding a filter constant. Then all values in the
1797
   *                          input array are filtered by this filter.
1798
   *                          </p>
1799
   * @param bool  $add_empty  [optional] <p>
1800
   *                          Add missing keys as <b>NULL</b> to the return value.
1801
   *                          </p>
1802
   *
1803
   * @return mixed An array containing the values of the requested variables on success, or <b>FALSE</b>
1804
   * on failure. An array value will be <b>FALSE</b> if the filter fails, or <b>NULL</b> if
1805
   * the variable is not set.
1806
   * @since 5.2.0
1807
   */
1808 1 View Code Duplication
  public static function filter_var_array($data, $definition = null, $add_empty = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1809
  {
1810 1
    if (2 > func_num_args()) {
1811 1
      $a = filter_var_array($data);
1812 1
    } else {
1813 1
      $a = filter_var_array($data, $definition, $add_empty);
1814
    }
1815
1816 1
    return self::filter($a);
1817
  }
1818
1819
  /**
1820
   * Check if the number of unicode characters are not more than the specified integer.
1821
   *
1822
   * @param string $str      The original string to be checked.
1823
   * @param int    $box_size The size in number of chars to be checked against string.
1824
   *
1825
   * @return bool true if string is less than or equal to $box_size, false otherwise.
1826
   */
1827 1
  public static function fits_inside($str, $box_size)
1828
  {
1829 1
    return (self::strlen($str) <= $box_size);
1830
  }
1831
1832
  /**
1833
   * Try to fix simple broken UTF-8 strings.
1834
   *
1835
   * INFO: Take a look at "UTF8::fix_utf8()" if you need a more advanced fix for broken UTF-8 strings.
1836
   *
1837
   * If you received an UTF-8 string that was converted from Windows-1252 as it was ISO-8859-1
1838
   * (ignoring Windows-1252 chars from 80 to 9F) use this function to fix it.
1839
   * See: http://en.wikipedia.org/wiki/Windows-1252
1840
   *
1841
   * @param string $str <p>The input string</p>
1842
   *
1843
   * @return string
1844
   */
1845 26 View Code Duplication
  public static function fix_simple_utf8($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1846
  {
1847
    // init
1848 26
    $str = (string)$str;
1849
1850 26
    if (!isset($str[0])) {
1851 2
      return '';
1852
    }
1853
1854 26
    static $BROKEN_UTF8_TO_UTF8_KEYS_CACHE = null;
1855 26
    static $BROKEN_UTF8_TO_UTF8_VALUES_CACHE = null;
1856
1857 26
    if ($BROKEN_UTF8_TO_UTF8_KEYS_CACHE === null) {
1858 1
      $BROKEN_UTF8_TO_UTF8_KEYS_CACHE = array_keys(self::$BROKEN_UTF8_FIX);
1859 1
      $BROKEN_UTF8_TO_UTF8_VALUES_CACHE = array_values(self::$BROKEN_UTF8_FIX);
1860 1
    }
1861
1862 26
    return str_replace($BROKEN_UTF8_TO_UTF8_KEYS_CACHE, $BROKEN_UTF8_TO_UTF8_VALUES_CACHE, $str);
1863
  }
1864
1865
  /**
1866
   * Fix a double (or multiple) encoded UTF8 string.
1867
   *
1868
   * @param string|string[] $str <p>You can use a string or an array of strings.</p>
1869
   *
1870
   * @return string|string[] <p>Will return the fixed input-"array" or
1871
   *                         the fixed input-"string".</p>
1872
   */
1873 1
  public static function fix_utf8($str)
1874
  {
1875 1
    if (is_array($str) === true) {
1876
1877
      /** @noinspection ForeachSourceInspection */
1878 1
      foreach ($str as $k => $v) {
1879
        /** @noinspection AlterInForeachInspection */
1880
        /** @noinspection OffsetOperationsInspection */
1881 1
        $str[$k] = self::fix_utf8($v);
1882 1
      }
1883
1884 1
      return $str;
1885
    }
1886
1887 1
    $last = '';
1888 1
    while ($last !== $str) {
1889 1
      $last = $str;
1890 1
      $str = self::to_utf8(
1891 1
          self::utf8_decode($str)
0 ignored issues
show
Bug introduced by
It seems like $str defined by self::to_utf8(self::utf8_decode($str)) on line 1890 can also be of type array; however, voku\helper\UTF8::utf8_decode() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
1892 1
      );
1893 1
    }
1894
1895 1
    return $str;
1896
  }
1897
1898
  /**
1899
   * Get character of a specific character.
1900
   *
1901
   * @param string $char
1902
   *
1903
   * @return string <p>'RTL' or 'LTR'</p>
1904
   */
1905 1
  public static function getCharDirection($char)
1906
  {
1907 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
1908
      self::checkForSupport();
1909
    }
1910
1911 1
    if (self::$SUPPORT['intlChar'] === true) {
1912
      $tmpReturn = \IntlChar::charDirection($char);
1913
1914
      // from "IntlChar"-Class
1915
      $charDirection = array(
1916
          'RTL' => array(1, 13, 14, 15, 21),
1917
          'LTR' => array(0, 11, 12, 20),
1918
      );
1919
1920
      if (in_array($tmpReturn, $charDirection['LTR'], true)) {
1921
        return 'LTR';
1922
      }
1923
1924
      if (in_array($tmpReturn, $charDirection['RTL'], true)) {
1925
        return 'RTL';
1926
      }
1927
    }
1928
1929 1
    $c = static::chr_to_decimal($char);
1930
1931 1
    if (!(0x5be <= $c && 0x10b7f >= $c)) {
1932 1
      return 'LTR';
1933
    }
1934
1935 1
    if (0x85e >= $c) {
1936
1937 1
      if (0x5be === $c ||
1938 1
          0x5c0 === $c ||
1939 1
          0x5c3 === $c ||
1940 1
          0x5c6 === $c ||
1941 1
          (0x5d0 <= $c && 0x5ea >= $c) ||
1942 1
          (0x5f0 <= $c && 0x5f4 >= $c) ||
1943 1
          0x608 === $c ||
1944 1
          0x60b === $c ||
1945 1
          0x60d === $c ||
1946 1
          0x61b === $c ||
1947 1
          (0x61e <= $c && 0x64a >= $c) ||
1948 1
          (0x66d <= $c && 0x66f >= $c) ||
1949 1
          (0x671 <= $c && 0x6d5 >= $c) ||
1950 1
          (0x6e5 <= $c && 0x6e6 >= $c) ||
1951 1
          (0x6ee <= $c && 0x6ef >= $c) ||
1952 1
          (0x6fa <= $c && 0x70d >= $c) ||
1953 1
          0x710 === $c ||
1954 1
          (0x712 <= $c && 0x72f >= $c) ||
1955 1
          (0x74d <= $c && 0x7a5 >= $c) ||
1956 1
          0x7b1 === $c ||
1957 1
          (0x7c0 <= $c && 0x7ea >= $c) ||
1958 1
          (0x7f4 <= $c && 0x7f5 >= $c) ||
1959 1
          0x7fa === $c ||
1960 1
          (0x800 <= $c && 0x815 >= $c) ||
1961 1
          0x81a === $c ||
1962 1
          0x824 === $c ||
1963 1
          0x828 === $c ||
1964 1
          (0x830 <= $c && 0x83e >= $c) ||
1965 1
          (0x840 <= $c && 0x858 >= $c) ||
1966
          0x85e === $c
1967 1
      ) {
1968 1
        return 'RTL';
1969
      }
1970
1971 1
    } elseif (0x200f === $c) {
1972
1973
      return 'RTL';
1974
1975 1
    } elseif (0xfb1d <= $c) {
1976
1977 1
      if (0xfb1d === $c ||
1978 1
          (0xfb1f <= $c && 0xfb28 >= $c) ||
1979 1
          (0xfb2a <= $c && 0xfb36 >= $c) ||
1980 1
          (0xfb38 <= $c && 0xfb3c >= $c) ||
1981 1
          0xfb3e === $c ||
1982 1
          (0xfb40 <= $c && 0xfb41 >= $c) ||
1983 1
          (0xfb43 <= $c && 0xfb44 >= $c) ||
1984 1
          (0xfb46 <= $c && 0xfbc1 >= $c) ||
1985 1
          (0xfbd3 <= $c && 0xfd3d >= $c) ||
1986 1
          (0xfd50 <= $c && 0xfd8f >= $c) ||
1987 1
          (0xfd92 <= $c && 0xfdc7 >= $c) ||
1988 1
          (0xfdf0 <= $c && 0xfdfc >= $c) ||
1989 1
          (0xfe70 <= $c && 0xfe74 >= $c) ||
1990 1
          (0xfe76 <= $c && 0xfefc >= $c) ||
1991 1
          (0x10800 <= $c && 0x10805 >= $c) ||
1992 1
          0x10808 === $c ||
1993 1
          (0x1080a <= $c && 0x10835 >= $c) ||
1994 1
          (0x10837 <= $c && 0x10838 >= $c) ||
1995 1
          0x1083c === $c ||
1996 1
          (0x1083f <= $c && 0x10855 >= $c) ||
1997 1
          (0x10857 <= $c && 0x1085f >= $c) ||
1998 1
          (0x10900 <= $c && 0x1091b >= $c) ||
1999 1
          (0x10920 <= $c && 0x10939 >= $c) ||
2000 1
          0x1093f === $c ||
2001 1
          0x10a00 === $c ||
2002 1
          (0x10a10 <= $c && 0x10a13 >= $c) ||
2003 1
          (0x10a15 <= $c && 0x10a17 >= $c) ||
2004 1
          (0x10a19 <= $c && 0x10a33 >= $c) ||
2005 1
          (0x10a40 <= $c && 0x10a47 >= $c) ||
2006 1
          (0x10a50 <= $c && 0x10a58 >= $c) ||
2007 1
          (0x10a60 <= $c && 0x10a7f >= $c) ||
2008 1
          (0x10b00 <= $c && 0x10b35 >= $c) ||
2009 1
          (0x10b40 <= $c && 0x10b55 >= $c) ||
2010 1
          (0x10b58 <= $c && 0x10b72 >= $c) ||
2011
          (0x10b78 <= $c && 0x10b7f >= $c)
2012 1
      ) {
2013 1
        return 'RTL';
2014
      }
2015
    }
2016
2017 1
    return 'LTR';
2018
  }
2019
2020
  /**
2021
   * get data from "/data/*.ser"
2022
   *
2023
   * @param string $file
2024
   *
2025
   * @return bool|string|array|int <p>Will return false on error.</p>
2026
   */
2027 4
  private static function getData($file)
2028
  {
2029 4
    $file = __DIR__ . '/data/' . $file . '.php';
2030 4
    if (file_exists($file)) {
2031
      /** @noinspection PhpIncludeInspection */
2032 4
      return require $file;
2033
    }
2034
2035 1
    return false;
2036
  }
2037
2038
  /**
2039
   * Check for php-support.
2040
   *
2041
   * @param string|null $key
2042
   *
2043
   * @return mixed <p>Return the full support-"array", if $key === null<br>
2044
   *               return bool-value, if $key is used and available<br>
2045
   *               otherwise return null</p>
2046
   */
2047 19
  public static function getSupportInfo($key = null)
2048
  {
2049 19
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
2050
      self::checkForSupport();
2051
    }
2052
2053 19
    if ($key === null) {
2054 2
      return self::$SUPPORT;
2055
    }
2056
2057 18
    if (!isset(self::$SUPPORT[$key])) {
2058 1
      return null;
2059
    }
2060
2061 17
    return self::$SUPPORT[$key];
2062
  }
2063
2064
  /**
2065
   * alias for "UTF8::string_has_bom()"
2066
   *
2067
   * @see UTF8::string_has_bom()
2068
   *
2069
   * @param string $str
2070
   *
2071
   * @return bool
2072
   *
2073
   * @deprecated <p>use "UTF8::string_has_bom()"</p>
2074
   */
2075
  public static function hasBom($str)
2076
  {
2077
    return self::string_has_bom($str);
2078
  }
2079
2080
  /**
2081
   * Converts a hexadecimal-value into an UTF-8 character.
2082
   *
2083
   * @param string $hexdec <p>The hexadecimal value.</p>
2084
   *
2085
   * @return string|false <p>One single UTF-8 character.</p>
2086
   */
2087 2
  public static function hex_to_chr($hexdec)
2088
  {
2089 2
    return self::decimal_to_chr(hexdec($hexdec));
2090
  }
2091
2092
  /**
2093
   * Converts hexadecimal U+xxxx code point representation to integer.
2094
   *
2095
   * INFO: opposite to UTF8::int_to_hex()
2096
   *
2097
   * @param string $hexDec <p>The hexadecimal code point representation.</p>
2098
   *
2099
   * @return int|false <p>The code point, or false on failure.</p>
2100
   */
2101 1
  public static function hex_to_int($hexDec)
2102
  {
2103 1
    $hexDec = (string)$hexDec;
2104
2105 1
    if (!isset($hexDec[0])) {
2106 1
      return false;
2107
    }
2108
2109 1
    if (preg_match('/^(?:\\\u|U\+|)([a-z0-9]{4,6})$/i', $hexDec, $match)) {
2110 1
      return intval($match[1], 16);
2111
    }
2112
2113 1
    return false;
2114
  }
2115
2116
  /**
2117
   * alias for "UTF8::html_entity_decode()"
2118
   *
2119
   * @see UTF8::html_entity_decode()
2120
   *
2121
   * @param string $str
2122
   * @param int    $flags
2123
   * @param string $encoding
2124
   *
2125
   * @return string
2126
   */
2127 1
  public static function html_decode($str, $flags = null, $encoding = 'UTF-8')
2128
  {
2129 1
    return self::html_entity_decode($str, $flags, $encoding);
2130
  }
2131
2132
  /**
2133
   * Converts a UTF-8 string to a series of HTML numbered entities.
2134
   *
2135
   * INFO: opposite to UTF8::html_decode()
2136
   *
2137
   * @param string $str            <p>The Unicode string to be encoded as numbered entities.</p>
2138
   * @param bool   $keepAsciiChars [optional] <p>Keep ASCII chars.</p>
2139
   * @param string $encoding       [optional] <p>Default is UTF-8</p>
2140
   *
2141
   * @return string <p>HTML numbered entities.</p>
2142
   */
2143 2
  public static function html_encode($str, $keepAsciiChars = false, $encoding = 'UTF-8')
2144
  {
2145
    // init
2146 2
    $str = (string)$str;
2147
2148 2
    if (!isset($str[0])) {
2149 1
      return '';
2150
    }
2151
2152 2
    if ($encoding !== 'UTF-8') {
2153 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2154 1
    }
2155
2156
    # INFO: http://stackoverflow.com/questions/35854535/better-explanation-of-convmap-in-mb-encode-numericentity
2157 2
    if (function_exists('mb_encode_numericentity')) {
2158
2159 2
      $startCode = 0x00;
2160 2
      if ($keepAsciiChars === true) {
2161 1
        $startCode = 0x80;
2162 1
      }
2163
2164 2
      return mb_encode_numericentity(
2165 2
          $str,
2166 2
          array($startCode, 0xfffff, 0, 0xfffff, 0),
2167
          $encoding
2168 2
      );
2169
    }
2170
2171
    return implode(
2172
        '',
2173
        array_map(
2174
            function ($data) use ($keepAsciiChars, $encoding) {
2175
              return UTF8::single_chr_html_encode($data, $keepAsciiChars, $encoding);
2176
            },
2177
            self::split($str)
2178
        )
2179
    );
2180
  }
2181
2182
  /**
2183
   * UTF-8 version of html_entity_decode()
2184
   *
2185
   * The reason we are not using html_entity_decode() by itself is because
2186
   * while it is not technically correct to leave out the semicolon
2187
   * at the end of an entity most browsers will still interpret the entity
2188
   * correctly. html_entity_decode() does not convert entities without
2189
   * semicolons, so we are left with our own little solution here. Bummer.
2190
   *
2191
   * Convert all HTML entities to their applicable characters
2192
   *
2193
   * INFO: opposite to UTF8::html_encode()
2194
   *
2195
   * @link http://php.net/manual/en/function.html-entity-decode.php
2196
   *
2197
   * @param string $str      <p>
2198
   *                         The input string.
2199
   *                         </p>
2200
   * @param int    $flags    [optional] <p>
2201
   *                         A bitmask of one or more of the following flags, which specify how to handle quotes and
2202
   *                         which document type to use. The default is ENT_COMPAT | ENT_HTML401.
2203
   *                         <table>
2204
   *                         Available <i>flags</i> constants
2205
   *                         <tr valign="top">
2206
   *                         <td>Constant Name</td>
2207
   *                         <td>Description</td>
2208
   *                         </tr>
2209
   *                         <tr valign="top">
2210
   *                         <td><b>ENT_COMPAT</b></td>
2211
   *                         <td>Will convert double-quotes and leave single-quotes alone.</td>
2212
   *                         </tr>
2213
   *                         <tr valign="top">
2214
   *                         <td><b>ENT_QUOTES</b></td>
2215
   *                         <td>Will convert both double and single quotes.</td>
2216
   *                         </tr>
2217
   *                         <tr valign="top">
2218
   *                         <td><b>ENT_NOQUOTES</b></td>
2219
   *                         <td>Will leave both double and single quotes unconverted.</td>
2220
   *                         </tr>
2221
   *                         <tr valign="top">
2222
   *                         <td><b>ENT_HTML401</b></td>
2223
   *                         <td>
2224
   *                         Handle code as HTML 4.01.
2225
   *                         </td>
2226
   *                         </tr>
2227
   *                         <tr valign="top">
2228
   *                         <td><b>ENT_XML1</b></td>
2229
   *                         <td>
2230
   *                         Handle code as XML 1.
2231
   *                         </td>
2232
   *                         </tr>
2233
   *                         <tr valign="top">
2234
   *                         <td><b>ENT_XHTML</b></td>
2235
   *                         <td>
2236
   *                         Handle code as XHTML.
2237
   *                         </td>
2238
   *                         </tr>
2239
   *                         <tr valign="top">
2240
   *                         <td><b>ENT_HTML5</b></td>
2241
   *                         <td>
2242
   *                         Handle code as HTML 5.
2243
   *                         </td>
2244
   *                         </tr>
2245
   *                         </table>
2246
   *                         </p>
2247
   * @param string $encoding [optional] <p>Encoding to use.</p>
2248
   *
2249
   * @return string <p>The decoded string.</p>
2250
   */
2251 16
  public static function html_entity_decode($str, $flags = null, $encoding = 'UTF-8')
2252
  {
2253
    // init
2254 16
    $str = (string)$str;
2255
2256 16
    if (!isset($str[0])) {
2257 5
      return '';
2258
    }
2259
2260 16
    if (!isset($str[3])) { // examples: &; || &x;
2261 9
      return $str;
2262
    }
2263
2264
    if (
2265 15
        strpos($str, '&') === false
2266 15
        ||
2267
        (
2268 15
            strpos($str, '&#') === false
2269 15
            &&
2270 9
            strpos($str, ';') === false
2271 9
        )
2272 15
    ) {
2273 8
      return $str;
2274
    }
2275
2276 15
    if ($encoding !== 'UTF-8') {
2277 2
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2278 2
    }
2279
2280 15
    if ($flags === null) {
2281 5
      if (Bootup::is_php('5.4') === true) {
2282
        $flags = ENT_QUOTES | ENT_HTML5;
2283
      } else {
2284 5
        $flags = ENT_QUOTES;
2285
      }
2286 5
    }
2287
2288 View Code Duplication
    if (
2289
        $encoding !== 'UTF-8'
2290 15
        &&
2291
        $encoding !== 'WINDOWS-1252'
2292 15
        &&
2293 2
        self::$SUPPORT['mbstring'] === false
2294 15
    ) {
2295
      trigger_error('UTF8::html_entity_decode() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
2296
    }
2297
2298
    do {
2299 15
      $str_compare = $str;
2300
2301 15
      $str = preg_replace_callback(
2302 15
          "/&#\d{2,6};/",
2303
          function ($matches) use ($encoding) {
2304 13
            $returnTmp = \mb_convert_encoding($matches[0], $encoding, 'HTML-ENTITIES');
2305
2306 13
            if ($returnTmp !== '"' && $returnTmp !== "'") {
2307 13
              return $returnTmp;
2308
            }
2309
2310 6
            return $matches[0];
2311 15
          },
2312
          $str
2313 15
      );
2314
2315
      // decode numeric & UTF16 two byte entities
2316 15
      $str = html_entity_decode(
2317 15
          preg_replace('/(&#(?:x0*[0-9a-f]{2,6}(?![0-9a-f;])|(?:0*\d{2,6}(?![0-9;]))))/iS', '$1;', $str),
2318 15
          $flags,
2319
          $encoding
2320 15
      );
2321
2322 15
    } while ($str_compare !== $str);
2323
2324 15
    return $str;
2325
  }
2326
2327
  /**
2328
   * Convert all applicable characters to HTML entities: UTF-8 version of htmlentities()
2329
   *
2330
   * @link http://php.net/manual/en/function.htmlentities.php
2331
   *
2332
   * @param string $str           <p>
2333
   *                              The input string.
2334
   *                              </p>
2335
   * @param int    $flags         [optional] <p>
2336
   *                              A bitmask of one or more of the following flags, which specify how to handle quotes,
2337
   *                              invalid code unit sequences and the used document type. The default is
2338
   *                              ENT_COMPAT | ENT_HTML401.
2339
   *                              <table>
2340
   *                              Available <i>flags</i> constants
2341
   *                              <tr valign="top">
2342
   *                              <td>Constant Name</td>
2343
   *                              <td>Description</td>
2344
   *                              </tr>
2345
   *                              <tr valign="top">
2346
   *                              <td><b>ENT_COMPAT</b></td>
2347
   *                              <td>Will convert double-quotes and leave single-quotes alone.</td>
2348
   *                              </tr>
2349
   *                              <tr valign="top">
2350
   *                              <td><b>ENT_QUOTES</b></td>
2351
   *                              <td>Will convert both double and single quotes.</td>
2352
   *                              </tr>
2353
   *                              <tr valign="top">
2354
   *                              <td><b>ENT_NOQUOTES</b></td>
2355
   *                              <td>Will leave both double and single quotes unconverted.</td>
2356
   *                              </tr>
2357
   *                              <tr valign="top">
2358
   *                              <td><b>ENT_IGNORE</b></td>
2359
   *                              <td>
2360
   *                              Silently discard invalid code unit sequences instead of returning
2361
   *                              an empty string. Using this flag is discouraged as it
2362
   *                              may have security implications.
2363
   *                              </td>
2364
   *                              </tr>
2365
   *                              <tr valign="top">
2366
   *                              <td><b>ENT_SUBSTITUTE</b></td>
2367
   *                              <td>
2368
   *                              Replace invalid code unit sequences with a Unicode Replacement Character
2369
   *                              U+FFFD (UTF-8) or &#38;#38;#FFFD; (otherwise) instead of returning an empty string.
2370
   *                              </td>
2371
   *                              </tr>
2372
   *                              <tr valign="top">
2373
   *                              <td><b>ENT_DISALLOWED</b></td>
2374
   *                              <td>
2375
   *                              Replace invalid code points for the given document type with a
2376
   *                              Unicode Replacement Character U+FFFD (UTF-8) or &#38;#38;#FFFD;
2377
   *                              (otherwise) instead of leaving them as is. This may be useful, for
2378
   *                              instance, to ensure the well-formedness of XML documents with
2379
   *                              embedded external content.
2380
   *                              </td>
2381
   *                              </tr>
2382
   *                              <tr valign="top">
2383
   *                              <td><b>ENT_HTML401</b></td>
2384
   *                              <td>
2385
   *                              Handle code as HTML 4.01.
2386
   *                              </td>
2387
   *                              </tr>
2388
   *                              <tr valign="top">
2389
   *                              <td><b>ENT_XML1</b></td>
2390
   *                              <td>
2391
   *                              Handle code as XML 1.
2392
   *                              </td>
2393
   *                              </tr>
2394
   *                              <tr valign="top">
2395
   *                              <td><b>ENT_XHTML</b></td>
2396
   *                              <td>
2397
   *                              Handle code as XHTML.
2398
   *                              </td>
2399
   *                              </tr>
2400
   *                              <tr valign="top">
2401
   *                              <td><b>ENT_HTML5</b></td>
2402
   *                              <td>
2403
   *                              Handle code as HTML 5.
2404
   *                              </td>
2405
   *                              </tr>
2406
   *                              </table>
2407
   *                              </p>
2408
   * @param string $encoding      [optional] <p>
2409
   *                              Like <b>htmlspecialchars</b>,
2410
   *                              <b>htmlentities</b> takes an optional third argument
2411
   *                              <i>encoding</i> which defines encoding used in
2412
   *                              conversion.
2413
   *                              Although this argument is technically optional, you are highly
2414
   *                              encouraged to specify the correct value for your code.
2415
   *                              </p>
2416
   * @param bool   $double_encode [optional] <p>
2417
   *                              When <i>double_encode</i> is turned off PHP will not
2418
   *                              encode existing html entities. The default is to convert everything.
2419
   *                              </p>
2420
   *
2421
   *
2422
   * @return string the encoded string.
2423
   * </p>
2424
   * <p>
2425
   * If the input <i>string</i> contains an invalid code unit
2426
   * sequence within the given <i>encoding</i> an empty string
2427
   * will be returned, unless either the <b>ENT_IGNORE</b> or
2428
   * <b>ENT_SUBSTITUTE</b> flags are set.
2429
   */
2430 2
  public static function htmlentities($str, $flags = ENT_COMPAT, $encoding = 'UTF-8', $double_encode = true)
2431
  {
2432 2
    if ($encoding !== 'UTF-8') {
2433 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2434 1
    }
2435
2436 2
    $str = htmlentities($str, $flags, $encoding, $double_encode);
2437
2438
    /**
2439
     * PHP doesn't replace a backslash to its html entity since this is something
2440
     * that's mostly used to escape characters when inserting in a database. Since
2441
     * we're using a decent database layer, we don't need this shit and we're replacing
2442
     * the double backslashes by its' html entity equivalent.
2443
     *
2444
     * https://github.com/forkcms/library/blob/master/spoon/filter/filter.php#L303
2445
     */
2446 2
    $str = str_replace('\\', '&#92;', $str);
2447
2448 2
    if ($encoding !== 'UTF-8') {
2449 1
      return $str;
2450
    }
2451
2452 2
    $byteLengths = self::chr_size_list($str);
2453 2
    $search = array();
2454 2
    $replacements = array();
2455 2
    foreach ($byteLengths as $counter => $byteLength) {
2456 2
      if ($byteLength >= 3) {
2457 1
        $char = self::access($str, $counter);
2458
2459 1
        if (!isset($replacements[$char])) {
2460 1
          $search[$char] = $char;
2461 1
          $replacements[$char] = self::html_encode($char);
2462 1
        }
2463 1
      }
2464 2
    }
2465
2466 2
    return str_replace($search, $replacements, $str);
2467
  }
2468
2469
  /**
2470
   * Convert only special characters to HTML entities: UTF-8 version of htmlspecialchars()
2471
   *
2472
   * INFO: Take a look at "UTF8::htmlentities()"
2473
   *
2474
   * @link http://php.net/manual/en/function.htmlspecialchars.php
2475
   *
2476
   * @param string $str           <p>
2477
   *                              The string being converted.
2478
   *                              </p>
2479
   * @param int    $flags         [optional] <p>
2480
   *                              A bitmask of one or more of the following flags, which specify how to handle quotes,
2481
   *                              invalid code unit sequences and the used document type. The default is
2482
   *                              ENT_COMPAT | ENT_HTML401.
2483
   *                              <table>
2484
   *                              Available <i>flags</i> constants
2485
   *                              <tr valign="top">
2486
   *                              <td>Constant Name</td>
2487
   *                              <td>Description</td>
2488
   *                              </tr>
2489
   *                              <tr valign="top">
2490
   *                              <td><b>ENT_COMPAT</b></td>
2491
   *                              <td>Will convert double-quotes and leave single-quotes alone.</td>
2492
   *                              </tr>
2493
   *                              <tr valign="top">
2494
   *                              <td><b>ENT_QUOTES</b></td>
2495
   *                              <td>Will convert both double and single quotes.</td>
2496
   *                              </tr>
2497
   *                              <tr valign="top">
2498
   *                              <td><b>ENT_NOQUOTES</b></td>
2499
   *                              <td>Will leave both double and single quotes unconverted.</td>
2500
   *                              </tr>
2501
   *                              <tr valign="top">
2502
   *                              <td><b>ENT_IGNORE</b></td>
2503
   *                              <td>
2504
   *                              Silently discard invalid code unit sequences instead of returning
2505
   *                              an empty string. Using this flag is discouraged as it
2506
   *                              may have security implications.
2507
   *                              </td>
2508
   *                              </tr>
2509
   *                              <tr valign="top">
2510
   *                              <td><b>ENT_SUBSTITUTE</b></td>
2511
   *                              <td>
2512
   *                              Replace invalid code unit sequences with a Unicode Replacement Character
2513
   *                              U+FFFD (UTF-8) or &#38;#38;#FFFD; (otherwise) instead of returning an empty string.
2514
   *                              </td>
2515
   *                              </tr>
2516
   *                              <tr valign="top">
2517
   *                              <td><b>ENT_DISALLOWED</b></td>
2518
   *                              <td>
2519
   *                              Replace invalid code points for the given document type with a
2520
   *                              Unicode Replacement Character U+FFFD (UTF-8) or &#38;#38;#FFFD;
2521
   *                              (otherwise) instead of leaving them as is. This may be useful, for
2522
   *                              instance, to ensure the well-formedness of XML documents with
2523
   *                              embedded external content.
2524
   *                              </td>
2525
   *                              </tr>
2526
   *                              <tr valign="top">
2527
   *                              <td><b>ENT_HTML401</b></td>
2528
   *                              <td>
2529
   *                              Handle code as HTML 4.01.
2530
   *                              </td>
2531
   *                              </tr>
2532
   *                              <tr valign="top">
2533
   *                              <td><b>ENT_XML1</b></td>
2534
   *                              <td>
2535
   *                              Handle code as XML 1.
2536
   *                              </td>
2537
   *                              </tr>
2538
   *                              <tr valign="top">
2539
   *                              <td><b>ENT_XHTML</b></td>
2540
   *                              <td>
2541
   *                              Handle code as XHTML.
2542
   *                              </td>
2543
   *                              </tr>
2544
   *                              <tr valign="top">
2545
   *                              <td><b>ENT_HTML5</b></td>
2546
   *                              <td>
2547
   *                              Handle code as HTML 5.
2548
   *                              </td>
2549
   *                              </tr>
2550
   *                              </table>
2551
   *                              </p>
2552
   * @param string $encoding      [optional] <p>
2553
   *                              Defines encoding used in conversion.
2554
   *                              </p>
2555
   *                              <p>
2556
   *                              For the purposes of this function, the encodings
2557
   *                              ISO-8859-1, ISO-8859-15,
2558
   *                              UTF-8, cp866,
2559
   *                              cp1251, cp1252, and
2560
   *                              KOI8-R are effectively equivalent, provided the
2561
   *                              <i>string</i> itself is valid for the encoding, as
2562
   *                              the characters affected by <b>htmlspecialchars</b> occupy
2563
   *                              the same positions in all of these encodings.
2564
   *                              </p>
2565
   * @param bool   $double_encode [optional] <p>
2566
   *                              When <i>double_encode</i> is turned off PHP will not
2567
   *                              encode existing html entities, the default is to convert everything.
2568
   *                              </p>
2569
   *
2570
   * @return string The converted string.
2571
   * </p>
2572
   * <p>
2573
   * If the input <i>string</i> contains an invalid code unit
2574
   * sequence within the given <i>encoding</i> an empty string
2575
   * will be returned, unless either the <b>ENT_IGNORE</b> or
2576
   * <b>ENT_SUBSTITUTE</b> flags are set.
2577
   */
2578 1
  public static function htmlspecialchars($str, $flags = ENT_COMPAT, $encoding = 'UTF-8', $double_encode = true)
2579
  {
2580 1
    if ($encoding !== 'UTF-8') {
2581 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
2582 1
    }
2583
2584 1
    return htmlspecialchars($str, $flags, $encoding, $double_encode);
2585
  }
2586
2587
  /**
2588
   * Checks whether iconv is available on the server.
2589
   *
2590
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
2591
   */
2592 1
  public static function iconv_loaded()
2593
  {
2594 1
    $return = extension_loaded('iconv') ? true : false;
2595
2596
    // INFO: "default_charset" is already set by the "Bootup"-class
2597
2598 1
    if (Bootup::is_php('5.6') === false) {
2599
      // INFO: "iconv_set_encoding" is deprecated since PHP >= 5.6
2600 1
      iconv_set_encoding('input_encoding', 'UTF-8');
2601 1
      iconv_set_encoding('output_encoding', 'UTF-8');
2602 1
      iconv_set_encoding('internal_encoding', 'UTF-8');
2603 1
    }
2604
2605 1
    return $return;
2606
  }
2607
2608
  /**
2609
   * alias for "UTF8::decimal_to_chr()"
2610
   *
2611
   * @see UTF8::decimal_to_chr()
2612
   *
2613
   * @param mixed $int
2614
   *
2615
   * @return string
2616
   */
2617 2
  public static function int_to_chr($int)
2618
  {
2619 2
    return self::decimal_to_chr($int);
2620
  }
2621
2622
  /**
2623
   * Converts Integer to hexadecimal U+xxxx code point representation.
2624
   *
2625
   * INFO: opposite to UTF8::hex_to_int()
2626
   *
2627
   * @param int    $int  <p>The integer to be converted to hexadecimal code point.</p>
2628
   * @param string $pfix [optional]
2629
   *
2630
   * @return string <p>The code point, or empty string on failure.</p>
2631
   */
2632 3
  public static function int_to_hex($int, $pfix = 'U+')
2633
  {
2634 3
    if ((int)$int === $int) {
2635 3
      $hex = dechex($int);
2636
2637 3
      $hex = (strlen($hex) < 4 ? substr('0000' . $hex, -4) : $hex);
2638
2639 3
      return $pfix . $hex;
2640
    }
2641
2642 1
    return '';
2643
  }
2644
2645
  /**
2646
   * Checks whether intl-char is available on the server.
2647
   *
2648
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
2649
   */
2650 1
  public static function intlChar_loaded()
2651
  {
2652
    return (
2653 1
        Bootup::is_php('7.0') === true
2654 1
        &&
2655
        class_exists('IntlChar') === true
2656 1
    );
2657
  }
2658
2659
  /**
2660
   * Checks whether intl is available on the server.
2661
   *
2662
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
2663
   */
2664 4
  public static function intl_loaded()
2665
  {
2666 4
    return extension_loaded('intl') ? true : false;
2667
  }
2668
2669
  /**
2670
   * alias for "UTF8::is_ascii()"
2671
   *
2672
   * @see UTF8::is_ascii()
2673
   *
2674
   * @param string $str
2675
   *
2676
   * @return boolean
2677
   *
2678
   * @deprecated <p>use "UTF8::is_ascii()"</p>
2679
   */
2680
  public static function isAscii($str)
2681
  {
2682
    return self::is_ascii($str);
2683
  }
2684
2685
  /**
2686
   * alias for "UTF8::is_base64()"
2687
   *
2688
   * @see UTF8::is_base64()
2689
   *
2690
   * @param string $str
2691
   *
2692
   * @return bool
2693
   *
2694
   * @deprecated <p>use "UTF8::is_base64()"</p>
2695
   */
2696
  public static function isBase64($str)
2697
  {
2698
    return self::is_base64($str);
2699
  }
2700
2701
  /**
2702
   * alias for "UTF8::is_binary()"
2703
   *
2704
   * @see UTF8::is_binary()
2705
   *
2706
   * @param string $str
2707
   *
2708
   * @return bool
2709
   *
2710
   * @deprecated <p>use "UTF8::is_binary()"</p>
2711
   */
2712
  public static function isBinary($str)
2713
  {
2714
    return self::is_binary($str);
2715
  }
2716
2717
  /**
2718
   * alias for "UTF8::is_bom()"
2719
   *
2720
   * @see UTF8::is_bom()
2721
   *
2722
   * @param string $utf8_chr
2723
   *
2724
   * @return boolean
2725
   *
2726
   * @deprecated <p>use "UTF8::is_bom()"</p>
2727
   */
2728
  public static function isBom($utf8_chr)
2729
  {
2730
    return self::is_bom($utf8_chr);
2731
  }
2732
2733
  /**
2734
   * alias for "UTF8::is_html()"
2735
   *
2736
   * @see UTF8::is_html()
2737
   *
2738
   * @param string $str
2739
   *
2740
   * @return boolean
2741
   *
2742
   * @deprecated <p>use "UTF8::is_html()"</p>
2743
   */
2744
  public static function isHtml($str)
2745
  {
2746
    return self::is_html($str);
2747
  }
2748
2749
  /**
2750
   * alias for "UTF8::is_json()"
2751
   *
2752
   * @see UTF8::is_json()
2753
   *
2754
   * @param string $str
2755
   *
2756
   * @return bool
2757
   *
2758
   * @deprecated <p>use "UTF8::is_json()"</p>
2759
   */
2760
  public static function isJson($str)
2761
  {
2762
    return self::is_json($str);
2763
  }
2764
2765
  /**
2766
   * alias for "UTF8::is_utf16()"
2767
   *
2768
   * @see UTF8::is_utf16()
2769
   *
2770
   * @param string $str
2771
   *
2772
   * @return int|false false if is't not UTF16, 1 for UTF-16LE, 2 for UTF-16BE.
2773
   *
2774
   * @deprecated <p>use "UTF8::is_utf16()"</p>
2775
   */
2776
  public static function isUtf16($str)
2777
  {
2778
    return self::is_utf16($str);
2779
  }
2780
2781
  /**
2782
   * alias for "UTF8::is_utf32()"
2783
   *
2784
   * @see UTF8::is_utf32()
2785
   *
2786
   * @param string $str
2787
   *
2788
   * @return int|false false if is't not UTF16, 1 for UTF-32LE, 2 for UTF-32BE.
2789
   *
2790
   * @deprecated <p>use "UTF8::is_utf32()"</p>
2791
   */
2792
  public static function isUtf32($str)
2793
  {
2794
    return self::is_utf32($str);
2795
  }
2796
2797
  /**
2798
   * alias for "UTF8::is_utf8()"
2799
   *
2800
   * @see UTF8::is_utf8()
2801
   *
2802
   * @param string $str
2803
   * @param bool   $strict
2804
   *
2805
   * @return bool
2806
   *
2807
   * @deprecated <p>use "UTF8::is_utf8()"</p>
2808
   */
2809
  public static function isUtf8($str, $strict = false)
2810
  {
2811
    return self::is_utf8($str, $strict);
2812
  }
2813
2814
  /**
2815
   * Checks if a string is 7 bit ASCII.
2816
   *
2817
   * @param string $str <p>The string to check.</p>
2818
   *
2819
   * @return bool <p>
2820
   *              <strong>true</strong> if it is ASCII<br>
2821
   *              <strong>false</strong> otherwise
2822
   *              </p>
2823
   */
2824 55
  public static function is_ascii($str)
2825
  {
2826 55
    $str = (string)$str;
2827
2828 55
    if (!isset($str[0])) {
2829 6
      return true;
2830
    }
2831
2832 54
    return (bool)!preg_match('/[^\x09\x10\x13\x0A\x0D\x20-\x7E]/', $str);
2833
  }
2834
2835
  /**
2836
   * Returns true if the string is base64 encoded, false otherwise.
2837
   *
2838
   * @param string $str <p>The input string.</p>
2839
   *
2840
   * @return bool <p>Whether or not $str is base64 encoded.</p>
2841
   */
2842 1
  public static function is_base64($str)
2843
  {
2844 1
    $str = (string)$str;
2845
2846 1
    if (!isset($str[0])) {
2847 1
      return false;
2848
    }
2849
2850 1
    $base64String = (string)base64_decode($str, true);
2851 1
    if ($base64String && base64_encode($base64String) === $str) {
2852 1
      return true;
2853
    }
2854
2855 1
    return false;
2856
  }
2857
2858
  /**
2859
   * Check if the input is binary... (is look like a hack).
2860
   *
2861
   * @param mixed $input
2862
   *
2863
   * @return bool
2864
   */
2865 16
  public static function is_binary($input)
2866
  {
2867 16
    $input = (string)$input;
2868
2869 16
    if (!isset($input[0])) {
2870 4
      return false;
2871
    }
2872
2873 16
    if (preg_match('~^[01]+$~', $input)) {
2874 4
      return true;
2875
    }
2876
2877 16
    $testLength = strlen($input);
2878 16
    if ($testLength && substr_count($input, "\x0") / $testLength > 0.3) {
2879 5
      return true;
2880
    }
2881
2882 15
    if (substr_count($input, "\x00") > 0) {
2883 1
      return true;
2884
    }
2885
2886 15
    return false;
2887
  }
2888
2889
  /**
2890
   * Check if the file is binary.
2891
   *
2892
   * @param string $file
2893
   *
2894
   * @return boolean
2895
   */
2896 1
  public static function is_binary_file($file)
2897
  {
2898
    try {
2899 1
      $fp = fopen($file, 'rb');
2900 1
      $block = fread($fp, 512);
2901 1
      fclose($fp);
2902 1
    } catch (\Exception $e) {
2903
      $block = '';
2904
    }
2905
2906 1
    return self::is_binary($block);
2907
  }
2908
2909
  /**
2910
   * Checks if the given string is equal to any "Byte Order Mark".
2911
   *
2912
   * WARNING: Use "UTF8::string_has_bom()" if you will check BOM in a string.
2913
   *
2914
   * @param string $str <p>The input string.</p>
2915
   *
2916
   * @return bool <p><strong>true</strong> if the $utf8_chr is Byte Order Mark, <strong>false</strong> otherwise.</p>
2917
   */
2918 1
  public static function is_bom($str)
2919
  {
2920 1
    foreach (self::$BOM as $bomString => $bomByteLength) {
2921 1
      if ($str === $bomString) {
2922 1
        return true;
2923
      }
2924 1
    }
2925
2926 1
    return false;
2927
  }
2928
2929
  /**
2930
   * Check if the string contains any html-tags <lall>.
2931
   *
2932
   * @param string $str <p>The input string.</p>
2933
   *
2934
   * @return boolean
2935
   */
2936 1
  public static function is_html($str)
2937
  {
2938 1
    $str = (string)$str;
2939
2940 1
    if (!isset($str[0])) {
2941 1
      return false;
2942
    }
2943
2944
    // init
2945 1
    $matches = array();
2946
2947 1
    preg_match("/<\/?\w+(?:(?:\s+\w+(?:\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)*+\s*|\s*)\/?>/", $str, $matches);
2948
2949 1
    if (count($matches) === 0) {
2950 1
      return false;
2951
    }
2952
2953 1
    return true;
2954
  }
2955
2956
  /**
2957
   * Try to check if "$str" is an json-string.
2958
   *
2959
   * @param string $str <p>The input string.</p>
2960
   *
2961
   * @return bool
2962
   */
2963 1
  public static function is_json($str)
2964
  {
2965 1
    $str = (string)$str;
2966
2967 1
    if (!isset($str[0])) {
2968 1
      return false;
2969
    }
2970
2971 1
    $json = self::json_decode($str);
2972
2973
    if (
2974
        (
2975 1
            is_object($json) === true
2976 1
            ||
2977 1
            is_array($json) === true
2978 1
        )
2979 1
        &&
2980 1
        json_last_error() === JSON_ERROR_NONE
2981 1
    ) {
2982 1
      return true;
2983
    }
2984
2985 1
    return false;
2986
  }
2987
2988
  /**
2989
   * Check if the string is UTF-16.
2990
   *
2991
   * @param string $str <p>The input string.</p>
2992
   *
2993
   * @return int|false <p>
2994
   *                   <strong>false</strong> if is't not UTF-16,<br>
2995
   *                   <strong>1</strong> for UTF-16LE,<br>
2996
   *                   <strong>2</strong> for UTF-16BE.
2997
   *                   </p>
2998
   */
2999 5 View Code Duplication
  public static function is_utf16($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3000
  {
3001 5
    $str = self::remove_bom($str);
3002
3003 5
    if (self::is_binary($str) === true) {
3004
3005 5
      $maybeUTF16LE = 0;
3006 5
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-16LE');
3007 5
      if ($test) {
3008 5
        $test2 = \mb_convert_encoding($test, 'UTF-16LE', 'UTF-8');
3009 5
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-16LE');
3010 5
        if ($test3 === $test) {
3011 5
          $strChars = self::count_chars($str, true);
3012 5
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
3013 4
            if (in_array($test3char, $strChars, true) === true) {
3014 2
              $maybeUTF16LE++;
3015 2
            }
3016 5
          }
3017 5
        }
3018 5
      }
3019
3020 5
      $maybeUTF16BE = 0;
3021 5
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-16BE');
3022 5
      if ($test) {
3023 5
        $test2 = \mb_convert_encoding($test, 'UTF-16BE', 'UTF-8');
3024 5
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-16BE');
3025 5
        if ($test3 === $test) {
3026 5
          $strChars = self::count_chars($str, true);
3027 5
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
3028 4
            if (in_array($test3char, $strChars, true) === true) {
3029 3
              $maybeUTF16BE++;
3030 3
            }
3031 5
          }
3032 5
        }
3033 5
      }
3034
3035 5
      if ($maybeUTF16BE !== $maybeUTF16LE) {
3036 3
        if ($maybeUTF16LE > $maybeUTF16BE) {
3037 2
          return 1;
3038
        }
3039
3040 3
        return 2;
3041
      }
3042
3043 3
    }
3044
3045 3
    return false;
3046
  }
3047
3048
  /**
3049
   * Check if the string is UTF-32.
3050
   *
3051
   * @param string $str
3052
   *
3053
   * @return int|false <p>
3054
   *                   <strong>false</strong> if is't not UTF-32,<br>
3055
   *                   <strong>1</strong> for UTF-32LE,<br>
3056
   *                   <strong>2</strong> for UTF-32BE.
3057
   *                   </p>
3058
   */
3059 3 View Code Duplication
  public static function is_utf32($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3060
  {
3061 3
    $str = self::remove_bom($str);
3062
3063 3
    if (self::is_binary($str) === true) {
3064
3065 3
      $maybeUTF32LE = 0;
3066 3
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-32LE');
3067 3
      if ($test) {
3068 2
        $test2 = \mb_convert_encoding($test, 'UTF-32LE', 'UTF-8');
3069 2
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-32LE');
3070 2
        if ($test3 === $test) {
3071 2
          $strChars = self::count_chars($str, true);
3072 2
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
3073 2
            if (in_array($test3char, $strChars, true) === true) {
3074 1
              $maybeUTF32LE++;
3075 1
            }
3076 2
          }
3077 2
        }
3078 2
      }
3079
3080 3
      $maybeUTF32BE = 0;
3081 3
      $test = \mb_convert_encoding($str, 'UTF-8', 'UTF-32BE');
3082 3
      if ($test) {
3083 2
        $test2 = \mb_convert_encoding($test, 'UTF-32BE', 'UTF-8');
3084 2
        $test3 = \mb_convert_encoding($test2, 'UTF-8', 'UTF-32BE');
3085 2
        if ($test3 === $test) {
3086 2
          $strChars = self::count_chars($str, true);
3087 2
          foreach (self::count_chars($test3, true) as $test3char => $test3charEmpty) {
3088 2
            if (in_array($test3char, $strChars, true) === true) {
3089 1
              $maybeUTF32BE++;
3090 1
            }
3091 2
          }
3092 2
        }
3093 2
      }
3094
3095 3
      if ($maybeUTF32BE !== $maybeUTF32LE) {
3096 1
        if ($maybeUTF32LE > $maybeUTF32BE) {
3097 1
          return 1;
3098
        }
3099
3100 1
        return 2;
3101
      }
3102
3103 3
    }
3104
3105 3
    return false;
3106
  }
3107
3108
  /**
3109
   * Checks whether the passed string contains only byte sequences that appear valid UTF-8 characters.
3110
   *
3111
   * @see    http://hsivonen.iki.fi/php-utf8/
3112
   *
3113
   * @param string $str    <p>The string to be checked.</p>
3114
   * @param bool   $strict <p>Check also if the string is not UTF-16 or UTF-32.</p>
3115
   *
3116
   * @return bool
3117
   */
3118 60
  public static function is_utf8($str, $strict = false)
3119
  {
3120 60
    $str = (string)$str;
3121
3122 60
    if (!isset($str[0])) {
3123 3
      return true;
3124
    }
3125
3126 58
    if ($strict === true) {
3127 1
      if (self::is_utf16($str) !== false) {
3128 1
        return false;
3129
      }
3130
3131
      if (self::is_utf32($str) !== false) {
3132
        return false;
3133
      }
3134
    }
3135
3136 58
    if (self::pcre_utf8_support() !== true) {
3137
3138
      // If even just the first character can be matched, when the /u
3139
      // modifier is used, then it's valid UTF-8. If the UTF-8 is somehow
3140
      // invalid, nothing at all will match, even if the string contains
3141
      // some valid sequences
3142
      return (preg_match('/^.{1}/us', $str, $ar) === 1);
3143
    }
3144
3145 58
    $mState = 0; // cached expected number of octets after the current octet
3146
    // until the beginning of the next UTF8 character sequence
3147 58
    $mUcs4 = 0; // cached Unicode character
3148 58
    $mBytes = 1; // cached expected number of octets in the current sequence
3149
3150 58
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
3151
      self::checkForSupport();
3152
    }
3153
3154 58 View Code Duplication
    if (self::$SUPPORT['mbstring_func_overload'] === true) {
3155
      $len = \mb_strlen($str, '8BIT');
3156
    } else {
3157 58
      $len = strlen($str);
3158
    }
3159
3160
    /** @noinspection ForeachInvariantsInspection */
3161 58
    for ($i = 0; $i < $len; $i++) {
3162 58
      $in = ord($str[$i]);
3163 58
      if ($mState === 0) {
3164
        // When mState is zero we expect either a US-ASCII character or a
3165
        // multi-octet sequence.
3166 58
        if (0 === (0x80 & $in)) {
3167
          // US-ASCII, pass straight through.
3168 52
          $mBytes = 1;
3169 58 View Code Duplication
        } elseif (0xC0 === (0xE0 & $in)) {
3170
          // First octet of 2 octet sequence.
3171 48
          $mUcs4 = $in;
3172 48
          $mUcs4 = ($mUcs4 & 0x1F) << 6;
3173 48
          $mState = 1;
3174 48
          $mBytes = 2;
3175 55
        } elseif (0xE0 === (0xF0 & $in)) {
3176
          // First octet of 3 octet sequence.
3177 29
          $mUcs4 = $in;
3178 29
          $mUcs4 = ($mUcs4 & 0x0F) << 12;
3179 29
          $mState = 2;
3180 29
          $mBytes = 3;
3181 46 View Code Duplication
        } elseif (0xF0 === (0xF8 & $in)) {
3182
          // First octet of 4 octet sequence.
3183 11
          $mUcs4 = $in;
3184 11
          $mUcs4 = ($mUcs4 & 0x07) << 18;
3185 11
          $mState = 3;
3186 11
          $mBytes = 4;
3187 22
        } elseif (0xF8 === (0xFC & $in)) {
3188
          /* First octet of 5 octet sequence.
3189
          *
3190
          * This is illegal because the encoded codepoint must be either
3191
          * (a) not the shortest form or
3192
          * (b) outside the Unicode range of 0-0x10FFFF.
3193
          * Rather than trying to resynchronize, we will carry on until the end
3194
          * of the sequence and let the later error handling code catch it.
3195
          */
3196 4
          $mUcs4 = $in;
3197 4
          $mUcs4 = ($mUcs4 & 0x03) << 24;
3198 4
          $mState = 4;
3199 4
          $mBytes = 5;
3200 12 View Code Duplication
        } elseif (0xFC === (0xFE & $in)) {
3201
          // First octet of 6 octet sequence, see comments for 5 octet sequence.
3202 4
          $mUcs4 = $in;
3203 4
          $mUcs4 = ($mUcs4 & 1) << 30;
3204 4
          $mState = 5;
3205 4
          $mBytes = 6;
3206 4
        } else {
3207
          /* Current octet is neither in the US-ASCII range nor a legal first
3208
           * octet of a multi-octet sequence.
3209
           */
3210 6
          return false;
3211
        }
3212 57
      } else {
3213
        // When mState is non-zero, we expect a continuation of the multi-octet
3214
        // sequence
3215 52
        if (0x80 === (0xC0 & $in)) {
3216
          // Legal continuation.
3217 48
          $shift = ($mState - 1) * 6;
3218 48
          $tmp = $in;
3219 48
          $tmp = ($tmp & 0x0000003F) << $shift;
3220 48
          $mUcs4 |= $tmp;
3221
          /**
3222
           * End of the multi-octet sequence. mUcs4 now contains the final
3223
           * Unicode code point to be output
3224
           */
3225 48
          if (0 === --$mState) {
3226
            /*
3227
            * Check for illegal sequences and code points.
3228
            */
3229
            // From Unicode 3.1, non-shortest form is illegal
3230
            if (
3231 48
                (2 === $mBytes && $mUcs4 < 0x0080) ||
3232 48
                (3 === $mBytes && $mUcs4 < 0x0800) ||
3233 48
                (4 === $mBytes && $mUcs4 < 0x10000) ||
3234 48
                (4 < $mBytes) ||
3235
                // From Unicode 3.2, surrogate characters are illegal.
3236 48
                (($mUcs4 & 0xFFFFF800) === 0xD800) ||
3237
                // Code points outside the Unicode range are illegal.
3238 48
                ($mUcs4 > 0x10FFFF)
3239 48
            ) {
3240 7
              return false;
3241
            }
3242
            // initialize UTF8 cache
3243 48
            $mState = 0;
3244 48
            $mUcs4 = 0;
3245 48
            $mBytes = 1;
3246 48
          }
3247 48
        } else {
3248
          /**
3249
           *((0xC0 & (*in) != 0x80) && (mState != 0))
3250
           * Incomplete multi-octet sequence.
3251
           */
3252 26
          return false;
3253
        }
3254
      }
3255 57
    }
3256
3257 27
    return true;
3258
  }
3259
3260
  /**
3261
   * (PHP 5 &gt;= 5.2.0, PECL json &gt;= 1.2.0)<br/>
3262
   * Decodes a JSON string
3263
   *
3264
   * @link http://php.net/manual/en/function.json-decode.php
3265
   *
3266
   * @param string $json    <p>
3267
   *                        The <i>json</i> string being decoded.
3268
   *                        </p>
3269
   *                        <p>
3270
   *                        This function only works with UTF-8 encoded strings.
3271
   *                        </p>
3272
   *                        <p>PHP implements a superset of
3273
   *                        JSON - it will also encode and decode scalar types and <b>NULL</b>. The JSON standard
3274
   *                        only supports these values when they are nested inside an array or an object.
3275
   *                        </p>
3276
   * @param bool   $assoc   [optional] <p>
3277
   *                        When <b>TRUE</b>, returned objects will be converted into
3278
   *                        associative arrays.
3279
   *                        </p>
3280
   * @param int    $depth   [optional] <p>
3281
   *                        User specified recursion depth.
3282
   *                        </p>
3283
   * @param int    $options [optional] <p>
3284
   *                        Bitmask of JSON decode options. Currently only
3285
   *                        <b>JSON_BIGINT_AS_STRING</b>
3286
   *                        is supported (default is to cast large integers as floats)
3287
   *                        </p>
3288
   *
3289
   * @return mixed the value encoded in <i>json</i> in appropriate
3290
   * PHP type. Values true, false and
3291
   * null (case-insensitive) are returned as <b>TRUE</b>, <b>FALSE</b>
3292
   * and <b>NULL</b> respectively. <b>NULL</b> is returned if the
3293
   * <i>json</i> cannot be decoded or if the encoded
3294
   * data is deeper than the recursion limit.
3295
   */
3296 2 View Code Duplication
  public static function json_decode($json, $assoc = false, $depth = 512, $options = 0)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3297
  {
3298 2
    $json = (string)self::filter($json);
3299
3300 2
    if (Bootup::is_php('5.4') === true) {
3301
      $json = json_decode($json, $assoc, $depth, $options);
3302
    } else {
3303 2
      $json = json_decode($json, $assoc, $depth);
3304
    }
3305
3306 2
    return $json;
3307
  }
3308
3309
  /**
3310
   * (PHP 5 &gt;= 5.2.0, PECL json &gt;= 1.2.0)<br/>
3311
   * Returns the JSON representation of a value.
3312
   *
3313
   * @link http://php.net/manual/en/function.json-encode.php
3314
   *
3315
   * @param mixed $value   <p>
3316
   *                       The <i>value</i> being encoded. Can be any type except
3317
   *                       a resource.
3318
   *                       </p>
3319
   *                       <p>
3320
   *                       All string data must be UTF-8 encoded.
3321
   *                       </p>
3322
   *                       <p>PHP implements a superset of
3323
   *                       JSON - it will also encode and decode scalar types and <b>NULL</b>. The JSON standard
3324
   *                       only supports these values when they are nested inside an array or an object.
3325
   *                       </p>
3326
   * @param int   $options [optional] <p>
3327
   *                       Bitmask consisting of <b>JSON_HEX_QUOT</b>,
3328
   *                       <b>JSON_HEX_TAG</b>,
3329
   *                       <b>JSON_HEX_AMP</b>,
3330
   *                       <b>JSON_HEX_APOS</b>,
3331
   *                       <b>JSON_NUMERIC_CHECK</b>,
3332
   *                       <b>JSON_PRETTY_PRINT</b>,
3333
   *                       <b>JSON_UNESCAPED_SLASHES</b>,
3334
   *                       <b>JSON_FORCE_OBJECT</b>,
3335
   *                       <b>JSON_UNESCAPED_UNICODE</b>. The behaviour of these
3336
   *                       constants is described on
3337
   *                       the JSON constants page.
3338
   *                       </p>
3339
   * @param int   $depth   [optional] <p>
3340
   *                       Set the maximum depth. Must be greater than zero.
3341
   *                       </p>
3342
   *
3343
   * @return string a JSON encoded string on success or <b>FALSE</b> on failure.
3344
   */
3345 2 View Code Duplication
  public static function json_encode($value, $options = 0, $depth = 512)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3346
  {
3347 2
    $value = self::filter($value);
3348
3349 2
    if (Bootup::is_php('5.5') === true) {
3350
      $json = json_encode($value, $options, $depth);
3351
    } else {
3352 2
      $json = json_encode($value, $options);
3353
    }
3354
3355 2
    return $json;
3356
  }
3357
3358
  /**
3359
   * Makes string's first char lowercase.
3360
   *
3361
   * @param string $str <p>The input string</p>
3362
   * @param string  $encoding  [optional] <p>Set the charset.</p>
3363
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
3364
   *
3365
   * @return string <p>The resulting string</p>
3366
   */
3367 7
  public static function lcfirst($str, $encoding = 'UTF-8', $cleanUtf8 = false)
3368
  {
3369 7
    $strPartTwo = self::substr($str, 1, null, $encoding, $cleanUtf8);
3370 7
    if ($strPartTwo === false) {
3371
      $strPartTwo = '';
3372
    }
3373
3374 7
    $strPartOne = self::strtolower(
3375 7
        (string)self::substr($str, 0, 1, $encoding, $cleanUtf8),
3376 7
        $encoding,
3377
        $cleanUtf8
3378 7
    );
3379
3380 7
    return $strPartOne . $strPartTwo;
3381
  }
3382
3383
  /**
3384
   * alias for "UTF8::lcfirst()"
3385
   *
3386
   * @see UTF8::lcfirst()
3387
   *
3388
   * @param string  $word
3389
   * @param string  $encoding
3390
   * @param boolean $cleanUtf8
3391
   *
3392
   * @return string
3393
   */
3394 1
  public static function lcword($word, $encoding = 'UTF-8', $cleanUtf8 = false)
3395
  {
3396 1
    return self::lcfirst($word, $encoding, $cleanUtf8);
3397
  }
3398
3399
  /**
3400
   * Lowercase for all words in the string.
3401
   *
3402
   * @param string   $str        <p>The input string.</p>
3403
   * @param string[] $exceptions [optional] <p>Exclusion for some words.</p>
3404
   * @param string   $charlist   [optional] <p>Additional chars that contains to words and do not start a new word.</p>
3405
   * @param string   $encoding   [optional] <p>Set the charset.</p>
3406
   * @param boolean  $cleanUtf8  [optional] <p>Remove non UTF-8 chars from the string.</p>
3407
   *
3408
   * @return string
3409
   */
3410 1
  public static function lcwords($str, $exceptions = array(), $charlist = '', $encoding = 'UTF-8', $cleanUtf8 = false)
3411
  {
3412 1
    if (!$str) {
3413 1
      return '';
3414
    }
3415
3416 1
    $words = self::str_to_words($str, $charlist);
3417 1
    $newWords = array();
3418
3419 1
    if (count($exceptions) > 0) {
3420 1
      $useExceptions = true;
3421 1
    } else {
3422 1
      $useExceptions = false;
3423
    }
3424
3425 1 View Code Duplication
    foreach ($words as $word) {
3426
3427 1
      if (!$word) {
3428 1
        continue;
3429
      }
3430
3431
      if (
3432
          $useExceptions === false
3433 1
          ||
3434
          (
3435
              $useExceptions === true
3436 1
              &&
3437 1
              !in_array($word, $exceptions, true)
3438 1
          )
3439 1
      ) {
3440 1
        $word = self::lcfirst($word, $encoding, $cleanUtf8);
3441 1
      }
3442
3443 1
      $newWords[] = $word;
3444 1
    }
3445
3446 1
    return implode('', $newWords);
3447
  }
3448
3449
  /**
3450
   * Strip whitespace or other characters from beginning of a UTF-8 string.
3451
   *
3452
   * @param string $str   <p>The string to be trimmed</p>
3453
   * @param string $chars <p>Optional characters to be stripped</p>
3454
   *
3455
   * @return string <p>The string with unwanted characters stripped from the left.</p>
3456
   */
3457 24 View Code Duplication
  public static function ltrim($str = '', $chars = INF)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3458
  {
3459 24
    $str = (string)$str;
3460
3461 24
    if (!isset($str[0])) {
3462 2
      return '';
3463
    }
3464
3465
    // Info: http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page#Unicodecharactercategories
3466 23
    if ($chars === INF || !$chars) {
3467 2
      return preg_replace('/^[\pZ\pC]+/u', '', $str);
3468
    }
3469
3470 23
    return preg_replace('/^' . self::rxClass($chars) . '+/u', '', $str);
3471
  }
3472
3473
  /**
3474
   * Returns the UTF-8 character with the maximum code point in the given data.
3475
   *
3476
   * @param mixed $arg <p>A UTF-8 encoded string or an array of such strings.</p>
3477
   *
3478
   * @return string <p>The character with the highest code point than others.</p>
3479
   */
3480 1 View Code Duplication
  public static function max($arg)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3481
  {
3482 1
    if (is_array($arg) === true) {
3483 1
      $arg = implode('', $arg);
3484 1
    }
3485
3486 1
    return self::chr(max(self::codepoints($arg)));
3487
  }
3488
3489
  /**
3490
   * Calculates and returns the maximum number of bytes taken by any
3491
   * UTF-8 encoded character in the given string.
3492
   *
3493
   * @param string $str <p>The original Unicode string.</p>
3494
   *
3495
   * @return int <p>Max byte lengths of the given chars.</p>
3496
   */
3497 1
  public static function max_chr_width($str)
3498
  {
3499 1
    $bytes = self::chr_size_list($str);
3500 1
    if (count($bytes) > 0) {
3501 1
      return (int)max($bytes);
3502
    }
3503
3504 1
    return 0;
3505
  }
3506
3507
  /**
3508
   * Checks whether mbstring is available on the server.
3509
   *
3510
   * @return bool <p><strong>true</strong> if available, <strong>false</strong> otherwise.</p>
3511
   */
3512 12
  public static function mbstring_loaded()
3513
  {
3514 12
    $return = extension_loaded('mbstring') ? true : false;
3515
3516 12
    if ($return === true) {
3517 12
      \mb_internal_encoding('UTF-8');
3518 12
    }
3519
3520 12
    return $return;
3521
  }
3522
3523 1
  private static function mbstring_overloaded()
3524
  {
3525
    if (
3526 1
        defined('MB_OVERLOAD_STRING')
3527 1
        &&
3528 1
        ini_get('mbstring.func_overload') & MB_OVERLOAD_STRING
3529 1
    ) {
3530
      return true;
3531
    }
3532
3533 1
    return false;
3534
  }
3535
3536
  /**
3537
   * Returns the UTF-8 character with the minimum code point in the given data.
3538
   *
3539
   * @param mixed $arg <strong>A UTF-8 encoded string or an array of such strings.</strong>
3540
   *
3541
   * @return string <p>The character with the lowest code point than others.</p>
3542
   */
3543 1 View Code Duplication
  public static function min($arg)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3544
  {
3545 1
    if (is_array($arg) === true) {
3546 1
      $arg = implode('', $arg);
3547 1
    }
3548
3549 1
    return self::chr(min(self::codepoints($arg)));
3550
  }
3551
3552
  /**
3553
   * alias for "UTF8::normalize_encoding()"
3554
   *
3555
   * @see UTF8::normalize_encoding()
3556
   *
3557
   * @param string $encoding
3558
   * @param mixed  $fallback
3559
   *
3560
   * @return string
3561
   *
3562
   * @deprecated <p>use "UTF8::normalize_encoding()"</p>
3563
   */
3564
  public static function normalizeEncoding($encoding, $fallback = false)
3565
  {
3566
    return self::normalize_encoding($encoding, $fallback);
3567
  }
3568
3569
  /**
3570
   * Normalize the encoding-"name" input.
3571
   *
3572
   * @param string $encoding <p>e.g.: ISO, UTF8, WINDOWS-1251 etc.</p>
3573
   * @param mixed  $fallback <p>e.g.: UTF-8</p>
3574
   *
3575
   * @return string <p>e.g.: ISO-8859-1, UTF-8, WINDOWS-1251 etc.</p>
3576
   */
3577 80
  public static function normalize_encoding($encoding, $fallback = false)
3578
  {
3579 80
    static $STATIC_NORMALIZE_ENCODING_CACHE = array();
3580
3581 80
    if (!$encoding) {
3582 3
      return $fallback;
3583
    }
3584
3585 79
    if ('UTF-8' === $encoding) {
3586 1
      return $encoding;
3587
    }
3588
3589 79
    if (in_array($encoding, self::$ICONV_ENCODING, true)) {
3590 7
      return $encoding;
3591
    }
3592
3593 78
    if (isset($STATIC_NORMALIZE_ENCODING_CACHE[$encoding])) {
3594 77
      return $STATIC_NORMALIZE_ENCODING_CACHE[$encoding];
3595
    }
3596
3597 5
    $encodingOrig = $encoding;
3598 5
    $encoding = strtoupper($encoding);
3599 5
    $encodingUpperHelper = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);
3600
3601
    $equivalences = array(
3602 5
        'ISO8859'     => 'ISO-8859-1',
3603 5
        'ISO88591'    => 'ISO-8859-1',
3604 5
        'ISO'         => 'ISO-8859-1',
3605 5
        'LATIN'       => 'ISO-8859-1',
3606 5
        'LATIN1'      => 'ISO-8859-1', // Western European
3607 5
        'ISO88592'    => 'ISO-8859-2',
3608 5
        'LATIN2'      => 'ISO-8859-2', // Central European
3609 5
        'ISO88593'    => 'ISO-8859-3',
3610 5
        'LATIN3'      => 'ISO-8859-3', // Southern European
3611 5
        'ISO88594'    => 'ISO-8859-4',
3612 5
        'LATIN4'      => 'ISO-8859-4', // Northern European
3613 5
        'ISO88595'    => 'ISO-8859-5',
3614 5
        'ISO88596'    => 'ISO-8859-6', // Greek
3615 5
        'ISO88597'    => 'ISO-8859-7',
3616 5
        'ISO88598'    => 'ISO-8859-8', // Hebrew
3617 5
        'ISO88599'    => 'ISO-8859-9',
3618 5
        'LATIN5'      => 'ISO-8859-9', // Turkish
3619 5
        'ISO885911'   => 'ISO-8859-11',
3620 5
        'TIS620'      => 'ISO-8859-11', // Thai
3621 5
        'ISO885910'   => 'ISO-8859-10',
3622 5
        'LATIN6'      => 'ISO-8859-10', // Nordic
3623 5
        'ISO885913'   => 'ISO-8859-13',
3624 5
        'LATIN7'      => 'ISO-8859-13', // Baltic
3625 5
        'ISO885914'   => 'ISO-8859-14',
3626 5
        'LATIN8'      => 'ISO-8859-14', // Celtic
3627 5
        'ISO885915'   => 'ISO-8859-15',
3628 5
        'LATIN9'      => 'ISO-8859-15', // Western European (with some extra chars e.g. €)
3629 5
        'ISO885916'   => 'ISO-8859-16',
3630 5
        'LATIN10'     => 'ISO-8859-16', // Southeast European
3631 5
        'CP1250'      => 'WINDOWS-1250',
3632 5
        'WIN1250'     => 'WINDOWS-1250',
3633 5
        'WINDOWS1250' => 'WINDOWS-1250',
3634 5
        'CP1251'      => 'WINDOWS-1251',
3635 5
        'WIN1251'     => 'WINDOWS-1251',
3636 5
        'WINDOWS1251' => 'WINDOWS-1251',
3637 5
        'CP1252'      => 'WINDOWS-1252',
3638 5
        'WIN1252'     => 'WINDOWS-1252',
3639 5
        'WINDOWS1252' => 'WINDOWS-1252',
3640 5
        'CP1253'      => 'WINDOWS-1253',
3641 5
        'WIN1253'     => 'WINDOWS-1253',
3642 5
        'WINDOWS1253' => 'WINDOWS-1253',
3643 5
        'CP1254'      => 'WINDOWS-1254',
3644 5
        'WIN1254'     => 'WINDOWS-1254',
3645 5
        'WINDOWS1254' => 'WINDOWS-1254',
3646 5
        'CP1255'      => 'WINDOWS-1255',
3647 5
        'WIN1255'     => 'WINDOWS-1255',
3648 5
        'WINDOWS1255' => 'WINDOWS-1255',
3649 5
        'CP1256'      => 'WINDOWS-1256',
3650 5
        'WIN1256'     => 'WINDOWS-1256',
3651 5
        'WINDOWS1256' => 'WINDOWS-1256',
3652 5
        'CP1257'      => 'WINDOWS-1257',
3653 5
        'WIN1257'     => 'WINDOWS-1257',
3654 5
        'WINDOWS1257' => 'WINDOWS-1257',
3655 5
        'CP1258'      => 'WINDOWS-1258',
3656 5
        'WIN1258'     => 'WINDOWS-1258',
3657 5
        'WINDOWS1258' => 'WINDOWS-1258',
3658 5
        'UTF16'       => 'UTF-16',
3659 5
        'UTF32'       => 'UTF-32',
3660 5
        'UTF8'        => 'UTF-8',
3661 5
        'UTF'         => 'UTF-8',
3662 5
        'UTF7'        => 'UTF-7',
3663 5
        '8BIT'        => 'CP850',
3664 5
        'BINARY'      => 'CP850',
3665 5
    );
3666
3667 5
    if (!empty($equivalences[$encodingUpperHelper])) {
3668 5
      $encoding = $equivalences[$encodingUpperHelper];
3669 5
    }
3670
3671 5
    $STATIC_NORMALIZE_ENCODING_CACHE[$encodingOrig] = $encoding;
3672
3673 5
    return $encoding;
3674
  }
3675
3676
  /**
3677
   * Normalize some MS Word special characters.
3678
   *
3679
   * @param string $str <p>The string to be normalized.</p>
3680
   *
3681
   * @return string
3682
   */
3683 16 View Code Duplication
  public static function normalize_msword($str)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3684
  {
3685 16
    $str = (string)$str;
3686
3687 16
    if (!isset($str[0])) {
3688 1
      return '';
3689
    }
3690
3691 16
    static $UTF8_MSWORD_KEYS_CACHE = null;
3692 16
    static $UTF8_MSWORD_VALUES_CACHE = null;
3693
3694 16
    if ($UTF8_MSWORD_KEYS_CACHE === null) {
3695 1
      $UTF8_MSWORD_KEYS_CACHE = array_keys(self::$UTF8_MSWORD);
3696 1
      $UTF8_MSWORD_VALUES_CACHE = array_values(self::$UTF8_MSWORD);
3697 1
    }
3698
3699 16
    return str_replace($UTF8_MSWORD_KEYS_CACHE, $UTF8_MSWORD_VALUES_CACHE, $str);
3700
  }
3701
3702
  /**
3703
   * Normalize the whitespace.
3704
   *
3705
   * @param string $str                     <p>The string to be normalized.</p>
3706
   * @param bool   $keepNonBreakingSpace    [optional] <p>Set to true, to keep non-breaking-spaces.</p>
3707
   * @param bool   $keepBidiUnicodeControls [optional] <p>Set to true, to keep non-printable (for the web)
3708
   *                                        bidirectional text chars.</p>
3709
   *
3710
   * @return string
3711
   */
3712 37
  public static function normalize_whitespace($str, $keepNonBreakingSpace = false, $keepBidiUnicodeControls = false)
3713
  {
3714 37
    $str = (string)$str;
3715
3716 37
    if (!isset($str[0])) {
3717 4
      return '';
3718
    }
3719
3720 37
    static $WHITESPACE_CACHE = array();
3721 37
    $cacheKey = (int)$keepNonBreakingSpace;
3722
3723 37
    if (!isset($WHITESPACE_CACHE[$cacheKey])) {
3724
3725 2
      $WHITESPACE_CACHE[$cacheKey] = self::$WHITESPACE_TABLE;
3726
3727 2
      if ($keepNonBreakingSpace === true) {
3728
        /** @noinspection OffsetOperationsInspection */
3729 1
        unset($WHITESPACE_CACHE[$cacheKey]['NO-BREAK SPACE']);
3730 1
      }
3731
3732 2
      $WHITESPACE_CACHE[$cacheKey] = array_values($WHITESPACE_CACHE[$cacheKey]);
3733 2
    }
3734
3735 37
    if ($keepBidiUnicodeControls === false) {
3736 37
      static $BIDI_UNICODE_CONTROLS_CACHE = null;
3737
3738 37
      if ($BIDI_UNICODE_CONTROLS_CACHE === null) {
3739 1
        $BIDI_UNICODE_CONTROLS_CACHE = array_values(self::$BIDI_UNI_CODE_CONTROLS_TABLE);
3740 1
      }
3741
3742 37
      $str = str_replace($BIDI_UNICODE_CONTROLS_CACHE, '', $str);
3743 37
    }
3744
3745 37
    return str_replace($WHITESPACE_CACHE[$cacheKey], ' ', $str);
3746
  }
3747
3748
  /**
3749
   * Strip all whitespace characters. This includes tabs and newline
3750
   * characters, as well as multibyte whitespace such as the thin space
3751
   * and ideographic space.
3752
   *
3753
   * @param string $str
3754
   *
3755
   * @return string
3756
   */
3757 12
  public static function strip_whitespace($str)
3758
  {
3759 12
    $str = (string)$str;
3760
3761 12
    if (!isset($str[0])) {
3762 1
      return '';
3763
    }
3764
3765 11
    return (string)preg_replace('/[[:space:]]+/u', '', $str);
3766
  }
3767
3768
  /**
3769
   * Format a number with grouped thousands.
3770
   *
3771
   * @param float  $number
3772
   * @param int    $decimals
3773
   * @param string $dec_point
3774
   * @param string $thousands_sep
3775
   *
3776
   * @return string
3777
   *
3778
   * @deprecated <p>This has nothing to do with UTF-8.</p>
3779
   */
3780
  public static function number_format($number, $decimals = 0, $dec_point = '.', $thousands_sep = ',')
3781
  {
3782
    $thousands_sep = (string)$thousands_sep;
3783
    $dec_point = (string)$dec_point;
3784
    $number = (float)$number;
3785
3786
    if (
3787
        isset($thousands_sep[1], $dec_point[1])
3788
        &&
3789
        Bootup::is_php('5.4') === true
3790
    ) {
3791
      return str_replace(
3792
          array(
3793
              '.',
3794
              ',',
3795
          ),
3796
          array(
3797
              $dec_point,
3798
              $thousands_sep,
3799
          ),
3800
          number_format($number, $decimals, '.', ',')
3801
      );
3802
    }
3803
3804
    return number_format($number, $decimals, $dec_point, $thousands_sep);
3805
  }
3806
3807
  /**
3808
   * Calculates Unicode code point of the given UTF-8 encoded character.
3809
   *
3810
   * INFO: opposite to UTF8::chr()
3811
   *
3812
   * @param string      $chr      <p>The character of which to calculate code point.<p/>
3813
   * @param string|null $encoding [optional] <p>Default is UTF-8</p>
3814
   *
3815
   * @return int <p>
3816
   *             Unicode code point of the given character,<br>
3817
   *             0 on invalid UTF-8 byte sequence.
3818
   *             </p>
3819
   */
3820 23
  public static function ord($chr, $encoding = 'UTF-8')
3821
  {
3822
    // init
3823 23
    static $CHAR_CACHE = array();
3824 23
    $encoding = (string)$encoding;
3825
3826
    // save the original string
3827 23
    $chr_orig = $chr;
3828
3829 23
    if ($encoding !== 'UTF-8') {
3830 2
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
3831
3832
      // check again, if it's still not UTF-8
3833
      /** @noinspection NotOptimalIfConditionsInspection */
3834 2
      if ($encoding !== 'UTF-8') {
3835 2
        $chr = (string)\mb_convert_encoding($chr, 'UTF-8', $encoding);
3836 2
      }
3837 2
    }
3838
3839 23
    $cacheKey = $chr_orig . $encoding;
3840 23
    if (isset($CHAR_CACHE[$cacheKey]) === true) {
3841 23
      return $CHAR_CACHE[$cacheKey];
3842
    }
3843
3844 11
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
3845
      self::checkForSupport();
3846
    }
3847
3848 11
    if (self::$SUPPORT['intlChar'] === true) {
3849
      $code = \IntlChar::ord($chr);
3850
      if ($code) {
3851
        return $CHAR_CACHE[$cacheKey] = $code;
3852
      }
3853
    }
3854
3855
    /** @noinspection CallableParameterUseCaseInTypeContextInspection */
3856 11
    $chr = unpack('C*', (string)self::substr($chr, 0, 4, '8BIT'));
3857 11
    $code = $chr ? $chr[1] : 0;
3858
3859 11
    if (0xF0 <= $code && isset($chr[4])) {
3860 1
      return $CHAR_CACHE[$cacheKey] = (($code - 0xF0) << 18) + (($chr[2] - 0x80) << 12) + (($chr[3] - 0x80) << 6) + $chr[4] - 0x80;
3861
    }
3862
3863 11
    if (0xE0 <= $code && isset($chr[3])) {
3864 4
      return $CHAR_CACHE[$cacheKey] = (($code - 0xE0) << 12) + (($chr[2] - 0x80) << 6) + $chr[3] - 0x80;
3865
    }
3866
3867 11
    if (0xC0 <= $code && isset($chr[2])) {
3868 7
      return $CHAR_CACHE[$cacheKey] = (($code - 0xC0) << 6) + $chr[2] - 0x80;
3869
    }
3870
3871 10
    return $CHAR_CACHE[$cacheKey] = $code;
3872
  }
3873
3874
  /**
3875
   * Parses the string into an array (into the the second parameter).
3876
   *
3877
   * WARNING: Instead of "parse_str()" this method do not (re-)placing variables in the current scope,
3878
   *          if the second parameter is not set!
3879
   *
3880
   * @link http://php.net/manual/en/function.parse-str.php
3881
   *
3882
   * @param string  $str       <p>The input string.</p>
3883
   * @param array   $result    <p>The result will be returned into this reference parameter.</p>
3884
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
3885
   *
3886
   * @return bool <p>Will return <strong>false</strong> if php can't parse the string and we haven't any $result.</p>
3887
   */
3888 1
  public static function parse_str($str, &$result, $cleanUtf8 = false)
3889
  {
3890 1
    if ($cleanUtf8 === true) {
3891 1
      $str = self::clean($str);
3892 1
    }
3893
3894
    /** @noinspection PhpVoidFunctionResultUsedInspection */
3895 1
    $return = \mb_parse_str($str, $result);
3896 1
    if ($return === false || empty($result)) {
3897 1
      return false;
3898
    }
3899
3900 1
    return true;
3901
  }
3902
3903
  /**
3904
   * Checks if \u modifier is available that enables Unicode support in PCRE.
3905
   *
3906
   * @return bool <p><strong>true</strong> if support is available, <strong>false</strong> otherwise.</p>
3907
   */
3908 58
  public static function pcre_utf8_support()
3909
  {
3910
    /** @noinspection PhpUsageOfSilenceOperatorInspection */
3911 58
    return (bool)@preg_match('//u', '');
3912
  }
3913
3914
  /**
3915
   * Create an array containing a range of UTF-8 characters.
3916
   *
3917
   * @param mixed $var1 <p>Numeric or hexadecimal code points, or a UTF-8 character to start from.</p>
3918
   * @param mixed $var2 <p>Numeric or hexadecimal code points, or a UTF-8 character to end at.</p>
3919
   *
3920
   * @return array
3921
   */
3922 1
  public static function range($var1, $var2)
3923
  {
3924 1
    if (!$var1 || !$var2) {
3925 1
      return array();
3926
    }
3927
3928 1 View Code Duplication
    if (ctype_digit((string)$var1)) {
3929 1
      $start = (int)$var1;
3930 1
    } elseif (ctype_xdigit($var1)) {
3931
      $start = (int)self::hex_to_int($var1);
3932
    } else {
3933 1
      $start = self::ord($var1);
3934
    }
3935
3936 1
    if (!$start) {
3937
      return array();
3938
    }
3939
3940 1 View Code Duplication
    if (ctype_digit((string)$var2)) {
3941 1
      $end = (int)$var2;
3942 1
    } elseif (ctype_xdigit($var2)) {
3943
      $end = (int)self::hex_to_int($var2);
3944
    } else {
3945 1
      $end = self::ord($var2);
3946
    }
3947
3948 1
    if (!$end) {
3949
      return array();
3950
    }
3951
3952 1
    return array_map(
3953
        array(
3954 1
            '\\voku\\helper\\UTF8',
3955 1
            'chr',
3956 1
        ),
3957 1
        range($start, $end)
3958 1
    );
3959
  }
3960
3961
  /**
3962
   * Multi decode html entity & fix urlencoded-win1252-chars.
3963
   *
3964
   * e.g:
3965
   * 'test+test'                     => 'test+test'
3966
   * 'D&#252;sseldorf'               => 'Düsseldorf'
3967
   * 'D%FCsseldorf'                  => 'Düsseldorf'
3968
   * 'D&#xFC;sseldorf'               => 'Düsseldorf'
3969
   * 'D%26%23xFC%3Bsseldorf'         => 'Düsseldorf'
3970
   * 'Düsseldorf'                   => 'Düsseldorf'
3971
   * 'D%C3%BCsseldorf'               => 'Düsseldorf'
3972
   * 'D%C3%83%C2%BCsseldorf'         => 'Düsseldorf'
3973
   * 'D%25C3%2583%25C2%25BCsseldorf' => 'Düsseldorf'
3974
   *
3975
   * @param string $str          <p>The input string.</p>
3976
   * @param bool   $multi_decode <p>Decode as often as possible.</p>
3977
   *
3978
   * @return string
3979
   */
3980 2 View Code Duplication
  public static function rawurldecode($str, $multi_decode = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
3981
  {
3982 2
    $str = (string)$str;
3983
3984 2
    if (!isset($str[0])) {
3985 1
      return '';
3986
    }
3987
3988 2
    $pattern = '/%u([0-9a-f]{3,4})/i';
3989 2
    if (preg_match($pattern, $str)) {
3990 1
      $str = preg_replace($pattern, '&#x\\1;', rawurldecode($str));
3991 1
    }
3992
3993 2
    $flags = Bootup::is_php('5.4') === true ? ENT_QUOTES | ENT_HTML5 : ENT_QUOTES;
3994
3995
    do {
3996 2
      $str_compare = $str;
3997
3998 2
      $str = self::fix_simple_utf8(
3999 2
          rawurldecode(
4000 2
              self::html_entity_decode(
4001 2
                  self::to_utf8($str),
0 ignored issues
show
Bug introduced by
It seems like self::to_utf8($str) targeting voku\helper\UTF8::to_utf8() can also be of type array; however, voku\helper\UTF8::html_entity_decode() does only seem to accept string, maybe add an additional type check?

This check looks at variables that are passed out again to other methods.

If the outgoing method call has stricter type requirements than the method itself, an issue is raised.

An additional type check may prevent trouble.

Loading history...
4002
                  $flags
4003 2
              )
4004 2
          )
4005 2
      );
4006
4007 2
    } while ($multi_decode === true && $str_compare !== $str);
4008
4009 2
    return (string)$str;
4010
  }
4011
4012
  /**
4013
   * alias for "UTF8::remove_bom()"
4014
   *
4015
   * @see UTF8::remove_bom()
4016
   *
4017
   * @param string $str
4018
   *
4019
   * @return string
4020
   *
4021
   * @deprecated <p>use "UTF8::remove_bom()"</p>
4022
   */
4023
  public static function removeBOM($str)
4024
  {
4025
    return self::remove_bom($str);
4026
  }
4027
4028
  /**
4029
   * Remove the BOM from UTF-8 / UTF-16 / UTF-32 strings.
4030
   *
4031
   * @param string $str <p>The input string.</p>
4032
   *
4033
   * @return string <p>String without UTF-BOM</p>
4034
   */
4035 40
  public static function remove_bom($str)
4036
  {
4037 40
    $str = (string)$str;
4038
4039 40
    if (!isset($str[0])) {
4040 5
      return '';
4041
    }
4042
4043 40
    foreach (self::$BOM as $bomString => $bomByteLength) {
4044 40
      if (0 === self::strpos($str, $bomString, 0, '8BIT')) {
4045 5
        $strTmp = self::substr($str, $bomByteLength, null, '8BIT');
4046 5
        if ($strTmp === false) {
4047
          $strTmp = '';
4048
        }
4049 5
        $str = (string)$strTmp;
4050 5
      }
4051 40
    }
4052
4053 40
    return $str;
4054
  }
4055
4056
  /**
4057
   * Removes duplicate occurrences of a string in another string.
4058
   *
4059
   * @param string          $str  <p>The base string.</p>
4060
   * @param string|string[] $what <p>String to search for in the base string.</p>
4061
   *
4062
   * @return string <p>The result string with removed duplicates.</p>
4063
   */
4064 1
  public static function remove_duplicates($str, $what = ' ')
4065
  {
4066 1
    if (is_string($what) === true) {
4067 1
      $what = array($what);
4068 1
    }
4069
4070 1
    if (is_array($what) === true) {
4071
      /** @noinspection ForeachSourceInspection */
4072 1
      foreach ($what as $item) {
4073 1
        $str = preg_replace('/(' . preg_quote($item, '/') . ')+/', $item, $str);
4074 1
      }
4075 1
    }
4076
4077 1
    return $str;
4078
  }
4079
4080
  /**
4081
   * Remove invisible characters from a string.
4082
   *
4083
   * e.g.: This prevents sandwiching null characters between ascii characters, like Java\0script.
4084
   *
4085
   * copy&past from https://github.com/bcit-ci/CodeIgniter/blob/develop/system/core/Common.php
4086
   *
4087
   * @param string $str
4088
   * @param bool   $url_encoded
4089
   * @param string $replacement
4090
   *
4091
   * @return string
4092
   */
4093 62
  public static function remove_invisible_characters($str, $url_encoded = true, $replacement = '')
4094
  {
4095
    // init
4096 62
    $non_displayables = array();
4097
4098
    // every control character except newline (dec 10),
4099
    // carriage return (dec 13) and horizontal tab (dec 09)
4100 62
    if ($url_encoded) {
4101 62
      $non_displayables[] = '/%0[0-8bcef]/'; // url encoded 00-08, 11, 12, 14, 15
4102 62
      $non_displayables[] = '/%1[0-9a-f]/'; // url encoded 16-31
4103 62
    }
4104
4105 62
    $non_displayables[] = '/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]+/S'; // 00-08, 11, 12, 14-31, 127
4106
4107
    do {
4108 62
      $str = preg_replace($non_displayables, $replacement, $str, -1, $count);
4109 62
    } while ($count !== 0);
4110
4111 62
    return $str;
4112
  }
4113
4114
  /**
4115
   * Replace the diamond question mark (�) and invalid-UTF8 chars with the replacement.
4116
   *
4117
   * @param string $str                <p>The input string</p>
4118
   * @param string $replacementChar    <p>The replacement character.</p>
4119
   * @param bool   $processInvalidUtf8 <p>Convert invalid UTF-8 chars </p>
4120
   *
4121
   * @return string
4122
   */
4123 62
  public static function replace_diamond_question_mark($str, $replacementChar = '', $processInvalidUtf8 = true)
4124
  {
4125 62
    $str = (string)$str;
4126
4127 62
    if (!isset($str[0])) {
4128 4
      return '';
4129
    }
4130
4131 62
    if ($processInvalidUtf8 === true) {
4132 62
      $replacementCharHelper = $replacementChar;
4133 62
      if ($replacementChar === '') {
4134 62
        $replacementCharHelper = 'none';
4135 62
      }
4136
4137 62
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
4138
        self::checkForSupport();
4139
      }
4140
4141 62
      $save = \mb_substitute_character();
4142 62
      \mb_substitute_character($replacementCharHelper);
4143 62
      $str = \mb_convert_encoding($str, 'UTF-8', 'UTF-8');
4144 62
      \mb_substitute_character($save);
4145 62
    }
4146
4147 62
    return str_replace(
4148
        array(
4149 62
            "\xEF\xBF\xBD",
4150 62
            '�',
4151 62
        ),
4152
        array(
4153 62
            $replacementChar,
4154 62
            $replacementChar,
4155 62
        ),
4156
        $str
4157 62
    );
4158
  }
4159
4160
  /**
4161
   * Strip whitespace or other characters from end of a UTF-8 string.
4162
   *
4163
   * @param string $str   <p>The string to be trimmed.</p>
4164
   * @param string $chars <p>Optional characters to be stripped.</p>
4165
   *
4166
   * @return string <p>The string with unwanted characters stripped from the right.</p>
4167
   */
4168 23 View Code Duplication
  public static function rtrim($str = '', $chars = INF)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4169
  {
4170 23
    $str = (string)$str;
4171
4172 23
    if (!isset($str[0])) {
4173 5
      return '';
4174
    }
4175
4176
    // Info: http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page#Unicodecharactercategories
4177 19
    if ($chars === INF || !$chars) {
4178 3
      return preg_replace('/[\pZ\pC]+$/u', '', $str);
4179
    }
4180
4181 18
    return preg_replace('/' . self::rxClass($chars) . '+$/u', '', $str);
4182
  }
4183
4184
  /**
4185
   * rxClass
4186
   *
4187
   * @param string $s
4188
   * @param string $class
4189
   *
4190
   * @return string
4191
   */
4192 60
  private static function rxClass($s, $class = '')
4193
  {
4194 60
    static $RX_CLASSS_CACHE = array();
4195
4196 60
    $cacheKey = $s . $class;
4197
4198 60
    if (isset($RX_CLASSS_CACHE[$cacheKey])) {
4199 48
      return $RX_CLASSS_CACHE[$cacheKey];
4200
    }
4201
4202
    /** @noinspection CallableParameterUseCaseInTypeContextInspection */
4203 20
    $class = array($class);
4204
4205
    /** @noinspection SuspiciousLoopInspection */
4206 20
    foreach (self::str_split($s) as $s) {
4207 19
      if ('-' === $s) {
4208
        $class[0] = '-' . $class[0];
4209 19
      } elseif (!isset($s[2])) {
4210 19
        $class[0] .= preg_quote($s, '/');
4211 19
      } elseif (1 === self::strlen($s)) {
4212 2
        $class[0] .= $s;
4213 2
      } else {
4214
        $class[] = $s;
4215
      }
4216 20
    }
4217
4218 20
    if ($class[0]) {
4219 20
      $class[0] = '[' . $class[0] . ']';
4220 20
    }
4221
4222 20
    if (1 === count($class)) {
4223 20
      $return = $class[0];
4224 20
    } else {
4225
      $return = '(?:' . implode('|', $class) . ')';
4226
    }
4227
4228 20
    $RX_CLASSS_CACHE[$cacheKey] = $return;
4229
4230 20
    return $return;
4231
  }
4232
4233
  /**
4234
   * WARNING: Print native UTF-8 support (libs), e.g. for debugging.
4235
   */
4236 1
  public static function showSupport()
4237
  {
4238 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
4239
      self::checkForSupport();
4240
    }
4241
4242 1
    echo '<pre>';
4243 1
    foreach (self::$SUPPORT as $key => $value) {
4244 1
      echo $key . ' - ' . print_r($value, true) . "\n<br>";
4245 1
    }
4246 1
    echo '</pre>';
4247 1
  }
4248
4249
  /**
4250
   * Converts a UTF-8 character to HTML Numbered Entity like "&#123;".
4251
   *
4252
   * @param string $char           <p>The Unicode character to be encoded as numbered entity.</p>
4253
   * @param bool   $keepAsciiChars <p>Set to <strong>true</strong> to keep ASCII chars.</>
4254
   * @param string $encoding       [optional] <p>Default is UTF-8</p>
4255
   *
4256
   * @return string <p>The HTML numbered entity.</p>
4257
   */
4258 1
  public static function single_chr_html_encode($char, $keepAsciiChars = false, $encoding = 'UTF-8')
4259
  {
4260 1
    $char = (string)$char;
4261
4262 1
    if (!isset($char[0])) {
4263 1
      return '';
4264
    }
4265
4266
    if (
4267
        $keepAsciiChars === true
4268 1
        &&
4269 1
        self::is_ascii($char) === true
4270 1
    ) {
4271 1
      return $char;
4272
    }
4273
4274 1
    if ($encoding !== 'UTF-8') {
4275 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
4276 1
    }
4277
4278 1
    return '&#' . self::ord($char, $encoding) . ';';
4279
  }
4280
4281
  /**
4282
   * Convert a string to an array of Unicode characters.
4283
   *
4284
   * @param string  $str       <p>The string to split into array.</p>
4285
   * @param int     $length    [optional] <p>Max character length of each array element.</p>
4286
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
4287
   *
4288
   * @return string[] <p>An array containing chunks of the string.</p>
4289
   */
4290 39
  public static function split($str, $length = 1, $cleanUtf8 = false)
4291
  {
4292 39
    $str = (string)$str;
4293
4294 39
    if (!isset($str[0])) {
4295 3
      return array();
4296
    }
4297
4298
    // init
4299 38
    $ret = array();
4300
4301 38
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
4302
      self::checkForSupport();
4303
    }
4304
4305 38
    if ($cleanUtf8 === true) {
4306 7
      $str = self::clean($str);
4307 7
    }
4308
4309 38
    if (self::$SUPPORT['pcre_utf8'] === true) {
4310
4311 38
      preg_match_all('/./us', $str, $retArray);
4312 38
      if (isset($retArray[0])) {
4313 38
        $ret = $retArray[0];
4314 38
      }
4315 38
      unset($retArray);
4316
4317 38
    } else {
4318
4319
      // fallback
4320
4321 2
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
4322
        self::checkForSupport();
4323
      }
4324
4325 2 View Code Duplication
      if (self::$SUPPORT['mbstring_func_overload'] === true) {
4326
        $len = \mb_strlen($str, '8BIT');
4327
      } else {
4328 2
        $len = strlen($str);
4329
      }
4330
4331
      /** @noinspection ForeachInvariantsInspection */
4332 2
      for ($i = 0; $i < $len; $i++) {
4333
4334 2
        if (($str[$i] & "\x80") === "\x00") {
4335
4336 2
          $ret[] = $str[$i];
4337
4338 2
        } elseif (
4339 2
            isset($str[$i + 1])
4340 2
            &&
4341 2
            ($str[$i] & "\xE0") === "\xC0"
4342 2
        ) {
4343
4344
          if (($str[$i + 1] & "\xC0") === "\x80") {
4345
            $ret[] = $str[$i] . $str[$i + 1];
4346
4347
            $i++;
4348
          }
4349
4350 View Code Duplication
        } elseif (
4351 2
            isset($str[$i + 2])
4352 2
            &&
4353 2
            ($str[$i] & "\xF0") === "\xE0"
4354 2
        ) {
4355
4356
          if (
4357 2
              ($str[$i + 1] & "\xC0") === "\x80"
4358 2
              &&
4359 2
              ($str[$i + 2] & "\xC0") === "\x80"
4360 2
          ) {
4361 2
            $ret[] = $str[$i] . $str[$i + 1] . $str[$i + 2];
4362
4363 2
            $i += 2;
4364 2
          }
4365
4366 2
        } elseif (
4367
            isset($str[$i + 3])
4368
            &&
4369
            ($str[$i] & "\xF8") === "\xF0"
4370
        ) {
4371
4372 View Code Duplication
          if (
4373
              ($str[$i + 1] & "\xC0") === "\x80"
4374
              &&
4375
              ($str[$i + 2] & "\xC0") === "\x80"
4376
              &&
4377
              ($str[$i + 3] & "\xC0") === "\x80"
4378
          ) {
4379
            $ret[] = $str[$i] . $str[$i + 1] . $str[$i + 2] . $str[$i + 3];
4380
4381
            $i += 3;
4382
          }
4383
4384
        }
4385 2
      }
4386
    }
4387
4388 38
    if ($length > 1) {
4389 5
      $ret = array_chunk($ret, $length);
4390
4391 5
      return array_map(
4392
          function ($item) {
4393 5
            return implode('', $item);
4394 5
          }, $ret
4395 5
      );
4396
    }
4397
4398 34
    if (isset($ret[0]) && $ret[0] === '') {
4399
      return array();
4400
    }
4401
4402 34
    return $ret;
4403
  }
4404
4405
  /**
4406
   * Optimized "\mb_detect_encoding()"-function -> with support for UTF-16 and UTF-32.
4407
   *
4408
   * @param string $str <p>The input string.</p>
4409
   *
4410
   * @return false|string <p>
4411
   *                      The detected string-encoding e.g. UTF-8 or UTF-16BE,<br>
4412
   *                      otherwise it will return false.
4413
   *                      </p>
4414
   */
4415 12
  public static function str_detect_encoding($str)
4416
  {
4417
    //
4418
    // 1.) check binary strings (010001001...) like UTF-16 / UTF-32
4419
    //
4420
4421 12
    if (self::is_binary($str) === true) {
4422
4423 3
      if (self::is_utf16($str) === 1) {
4424 1
        return 'UTF-16LE';
4425
      }
4426
4427 3
      if (self::is_utf16($str) === 2) {
4428 1
        return 'UTF-16BE';
4429
      }
4430
4431 2
      if (self::is_utf32($str) === 1) {
4432
        return 'UTF-32LE';
4433
      }
4434
4435 2
      if (self::is_utf32($str) === 2) {
4436
        return 'UTF-32BE';
4437
      }
4438
4439 2
    }
4440
4441
    //
4442
    // 2.) simple check for ASCII chars
4443
    //
4444
4445 12
    if (self::is_ascii($str) === true) {
4446 3
      return 'ASCII';
4447
    }
4448
4449
    //
4450
    // 3.) simple check for UTF-8 chars
4451
    //
4452
4453 12
    if (self::is_utf8($str) === true) {
4454 9
      return 'UTF-8';
4455
    }
4456
4457
    //
4458
    // 4.) check via "\mb_detect_encoding()"
4459
    //
4460
    // INFO: UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always with "\mb_detect_encoding()"
4461
4462
    $detectOrder = array(
4463 7
        'ISO-8859-1',
4464 7
        'ISO-8859-2',
4465 7
        'ISO-8859-3',
4466 7
        'ISO-8859-4',
4467 7
        'ISO-8859-5',
4468 7
        'ISO-8859-6',
4469 7
        'ISO-8859-7',
4470 7
        'ISO-8859-8',
4471 7
        'ISO-8859-9',
4472 7
        'ISO-8859-10',
4473 7
        'ISO-8859-13',
4474 7
        'ISO-8859-14',
4475 7
        'ISO-8859-15',
4476 7
        'ISO-8859-16',
4477 7
        'WINDOWS-1251',
4478 7
        'WINDOWS-1252',
4479 7
        'WINDOWS-1254',
4480 7
        'ISO-2022-JP',
4481 7
        'JIS',
4482 7
        'EUC-JP',
4483 7
    );
4484
4485 7
    $encoding = \mb_detect_encoding($str, $detectOrder, true);
4486 7
    if ($encoding) {
4487 7
      return $encoding;
4488
    }
4489
4490
    //
4491
    // 5.) check via "iconv()"
4492
    //
4493
4494
    $md5 = md5($str);
4495
    foreach (self::$ICONV_ENCODING as $encodingTmp) {
4496
      # INFO: //IGNORE and //TRANSLIT still throw notice
4497
      /** @noinspection PhpUsageOfSilenceOperatorInspection */
4498
      if (md5(@\iconv($encodingTmp, $encodingTmp . '//IGNORE', $str)) === $md5) {
4499
        return $encodingTmp;
4500
      }
4501
    }
4502
4503
    return false;
4504
  }
4505
4506
  /**
4507
   * Check if the string ends with the given substring.
4508
   *
4509
   * @param string $haystack <p>The string to search in.</p>
4510
   * @param string $needle   <p>The substring to search for.</p>
4511
   *
4512
   * @return bool
4513
   */
4514 2
  public static function str_ends_with($haystack, $needle)
4515
  {
4516 2
    $haystack = (string)$haystack;
4517 2
    $needle = (string)$needle;
4518
4519 2
    if (!isset($haystack[0], $needle[0])) {
4520 1
      return false;
4521
    }
4522
4523 2
    if (substr($haystack, -strlen($needle)) === $needle) {
4524 2
      return true;
4525
    }
4526
4527 2
    return false;
4528
  }
4529
4530
  /**
4531
   * Check if the string ends with the given substring, case insensitive.
4532
   *
4533
   * @param string $haystack <p>The string to search in.</p>
4534
   * @param string $needle   <p>The substring to search for.</p>
4535
   *
4536
   * @return bool
4537
   */
4538 2
  public static function str_iends_with($haystack, $needle)
4539
  {
4540 2
    $haystack = (string)$haystack;
4541 2
    $needle = (string)$needle;
4542
4543 2
    if (!isset($haystack[0], $needle[0])) {
4544 1
      return false;
4545
    }
4546
4547 2
    if (self::strcasecmp(substr($haystack, -strlen($needle)), $needle) === 0) {
4548 2
      return true;
4549
    }
4550
4551 2
    return false;
4552
  }
4553
4554
  /**
4555
   * Case-insensitive and UTF-8 safe version of <function>str_replace</function>.
4556
   *
4557
   * @link  http://php.net/manual/en/function.str-ireplace.php
4558
   *
4559
   * @param mixed $search  <p>
4560
   *                       Every replacement with search array is
4561
   *                       performed on the result of previous replacement.
4562
   *                       </p>
4563
   * @param mixed $replace <p>
4564
   *                       </p>
4565
   * @param mixed $subject <p>
4566
   *                       If subject is an array, then the search and
4567
   *                       replace is performed with every entry of
4568
   *                       subject, and the return value is an array as
4569
   *                       well.
4570
   *                       </p>
4571
   * @param int   $count   [optional] <p>
4572
   *                       The number of matched and replaced needles will
4573
   *                       be returned in count which is passed by
4574
   *                       reference.
4575
   *                       </p>
4576
   *
4577
   * @return mixed <p>A string or an array of replacements.</p>
4578
   */
4579 26
  public static function str_ireplace($search, $replace, $subject, &$count = null)
4580
  {
4581 26
    $search = (array)$search;
4582
4583
    /** @noinspection AlterInForeachInspection */
4584 26
    foreach ($search as &$s) {
4585 26
      if ('' === $s .= '') {
4586 2
        $s = '/^(?<=.)$/';
4587 2
      } else {
4588 24
        $s = '/' . preg_quote($s, '/') . '/ui';
4589
      }
4590 26
    }
4591
4592 26
    $subject = preg_replace($search, $replace, $subject, -1, $replace);
4593 26
    $count = $replace; // used as reference parameter
4594
4595 26
    return $subject;
4596
  }
4597
4598
  /**
4599
   * Check if the string starts with the given substring, case insensitive.
4600
   *
4601
   * @param string $haystack <p>The string to search in.</p>
4602
   * @param string $needle   <p>The substring to search for.</p>
4603
   *
4604
   * @return bool
4605
   */
4606 2 View Code Duplication
  public static function str_istarts_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4607
  {
4608 2
    $haystack = (string)$haystack;
4609 2
    $needle = (string)$needle;
4610
4611 2
    if (!isset($haystack[0], $needle[0])) {
4612 1
      return false;
4613
    }
4614
4615 2
    if (self::stripos($haystack, $needle) === 0) {
4616 2
      return true;
4617
    }
4618
4619 2
    return false;
4620
  }
4621
4622
  /**
4623
   * Limit the number of characters in a string, but also after the next word.
4624
   *
4625
   * @param string $str
4626
   * @param int    $length
4627
   * @param string $strAddOn
4628
   *
4629
   * @return string
4630
   */
4631 1
  public static function str_limit_after_word($str, $length = 100, $strAddOn = '...')
4632
  {
4633 1
    $str = (string)$str;
4634
4635 1
    if (!isset($str[0])) {
4636 1
      return '';
4637
    }
4638
4639 1
    $length = (int)$length;
4640
4641 1
    if (self::strlen($str) <= $length) {
4642 1
      return $str;
4643
    }
4644
4645 1
    if (self::substr($str, $length - 1, 1) === ' ') {
4646 1
      return (string)self::substr($str, 0, $length - 1) . $strAddOn;
4647
    }
4648
4649 1
    $str = (string)self::substr($str, 0, $length);
4650 1
    $array = explode(' ', $str);
4651 1
    array_pop($array);
4652 1
    $new_str = implode(' ', $array);
4653
4654 1
    if ($new_str === '') {
4655 1
      $str = (string)self::substr($str, 0, $length - 1) . $strAddOn;
4656 1
    } else {
4657 1
      $str = $new_str . $strAddOn;
4658
    }
4659
4660 1
    return $str;
4661
  }
4662
4663
  /**
4664
   * Pad a UTF-8 string to given length with another string.
4665
   *
4666
   * @param string $str        <p>The input string.</p>
4667
   * @param int    $pad_length <p>The length of return string.</p>
4668
   * @param string $pad_string [optional] <p>String to use for padding the input string.</p>
4669
   * @param int    $pad_type   [optional] <p>
4670
   *                           Can be <strong>STR_PAD_RIGHT</strong> (default),
4671
   *                           <strong>STR_PAD_LEFT</strong> or <strong>STR_PAD_BOTH</strong>
4672
   *                           </p>
4673
   *
4674
   * @return string <strong>Returns the padded string</strong>
4675
   */
4676 2
  public static function str_pad($str, $pad_length, $pad_string = ' ', $pad_type = STR_PAD_RIGHT)
4677
  {
4678 2
    $str_length = self::strlen($str);
4679
4680
    if (
4681 2
        is_int($pad_length) === true
4682 2
        &&
4683
        $pad_length > 0
4684 2
        &&
4685
        $pad_length >= $str_length
4686 2
    ) {
4687 2
      $ps_length = self::strlen($pad_string);
4688
4689 2
      $diff = $pad_length - $str_length;
4690
4691
      switch ($pad_type) {
4692 2 View Code Duplication
        case STR_PAD_LEFT:
4693 2
          $pre = str_repeat($pad_string, (int)ceil($diff / $ps_length));
4694 2
          $pre = (string)self::substr($pre, 0, $diff);
4695 2
          $post = '';
4696 2
          break;
4697
4698 2
        case STR_PAD_BOTH:
4699 2
          $pre = str_repeat($pad_string, (int)ceil($diff / $ps_length / 2));
4700 2
          $pre = (string)self::substr($pre, 0, (int)$diff / 2);
4701 2
          $post = str_repeat($pad_string, (int)ceil($diff / $ps_length / 2));
4702 2
          $post = (string)self::substr($post, 0, (int)ceil($diff / 2));
4703 2
          break;
4704
4705 2
        case STR_PAD_RIGHT:
4706 2 View Code Duplication
        default:
4707 2
          $post = str_repeat($pad_string, (int)ceil($diff / $ps_length));
4708 2
          $post = (string)self::substr($post, 0, $diff);
4709 2
          $pre = '';
4710 2
      }
4711
4712 2
      return $pre . $str . $post;
4713
    }
4714
4715 2
    return $str;
4716
  }
4717
4718
  /**
4719
   * Repeat a string.
4720
   *
4721
   * @param string $str        <p>
4722
   *                           The string to be repeated.
4723
   *                           </p>
4724
   * @param int    $multiplier <p>
4725
   *                           Number of time the input string should be
4726
   *                           repeated.
4727
   *                           </p>
4728
   *                           <p>
4729
   *                           multiplier has to be greater than or equal to 0.
4730
   *                           If the multiplier is set to 0, the function
4731
   *                           will return an empty string.
4732
   *                           </p>
4733
   *
4734
   * @return string <p>The repeated string.</p>
4735
   */
4736 1
  public static function str_repeat($str, $multiplier)
4737
  {
4738 1
    $str = self::filter($str);
4739
4740 1
    return str_repeat($str, $multiplier);
4741
  }
4742
4743
  /**
4744
   * INFO: This is only a wrapper for "str_replace()"  -> the original functions is already UTF-8 safe.
4745
   *
4746
   * Replace all occurrences of the search string with the replacement string
4747
   *
4748
   * @link http://php.net/manual/en/function.str-replace.php
4749
   *
4750
   * @param mixed $search  <p>
4751
   *                       The value being searched for, otherwise known as the needle.
4752
   *                       An array may be used to designate multiple needles.
4753
   *                       </p>
4754
   * @param mixed $replace <p>
4755
   *                       The replacement value that replaces found search
4756
   *                       values. An array may be used to designate multiple replacements.
4757
   *                       </p>
4758
   * @param mixed $subject <p>
4759
   *                       The string or array being searched and replaced on,
4760
   *                       otherwise known as the haystack.
4761
   *                       </p>
4762
   *                       <p>
4763
   *                       If subject is an array, then the search and
4764
   *                       replace is performed with every entry of
4765
   *                       subject, and the return value is an array as
4766
   *                       well.
4767
   *                       </p>
4768
   * @param int   $count   [optional] If passed, this will hold the number of matched and replaced needles.
4769
   *
4770
   * @return mixed <p>This function returns a string or an array with the replaced values.</p>
4771
   */
4772 12
  public static function str_replace($search, $replace, $subject, &$count = null)
4773
  {
4774 12
    return str_replace($search, $replace, $subject, $count);
4775
  }
4776
4777
  /**
4778
   * Replace the first "$search"-term with the "$replace"-term.
4779
   *
4780
   * @param string $search
4781
   * @param string $replace
4782
   * @param string $subject
4783
   *
4784
   * @return string
4785
   */
4786 1
  public static function str_replace_first($search, $replace, $subject)
4787
  {
4788 1
    $pos = self::strpos($subject, $search);
4789
4790 1
    if ($pos !== false) {
4791 1
      return self::substr_replace($subject, $replace, $pos, self::strlen($search));
4792
    }
4793
4794 1
    return $subject;
4795
  }
4796
4797
  /**
4798
   * Shuffles all the characters in the string.
4799
   *
4800
   * @param string $str <p>The input string</p>
4801
   *
4802
   * @return string <p>The shuffled string.</p>
4803
   */
4804 1
  public static function str_shuffle($str)
4805
  {
4806 1
    $array = self::split($str);
4807
4808 1
    shuffle($array);
4809
4810 1
    return implode('', $array);
4811
  }
4812
4813
  /**
4814
   * Sort all characters according to code points.
4815
   *
4816
   * @param string $str    <p>A UTF-8 string.</p>
4817
   * @param bool   $unique <p>Sort unique. If <strong>true</strong>, repeated characters are ignored.</p>
4818
   * @param bool   $desc   <p>If <strong>true</strong>, will sort characters in reverse code point order.</p>
4819
   *
4820
   * @return string <p>String of sorted characters.</p>
4821
   */
4822 1
  public static function str_sort($str, $unique = false, $desc = false)
4823
  {
4824 1
    $array = self::codepoints($str);
4825
4826 1
    if ($unique) {
4827 1
      $array = array_flip(array_flip($array));
4828 1
    }
4829
4830 1
    if ($desc) {
4831 1
      arsort($array);
4832 1
    } else {
4833 1
      asort($array);
4834
    }
4835
4836 1
    return self::string($array);
4837
  }
4838
4839
  /**
4840
   * Split a string into an array.
4841
   *
4842
   * @param string $str
4843
   * @param int    $len
4844
   *
4845
   * @return array
4846
   */
4847 23
  public static function str_split($str, $len = 1)
4848
  {
4849 23
    $str = (string)$str;
4850
4851 23
    if (!isset($str[0])) {
4852 1
      return array();
4853
    }
4854
4855 22
    $len = (int)$len;
4856
4857 22
    if ($len < 1) {
4858
      return str_split($str, $len);
4859
    }
4860
4861
    /** @noinspection PhpInternalEntityUsedInspection */
4862 22
    preg_match_all('/' . self::GRAPHEME_CLUSTER_RX . '/u', $str, $a);
4863 22
    $a = $a[0];
4864
4865 22
    if ($len === 1) {
4866 22
      return $a;
4867
    }
4868
4869 1
    $arrayOutput = array();
4870 1
    $p = -1;
4871
4872
    /** @noinspection PhpForeachArrayIsUsedAsValueInspection */
4873 1
    foreach ($a as $l => $a) {
4874 1
      if ($l % $len) {
4875 1
        $arrayOutput[$p] .= $a;
4876 1
      } else {
4877 1
        $arrayOutput[++$p] = $a;
4878
      }
4879 1
    }
4880
4881 1
    return $arrayOutput;
4882
  }
4883
4884
  /**
4885
   * Check if the string starts with the given substring.
4886
   *
4887
   * @param string $haystack <p>The string to search in.</p>
4888
   * @param string $needle   <p>The substring to search for.</p>
4889
   *
4890
   * @return bool
4891
   */
4892 2 View Code Duplication
  public static function str_starts_with($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
4893
  {
4894 2
    $haystack = (string)$haystack;
4895 2
    $needle = (string)$needle;
4896
4897 2
    if (!isset($haystack[0], $needle[0])) {
4898 1
      return false;
4899
    }
4900
4901 2
    if (strpos($haystack, $needle) === 0) {
4902 2
      return true;
4903
    }
4904
4905 2
    return false;
4906
  }
4907
4908
  /**
4909
   * Get a binary representation of a specific string.
4910
   *
4911
   * @param string $str <p>The input string.</p>
4912
   *
4913
   * @return string
4914
   */
4915 1
  public static function str_to_binary($str)
4916
  {
4917 1
    $str = (string)$str;
4918
4919 1
    $value = unpack('H*', $str);
4920
4921 1
    return base_convert($value[1], 16, 2);
4922
  }
4923
4924
  /**
4925
   * Convert a string into an array of words.
4926
   *
4927
   * @param string   $str
4928
   * @param string   $charList <p>Additional chars for the definition of "words".</p>
4929
   * @param bool     $removeEmptyValues <p>Remove empty values.</p>
4930
   * @param null|int $removeShortValues
4931
   *
4932
   * @return array
4933
   */
4934 10
  public static function str_to_words($str, $charList = '', $removeEmptyValues = false, $removeShortValues = null)
4935
  {
4936 10
    $str = (string)$str;
4937
4938 10
    if ($removeShortValues !== null) {
4939 1
      $removeShortValues = (int)$removeShortValues;
4940 1
    }
4941
4942 10
    if (!isset($str[0])) {
4943 2
      if ($removeEmptyValues === true) {
4944
        return array();
4945
      }
4946
4947 2
      return array('');
4948
    }
4949
4950 10
    $charList = self::rxClass($charList, '\pL');
4951
4952 10
    $return = \preg_split("/({$charList}+(?:[\p{Pd}’']{$charList}+)*)/u", $str, -1, PREG_SPLIT_DELIM_CAPTURE);
4953
4954
    if (
4955
        $removeShortValues === null
4956 10
        &&
4957
        $removeEmptyValues === false
4958 10
    ) {
4959 10
      return $return;
4960
    }
4961
4962 1
    $tmpReturn = array();
4963 1
    foreach ($return as $returnValue) {
4964
      if (
4965
          $removeShortValues !== null
4966 1
          &&
4967 1
          self::strlen($returnValue) <= $removeShortValues
4968 1
      ) {
4969 1
        continue;
4970
      }
4971
4972
      if (
4973
          $removeEmptyValues === true
4974 1
          &&
4975 1
          trim($returnValue) === ''
4976 1
      ) {
4977 1
        continue;
4978
      }
4979
4980 1
      $tmpReturn[] = $returnValue;
4981 1
    }
4982
4983 1
    return $tmpReturn;
4984
  }
4985
4986
  /**
4987
   * alias for "UTF8::to_ascii()"
4988
   *
4989
   * @see UTF8::to_ascii()
4990
   *
4991
   * @param string $str
4992
   * @param string $unknown
4993
   * @param bool   $strict
4994
   *
4995
   * @return string
4996
   */
4997 7
  public static function str_transliterate($str, $unknown = '?', $strict = false)
4998
  {
4999 7
    return self::to_ascii($str, $unknown, $strict);
5000
  }
5001
5002
  /**
5003
   * Counts number of words in the UTF-8 string.
5004
   *
5005
   * @param string $str      <p>The input string.</p>
5006
   * @param int    $format   [optional] <p>
5007
   *                         <strong>0</strong> => return a number of words (default)<br>
5008
   *                         <strong>1</strong> => return an array of words<br>
5009
   *                         <strong>2</strong> => return an array of words with word-offset as key
5010
   *                         </p>
5011
   * @param string $charlist [optional] <p>Additional chars that contains to words and do not start a new word.</p>
5012
   *
5013
   * @return array|int <p>The number of words in the string</p>
5014
   */
5015 1
  public static function str_word_count($str, $format = 0, $charlist = '')
5016
  {
5017 1
    $strParts = self::str_to_words($str, $charlist);
5018
5019 1
    $len = count($strParts);
5020
5021 1
    if ($format === 1) {
5022
5023 1
      $numberOfWords = array();
5024 1
      for ($i = 1; $i < $len; $i += 2) {
5025 1
        $numberOfWords[] = $strParts[$i];
5026 1
      }
5027
5028 1
    } elseif ($format === 2) {
5029
5030 1
      $numberOfWords = array();
5031 1
      $offset = self::strlen($strParts[0]);
5032 1
      for ($i = 1; $i < $len; $i += 2) {
5033 1
        $numberOfWords[$offset] = $strParts[$i];
5034 1
        $offset += self::strlen($strParts[$i]) + self::strlen($strParts[$i + 1]);
5035 1
      }
5036
5037 1
    } else {
5038
5039 1
      $numberOfWords = ($len - 1) / 2;
5040
5041
    }
5042
5043 1
    return $numberOfWords;
5044
  }
5045
5046
  /**
5047
   * Case-insensitive string comparison.
5048
   *
5049
   * INFO: Case-insensitive version of UTF8::strcmp()
5050
   *
5051
   * @param string $str1
5052
   * @param string $str2
5053
   *
5054
   * @return int <p>
5055
   *             <strong>&lt; 0</strong> if str1 is less than str2;<br>
5056
   *             <strong>&gt; 0</strong> if str1 is greater than str2,<br>
5057
   *             <strong>0</strong> if they are equal.
5058
   *             </p>
5059
   */
5060 11
  public static function strcasecmp($str1, $str2)
5061
  {
5062 11
    return self::strcmp(self::strtocasefold($str1), self::strtocasefold($str2));
5063
  }
5064
5065
  /**
5066
   * alias for "UTF8::strstr()"
5067
   *
5068
   * @see UTF8::strstr()
5069
   *
5070
   * @param string  $haystack
5071
   * @param string  $needle
5072
   * @param bool    $before_needle
5073
   * @param string  $encoding
5074
   * @param boolean $cleanUtf8
5075
   *
5076
   * @return string|false
5077
   */
5078 1
  public static function strchr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
5079
  {
5080 1
    return self::strstr($haystack, $needle, $before_needle, $encoding, $cleanUtf8);
5081
  }
5082
5083
  /**
5084
   * Case-sensitive string comparison.
5085
   *
5086
   * @param string $str1
5087
   * @param string $str2
5088
   *
5089
   * @return int  <p>
5090
   *              <strong>&lt; 0</strong> if str1 is less than str2<br>
5091
   *              <strong>&gt; 0</strong> if str1 is greater than str2<br>
5092
   *              <strong>0</strong> if they are equal.
5093
   *              </p>
5094
   */
5095 14
  public static function strcmp($str1, $str2)
5096
  {
5097
    /** @noinspection PhpUndefinedClassInspection */
5098 14
    return $str1 . '' === $str2 . '' ? 0 : strcmp(
5099 13
        \Normalizer::normalize($str1, \Normalizer::NFD),
5100 13
        \Normalizer::normalize($str2, \Normalizer::NFD)
5101 14
    );
5102
  }
5103
5104
  /**
5105
   * Find length of initial segment not matching mask.
5106
   *
5107
   * @param string $str
5108
   * @param string $charList
5109
   * @param int    $offset
5110
   * @param int    $length
5111
   *
5112
   * @return int|null
5113
   */
5114 15
  public static function strcspn($str, $charList, $offset = 0, $length = null)
5115
  {
5116 15
    if ('' === $charList .= '') {
5117 1
      return null;
5118
    }
5119
5120 14 View Code Duplication
    if ($offset || $length !== null) {
5121 2
      $strTmp = self::substr($str, $offset, $length);
5122 2
      if ($strTmp === false) {
5123
        return null;
5124
      }
5125 2
      $str = (string)$strTmp;
5126 2
    }
5127
5128 14
    $str = (string)$str;
5129 14
    if (!isset($str[0])) {
5130 1
      return null;
5131
    }
5132
5133 13
    if (preg_match('/^(.*?)' . self::rxClass($charList) . '/us', $str, $length)) {
5134
      /** @noinspection OffsetOperationsInspection */
5135 13
      return self::strlen($length[1]);
5136
    }
5137
5138 1
    return self::strlen($str);
5139
  }
5140
5141
  /**
5142
   * alias for "UTF8::stristr()"
5143
   *
5144
   * @see UTF8::stristr()
5145
   *
5146
   * @param string  $haystack
5147
   * @param string  $needle
5148
   * @param bool    $before_needle
5149
   * @param string  $encoding
5150
   * @param boolean $cleanUtf8
5151
   *
5152
   * @return string|false
5153
   */
5154 1
  public static function strichr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
5155
  {
5156 1
    return self::stristr($haystack, $needle, $before_needle, $encoding, $cleanUtf8);
5157
  }
5158
5159
  /**
5160
   * Create a UTF-8 string from code points.
5161
   *
5162
   * INFO: opposite to UTF8::codepoints()
5163
   *
5164
   * @param array $array <p>Integer or Hexadecimal codepoints.</p>
5165
   *
5166
   * @return string <p>UTF-8 encoded string.</p>
5167
   */
5168 2
  public static function string(array $array)
5169
  {
5170 2
    return implode(
5171 2
        '',
5172 2
        array_map(
5173
            array(
5174 2
                '\\voku\\helper\\UTF8',
5175 2
                'chr',
5176 2
            ),
5177
            $array
5178 2
        )
5179 2
    );
5180
  }
5181
5182
  /**
5183
   * Checks if string starts with "BOM" (Byte Order Mark Character) character.
5184
   *
5185
   * @param string $str <p>The input string.</p>
5186
   *
5187
   * @return bool <p><strong>true</strong> if the string has BOM at the start, <strong>false</strong> otherwise.</p>
5188
   */
5189 3
  public static function string_has_bom($str)
5190
  {
5191 3
    foreach (self::$BOM as $bomString => $bomByteLength) {
5192 3
      if (0 === strpos($str, $bomString)) {
5193 3
        return true;
5194
      }
5195 3
    }
5196
5197 3
    return false;
5198
  }
5199
5200
  /**
5201
   * Strip HTML and PHP tags from a string + clean invalid UTF-8.
5202
   *
5203
   * @link http://php.net/manual/en/function.strip-tags.php
5204
   *
5205
   * @param string  $str            <p>
5206
   *                                The input string.
5207
   *                                </p>
5208
   * @param string  $allowable_tags [optional] <p>
5209
   *                                You can use the optional second parameter to specify tags which should
5210
   *                                not be stripped.
5211
   *                                </p>
5212
   *                                <p>
5213
   *                                HTML comments and PHP tags are also stripped. This is hardcoded and
5214
   *                                can not be changed with allowable_tags.
5215
   *                                </p>
5216
   * @param boolean $cleanUtf8      [optional] <p>Remove non UTF-8 chars from the string.</p>
5217
   *
5218
   * @return string <p>The stripped string.</p>
5219
   */
5220 2 View Code Duplication
  public static function strip_tags($str, $allowable_tags = null, $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5221
  {
5222 2
    $str = (string)$str;
5223
5224 2
    if (!isset($str[0])) {
5225 1
      return '';
5226
    }
5227
5228 2
    if ($cleanUtf8 === true) {
5229 1
      $str = self::clean($str);
5230 1
    }
5231
5232 2
    return strip_tags($str, $allowable_tags);
5233
  }
5234
5235
  /**
5236
   * Finds position of first occurrence of a string within another, case insensitive.
5237
   *
5238
   * @link http://php.net/manual/en/function.mb-stripos.php
5239
   *
5240
   * @param string  $haystack  <p>The string from which to get the position of the first occurrence of needle.</p>
5241
   * @param string  $needle    <p>The string to find in haystack.</p>
5242
   * @param int     $offset    [optional] <p>The position in haystack to start searching.</p>
5243
   * @param string  $encoding  [optional] <p>Set the charset.</p>
5244
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
5245
   *
5246
   * @return int|false <p>
5247
   *                   Return the numeric position of the first occurrence of needle in the haystack string,<br>
5248
   *                   or false if needle is not found.
5249
   *                   </p>
5250
   */
5251 10
  public static function stripos($haystack, $needle, $offset = null, $encoding = 'UTF-8', $cleanUtf8 = false)
5252
  {
5253 10
    $haystack = (string)$haystack;
5254 10
    $needle = (string)$needle;
5255 10
    $offset = (int)$offset;
5256
5257 10
    if (!isset($haystack[0], $needle[0])) {
5258 3
      return false;
5259
    }
5260
5261 9
    if ($cleanUtf8 === true) {
5262
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5263
      // if invalid characters are found in $haystack before $needle
5264 1
      $haystack = self::clean($haystack);
5265 1
      $needle = self::clean($needle);
5266 1
    }
5267
5268 View Code Duplication
    if (
5269
        $encoding === 'UTF-8'
5270 9
        ||
5271 2
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5272 9
    ) {
5273 9
      $encoding = 'UTF-8';
5274 9
    } else {
5275 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5276
    }
5277
5278 9
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5279
      self::checkForSupport();
5280
    }
5281
5282
    if (
5283
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5284 9
        &&
5285 9
        self::$SUPPORT['intl'] === true
5286 9
        &&
5287 9
        Bootup::is_php('5.4') === true
5288 9
    ) {
5289
      return \grapheme_stripos($haystack, $needle, $offset);
5290
    }
5291
5292
    // fallback to "mb_"-function via polyfill
5293 9
    return \mb_stripos($haystack, $needle, $offset, $encoding);
5294
  }
5295
5296
  /**
5297
   * Returns all of haystack starting from and including the first occurrence of needle to the end.
5298
   *
5299
   * @param string  $haystack      <p>The input string. Must be valid UTF-8.</p>
5300
   * @param string  $needle        <p>The string to look for. Must be valid UTF-8.</p>
5301
   * @param bool    $before_needle [optional] <p>
5302
   *                               If <b>TRUE</b>, grapheme_strstr() returns the part of the
5303
   *                               haystack before the first occurrence of the needle (excluding the needle).
5304
   *                               </p>
5305
   * @param string  $encoding      [optional] <p>Set the charset for e.g. "\mb_" function</p>
5306
   * @param boolean $cleanUtf8     [optional] <p>Remove non UTF-8 chars from the string.</p>
5307
   *
5308
   * @return false|string A sub-string,<br>or <strong>false</strong> if needle is not found.
5309
   */
5310 17
  public static function stristr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
5311
  {
5312 17
    $haystack = (string)$haystack;
5313 17
    $needle = (string)$needle;
5314 17
    $before_needle = (bool)$before_needle;
5315
5316 17
    if (!isset($haystack[0], $needle[0])) {
5317 6
      return false;
5318
    }
5319
5320 11
    if ($encoding !== 'UTF-8') {
5321 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5322 1
    }
5323
5324 11
    if ($cleanUtf8 === true) {
5325
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5326
      // if invalid characters are found in $haystack before $needle
5327 1
      $needle = self::clean($needle);
5328 1
      $haystack = self::clean($haystack);
5329 1
    }
5330
5331 11
    if (!$needle) {
5332
      return $haystack;
5333
    }
5334
5335 11
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5336
      self::checkForSupport();
5337
    }
5338
5339 View Code Duplication
    if (
5340
        $encoding !== 'UTF-8'
5341 11
        &&
5342 1
        self::$SUPPORT['mbstring'] === false
5343 11
    ) {
5344
      trigger_error('UTF8::stristr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5345
    }
5346
5347 11
    if (self::$SUPPORT['mbstring'] === true) {
5348 11
      return \mb_stristr($haystack, $needle, $before_needle, $encoding);
5349
    }
5350
5351
    if (
5352
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5353
        &&
5354
        self::$SUPPORT['intl'] === true
5355
        &&
5356
        Bootup::is_php('5.4') === true
5357
    ) {
5358
      return \grapheme_stristr($haystack, $needle, $before_needle);
5359
    }
5360
5361
    if (self::is_ascii($needle) && self::is_ascii($haystack)) {
5362
      return stristr($haystack, $needle, $before_needle);
5363
    }
5364
5365
    preg_match('/^(.*?)' . preg_quote($needle, '/') . '/usi', $haystack, $match);
5366
5367
    if (!isset($match[1])) {
5368
      return false;
5369
    }
5370
5371
    if ($before_needle) {
5372
      return $match[1];
5373
    }
5374
5375
    return self::substr($haystack, self::strlen($match[1]));
5376
  }
5377
5378
  /**
5379
   * Get the string length, not the byte-length!
5380
   *
5381
   * @link     http://php.net/manual/en/function.mb-strlen.php
5382
   *
5383
   * @param string  $str       <p>The string being checked for length.</p>
5384
   * @param string  $encoding  [optional] <p>Set the charset.</p>
5385
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
5386
   *
5387
   * @return int <p>The number of characters in the string $str having character encoding $encoding. (One multi-byte
5388
   *             character counted as +1)</p>
5389
   */
5390 87
  public static function strlen($str, $encoding = 'UTF-8', $cleanUtf8 = false)
5391
  {
5392 87
    $str = (string)$str;
5393
5394 87
    if (!isset($str[0])) {
5395 6
      return 0;
5396
    }
5397
5398 View Code Duplication
    if (
5399
        $encoding === 'UTF-8'
5400 86
        ||
5401 14
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5402 86
    ) {
5403 77
      $encoding = 'UTF-8';
5404 77
    } else {
5405 13
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5406
    }
5407
5408 86
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5409
      self::checkForSupport();
5410
    }
5411
5412
    switch ($encoding) {
5413 86
      case 'ASCII':
5414 86
      case 'CP850':
5415 86
      case '8BIT':
5416
        if (
5417
            $encoding === 'CP850'
5418 10
            &&
5419 10
            self::$SUPPORT['mbstring_func_overload'] === false
5420 10
        ) {
5421 10
          return strlen($str);
5422
        }
5423
5424
        return \mb_strlen($str, '8BIT');
5425
    }
5426
5427 78
    if ($cleanUtf8 === true) {
5428
      // "\mb_strlen" and "\iconv_strlen" returns wrong length,
5429
      // if invalid characters are found in $str
5430 2
      $str = self::clean($str);
5431 2
    }
5432
5433 View Code Duplication
    if (
5434
        $encoding !== 'UTF-8'
5435 78
        &&
5436 2
        self::$SUPPORT['mbstring'] === false
5437 78
        &&
5438
        self::$SUPPORT['iconv'] === false
5439 78
    ) {
5440
      trigger_error('UTF8::strlen() without mbstring / iconv cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5441
    }
5442
5443
    if (
5444
        $encoding !== 'UTF-8'
5445 78
        &&
5446 2
        self::$SUPPORT['iconv'] === true
5447 78
        &&
5448 2
        self::$SUPPORT['mbstring'] === false
5449 78
    ) {
5450
      return \iconv_strlen($str, $encoding);
5451
    }
5452
5453 78
    if (self::$SUPPORT['mbstring'] === true) {
5454 77
      return \mb_strlen($str, $encoding);
5455
    }
5456
5457 2
    if (self::$SUPPORT['iconv'] === true) {
5458
      return \iconv_strlen($str, $encoding);
5459
    }
5460
5461
    if (
5462
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5463 2
        &&
5464 2
        self::$SUPPORT['intl'] === true
5465 2
        &&
5466
        Bootup::is_php('5.4') === true
5467 2
    ) {
5468
      return \grapheme_strlen($str);
5469
    }
5470
5471 2
    if (self::is_ascii($str)) {
5472 1
      return strlen($str);
5473
    }
5474
5475
    // fallback via vanilla php
5476 2
    preg_match_all('/./us', $str, $parts);
5477 2
    $returnTmp = count($parts[0]);
5478 2
    if ($returnTmp !== 0) {
5479 2
      return $returnTmp;
5480
    }
5481
5482
    // fallback to "mb_"-function via polyfill
5483
    return \mb_strlen($str, $encoding);
5484
  }
5485
5486
  /**
5487
   * Case insensitive string comparisons using a "natural order" algorithm.
5488
   *
5489
   * INFO: natural order version of UTF8::strcasecmp()
5490
   *
5491
   * @param string $str1 <p>The first string.</p>
5492
   * @param string $str2 <p>The second string.</p>
5493
   *
5494
   * @return int <strong>&lt; 0</strong> if str1 is less than str2<br>
5495
   *             <strong>&gt; 0</strong> if str1 is greater than str2<br>
5496
   *             <strong>0</strong> if they are equal
5497
   */
5498 1
  public static function strnatcasecmp($str1, $str2)
5499
  {
5500 1
    return self::strnatcmp(self::strtocasefold($str1), self::strtocasefold($str2));
5501
  }
5502
5503
  /**
5504
   * String comparisons using a "natural order" algorithm
5505
   *
5506
   * INFO: natural order version of UTF8::strcmp()
5507
   *
5508
   * @link  http://php.net/manual/en/function.strnatcmp.php
5509
   *
5510
   * @param string $str1 <p>The first string.</p>
5511
   * @param string $str2 <p>The second string.</p>
5512
   *
5513
   * @return int <strong>&lt; 0</strong> if str1 is less than str2;<br>
5514
   *             <strong>&gt; 0</strong> if str1 is greater than str2;<br>
5515
   *             <strong>0</strong> if they are equal
5516
   */
5517 2
  public static function strnatcmp($str1, $str2)
5518
  {
5519 2
    return $str1 . '' === $str2 . '' ? 0 : strnatcmp(self::strtonatfold($str1), self::strtonatfold($str2));
5520
  }
5521
5522
  /**
5523
   * Case-insensitive string comparison of the first n characters.
5524
   *
5525
   * @link  http://php.net/manual/en/function.strncasecmp.php
5526
   *
5527
   * @param string $str1 <p>The first string.</p>
5528
   * @param string $str2 <p>The second string.</p>
5529
   * @param int    $len  <p>The length of strings to be used in the comparison.</p>
5530
   *
5531
   * @return int <strong>&lt; 0</strong> if <i>str1</i> is less than <i>str2</i>;<br>
5532
   *             <strong>&gt; 0</strong> if <i>str1</i> is greater than <i>str2</i>;<br>
5533
   *             <strong>0</strong> if they are equal
5534
   */
5535 1
  public static function strncasecmp($str1, $str2, $len)
5536
  {
5537 1
    return self::strncmp(self::strtocasefold($str1), self::strtocasefold($str2), $len);
5538
  }
5539
5540
  /**
5541
   * String comparison of the first n characters.
5542
   *
5543
   * @link  http://php.net/manual/en/function.strncmp.php
5544
   *
5545
   * @param string $str1 <p>The first string.</p>
5546
   * @param string $str2 <p>The second string.</p>
5547
   * @param int    $len  <p>Number of characters to use in the comparison.</p>
5548
   *
5549
   * @return int <strong>&lt; 0</strong> if <i>str1</i> is less than <i>str2</i>;<br>
5550
   *             <strong>&gt; 0</strong> if <i>str1</i> is greater than <i>str2</i>;<br>
5551
   *             <strong>0</strong> if they are equal
5552
   */
5553 2
  public static function strncmp($str1, $str2, $len)
5554
  {
5555 2
    $str1 = (string)self::substr($str1, 0, $len);
5556 2
    $str2 = (string)self::substr($str2, 0, $len);
5557
5558 2
    return self::strcmp($str1, $str2);
5559
  }
5560
5561
  /**
5562
   * Search a string for any of a set of characters.
5563
   *
5564
   * @link  http://php.net/manual/en/function.strpbrk.php
5565
   *
5566
   * @param string $haystack  <p>The string where char_list is looked for.</p>
5567
   * @param string $char_list <p>This parameter is case sensitive.</p>
5568
   *
5569
   * @return string String starting from the character found, or false if it is not found.
5570
   */
5571 1
  public static function strpbrk($haystack, $char_list)
5572
  {
5573 1
    $haystack = (string)$haystack;
5574 1
    $char_list = (string)$char_list;
5575
5576 1
    if (!isset($haystack[0], $char_list[0])) {
5577 1
      return false;
5578
    }
5579
5580 1
    if (preg_match('/' . self::rxClass($char_list) . '/us', $haystack, $m)) {
5581 1
      return substr($haystack, strpos($haystack, $m[0]));
5582
    }
5583
5584 1
    return false;
5585
  }
5586
5587
  /**
5588
   * Find position of first occurrence of string in a string.
5589
   *
5590
   * @link http://php.net/manual/en/function.mb-strpos.php
5591
   *
5592
   * @param string  $haystack  <p>The string from which to get the position of the first occurrence of needle.</p>
5593
   * @param string  $needle    <p>The string to find in haystack.</p>
5594
   * @param int     $offset    [optional] <p>The search offset. If it is not specified, 0 is used.</p>
5595
   * @param string  $encoding  [optional] <p>Set the charset.</p>
5596
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
5597
   *
5598
   * @return int|false <p>
5599
   *                   The numeric position of the first occurrence of needle in the haystack string.<br>
5600
   *                   If needle is not found it returns false.
5601
   *                   </p>
5602
   */
5603 56
  public static function strpos($haystack, $needle, $offset = 0, $encoding = 'UTF-8', $cleanUtf8 = false)
5604
  {
5605 56
    $haystack = (string)$haystack;
5606 56
    $needle = (string)$needle;
5607
5608 56
    if (!isset($haystack[0], $needle[0])) {
5609 3
      return false;
5610
    }
5611
5612
    // init
5613 55
    $offset = (int)$offset;
5614
5615
    // iconv and mbstring do not support integer $needle
5616
5617 55
    if ((int)$needle === $needle && $needle >= 0) {
0 ignored issues
show
Unused Code Bug introduced by
The strict comparison === seems to always evaluate to false as the types of (int) $needle (integer) and $needle (string) can never be identical. Maybe you want to use a loose comparison == instead?
Loading history...
5618
      $needle = (string)self::chr($needle);
5619
    }
5620
5621 55
    if ($cleanUtf8 === true) {
5622
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5623
      // if invalid characters are found in $haystack before $needle
5624 2
      $needle = self::clean($needle);
5625 2
      $haystack = self::clean($haystack);
5626 2
    }
5627
5628 View Code Duplication
    if (
5629
        $encoding === 'UTF-8'
5630 55
        ||
5631 42
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5632 55
    ) {
5633 15
      $encoding = 'UTF-8';
5634 15
    } else {
5635 41
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5636
    }
5637
5638 55
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5639
      self::checkForSupport();
5640
    }
5641
5642
    if (
5643
        $encoding === 'CP850'
5644 55
        &&
5645 41
        self::$SUPPORT['mbstring_func_overload'] === false
5646 55
    ) {
5647 41
      return strpos($haystack, $needle, $offset);
5648
    }
5649
5650 View Code Duplication
    if (
5651
        $encoding !== 'UTF-8'
0 ignored issues
show
Comprehensibility introduced by
Consider adding parentheses for clarity. Current Interpretation: ($encoding !== 'UTF-8') ...PPORT['iconv'] === true, Probably Intended Meaning: $encoding !== ('UTF-8' &...PORT['iconv'] === true)

When comparing the result of a bit operation, we suggest to add explicit parenthesis and not to rely on PHP’s built-in operator precedence to ensure the code behaves as intended and to make it more readable.

Let’s take a look at these examples:

// Returns always int(0).
return 0 === $foo & 4;
return (0 === $foo) & 4;

// More likely intended return: true/false
return 0 === ($foo & 4);
Loading history...
5652 15
        &
5653 15
        self::$SUPPORT['iconv'] === true
5654 15
        &&
5655 1
        self::$SUPPORT['mbstring'] === false
5656 15
    ) {
5657
      trigger_error('UTF8::strpos() without mbstring / iconv cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5658
    }
5659
5660
    if (
5661
        $offset >= 0 // iconv_strpos() can't handle negative offset
5662 15
        &&
5663
        $encoding !== 'UTF-8'
5664 15
        &&
5665 1
        self::$SUPPORT['mbstring'] === false
5666 15
        &&
5667
        self::$SUPPORT['iconv'] === true
5668 15
    ) {
5669
      // ignore invalid negative offset to keep compatibility
5670
      // with php < 5.5.35, < 5.6.21, < 7.0.6
5671
      return \iconv_strpos($haystack, $needle, $offset > 0 ? $offset : 0, $encoding);
5672
    }
5673
5674 15
    if (self::$SUPPORT['mbstring'] === true) {
5675 15
      return \mb_strpos($haystack, $needle, $offset, $encoding);
5676
    }
5677
5678
    if (
5679
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5680 1
        &&
5681 1
        self::$SUPPORT['intl'] === true
5682 1
        &&
5683
        Bootup::is_php('5.4') === true
5684 1
    ) {
5685
      return \grapheme_strpos($haystack, $needle, $offset);
5686
    }
5687
5688
    if (
5689
        $offset >= 0 // iconv_strpos() can't handle negative offset
5690 1
        &&
5691 1
        self::$SUPPORT['iconv'] === true
5692 1
    ) {
5693
      // ignore invalid negative offset to keep compatibility
5694
      // with php < 5.5.35, < 5.6.21, < 7.0.6
5695
      return \iconv_strpos($haystack, $needle, $offset > 0 ? $offset : 0, $encoding);
5696
    }
5697
5698 1
    $haystackIsAscii = self::is_ascii($haystack);
5699 1
    if ($haystackIsAscii && self::is_ascii($needle)) {
5700 1
      return strpos($haystack, $needle, $offset);
5701
    }
5702
5703
    // fallback via vanilla php
5704
5705 1
    if ($haystackIsAscii) {
5706
      $haystackTmp = substr($haystack, $offset);
5707
    } else {
5708 1
      $haystackTmp = self::substr($haystack, $offset);
5709
    }
5710 1
    if ($haystackTmp === false) {
5711
      $haystackTmp = '';
5712
    }
5713 1
    $haystack = (string)$haystackTmp;
5714
5715 1
    if ($offset < 0) {
5716
      $offset = 0;
5717
    }
5718
5719 1
    $pos = strpos($haystack, $needle);
5720 1
    if ($pos === false) {
5721
      return false;
5722
    }
5723
5724 1
    $returnTmp = $offset + self::strlen(substr($haystack, 0, $pos));
5725 1
    if ($returnTmp !== false) {
5726 1
      return $returnTmp;
5727
    }
5728
5729
    // fallback to "mb_"-function via polyfill
5730
    return \mb_strpos($haystack, $needle, $offset, $encoding);
5731
  }
5732
5733
  /**
5734
   * Finds the last occurrence of a character in a string within another.
5735
   *
5736
   * @link http://php.net/manual/en/function.mb-strrchr.php
5737
   *
5738
   * @param string $haystack      <p>The string from which to get the last occurrence of needle.</p>
5739
   * @param string $needle        <p>The string to find in haystack</p>
5740
   * @param bool   $before_needle [optional] <p>
5741
   *                              Determines which portion of haystack
5742
   *                              this function returns.
5743
   *                              If set to true, it returns all of haystack
5744
   *                              from the beginning to the last occurrence of needle.
5745
   *                              If set to false, it returns all of haystack
5746
   *                              from the last occurrence of needle to the end,
5747
   *                              </p>
5748
   * @param string $encoding      [optional] <p>
5749
   *                              Character encoding name to use.
5750
   *                              If it is omitted, internal character encoding is used.
5751
   *                              </p>
5752
   * @param bool   $cleanUtf8     [optional] <p>Remove non UTF-8 chars from the string.</p>
5753
   *
5754
   * @return string|false The portion of haystack or false if needle is not found.
5755
   */
5756 1 View Code Duplication
  public static function strrchr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5757
  {
5758 1
    if ($encoding !== 'UTF-8') {
5759 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5760 1
    }
5761
5762 1
    if ($cleanUtf8 === true) {
5763
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5764
      // if invalid characters are found in $haystack before $needle
5765 1
      $needle = self::clean($needle);
5766 1
      $haystack = self::clean($haystack);
5767 1
    }
5768
5769
    // fallback to "mb_"-function via polyfill
5770 1
    return \mb_strrchr($haystack, $needle, $before_needle, $encoding);
5771
  }
5772
5773
  /**
5774
   * Reverses characters order in the string.
5775
   *
5776
   * @param string $str The input string
5777
   *
5778
   * @return string The string with characters in the reverse sequence
5779
   */
5780 4
  public static function strrev($str)
5781
  {
5782 4
    $str = (string)$str;
5783
5784 4
    if (!isset($str[0])) {
5785 2
      return '';
5786
    }
5787
5788 3
    return implode('', array_reverse(self::split($str)));
5789
  }
5790
5791
  /**
5792
   * Finds the last occurrence of a character in a string within another, case insensitive.
5793
   *
5794
   * @link http://php.net/manual/en/function.mb-strrichr.php
5795
   *
5796
   * @param string  $haystack      <p>The string from which to get the last occurrence of needle.</p>
5797
   * @param string  $needle        <p>The string to find in haystack.</p>
5798
   * @param bool    $before_needle [optional] <p>
5799
   *                               Determines which portion of haystack
5800
   *                               this function returns.
5801
   *                               If set to true, it returns all of haystack
5802
   *                               from the beginning to the last occurrence of needle.
5803
   *                               If set to false, it returns all of haystack
5804
   *                               from the last occurrence of needle to the end,
5805
   *                               </p>
5806
   * @param string  $encoding      [optional] <p>
5807
   *                               Character encoding name to use.
5808
   *                               If it is omitted, internal character encoding is used.
5809
   *                               </p>
5810
   * @param boolean $cleanUtf8     [optional] <p>Remove non UTF-8 chars from the string.</p>
5811
   *
5812
   * @return string|false <p>The portion of haystack or<br>false if needle is not found.</p>
5813
   */
5814 1 View Code Duplication
  public static function strrichr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
5815
  {
5816 1
    if ($encoding !== 'UTF-8') {
5817 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5818 1
    }
5819
5820 1
    if ($cleanUtf8 === true) {
5821
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
5822
      // if invalid characters are found in $haystack before $needle
5823 1
      $needle = self::clean($needle);
5824 1
      $haystack = self::clean($haystack);
5825 1
    }
5826
5827 1
    return \mb_strrichr($haystack, $needle, $before_needle, $encoding);
5828
  }
5829
5830
  /**
5831
   * Find position of last occurrence of a case-insensitive string.
5832
   *
5833
   * @param string  $haystack  <p>The string to look in.</p>
5834
   * @param string  $needle    <p>The string to look for.</p>
5835
   * @param int     $offset    [optional] <p>Number of characters to ignore in the beginning or end.</p>
5836
   * @param string  $encoding  [optional] <p>Set the charset.</p>
5837
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
5838
   *
5839
   * @return int|false <p>
5840
   *                   The numeric position of the last occurrence of needle in the haystack string.<br>If needle is
5841
   *                   not found, it returns false.
5842
   *                   </p>
5843
   */
5844 1
  public static function strripos($haystack, $needle, $offset = 0, $encoding = 'UTF-8', $cleanUtf8 = false)
5845
  {
5846 1
    if ((int)$needle === $needle && $needle >= 0) {
0 ignored issues
show
Unused Code Bug introduced by
The strict comparison === seems to always evaluate to false as the types of (int) $needle (integer) and $needle (string) can never be identical. Maybe you want to use a loose comparison == instead?
Loading history...
5847
      $needle = (string)self::chr($needle);
5848
    }
5849
5850
    // init
5851 1
    $haystack = (string)$haystack;
5852 1
    $needle = (string)$needle;
5853 1
    $offset = (int)$offset;
5854
5855 1
    if (!isset($haystack[0], $needle[0])) {
5856
      return false;
5857
    }
5858
5859 View Code Duplication
    if (
5860
        $cleanUtf8 === true
5861 1
        ||
5862
        $encoding === true // INFO: the "bool"-check is only a fallback for old versions
5863 1
    ) {
5864
      // \mb_strripos && iconv_strripos is not tolerant to invalid characters
5865
5866 1
      $needle = self::clean($needle);
5867 1
      $haystack = self::clean($haystack);
5868 1
    }
5869
5870 View Code Duplication
    if (
5871
        $encoding === 'UTF-8'
5872 1
        ||
5873 1
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5874 1
    ) {
5875 1
      $encoding = 'UTF-8';
5876 1
    } else {
5877 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5878
    }
5879
5880 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5881
      self::checkForSupport();
5882
    }
5883
5884 View Code Duplication
    if (
5885
        $encoding !== 'UTF-8'
5886 1
        &&
5887
        self::$SUPPORT['mbstring'] === false
5888 1
    ) {
5889
      trigger_error('UTF8::strripos() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5890
    }
5891
5892 1
    if (self::$SUPPORT['mbstring'] === true) {
5893 1
      return \mb_strripos($haystack, $needle, $offset, $encoding);
5894
    }
5895
5896
    if (
5897
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5898
        &&
5899
        self::$SUPPORT['intl'] === true
5900
        &&
5901
        Bootup::is_php('5.4') === true
5902
    ) {
5903
      return \grapheme_strripos($haystack, $needle, $offset);
5904
    }
5905
5906
    // fallback via vanilla php
5907
5908
    return self::strrpos(self::strtoupper($haystack), self::strtoupper($needle), $offset, $encoding, $cleanUtf8);
5909
  }
5910
5911
  /**
5912
   * Find position of last occurrence of a string in a string.
5913
   *
5914
   * @link http://php.net/manual/en/function.mb-strrpos.php
5915
   *
5916
   * @param string     $haystack  <p>The string being checked, for the last occurrence of needle</p>
5917
   * @param string|int $needle    <p>The string to find in haystack.<br>Or a code point as int.</p>
5918
   * @param int        $offset    [optional] <p>May be specified to begin searching an arbitrary number of characters
5919
   *                              into the string. Negative values will stop searching at an arbitrary point prior to
5920
   *                              the end of the string.
5921
   *                              </p>
5922
   * @param string     $encoding  [optional] <p>Set the charset.</p>
5923
   * @param boolean    $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
5924
   *
5925
   * @return int|false <p>The numeric position of the last occurrence of needle in the haystack string.<br>If needle
5926
   *                   is not found, it returns false.</p>
5927
   */
5928 10
  public static function strrpos($haystack, $needle, $offset = null, $encoding = 'UTF-8', $cleanUtf8 = false)
5929
  {
5930 10
    if ((int)$needle === $needle && $needle >= 0) {
5931 2
      $needle = (string)self::chr($needle);
5932 2
    }
5933
5934
    // init
5935 10
    $haystack = (string)$haystack;
5936 10
    $needle = (string)$needle;
5937 10
    $offset = (int)$offset;
5938
5939 10
    if (!isset($haystack[0], $needle[0])) {
5940 2
      return false;
5941
    }
5942
5943 View Code Duplication
    if (
5944
        $cleanUtf8 === true
5945 9
        ||
5946
        $encoding === true // INFO: the "bool"-check is only a fallback for old versions
5947 9
    ) {
5948
      // \mb_strrpos && iconv_strrpos is not tolerant to invalid characters
5949 3
      $needle = self::clean($needle);
5950 3
      $haystack = self::clean($haystack);
5951 3
    }
5952
5953 View Code Duplication
    if (
5954
        $encoding === 'UTF-8'
5955 9
        ||
5956 1
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
5957 9
    ) {
5958 9
      $encoding = 'UTF-8';
5959 9
    } else {
5960 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
5961
    }
5962
5963 9
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
5964
      self::checkForSupport();
5965
    }
5966
5967 View Code Duplication
    if (
5968
        $encoding !== 'UTF-8'
5969 9
        &&
5970 1
        self::$SUPPORT['mbstring'] === false
5971 9
    ) {
5972
      trigger_error('UTF8::strrpos() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
5973
    }
5974
5975 9
    if (self::$SUPPORT['mbstring'] === true) {
5976 9
      return \mb_strrpos($haystack, $needle, $offset, $encoding);
5977
    }
5978
5979
    if (
5980
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
5981
        &&
5982
        self::$SUPPORT['intl'] === true
5983
        &&
5984
        Bootup::is_php('5.4') === true
5985
    ) {
5986
      return \grapheme_strrpos($haystack, $needle, $offset);
5987
    }
5988
5989
    // fallback via vanilla php
5990
5991
    $haystackTmp = null;
5992
    if ($offset > 0) {
5993
      $haystackTmp = self::substr($haystack, $offset);
5994
    } elseif ($offset < 0) {
5995
      $haystackTmp = self::substr($haystack, 0, $offset);
5996
      $offset = 0;
5997
    }
5998
5999
    if ($haystackTmp !== null) {
6000
      if ($haystackTmp === false) {
6001
        $haystackTmp = '';
6002
      }
6003
      $haystack = (string)$haystackTmp;
6004
    }
6005
6006
    $pos = strrpos($haystack, $needle);
6007
    if ($pos === false) {
6008
      return false;
6009
    }
6010
6011
    return $offset + self::strlen(substr($haystack, 0, $pos));
6012
  }
6013
6014
  /**
6015
   * Finds the length of the initial segment of a string consisting entirely of characters contained within a given
6016
   * mask.
6017
   *
6018
   * @param string $str    <p>The input string.</p>
6019
   * @param string $mask   <p>The mask of chars</p>
6020
   * @param int    $offset [optional]
6021
   * @param int    $length [optional]
6022
   *
6023
   * @return int
6024
   */
6025 10
  public static function strspn($str, $mask, $offset = 0, $length = null)
6026
  {
6027 10 View Code Duplication
    if ($offset || $length !== null) {
6028 2
      $strTmp = self::substr($str, $offset, $length);
6029 2
      if ($strTmp === false) {
6030
        $strTmp = '';
6031
      }
6032 2
      $str = (string)$strTmp;
6033 2
    }
6034
6035 10
    $str = (string)$str;
6036 10
    if (!isset($str[0], $mask[0])) {
6037 2
      return 0;
6038
    }
6039
6040 8
    return preg_match('/^' . self::rxClass($mask) . '+/u', $str, $str) ? self::strlen($str[0]) : 0;
6041
  }
6042
6043
  /**
6044
   * Returns part of haystack string from the first occurrence of needle to the end of haystack.
6045
   *
6046
   * @param string  $haystack      <p>The input string. Must be valid UTF-8.</p>
6047
   * @param string  $needle        <p>The string to look for. Must be valid UTF-8.</p>
6048
   * @param bool    $before_needle [optional] <p>
6049
   *                               If <b>TRUE</b>, strstr() returns the part of the
6050
   *                               haystack before the first occurrence of the needle (excluding the needle).
6051
   *                               </p>
6052
   * @param string  $encoding      [optional] <p>Set the charset.</p>
6053
   * @param boolean $cleanUtf8     [optional] <p>Remove non UTF-8 chars from the string.</p>
6054
   *
6055
   * @return string|false A sub-string,<br>or <strong>false</strong> if needle is not found.
6056
   */
6057 2
  public static function strstr($haystack, $needle, $before_needle = false, $encoding = 'UTF-8', $cleanUtf8 = false)
6058
  {
6059 2
    $haystack = (string)$haystack;
6060 2
    $needle = (string)$needle;
6061
6062 2
    if (!isset($haystack[0], $needle[0])) {
6063 1
      return false;
6064
    }
6065
6066 2
    if ($cleanUtf8 === true) {
6067
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
6068
      // if invalid characters are found in $haystack before $needle
6069
      $needle = self::clean($needle);
6070
      $haystack = self::clean($haystack);
6071
    }
6072
6073 2
    if ($encoding !== 'UTF-8') {
6074 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6075 1
    }
6076
6077 2
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6078
      self::checkForSupport();
6079
    }
6080
6081 View Code Duplication
    if (
6082
        $encoding !== 'UTF-8'
6083 2
        &&
6084 1
        self::$SUPPORT['mbstring'] === false
6085 2
    ) {
6086
      trigger_error('UTF8::strstr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
6087
    }
6088
6089 2
    if (self::$SUPPORT['mbstring'] === true) {
6090 2
      return \mb_strstr($haystack, $needle, $before_needle, $encoding);
6091
    }
6092
6093
    if (
6094
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
6095
        &&
6096
        self::$SUPPORT['intl'] === true
6097
        &&
6098
        Bootup::is_php('5.4') === true
6099
    ) {
6100
      return \grapheme_strstr($haystack, $needle, $before_needle);
6101
    }
6102
6103
    preg_match('/^(.*?)' . preg_quote($needle, '/') . '/us', $haystack, $match);
6104
6105
    if (!isset($match[1])) {
6106
      return false;
6107
    }
6108
6109
    if ($before_needle) {
6110
      return $match[1];
6111
    }
6112
6113
    return self::substr($haystack, self::strlen($match[1]));
6114
  }
6115
6116
  /**
6117
   * Unicode transformation for case-less matching.
6118
   *
6119
   * @link http://unicode.org/reports/tr21/tr21-5.html
6120
   *
6121
   * @param string  $str       <p>The input string.</p>
6122
   * @param bool    $full      [optional] <p>
6123
   *                           <b>true</b>, replace full case folding chars (default)<br>
6124
   *                           <b>false</b>, use only limited static array [UTF8::$commonCaseFold]
6125
   *                           </p>
6126
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
6127
   *
6128
   * @return string
6129
   */
6130 13
  public static function strtocasefold($str, $full = true, $cleanUtf8 = false)
6131
  {
6132
    // init
6133 13
    $str = (string)$str;
6134
6135 13
    if (!isset($str[0])) {
6136 4
      return '';
6137
    }
6138
6139 12
    static $COMMON_CASE_FOLD_KEYS_CACHE = null;
6140 12
    static $COMMAN_CASE_FOLD_VALUES_CACHE = null;
6141
6142 12
    if ($COMMON_CASE_FOLD_KEYS_CACHE === null) {
6143 1
      $COMMON_CASE_FOLD_KEYS_CACHE = array_keys(self::$COMMON_CASE_FOLD);
6144 1
      $COMMAN_CASE_FOLD_VALUES_CACHE = array_values(self::$COMMON_CASE_FOLD);
6145 1
    }
6146
6147 12
    $str = (string)str_replace($COMMON_CASE_FOLD_KEYS_CACHE, $COMMAN_CASE_FOLD_VALUES_CACHE, $str);
6148
6149 12
    if ($full) {
6150
6151 12
      static $FULL_CASE_FOLD = null;
6152
6153 12
      if ($FULL_CASE_FOLD === null) {
6154 1
        $FULL_CASE_FOLD = self::getData('caseFolding_full');
6155 1
      }
6156
6157
      /** @noinspection OffsetOperationsInspection */
6158 12
      $str = (string)str_replace($FULL_CASE_FOLD[0], $FULL_CASE_FOLD[1], $str);
6159 12
    }
6160
6161 12
    if ($cleanUtf8 === true) {
6162 1
      $str = self::clean($str);
6163 1
    }
6164
6165 12
    return self::strtolower($str);
6166
  }
6167
6168
  /**
6169
   * Make a string lowercase.
6170
   *
6171
   * @link http://php.net/manual/en/function.mb-strtolower.php
6172
   *
6173
   * @param string      $str       <p>The string being lowercased.</p>
6174
   * @param string      $encoding  [optional] <p>Set the charset for e.g. "\mb_" function</p>
6175
   * @param boolean     $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
6176
   * @param string|null $lang      [optional] <p>Set the language for special cases: az, el, lt, tr</p>
6177
   *
6178
   * @return string str with all alphabetic characters converted to lowercase.
6179
   */
6180 25 View Code Duplication
  public static function strtolower($str, $encoding = 'UTF-8', $cleanUtf8 = false, $lang = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6181
  {
6182
    // init
6183 25
    $str = (string)$str;
6184
6185 25
    if (!isset($str[0])) {
6186 3
      return '';
6187
    }
6188
6189 23
    if ($cleanUtf8 === true) {
6190
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
6191
      // if invalid characters are found in $haystack before $needle
6192 1
      $str = self::clean($str);
6193 1
    }
6194
6195 23
    if ($encoding !== 'UTF-8') {
6196 2
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6197 2
    }
6198
6199 23
    if ($lang !== null) {
6200
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6201
        self::checkForSupport();
6202
      }
6203
6204
      if (
6205
          self::$SUPPORT['intl'] === true
6206
          &&
6207
          Bootup::is_php('5.4') === true
6208
      ) {
6209
6210
        $langCode = $lang . '-Lower';
6211
        if (!in_array($langCode, self::$SUPPORT['intl__transliterator_list_ids'], true)) {
6212
          trigger_error('UTF8::strtolower() without intl for special language: ' . $lang, E_USER_WARNING);
6213
6214
          $langCode = 'Any-Lower';
6215
        }
6216
6217
        return transliterator_transliterate($langCode, $str);
6218
      }
6219
6220
      trigger_error('UTF8::strtolower() without intl + PHP >= 5.4 cannot handle the "lang"-parameter: ' . $lang, E_USER_WARNING);
6221
    }
6222
6223 23
    return \mb_strtolower($str, $encoding);
6224
  }
6225
6226
  /**
6227
   * Generic case sensitive transformation for collation matching.
6228
   *
6229
   * @param string $str <p>The input string</p>
6230
   *
6231
   * @return string
6232
   */
6233 3
  private static function strtonatfold($str)
6234
  {
6235
    /** @noinspection PhpUndefinedClassInspection */
6236 3
    return preg_replace('/\p{Mn}+/u', '', \Normalizer::normalize($str, \Normalizer::NFD));
6237
  }
6238
6239
  /**
6240
   * Make a string uppercase.
6241
   *
6242
   * @link http://php.net/manual/en/function.mb-strtoupper.php
6243
   *
6244
   * @param string      $str       <p>The string being uppercased.</p>
6245
   * @param string      $encoding  [optional] <p>Set the charset.</p>
6246
   * @param boolean     $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
6247
   * @param string|null $lang      [optional] <p>Set the language for special cases: az, el, lt, tr</p>
6248
   *
6249
   * @return string str with all alphabetic characters converted to uppercase.
6250
   */
6251 19 View Code Duplication
  public static function strtoupper($str, $encoding = 'UTF-8', $cleanUtf8 = false, $lang = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6252
  {
6253 19
    $str = (string)$str;
6254
6255 19
    if (!isset($str[0])) {
6256 3
      return '';
6257
    }
6258
6259 17
    if ($cleanUtf8 === true) {
6260
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
6261
      // if invalid characters are found in $haystack before $needle
6262 2
      $str = self::clean($str);
6263 2
    }
6264
6265 17
    if ($encoding !== 'UTF-8') {
6266 3
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6267 3
    }
6268
6269 17
    if ($lang !== null) {
6270
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6271
        self::checkForSupport();
6272
      }
6273
6274
      if (
6275
          self::$SUPPORT['intl'] === true
6276
          &&
6277
          Bootup::is_php('5.4') === true
6278
      ) {
6279
6280
        $langCode = $lang . '-Upper';
6281
        if (!in_array($langCode, self::$SUPPORT['intl__transliterator_list_ids'], true)) {
6282
          trigger_error('UTF8::strtoupper() without intl for special language: ' . $lang, E_USER_WARNING);
6283
6284
          $langCode = 'Any-Upper';
6285
        }
6286
6287
        return transliterator_transliterate($langCode, $str);
6288
      }
6289
6290
      trigger_error('UTF8::strtolower() without intl + PHP >= 5.4 cannot handle the "lang"-parameter: ' . $lang, E_USER_WARNING);
6291
    }
6292
6293 17
    return \mb_strtoupper($str, $encoding);
6294
  }
6295
6296
  /**
6297
   * Translate characters or replace sub-strings.
6298
   *
6299
   * @link  http://php.net/manual/en/function.strtr.php
6300
   *
6301
   * @param string          $str  <p>The string being translated.</p>
6302
   * @param string|string[] $from <p>The string replacing from.</p>
6303
   * @param string|string[] $to   <p>The string being translated to to.</p>
6304
   *
6305
   * @return string <p>
6306
   *                This function returns a copy of str, translating all occurrences of each character in from to the
6307
   *                corresponding character in to.
6308
   *                </p>
6309
   */
6310 1
  public static function strtr($str, $from, $to = INF)
6311
  {
6312 1
    $str = (string)$str;
6313
6314 1
    if (!isset($str[0])) {
6315
      return '';
6316
    }
6317
6318 1
    if ($from === $to) {
6319
      return $str;
6320
    }
6321
6322 1
    if (INF !== $to) {
6323 1
      $from = self::str_split($from);
0 ignored issues
show
Bug introduced by
It seems like $from defined by self::str_split($from) on line 6323 can also be of type array<integer,string>; however, voku\helper\UTF8::str_split() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
6324 1
      $to = self::str_split($to);
0 ignored issues
show
Bug introduced by
It seems like $to defined by self::str_split($to) on line 6324 can also be of type array<integer,string>; however, voku\helper\UTF8::str_split() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
6325 1
      $countFrom = count($from);
6326 1
      $countTo = count($to);
6327
6328 1
      if ($countFrom > $countTo) {
6329 1
        $from = array_slice($from, 0, $countTo);
6330 1
      } elseif ($countFrom < $countTo) {
6331 1
        $to = array_slice($to, 0, $countFrom);
6332 1
      }
6333
6334 1
      $from = array_combine($from, $to);
6335 1
    }
6336
6337 1
    if (is_string($from)) {
6338 1
      return str_replace($from, '', $str);
6339
    }
6340
6341 1
    return strtr($str, $from);
6342
  }
6343
6344
  /**
6345
   * Return the width of a string.
6346
   *
6347
   * @param string  $str       <p>The input string.</p>
6348
   * @param string  $encoding  [optional] <p>Default is UTF-8</p>
6349
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
6350
   *
6351
   * @return int
6352
   */
6353 1
  public static function strwidth($str, $encoding = 'UTF-8', $cleanUtf8 = false)
6354
  {
6355 1
    if ($encoding !== 'UTF-8') {
6356 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6357 1
    }
6358
6359 1
    if ($cleanUtf8 === true) {
6360
      // iconv and mbstring are not tolerant to invalid encoding
6361
      // further, their behaviour is inconsistent with that of PHP's substr
6362 1
      $str = self::clean($str);
6363 1
    }
6364
6365
    // fallback to "mb_"-function via polyfill
6366 1
    return \mb_strwidth($str, $encoding);
6367
  }
6368
6369
  /**
6370
   * Changes all keys in an array.
6371
   *
6372
   * @param array $array <p>The array to work on</p>
6373
   * @param int   $case  [optional] <p> Either <strong>CASE_UPPER</strong><br>
6374
   *                  or <strong>CASE_LOWER</strong> (default)</p>
6375
   *
6376
   * @return array|false <p>An array with its keys lower or uppercased, or false if
6377
   *                     input is not an array.</p>
6378
   */
6379 1
  public static function array_change_key_case($array, $case = CASE_LOWER)
6380
  {
6381 1
    if (!is_array($array)) {
6382
      return false;
6383
    }
6384
6385
    if (
6386
        $case !== CASE_LOWER
6387 1
        &&
6388
        $case !== CASE_UPPER
6389 1
    ) {
6390
      $case = CASE_UPPER;
6391
    }
6392
6393 1
    $return = array();
6394 1
    foreach ($array as $key => $value) {
6395 1
      if ($case  === CASE_LOWER) {
6396 1
        $key = self::strtolower($key);
6397 1
      } else {
6398 1
        $key = self::strtoupper($key);
6399
      }
6400
6401 1
      $return[$key] = $value;
6402 1
    }
6403
6404 1
    return $return;
6405
  }
6406
6407
  /**
6408
   * Get part of a string.
6409
   *
6410
   * @link http://php.net/manual/en/function.mb-substr.php
6411
   *
6412
   * @param string  $str       <p>The string being checked.</p>
6413
   * @param int     $offset    <p>The first position used in str.</p>
6414
   * @param int     $length    [optional] <p>The maximum length of the returned string.</p>
6415
   * @param string  $encoding  [optional] <p>Default is UTF-8</p>
6416
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
6417
   *
6418
   * @return string|false <p>The portion of <i>str</i> specified by the <i>offset</i> and
6419
   *                      <i>length</i> parameters.</p><p>If <i>str</i> is shorter than <i>offset</i>
6420
   *                      characters long, <b>FALSE</b> will be returned.</p>
6421
   */
6422 76
  public static function substr($str, $offset = 0, $length = null, $encoding = 'UTF-8', $cleanUtf8 = false)
6423
  {
6424
    // init
6425 76
    $str = (string)$str;
6426
6427 76
    if (!isset($str[0])) {
6428 10
      return '';
6429
    }
6430
6431
    // Empty string
6432 74
    if ($length === 0) {
6433 3
      return '';
6434
    }
6435
6436 73
    if ($cleanUtf8 === true) {
6437
      // iconv and mbstring are not tolerant to invalid encoding
6438
      // further, their behaviour is inconsistent with that of PHP's substr
6439 1
      $str = self::clean($str);
6440 1
    }
6441
6442
    // Whole string
6443 73
    if (!$offset && $length === null) {
6444 2
      return $str;
6445
    }
6446
6447 71
    $str_length = 0;
6448 71
    if ($offset || $length === null) {
6449 45
      $str_length = (int)self::strlen($str, $encoding);
6450 45
    }
6451
6452
    // Impossible
6453 71
    if ($offset && $offset > $str_length) {
6454 2
      return false;
6455
    }
6456
6457 69
    if ($length === null) {
6458 30
      $length = $str_length;
6459 30
    } else {
6460 60
      $length = (int)$length;
6461
    }
6462
6463 View Code Duplication
    if (
6464
        $encoding === 'UTF-8'
6465 69
        ||
6466 25
        $encoding === true || $encoding === false // INFO: the "bool"-check is only a fallback for old versions
6467 69
    ) {
6468 47
      $encoding = 'UTF-8';
6469 47
    } else {
6470 24
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6471
    }
6472
6473 69
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6474
      self::checkForSupport();
6475
    }
6476
6477
    if (
6478
        $encoding === 'CP850'
6479 69
        &&
6480 22
        self::$SUPPORT['mbstring_func_overload'] === false
6481 69
    ) {
6482 22
      return substr($str, $offset, $length === null ? $str_length : $length);
6483
    }
6484
6485 View Code Duplication
    if (
6486
        $encoding !== 'UTF-8'
6487 47
        &&
6488 1
        self::$SUPPORT['mbstring'] === false
6489 47
    ) {
6490
      trigger_error('UTF8::substr() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
6491
    }
6492
6493 47
    if (self::$SUPPORT['mbstring'] === true) {
6494 47
      return \mb_substr($str, $offset, $length, $encoding);
6495
    }
6496
6497
    if (
6498
        $encoding === 'UTF-8' // INFO: "grapheme_stripos()" can't handle other encodings
6499
        &&
6500
        self::$SUPPORT['intl'] === true
6501
        &&
6502
        Bootup::is_php('5.4') === true
6503
    ) {
6504
      return \grapheme_substr($str, $offset, $length);
6505
    }
6506
6507
    if (
6508
        $length >= 0 // "iconv_substr()" can't handle negative length
6509
        &&
6510
        self::$SUPPORT['iconv'] === true
6511
    ) {
6512
      return \iconv_substr($str, $offset, $length);
6513
    }
6514
6515
    if (self::is_ascii($str)) {
6516
      return ($length === null) ?
6517
          substr($str, $offset) :
6518
          substr($str, $offset, $length);
6519
    }
6520
6521
    // fallback via vanilla php
6522
6523
    // split to array, and remove invalid characters
6524
    $array = self::split($str);
6525
6526
    // extract relevant part, and join to make sting again
6527
    return implode('', array_slice($array, $offset, $length));
6528
  }
6529
6530
  /**
6531
   * Binary safe comparison of two strings from an offset, up to length characters.
6532
   *
6533
   * @param string  $str1               <p>The main string being compared.</p>
6534
   * @param string  $str2               <p>The secondary string being compared.</p>
6535
   * @param int     $offset             [optional] <p>The start position for the comparison. If negative, it starts
6536
   *                                    counting from the end of the string.</p>
6537
   * @param int     $length             [optional] <p>The length of the comparison. The default value is the largest of
6538
   *                                    the length of the str compared to the length of main_str less the offset.</p>
6539
   * @param boolean $case_insensitivity [optional] <p>If case_insensitivity is TRUE, comparison is case
6540
   *                                    insensitive.</p>
6541
   *
6542
   * @return int <p>
6543
   *             <strong>&lt; 0</strong> if str1 is less than str2;<br>
6544
   *             <strong>&gt; 0</strong> if str1 is greater than str2,<br>
6545
   *             <strong>0</strong> if they are equal.
6546
   *             </p>
6547
   */
6548 1
  public static function substr_compare($str1, $str2, $offset = 0, $length = null, $case_insensitivity = false)
6549
  {
6550
    if (
6551
        $offset !== 0
6552 1
        ||
6553
        $length !== null
6554 1
    ) {
6555 1
      $str1Tmp = self::substr($str1, $offset, $length);
6556 1
      if ($str1Tmp === false) {
6557
        $str1Tmp = '';
6558
      }
6559 1
      $str1 = (string)$str1Tmp;
6560
6561 1
      $str2Tmp = self::substr($str2, 0, self::strlen($str1));
6562 1
      if ($str2Tmp === false) {
6563
        $str2Tmp = '';
6564
      }
6565 1
      $str2 = (string)$str2Tmp;
6566 1
    }
6567
6568 1
    if ($case_insensitivity === true) {
6569 1
      return self::strcasecmp($str1, $str2);
6570
    }
6571
6572 1
    return self::strcmp($str1, $str2);
6573
  }
6574
6575
  /**
6576
   * Count the number of substring occurrences.
6577
   *
6578
   * @link  http://php.net/manual/en/function.substr-count.php
6579
   *
6580
   * @param string  $haystack  <p>The string to search in.</p>
6581
   * @param string  $needle    <p>The substring to search for.</p>
6582
   * @param int     $offset    [optional] <p>The offset where to start counting.</p>
6583
   * @param int     $length    [optional] <p>
6584
   *                           The maximum length after the specified offset to search for the
6585
   *                           substring. It outputs a warning if the offset plus the length is
6586
   *                           greater than the haystack length.
6587
   *                           </p>
6588
   * @param string  $encoding  <p>Set the charset.</p>
6589
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
6590
   *
6591
   * @return int|false <p>This functions returns an integer or false if there isn't a string.</p>
6592
   */
6593 1
  public static function substr_count($haystack, $needle, $offset = 0, $length = null, $encoding = 'UTF-8', $cleanUtf8 = false)
6594
  {
6595
    // init
6596 1
    $haystack = (string)$haystack;
6597 1
    $needle = (string)$needle;
6598
6599 1
    if (!isset($haystack[0], $needle[0])) {
6600 1
      return false;
6601
    }
6602
6603 1
    if ($offset || $length !== null) {
6604
6605 1
      if ($length === null) {
6606 1
        $length = (int)self::strlen($haystack);
6607 1
      }
6608
6609 1
      $offset = (int)$offset;
6610 1
      $length = (int)$length;
6611
6612
      if (
6613
          (
6614
              $length !== 0
6615 1
              &&
6616
              $offset !== 0
6617 1
          )
6618 1
          &&
6619 1
          $length + $offset <= 0
6620 1
          &&
6621 1
          Bootup::is_php('7.1') === false // output from "substr_count()" have changed in PHP 7.1
6622 1
      ) {
6623 1
        return false;
6624
      }
6625
6626 1
      $haystackTmp = self::substr($haystack, $offset, $length, $encoding);
6627 1
      if ($haystackTmp === false) {
6628
        $haystackTmp = '';
6629
      }
6630 1
      $haystack = (string)$haystackTmp;
6631 1
    }
6632
6633 1
    if ($encoding !== 'UTF-8') {
6634 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6635 1
    }
6636
6637 1
    if ($cleanUtf8 === true) {
6638
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
6639
      // if invalid characters are found in $haystack before $needle
6640
      $needle = self::clean($needle);
6641
      $haystack = self::clean($haystack);
6642
    }
6643
6644 1
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
6645
      self::checkForSupport();
6646
    }
6647
6648 View Code Duplication
    if (
6649
        $encoding !== 'UTF-8'
6650 1
        &&
6651 1
        self::$SUPPORT['mbstring'] === false
6652 1
    ) {
6653
      trigger_error('UTF8::substr_count() without mbstring cannot handle "' . $encoding . '" encoding', E_USER_WARNING);
6654
    }
6655
6656 1
    if (self::$SUPPORT['mbstring'] === true) {
6657 1
      return \mb_substr_count($haystack, $needle, $encoding);
6658
    }
6659
6660
    preg_match_all('/' . preg_quote($needle, '/') . '/us', $haystack, $matches, PREG_SET_ORDER);
6661
6662
    return count($matches);
6663
  }
6664
6665
  /**
6666
   * Removes an prefix ($needle) from start of the string ($haystack), case insensitive.
6667
   *
6668
   * @param string $haystack <p>The string to search in.</p>
6669
   * @param string $needle   <p>The substring to search for.</p>
6670
   *
6671
   * @return string <p>Return the sub-string.</p>
6672
   */
6673 1 View Code Duplication
  public static function substr_ileft($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6674
  {
6675
    // init
6676 1
    $haystack = (string)$haystack;
6677 1
    $needle = (string)$needle;
6678
6679 1
    if (!isset($haystack[0])) {
6680 1
      return '';
6681
    }
6682
6683 1
    if (!isset($needle[0])) {
6684 1
      return $haystack;
6685
    }
6686
6687 1
    if (self::str_istarts_with($haystack, $needle) === true) {
6688 1
      $haystackTmp = self::substr($haystack, self::strlen($needle));
6689 1
      if ($haystackTmp === false) {
6690
        $haystackTmp = '';
6691
      }
6692 1
      $haystack = (string)$haystackTmp;
6693 1
    }
6694
6695 1
    return $haystack;
6696
  }
6697
6698
  /**
6699
   * Removes an suffix ($needle) from end of the string ($haystack), case insensitive.
6700
   *
6701
   * @param string $haystack <p>The string to search in.</p>
6702
   * @param string $needle   <p>The substring to search for.</p>
6703
   *
6704
   * @return string <p>Return the sub-string.</p>
6705
   */
6706 1 View Code Duplication
  public static function substr_iright($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6707
  {
6708
    // init
6709 1
    $haystack = (string)$haystack;
6710 1
    $needle = (string)$needle;
6711
6712 1
    if (!isset($haystack[0])) {
6713 1
      return '';
6714
    }
6715
6716 1
    if (!isset($needle[0])) {
6717 1
      return $haystack;
6718
    }
6719
6720 1
    if (self::str_iends_with($haystack, $needle) === true) {
6721 1
      $haystackTmp = self::substr($haystack, 0, self::strlen($haystack) - self::strlen($needle));
6722 1
      if ($haystackTmp === false) {
6723
        $haystackTmp = '';
6724
      }
6725 1
      $haystack = (string)$haystackTmp;
6726 1
    }
6727
6728 1
    return $haystack;
6729
  }
6730
6731
  /**
6732
   * Removes an prefix ($needle) from start of the string ($haystack).
6733
   *
6734
   * @param string $haystack <p>The string to search in.</p>
6735
   * @param string $needle   <p>The substring to search for.</p>
6736
   *
6737
   * @return string <p>Return the sub-string.</p>
6738
   */
6739 1 View Code Duplication
  public static function substr_left($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6740
  {
6741
    // init
6742 1
    $haystack = (string)$haystack;
6743 1
    $needle = (string)$needle;
6744
6745 1
    if (!isset($haystack[0])) {
6746 1
      return '';
6747
    }
6748
6749 1
    if (!isset($needle[0])) {
6750 1
      return $haystack;
6751
    }
6752
6753 1
    if (self::str_starts_with($haystack, $needle) === true) {
6754 1
      $haystackTmp = self::substr($haystack, self::strlen($needle));
6755 1
      if ($haystackTmp === false) {
6756
        $haystackTmp = '';
6757
      }
6758 1
      $haystack = (string)$haystackTmp;
6759 1
    }
6760
6761 1
    return $haystack;
6762
  }
6763
6764
  /**
6765
   * Replace text within a portion of a string.
6766
   *
6767
   * source: https://gist.github.com/stemar/8287074
6768
   *
6769
   * @param string|string[] $str              <p>The input string or an array of stings.</p>
6770
   * @param string|string[] $replacement      <p>The replacement string or an array of stings.</p>
6771
   * @param int|int[]       $offset           <p>
6772
   *                                          If start is positive, the replacing will begin at the start'th offset
6773
   *                                          into string.
6774
   *                                          <br><br>
6775
   *                                          If start is negative, the replacing will begin at the start'th character
6776
   *                                          from the end of string.
6777
   *                                          </p>
6778
   * @param int|int[]|void  $length           [optional] <p>If given and is positive, it represents the length of the
6779
   *                                          portion of string which is to be replaced. If it is negative, it
6780
   *                                          represents the number of characters from the end of string at which to
6781
   *                                          stop replacing. If it is not given, then it will default to strlen(
6782
   *                                          string ); i.e. end the replacing at the end of string. Of course, if
6783
   *                                          length is zero then this function will have the effect of inserting
6784
   *                                          replacement into string at the given start offset.</p>
6785
   *
6786
   * @return string|string[] <p>The result string is returned. If string is an array then array is returned.</p>
6787
   */
6788 7
  public static function substr_replace($str, $replacement, $offset, $length = null)
6789
  {
6790 7
    if (is_array($str) === true) {
6791 1
      $num = count($str);
6792
6793
      // the replacement
6794 1
      if (is_array($replacement) === true) {
6795 1
        $replacement = array_slice($replacement, 0, $num);
6796 1
      } else {
6797 1
        $replacement = array_pad(array($replacement), $num, $replacement);
6798
      }
6799
6800
      // the offset
6801 1 View Code Duplication
      if (is_array($offset) === true) {
6802 1
        $offset = array_slice($offset, 0, $num);
6803 1
        foreach ($offset as &$valueTmp) {
6804 1
          $valueTmp = (int)$valueTmp === $valueTmp ? $valueTmp : 0;
6805 1
        }
6806 1
        unset($valueTmp);
6807 1
      } else {
6808 1
        $offset = array_pad(array($offset), $num, $offset);
6809
      }
6810
6811
      // the length
6812 1
      if (!isset($length)) {
6813 1
        $length = array_fill(0, $num, 0);
6814 1 View Code Duplication
      } elseif (is_array($length) === true) {
6815 1
        $length = array_slice($length, 0, $num);
6816 1
        foreach ($length as &$valueTmpV2) {
6817 1
          if (isset($valueTmpV2)) {
6818 1
            $valueTmpV2 = (int)$valueTmpV2 === $valueTmpV2 ? $valueTmpV2 : $num;
6819 1
          } else {
6820
            $valueTmpV2 = 0;
6821
          }
6822 1
        }
6823 1
        unset($valueTmpV2);
6824 1
      } else {
6825 1
        $length = array_pad(array($length), $num, $length);
6826
      }
6827
6828
      // recursive call
6829 1
      return array_map(array('\\voku\\helper\\UTF8', 'substr_replace'), $str, $replacement, $offset, $length);
0 ignored issues
show
Bug Best Practice introduced by
The return type of return array_map(array('...ent, $offset, $length); (array) is incompatible with the return type documented by voku\helper\UTF8::substr_replace of type string|string[].

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
6830
    }
6831
6832 7
    if (is_array($replacement) === true) {
6833 1
      if (count($replacement) > 0) {
6834 1
        $replacement = $replacement[0];
6835 1
      } else {
6836 1
        $replacement = '';
6837
      }
6838 1
    }
6839
6840
    // init
6841 7
    $str = (string)$str;
6842 7
    $replacement = (string)$replacement;
6843
6844 7
    if (!isset($str[0])) {
6845 1
      return $replacement;
6846
    }
6847
6848 6
    if (self::is_ascii($str)) {
6849 3
      return ($length === null) ?
6850 3
          substr_replace($str, $replacement, $offset) :
6851 3
          substr_replace($str, $replacement, $offset, $length);
6852
    }
6853
6854 5
    preg_match_all('/./us', $str, $smatches);
6855 5
    preg_match_all('/./us', $replacement, $rmatches);
6856
6857 5
    if ($length === null) {
6858 3
      $length = (int)self::strlen($str);
6859 3
    }
6860
6861 5
    array_splice($smatches[0], $offset, $length, $rmatches[0]);
6862
6863 5
    return implode('', $smatches[0]);
6864
  }
6865
6866
  /**
6867
   * Removes an suffix ($needle) from end of the string ($haystack).
6868
   *
6869
   * @param string $haystack <p>The string to search in.</p>
6870
   * @param string $needle   <p>The substring to search for.</p>
6871
   *
6872
   * @return string <p>Return the sub-string.</p>
6873
   */
6874 1 View Code Duplication
  public static function substr_right($haystack, $needle)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
6875
  {
6876 1
    $haystack = (string)$haystack;
6877 1
    $needle = (string)$needle;
6878
6879 1
    if (!isset($haystack[0])) {
6880 1
      return '';
6881
    }
6882
6883 1
    if (!isset($needle[0])) {
6884 1
      return $haystack;
6885
    }
6886
6887 1
    if (self::str_ends_with($haystack, $needle) === true) {
6888 1
      $haystackTmp = self::substr($haystack, 0, self::strlen($haystack) - self::strlen($needle));
6889 1
      if ($haystackTmp === false) {
6890
        $haystackTmp = '';
6891
      }
6892 1
      $haystack = (string)$haystackTmp;
6893 1
    }
6894
6895 1
    return $haystack;
6896
  }
6897
6898
  /**
6899
   * Returns a case swapped version of the string.
6900
   *
6901
   * @param string  $str       <p>The input string.</p>
6902
   * @param string  $encoding  [optional] <p>Default is UTF-8</p>
6903
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
6904
   *
6905
   * @return string <p>Each character's case swapped.</p>
6906
   */
6907 1
  public static function swapCase($str, $encoding = 'UTF-8', $cleanUtf8 = false)
6908
  {
6909 1
    $str = (string)$str;
6910
6911 1
    if (!isset($str[0])) {
6912 1
      return '';
6913
    }
6914
6915 1
    if ($encoding !== 'UTF-8') {
6916 1
      $encoding = self::normalize_encoding($encoding, 'UTF-8');
6917 1
    }
6918
6919 1
    if ($cleanUtf8 === true) {
6920
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
6921
      // if invalid characters are found in $haystack before $needle
6922 1
      $str = self::clean($str);
6923 1
    }
6924
6925 1
    $strSwappedCase = preg_replace_callback(
6926 1
        '/[\S]/u',
6927
        function ($match) use ($encoding) {
6928 1
          $marchToUpper = UTF8::strtoupper($match[0], $encoding);
6929
6930 1
          if ($match[0] === $marchToUpper) {
6931 1
            return UTF8::strtolower($match[0], $encoding);
6932
          }
6933
6934 1
          return $marchToUpper;
6935 1
        },
6936
        $str
6937 1
    );
6938
6939 1
    return $strSwappedCase;
6940
  }
6941
6942
  /**
6943
   * alias for "UTF8::to_ascii()"
6944
   *
6945
   * @see UTF8::to_ascii()
6946
   *
6947
   * @param string $s
6948
   * @param string $subst_chr
6949
   * @param bool   $strict
6950
   *
6951
   * @return string
6952
   *
6953
   * @deprecated <p>use "UTF8::to_ascii()"</p>
6954
   */
6955
  public static function toAscii($s, $subst_chr = '?', $strict = false)
6956
  {
6957
    return self::to_ascii($s, $subst_chr, $strict);
6958
  }
6959
6960
  /**
6961
   * alias for "UTF8::to_iso8859()"
6962
   *
6963
   * @see UTF8::to_iso8859()
6964
   *
6965
   * @param string $str
6966
   *
6967
   * @return string|string[]
6968
   *
6969
   * @deprecated <p>use "UTF8::to_iso8859()"</p>
6970
   */
6971
  public static function toIso8859($str)
6972
  {
6973
    return self::to_iso8859($str);
6974
  }
6975
6976
  /**
6977
   * alias for "UTF8::to_latin1()"
6978
   *
6979
   * @see UTF8::to_latin1()
6980
   *
6981
   * @param $str
6982
   *
6983
   * @return string
6984
   *
6985
   * @deprecated <p>use "UTF8::to_latin1()"</p>
6986
   */
6987
  public static function toLatin1($str)
6988
  {
6989
    return self::to_latin1($str);
6990
  }
6991
6992
  /**
6993
   * alias for "UTF8::to_utf8()"
6994
   *
6995
   * @see UTF8::to_utf8()
6996
   *
6997
   * @param string $str
6998
   *
6999
   * @return string
7000
   *
7001
   * @deprecated <p>use "UTF8::to_utf8()"</p>
7002
   */
7003
  public static function toUTF8($str)
7004
  {
7005
    return self::to_utf8($str);
7006
  }
7007
7008
  /**
7009
   * Convert a string into ASCII.
7010
   *
7011
   * @param string $str     <p>The input string.</p>
7012
   * @param string $unknown [optional] <p>Character use if character unknown. (default is ?)</p>
7013
   * @param bool   $strict  [optional] <p>Use "transliterator_transliterate()" from PHP-Intl | WARNING: bad
7014
   *                        performance</p>
7015
   *
7016
   * @return string
7017
   */
7018 21
  public static function to_ascii($str, $unknown = '?', $strict = false)
7019
  {
7020 21
    static $UTF8_TO_ASCII;
7021
7022
    // init
7023 21
    $str = (string)$str;
7024
7025 21
    if (!isset($str[0])) {
7026 4
      return '';
7027
    }
7028
7029
    // check if we only have ASCII, first (better performance)
7030 18
    if (self::is_ascii($str) === true) {
7031 6
      return $str;
7032
    }
7033
7034 13
    $str = self::clean($str, true, true, true);
7035
7036
    // check again, if we only have ASCII, now ...
7037 13
    if (self::is_ascii($str) === true) {
7038 7
      return $str;
7039
    }
7040
7041 7
    if ($strict === true) {
7042
      if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
7043
        self::checkForSupport();
7044
      }
7045
7046
      if (
7047
          self::$SUPPORT['intl'] === true
7048
          &&
7049
          Bootup::is_php('5.4') === true
7050
      ) {
7051
7052
        // HACK for issue from "transliterator_transliterate()"
7053
        $str = str_replace(
7054
            'ℌ',
7055
            'H',
7056
            $str
7057
        );
7058
7059
        $str = transliterator_transliterate('NFD; [:Nonspacing Mark:] Remove; NFC; Any-Latin; Latin-ASCII;', $str);
7060
7061
        // check again, if we only have ASCII, now ...
7062
        if (self::is_ascii($str) === true) {
7063
          return $str;
7064
        }
7065
7066
      }
7067
    }
7068
7069 7
    preg_match_all('/.{1}|[^\x00]{1,1}$/us', $str, $ar);
7070 7
    $chars = $ar[0];
7071 7
    foreach ($chars as &$c) {
7072
7073 7
      $ordC0 = ord($c[0]);
7074
7075 7
      if ($ordC0 >= 0 && $ordC0 <= 127) {
7076 7
        continue;
7077
      }
7078
7079 7
      $ordC1 = ord($c[1]);
7080
7081
      // ASCII - next please
7082 7
      if ($ordC0 >= 192 && $ordC0 <= 223) {
7083 7
        $ord = ($ordC0 - 192) * 64 + ($ordC1 - 128);
7084 7
      }
7085
7086 7
      if ($ordC0 >= 224) {
7087 2
        $ordC2 = ord($c[2]);
7088
7089 2
        if ($ordC0 <= 239) {
7090 2
          $ord = ($ordC0 - 224) * 4096 + ($ordC1 - 128) * 64 + ($ordC2 - 128);
7091 2
        }
7092
7093 2
        if ($ordC0 >= 240) {
7094 1
          $ordC3 = ord($c[3]);
7095
7096 1
          if ($ordC0 <= 247) {
7097 1
            $ord = ($ordC0 - 240) * 262144 + ($ordC1 - 128) * 4096 + ($ordC2 - 128) * 64 + ($ordC3 - 128);
7098 1
          }
7099
7100 1
          if ($ordC0 >= 248) {
7101
            $ordC4 = ord($c[4]);
7102
7103 View Code Duplication
            if ($ordC0 <= 251) {
7104
              $ord = ($ordC0 - 248) * 16777216 + ($ordC1 - 128) * 262144 + ($ordC2 - 128) * 4096 + ($ordC3 - 128) * 64 + ($ordC4 - 128);
7105
            }
7106
7107
            if ($ordC0 >= 252) {
7108
              $ordC5 = ord($c[5]);
7109
7110 View Code Duplication
              if ($ordC0 <= 253) {
7111
                $ord = ($ordC0 - 252) * 1073741824 + ($ordC1 - 128) * 16777216 + ($ordC2 - 128) * 262144 + ($ordC3 - 128) * 4096 + ($ordC4 - 128) * 64 + ($ordC5 - 128);
7112
              }
7113
            }
7114
          }
7115 1
        }
7116 2
      }
7117
7118 7
      if ($ordC0 === 254 || $ordC0 === 255) {
7119
        $c = $unknown;
7120
        continue;
7121
      }
7122
7123 7
      if (!isset($ord)) {
7124
        $c = $unknown;
7125
        continue;
7126
      }
7127
7128 7
      $bank = $ord >> 8;
7129 7
      if (!isset($UTF8_TO_ASCII[$bank])) {
7130 3
        $UTF8_TO_ASCII[$bank] = self::getData(sprintf('x%02x', $bank));
7131 3
        if ($UTF8_TO_ASCII[$bank] === false) {
7132 1
          $UTF8_TO_ASCII[$bank] = array();
7133 1
        }
7134 3
      }
7135
7136 7
      $newchar = $ord & 255;
7137
7138 7
      if (isset($UTF8_TO_ASCII[$bank], $UTF8_TO_ASCII[$bank][$newchar])) {
7139
7140
        // keep for debugging
7141
        /*
7142
        echo "file: " . sprintf('x%02x', $bank) . "\n";
7143
        echo "char: " . $c . "\n";
7144
        echo "ord: " . $ord . "\n";
7145
        echo "newchar: " . $newchar . "\n";
7146
        echo "ascii: " . $UTF8_TO_ASCII[$bank][$newchar] . "\n";
7147
        echo "bank:" . $bank . "\n\n";
7148
        */
7149
7150 7
        $c = $UTF8_TO_ASCII[$bank][$newchar];
7151 7
      } else {
7152
7153
        // keep for debugging missing chars
7154
        /*
7155
        echo "file: " . sprintf('x%02x', $bank) . "\n";
7156
        echo "char: " . $c . "\n";
7157
        echo "ord: " . $ord . "\n";
7158
        echo "newchar: " . $newchar . "\n";
7159
        echo "bank:" . $bank . "\n\n";
7160
        */
7161
7162 1
        $c = $unknown;
7163
      }
7164 7
    }
7165
7166 7
    return implode('', $chars);
7167
  }
7168
7169
  /**
7170
   * Convert a string into "ISO-8859"-encoding (Latin-1).
7171
   *
7172
   * @param string|string[] $str
7173
   *
7174
   * @return string|string[]
7175
   */
7176 3
  public static function to_iso8859($str)
7177
  {
7178 3
    if (is_array($str) === true) {
7179
7180
      /** @noinspection ForeachSourceInspection */
7181 1
      foreach ($str as $k => $v) {
7182
        /** @noinspection AlterInForeachInspection */
7183
        /** @noinspection OffsetOperationsInspection */
7184 1
        $str[$k] = self::to_iso8859($v);
7185 1
      }
7186
7187 1
      return $str;
7188
    }
7189
7190 3
    $str = (string)$str;
7191
7192 3
    if (!isset($str[0])) {
7193 1
      return '';
7194
    }
7195
7196 3
    return self::utf8_decode($str);
7197
  }
7198
7199
  /**
7200
   * alias for "UTF8::to_iso8859()"
7201
   *
7202
   * @see UTF8::to_iso8859()
7203
   *
7204
   * @param string|string[] $str
7205
   *
7206
   * @return string|string[]
7207
   */
7208 1
  public static function to_latin1($str)
7209
  {
7210 1
    return self::to_iso8859($str);
7211
  }
7212
7213
  /**
7214
   * This function leaves UTF-8 characters alone, while converting almost all non-UTF8 to UTF8.
7215
   *
7216
   * <ul>
7217
   * <li>It decode UTF-8 codepoints and unicode escape sequences.</li>
7218
   * <li>It assumes that the encoding of the original string is either WINDOWS-1252 or ISO-8859.</li>
7219
   * <li>WARNING: It does not remove invalid UTF-8 characters, so you maybe need to use "UTF8::clean()" for this
7220
   * case.</li>
7221
   * </ul>
7222
   *
7223
   * @param string|string[] $str                    <p>Any string or array.</p>
7224
   * @param bool            $decodeHtmlEntityToUtf8 <p>Set to true, if you need to decode html-entities.</p>
7225
   *
7226
   * @return string|string[] <p>The UTF-8 encoded string.</p>
7227
   */
7228 22
  public static function to_utf8($str, $decodeHtmlEntityToUtf8 = false)
7229
  {
7230 22
    if (is_array($str) === true) {
7231
      /** @noinspection ForeachSourceInspection */
7232 2
      foreach ($str as $k => $v) {
7233
        /** @noinspection AlterInForeachInspection */
7234
        /** @noinspection OffsetOperationsInspection */
7235 2
        $str[$k] = self::to_utf8($v, $decodeHtmlEntityToUtf8);
7236 2
      }
7237
7238 2
      return $str;
7239
    }
7240
7241 22
    $str = (string)$str;
7242
7243 22
    if (!isset($str[0])) {
7244 3
      return $str;
7245
    }
7246
7247 22
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
7248
      self::checkForSupport();
7249
    }
7250
7251 22 View Code Duplication
    if (self::$SUPPORT['mbstring_func_overload'] === true) {
7252
      $max = \mb_strlen($str, '8BIT');
7253
    } else {
7254 22
      $max = strlen($str);
7255
    }
7256
7257 22
    $buf = '';
7258
7259
    /** @noinspection ForeachInvariantsInspection */
7260 22
    for ($i = 0; $i < $max; $i++) {
7261 22
      $c1 = $str[$i];
7262
7263 22
      if ($c1 >= "\xC0") { // should be converted to UTF8, if it's not UTF8 already
7264
7265 22
        if ($c1 <= "\xDF") { // looks like 2 bytes UTF8
7266
7267 20
          $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
7268
7269 20
          if ($c2 >= "\x80" && $c2 <= "\xBF") { // yeah, almost sure it's UTF8 already
7270 18
            $buf .= $c1 . $c2;
7271 18
            $i++;
7272 18
          } else { // not valid UTF8 - convert it
7273 8
            $buf .= self::to_utf8_convert($c1);
7274
          }
7275
7276 22
        } elseif ($c1 >= "\xE0" && $c1 <= "\xEF") { // looks like 3 bytes UTF8
7277
7278 21
          $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
7279 21
          $c3 = $i + 2 >= $max ? "\x00" : $str[$i + 2];
7280
7281 21
          if ($c2 >= "\x80" && $c2 <= "\xBF" && $c3 >= "\x80" && $c3 <= "\xBF") { // yeah, almost sure it's UTF8 already
7282 15
            $buf .= $c1 . $c2 . $c3;
7283 15
            $i += 2;
7284 15
          } else { // not valid UTF8 - convert it
7285 11
            $buf .= self::to_utf8_convert($c1);
7286
          }
7287
7288 22
        } elseif ($c1 >= "\xF0" && $c1 <= "\xF7") { // looks like 4 bytes UTF8
7289
7290 12
          $c2 = $i + 1 >= $max ? "\x00" : $str[$i + 1];
7291 12
          $c3 = $i + 2 >= $max ? "\x00" : $str[$i + 2];
7292 12
          $c4 = $i + 3 >= $max ? "\x00" : $str[$i + 3];
7293
7294 12
          if ($c2 >= "\x80" && $c2 <= "\xBF" && $c3 >= "\x80" && $c3 <= "\xBF" && $c4 >= "\x80" && $c4 <= "\xBF") { // yeah, almost sure it's UTF8 already
7295 5
            $buf .= $c1 . $c2 . $c3 . $c4;
7296 5
            $i += 3;
7297 5
          } else { // not valid UTF8 - convert it
7298 9
            $buf .= self::to_utf8_convert($c1);
7299
          }
7300
7301 12
        } else { // doesn't look like UTF8, but should be converted
7302 9
          $buf .= self::to_utf8_convert($c1);
7303
        }
7304
7305 22
      } elseif (($c1 & "\xC0") === "\x80") { // needs conversion
7306
7307 5
        $buf .= self::to_utf8_convert($c1);
7308
7309 5
      } else { // it doesn't need conversion
7310 20
        $buf .= $c1;
7311
      }
7312 22
    }
7313
7314
    // decode unicode escape sequences
7315 22
    $buf = preg_replace_callback(
7316 22
        '/\\\\u([0-9a-f]{4})/i',
7317 22
        function ($match) {
7318 4
          return \mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
7319 22
        },
7320
        $buf
7321 22
    );
7322
7323
    // decode UTF-8 codepoints
7324 22
    if ($decodeHtmlEntityToUtf8 === true) {
7325 1
      $buf = self::html_entity_decode($buf);
7326 1
    }
7327
7328 22
    return $buf;
7329
  }
7330
7331
  /**
7332
   * @param int $int
7333
   *
7334
   * @return string
7335
   */
7336 16
  private static function to_utf8_convert($int)
7337
  {
7338 16
    $buf = '';
7339
7340 16
    $ordC1 = ord($int);
7341 16
    if (isset(self::$WIN1252_TO_UTF8[$ordC1])) { // found in Windows-1252 special cases
7342 2
      $buf .= self::$WIN1252_TO_UTF8[$ordC1];
7343 2
    } else {
7344 16
      $cc1 = self::chr_and_parse_int($ordC1 / 64) | "\xC0";
7345 16
      $cc2 = ($int & "\x3F") | "\x80";
7346 16
      $buf .= $cc1 . $cc2;
7347
    }
7348
7349 16
    return $buf;
7350
  }
7351
7352
  /**
7353
   * Strip whitespace or other characters from beginning or end of a UTF-8 string.
7354
   *
7355
   * INFO: This is slower then "trim()"
7356
   *
7357
   * We can only use the original-function, if we use <= 7-Bit in the string / chars
7358
   * but the check for ACSII (7-Bit) cost more time, then we can safe here.
7359
   *
7360
   * @param string $str   <p>The string to be trimmed</p>
7361
   * @param string $chars [optional] <p>Optional characters to be stripped</p>
7362
   *
7363
   * @return string <p>The trimmed string.</p>
7364
   */
7365 26
  public static function trim($str = '', $chars = INF)
7366
  {
7367 26
    $str = (string)$str;
7368
7369 26
    if (!isset($str[0])) {
7370 5
      return '';
7371
    }
7372
7373
    // Info: http://nadeausoftware.com/articles/2007/9/php_tip_how_strip_punctuation_characters_web_page#Unicodecharactercategories
7374 22
    if ($chars === INF || !$chars) {
7375 6
      return preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);
7376
    }
7377
7378 16
    return self::rtrim(self::ltrim($str, $chars), $chars);
7379
  }
7380
7381
  /**
7382
   * Makes string's first char uppercase.
7383
   *
7384
   * @param string  $str       <p>The input string.</p>
7385
   * @param string  $encoding  [optional] <p>Set the charset.</p>
7386
   * @param boolean $cleanUtf8 [optional] <p>Remove non UTF-8 chars from the string.</p>
7387
   *
7388
   * @return string <p>The resulting string</p>
7389
   */
7390 14
  public static function ucfirst($str, $encoding = 'UTF-8', $cleanUtf8 = false)
7391
  {
7392 14
    if ($cleanUtf8 === true) {
7393
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
7394
      // if invalid characters are found in $haystack before $needle
7395 1
      $str = self::clean($str);
7396 1
    }
7397
7398 14
    $strPartTwo = self::substr($str, 1, null, $encoding);
7399 14
    if ($strPartTwo === false) {
7400
      $strPartTwo = '';
7401
    }
7402
7403 14
    $strPartOne = self::strtoupper(
7404 14
        (string)self::substr($str, 0, 1, $encoding),
7405 14
        $encoding,
7406
        $cleanUtf8
7407 14
    );
7408
7409 14
    return $strPartOne . $strPartTwo;
7410
  }
7411
7412
  /**
7413
   * alias for "UTF8::ucfirst()"
7414
   *
7415
   * @see UTF8::ucfirst()
7416
   *
7417
   * @param string  $word
7418
   * @param string  $encoding
7419
   * @param boolean $cleanUtf8
7420
   *
7421
   * @return string
7422
   */
7423 1
  public static function ucword($word, $encoding = 'UTF-8', $cleanUtf8 = false)
7424
  {
7425 1
    return self::ucfirst($word, $encoding, $cleanUtf8);
7426
  }
7427
7428
  /**
7429
   * Uppercase for all words in the string.
7430
   *
7431
   * @param string   $str        <p>The input string.</p>
7432
   * @param string[] $exceptions [optional] <p>Exclusion for some words.</p>
7433
   * @param string   $charlist   [optional] <p>Additional chars that contains to words and do not start a new word.</p>
7434
   * @param string   $encoding   [optional] <p>Set the charset.</p>
7435
   * @param boolean  $cleanUtf8  [optional] <p>Remove non UTF-8 chars from the string.</p>
7436
   *
7437
   * @return string
7438
   */
7439 8
  public static function ucwords($str, $exceptions = array(), $charlist = '', $encoding = 'UTF-8', $cleanUtf8 = false)
7440
  {
7441 8
    if (!$str) {
7442 2
      return '';
7443
    }
7444
7445
    // INFO: mb_convert_case($str, MB_CASE_TITLE);
7446
    // -> MB_CASE_TITLE didn't only uppercase the first letter, it also lowercase all other letters
7447
7448 7
    if ($cleanUtf8 === true) {
7449
      // "\mb_strpos" and "\iconv_strpos" returns wrong position,
7450
      // if invalid characters are found in $haystack before $needle
7451 1
      $str = self::clean($str);
7452 1
    }
7453
7454 7
    $usePhpDefaultFunctions = !(bool)($charlist . implode('', $exceptions));
7455
7456
    if (
7457
        $usePhpDefaultFunctions === true
7458 7
        &&
7459 7
        self::is_ascii($str) === true
7460 7
    ) {
7461
      return ucwords($str);
7462
    }
7463
7464 7
    $words = self::str_to_words($str, $charlist);
7465 7
    $newWords = array();
7466
7467 7
    if (count($exceptions) > 0) {
7468 1
      $useExceptions = true;
7469 1
    } else {
7470 7
      $useExceptions = false;
7471
    }
7472
7473 7 View Code Duplication
    foreach ($words as $word) {
7474
7475 7
      if (!$word) {
7476 7
        continue;
7477
      }
7478
7479
      if (
7480
          $useExceptions === false
7481 7
          ||
7482
          (
7483
              $useExceptions === true
7484 1
              &&
7485 1
              !in_array($word, $exceptions, true)
7486 1
          )
7487 7
      ) {
7488 7
        $word = self::ucfirst($word, $encoding);
7489 7
      }
7490
7491 7
      $newWords[] = $word;
7492 7
    }
7493
7494 7
    return implode('', $newWords);
7495
  }
7496
7497
  /**
7498
   * Multi decode html entity & fix urlencoded-win1252-chars.
7499
   *
7500
   * e.g:
7501
   * 'test+test'                     => 'test test'
7502
   * 'D&#252;sseldorf'               => 'Düsseldorf'
7503
   * 'D%FCsseldorf'                  => 'Düsseldorf'
7504
   * 'D&#xFC;sseldorf'               => 'Düsseldorf'
7505
   * 'D%26%23xFC%3Bsseldorf'         => 'Düsseldorf'
7506
   * 'Düsseldorf'                   => 'Düsseldorf'
7507
   * 'D%C3%BCsseldorf'               => 'Düsseldorf'
7508
   * 'D%C3%83%C2%BCsseldorf'         => 'Düsseldorf'
7509
   * 'D%25C3%2583%25C2%25BCsseldorf' => 'Düsseldorf'
7510
   *
7511
   * @param string $str          <p>The input string.</p>
7512
   * @param bool   $multi_decode <p>Decode as often as possible.</p>
7513
   *
7514
   * @return string
7515
   */
7516 1 View Code Duplication
  public static function urldecode($str, $multi_decode = true)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
7517
  {
7518 1
    $str = (string)$str;
7519
7520 1
    if (!isset($str[0])) {
7521 1
      return '';
7522
    }
7523
7524 1
    $pattern = '/%u([0-9a-f]{3,4})/i';
7525 1
    if (preg_match($pattern, $str)) {
7526 1
      $str = preg_replace($pattern, '&#x\\1;', urldecode($str));
7527 1
    }
7528
7529 1
    $flags = Bootup::is_php('5.4') === true ? ENT_QUOTES | ENT_HTML5 : ENT_QUOTES;
7530
7531
    do {
7532 1
      $str_compare = $str;
7533
7534 1
      $str = self::fix_simple_utf8(
7535 1
          urldecode(
7536 1
              self::html_entity_decode(
7537 1
                  self::to_utf8($str),
0 ignored issues
show
Bug introduced by
It seems like self::to_utf8($str) targeting voku\helper\UTF8::to_utf8() can also be of type array; however, voku\helper\UTF8::html_entity_decode() does only seem to accept string, maybe add an additional type check?

This check looks at variables that are passed out again to other methods.

If the outgoing method call has stricter type requirements than the method itself, an issue is raised.

An additional type check may prevent trouble.

Loading history...
7538
                  $flags
7539 1
              )
7540 1
          )
7541 1
      );
7542
7543 1
    } while ($multi_decode === true && $str_compare !== $str);
7544
7545 1
    return (string)$str;
7546
  }
7547
7548
  /**
7549
   * Return a array with "urlencoded"-win1252 -> UTF-8
7550
   *
7551
   * @deprecated <p>use the "UTF8::urldecode()" function to decode a string</p>
7552
   *
7553
   * @return array
7554
   */
7555
  public static function urldecode_fix_win1252_chars()
7556
  {
7557
    return array(
7558
        '%20' => ' ',
7559
        '%21' => '!',
7560
        '%22' => '"',
7561
        '%23' => '#',
7562
        '%24' => '$',
7563
        '%25' => '%',
7564
        '%26' => '&',
7565
        '%27' => "'",
7566
        '%28' => '(',
7567
        '%29' => ')',
7568
        '%2A' => '*',
7569
        '%2B' => '+',
7570
        '%2C' => ',',
7571
        '%2D' => '-',
7572
        '%2E' => '.',
7573
        '%2F' => '/',
7574
        '%30' => '0',
7575
        '%31' => '1',
7576
        '%32' => '2',
7577
        '%33' => '3',
7578
        '%34' => '4',
7579
        '%35' => '5',
7580
        '%36' => '6',
7581
        '%37' => '7',
7582
        '%38' => '8',
7583
        '%39' => '9',
7584
        '%3A' => ':',
7585
        '%3B' => ';',
7586
        '%3C' => '<',
7587
        '%3D' => '=',
7588
        '%3E' => '>',
7589
        '%3F' => '?',
7590
        '%40' => '@',
7591
        '%41' => 'A',
7592
        '%42' => 'B',
7593
        '%43' => 'C',
7594
        '%44' => 'D',
7595
        '%45' => 'E',
7596
        '%46' => 'F',
7597
        '%47' => 'G',
7598
        '%48' => 'H',
7599
        '%49' => 'I',
7600
        '%4A' => 'J',
7601
        '%4B' => 'K',
7602
        '%4C' => 'L',
7603
        '%4D' => 'M',
7604
        '%4E' => 'N',
7605
        '%4F' => 'O',
7606
        '%50' => 'P',
7607
        '%51' => 'Q',
7608
        '%52' => 'R',
7609
        '%53' => 'S',
7610
        '%54' => 'T',
7611
        '%55' => 'U',
7612
        '%56' => 'V',
7613
        '%57' => 'W',
7614
        '%58' => 'X',
7615
        '%59' => 'Y',
7616
        '%5A' => 'Z',
7617
        '%5B' => '[',
7618
        '%5C' => '\\',
7619
        '%5D' => ']',
7620
        '%5E' => '^',
7621
        '%5F' => '_',
7622
        '%60' => '`',
7623
        '%61' => 'a',
7624
        '%62' => 'b',
7625
        '%63' => 'c',
7626
        '%64' => 'd',
7627
        '%65' => 'e',
7628
        '%66' => 'f',
7629
        '%67' => 'g',
7630
        '%68' => 'h',
7631
        '%69' => 'i',
7632
        '%6A' => 'j',
7633
        '%6B' => 'k',
7634
        '%6C' => 'l',
7635
        '%6D' => 'm',
7636
        '%6E' => 'n',
7637
        '%6F' => 'o',
7638
        '%70' => 'p',
7639
        '%71' => 'q',
7640
        '%72' => 'r',
7641
        '%73' => 's',
7642
        '%74' => 't',
7643
        '%75' => 'u',
7644
        '%76' => 'v',
7645
        '%77' => 'w',
7646
        '%78' => 'x',
7647
        '%79' => 'y',
7648
        '%7A' => 'z',
7649
        '%7B' => '{',
7650
        '%7C' => '|',
7651
        '%7D' => '}',
7652
        '%7E' => '~',
7653
        '%7F' => '',
7654
        '%80' => '`',
7655
        '%81' => '',
7656
        '%82' => '‚',
7657
        '%83' => 'ƒ',
7658
        '%84' => '„',
7659
        '%85' => '…',
7660
        '%86' => '†',
7661
        '%87' => '‡',
7662
        '%88' => 'ˆ',
7663
        '%89' => '‰',
7664
        '%8A' => 'Š',
7665
        '%8B' => '‹',
7666
        '%8C' => 'Œ',
7667
        '%8D' => '',
7668
        '%8E' => 'Ž',
7669
        '%8F' => '',
7670
        '%90' => '',
7671
        '%91' => '‘',
7672
        '%92' => '’',
7673
        '%93' => '“',
7674
        '%94' => '”',
7675
        '%95' => '•',
7676
        '%96' => '–',
7677
        '%97' => '—',
7678
        '%98' => '˜',
7679
        '%99' => '™',
7680
        '%9A' => 'š',
7681
        '%9B' => '›',
7682
        '%9C' => 'œ',
7683
        '%9D' => '',
7684
        '%9E' => 'ž',
7685
        '%9F' => 'Ÿ',
7686
        '%A0' => '',
7687
        '%A1' => '¡',
7688
        '%A2' => '¢',
7689
        '%A3' => '£',
7690
        '%A4' => '¤',
7691
        '%A5' => '¥',
7692
        '%A6' => '¦',
7693
        '%A7' => '§',
7694
        '%A8' => '¨',
7695
        '%A9' => '©',
7696
        '%AA' => 'ª',
7697
        '%AB' => '«',
7698
        '%AC' => '¬',
7699
        '%AD' => '',
7700
        '%AE' => '®',
7701
        '%AF' => '¯',
7702
        '%B0' => '°',
7703
        '%B1' => '±',
7704
        '%B2' => '²',
7705
        '%B3' => '³',
7706
        '%B4' => '´',
7707
        '%B5' => 'µ',
7708
        '%B6' => '¶',
7709
        '%B7' => '·',
7710
        '%B8' => '¸',
7711
        '%B9' => '¹',
7712
        '%BA' => 'º',
7713
        '%BB' => '»',
7714
        '%BC' => '¼',
7715
        '%BD' => '½',
7716
        '%BE' => '¾',
7717
        '%BF' => '¿',
7718
        '%C0' => 'À',
7719
        '%C1' => 'Á',
7720
        '%C2' => 'Â',
7721
        '%C3' => 'Ã',
7722
        '%C4' => 'Ä',
7723
        '%C5' => 'Å',
7724
        '%C6' => 'Æ',
7725
        '%C7' => 'Ç',
7726
        '%C8' => 'È',
7727
        '%C9' => 'É',
7728
        '%CA' => 'Ê',
7729
        '%CB' => 'Ë',
7730
        '%CC' => 'Ì',
7731
        '%CD' => 'Í',
7732
        '%CE' => 'Î',
7733
        '%CF' => 'Ï',
7734
        '%D0' => 'Ð',
7735
        '%D1' => 'Ñ',
7736
        '%D2' => 'Ò',
7737
        '%D3' => 'Ó',
7738
        '%D4' => 'Ô',
7739
        '%D5' => 'Õ',
7740
        '%D6' => 'Ö',
7741
        '%D7' => '×',
7742
        '%D8' => 'Ø',
7743
        '%D9' => 'Ù',
7744
        '%DA' => 'Ú',
7745
        '%DB' => 'Û',
7746
        '%DC' => 'Ü',
7747
        '%DD' => 'Ý',
7748
        '%DE' => 'Þ',
7749
        '%DF' => 'ß',
7750
        '%E0' => 'à',
7751
        '%E1' => 'á',
7752
        '%E2' => 'â',
7753
        '%E3' => 'ã',
7754
        '%E4' => 'ä',
7755
        '%E5' => 'å',
7756
        '%E6' => 'æ',
7757
        '%E7' => 'ç',
7758
        '%E8' => 'è',
7759
        '%E9' => 'é',
7760
        '%EA' => 'ê',
7761
        '%EB' => 'ë',
7762
        '%EC' => 'ì',
7763
        '%ED' => 'í',
7764
        '%EE' => 'î',
7765
        '%EF' => 'ï',
7766
        '%F0' => 'ð',
7767
        '%F1' => 'ñ',
7768
        '%F2' => 'ò',
7769
        '%F3' => 'ó',
7770
        '%F4' => 'ô',
7771
        '%F5' => 'õ',
7772
        '%F6' => 'ö',
7773
        '%F7' => '÷',
7774
        '%F8' => 'ø',
7775
        '%F9' => 'ù',
7776
        '%FA' => 'ú',
7777
        '%FB' => 'û',
7778
        '%FC' => 'ü',
7779
        '%FD' => 'ý',
7780
        '%FE' => 'þ',
7781
        '%FF' => 'ÿ',
7782
    );
7783
  }
7784
7785
  /**
7786
   * Decodes an UTF-8 string to ISO-8859-1.
7787
   *
7788
   * @param string $str <p>The input string.</p>
7789
   *
7790
   * @return string
7791
   */
7792 6
  public static function utf8_decode($str)
7793
  {
7794
    // init
7795 6
    $str = (string)$str;
7796
7797 6
    if (!isset($str[0])) {
7798 3
      return '';
7799
    }
7800
7801 6
    $str = (string)self::to_utf8($str);
7802
7803 6
    static $UTF8_TO_WIN1252_KEYS_CACHE = null;
7804 6
    static $UTF8_TO_WIN1252_VALUES_CACHE = null;
7805
7806 6
    if ($UTF8_TO_WIN1252_KEYS_CACHE === null) {
7807 1
      $UTF8_TO_WIN1252_KEYS_CACHE = array_keys(self::$UTF8_TO_WIN1252);
7808 1
      $UTF8_TO_WIN1252_VALUES_CACHE = array_values(self::$UTF8_TO_WIN1252);
7809 1
    }
7810
7811
    /** @noinspection PhpInternalEntityUsedInspection */
7812 6
    $str = str_replace($UTF8_TO_WIN1252_KEYS_CACHE, $UTF8_TO_WIN1252_VALUES_CACHE, $str);
7813
7814 6
    if (!isset(self::$SUPPORT['already_checked_via_portable_utf8'])) {
7815
      self::checkForSupport();
7816
    }
7817
7818 6 View Code Duplication
    if (self::$SUPPORT['mbstring_func_overload'] === true) {
7819
      $len = \mb_strlen($str, '8BIT');
7820
    } else {
7821 6
      $len = strlen($str);
7822
    }
7823
7824
    /** @noinspection ForeachInvariantsInspection */
7825 6
    for ($i = 0, $j = 0; $i < $len; ++$i, ++$j) {
7826 6
      switch ($str[$i] & "\xF0") {
7827 6
        case "\xC0":
7828 6
        case "\xD0":
7829 6
          $c = (ord($str[$i] & "\x1F") << 6) | ord($str[++$i] & "\x3F");
7830 6
          $str[$j] = $c < 256 ? self::chr_and_parse_int($c) : '?';
7831 6
          break;
7832
7833
        /** @noinspection PhpMissingBreakStatementInspection */
7834 6
        case "\xF0":
0 ignored issues
show
Coding Style introduced by
There must be a comment when fall-through is intentional in a non-empty case body
Loading history...
7835
          ++$i;
7836 6
        case "\xE0":
7837 4
          $str[$j] = '?';
7838 4
          $i += 2;
7839 4
          break;
7840
7841 6
        default:
7842 6
          $str[$j] = $str[$i];
7843 6
      }
7844 6
    }
7845
7846 6
    return (string)self::substr($str, 0, $j, '8BIT');
7847
  }
7848
7849
  /**
7850
   * Encodes an ISO-8859-1 string to UTF-8.
7851
   *
7852
   * @param string $str <p>The input string.</p>
7853
   *
7854
   * @return string
7855
   */
7856 7
  public static function utf8_encode($str)
7857
  {
7858
    // init
7859 7
    $str = (string)$str;
7860
7861 7
    if (!isset($str[0])) {
7862 7
      return '';
7863
    }
7864
7865 7
    $strTmp = \utf8_encode($str);
7866
7867
    // the polyfill maybe return false
7868 7
    if ($strTmp === false) {
7869
      return '';
7870
    }
7871
7872 7
    $str = (string)$strTmp;
7873 7
    if (false === strpos($str, "\xC2")) {
7874 3
      return $str;
7875
    }
7876
7877 6
    static $CP1252_TO_UTF8_KEYS_CACHE = null;
7878 6
    static $CP1252_TO_UTF8_VALUES_CACHE = null;
7879
7880 6
    if ($CP1252_TO_UTF8_KEYS_CACHE === null) {
7881 1
      $CP1252_TO_UTF8_KEYS_CACHE = array_keys(self::$CP1252_TO_UTF8);
7882 1
      $CP1252_TO_UTF8_VALUES_CACHE = array_values(self::$CP1252_TO_UTF8);
7883 1
    }
7884
7885 6
    return str_replace($CP1252_TO_UTF8_KEYS_CACHE, $CP1252_TO_UTF8_VALUES_CACHE, $str);
7886
  }
7887
7888
  /**
7889
   * fix -> utf8-win1252 chars
7890
   *
7891
   * @param string $str <p>The input string.</p>
7892
   *
7893
   * @return string
7894
   *
7895
   * @deprecated <p>use "UTF8::fix_simple_utf8()"</p>
7896
   */
7897
  public static function utf8_fix_win1252_chars($str)
7898
  {
7899
    return self::fix_simple_utf8($str);
7900
  }
7901
7902
  /**
7903
   * Returns an array with all utf8 whitespace characters.
7904
   *
7905
   * @see   : http://www.bogofilter.org/pipermail/bogofilter/2003-March/001889.html
7906
   *
7907
   * @author: Derek E. [email protected]
7908
   *
7909
   * @return array <p>
7910
   *               An array with all known whitespace characters as values and the type of whitespace as keys
7911
   *               as defined in above URL.
7912
   *               </p>
7913
   */
7914 1
  public static function whitespace_table()
7915
  {
7916 1
    return self::$WHITESPACE_TABLE;
7917
  }
7918
7919
  /**
7920
   * Limit the number of words in a string.
7921
   *
7922
   * @param string $str      <p>The input string.</p>
7923
   * @param int    $limit    <p>The limit of words as integer.</p>
7924
   * @param string $strAddOn <p>Replacement for the striped string.</p>
7925
   *
7926
   * @return string
7927
   */
7928 1
  public static function words_limit($str, $limit = 100, $strAddOn = '...')
7929
  {
7930 1
    $str = (string)$str;
7931
7932 1
    if (!isset($str[0])) {
7933 1
      return '';
7934
    }
7935
7936
    // init
7937 1
    $limit = (int)$limit;
7938
7939 1
    if ($limit < 1) {
7940 1
      return '';
7941
    }
7942
7943 1
    preg_match('/^\s*+(?:\S++\s*+){1,' . $limit . '}/u', $str, $matches);
7944
7945
    if (
7946 1
        !isset($matches[0])
7947 1
        ||
7948 1
        self::strlen($str) === self::strlen($matches[0])
7949 1
    ) {
7950 1
      return $str;
7951
    }
7952
7953 1
    return self::rtrim($matches[0]) . $strAddOn;
7954
  }
7955
7956
  /**
7957
   * Wraps a string to a given number of characters
7958
   *
7959
   * @link  http://php.net/manual/en/function.wordwrap.php
7960
   *
7961
   * @param string $str   <p>The input string.</p>
7962
   * @param int    $width [optional] <p>The column width.</p>
7963
   * @param string $break [optional] <p>The line is broken using the optional break parameter.</p>
7964
   * @param bool   $cut   [optional] <p>
7965
   *                      If the cut is set to true, the string is
7966
   *                      always wrapped at or before the specified width. So if you have
7967
   *                      a word that is larger than the given width, it is broken apart.
7968
   *                      </p>
7969
   *
7970
   * @return string <p>The given string wrapped at the specified column.</p>
7971
   */
7972 10
  public static function wordwrap($str, $width = 75, $break = "\n", $cut = false)
7973
  {
7974 10
    $str = (string)$str;
7975 10
    $break = (string)$break;
7976
7977 10
    if (!isset($str[0], $break[0])) {
7978 3
      return '';
7979
    }
7980
7981 8
    $w = '';
7982 8
    $strSplit = explode($break, $str);
7983 8
    $count = count($strSplit);
7984
7985 8
    $chars = array();
7986
    /** @noinspection ForeachInvariantsInspection */
7987 8
    for ($i = 0; $i < $count; ++$i) {
7988
7989 8
      if ($i) {
7990 1
        $chars[] = $break;
7991 1
        $w .= '#';
7992 1
      }
7993
7994 8
      $c = $strSplit[$i];
7995 8
      unset($strSplit[$i]);
7996
7997 8
      foreach (self::split($c) as $c) {
7998 8
        $chars[] = $c;
7999 8
        $w .= ' ' === $c ? ' ' : '?';
8000 8
      }
8001 8
    }
8002
8003 8
    $strReturn = '';
8004 8
    $j = 0;
8005 8
    $b = $i = -1;
8006 8
    $w = wordwrap($w, $width, '#', $cut);
8007
8008 8
    while (false !== $b = self::strpos($w, '#', $b + 1)) {
8009 6
      for (++$i; $i < $b; ++$i) {
8010 6
        $strReturn .= $chars[$j];
8011 6
        unset($chars[$j++]);
8012 6
      }
8013
8014 6
      if ($break === $chars[$j] || ' ' === $chars[$j]) {
8015 3
        unset($chars[$j++]);
8016 3
      }
8017
8018 6
      $strReturn .= $break;
8019 6
    }
8020
8021 8
    return $strReturn . implode('', $chars);
8022
  }
8023
8024
  /**
8025
   * Returns an array of Unicode White Space characters.
8026
   *
8027
   * @return array <p>An array with numeric code point as key and White Space Character as value.</p>
8028
   */
8029 1
  public static function ws()
8030
  {
8031 1
    return self::$WHITESPACE;
8032
  }
8033
8034
}
8035