Passed
Branch development (e0e718)
by Nils
04:45
created

AntiXSS   D

Complexity

Total Complexity 72

Size/Duplication

Total Lines 2811
Duplicated Lines 1.96 %

Coupling/Cohesion

Components 2
Dependencies 2

Importance

Changes 0
Metric Value
dl 55
loc 2811
rs 4.6746
c 0
b 0
f 0
wmc 72
lcom 2
cbo 2

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like AntiXSS often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use AntiXSS, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace protect\AntiXSS;
4
5
use protect\AntiXSS\bootup;
6
use protect\AntiXSS\UTF8;
7
require_once(dirname(__FILE__)."/bootup.php");
8
require_once(dirname(__FILE__)."/UTF8.php");
9
10
/**
11
 * Anti XSS library
12
 *
13
 * ported from "CodeIgniter"
14
 *
15
 * @author      EllisLab Dev Team
16
 * @author      Lars Moelleken
17
 * @copyright   Copyright (c) 2008 - 2014, EllisLab, Inc. (http://ellislab.com/)
18
 * @copyright   Copyright (c) 2014 - 2015, British Columbia Institute of Technology (http://bcit.ca/)
19
 * @copyright   Copyright (c) 2015 - 2017, Lars Moelleken (https://moelleken.org/)
20
 *
21
 * @license     http://opensource.org/licenses/MIT	MIT License
22
 */
23
final class AntiXSS
24
{
25
26
  /**
27
   * @var array
28
   */
29
  private static $entitiesFallback = array(
30
      "\t" => '&Tab;',
31
      "\n" => '&NewLine;',
32
      '!'  => '&excl;',
33
      '"'  => '&quot;',
34
      '#'  => '&num;',
35
      '$'  => '&dollar;',
36
      '%'  => '&percnt;',
37
      '&'  => '&amp;',
38
      "'"  => '&apos;',
39
      '('  => '&lpar;',
40
      ')'  => '&rpar;',
41
      '*'  => '&ast;',
42
      '+'  => '&plus;',
43
      ','  => '&comma;',
44
      '.'  => '&period;',
45
      '/'  => '&sol;',
46
      ':'  => '&colon;',
47
      ';'  => '&semi;',
48
      '<'  => '&lt;',
49
      '<⃒' => '&nvlt;',
50
      '='  => '&equals;',
51
      '=⃥' => '&bne;',
52
      '>'  => '&gt;',
53
      '>⃒' => '&nvgt',
54
      '?'  => '&quest;',
55
      '@'  => '&commat;',
56
      '['  => '&lbrack;',
57
      ']'  => '&rsqb;',
58
      '^'  => '&Hat;',
59
      '_'  => '&lowbar;',
60
      '`'  => '&grave;',
61
      'fj' => '&fjlig;',
62
      '{'  => '&lbrace;',
63
      '|'  => '&vert;',
64
      '}'  => '&rcub;',
65
      ' '  => '&nbsp;',
66
      '¡'  => '&iexcl;',
67
      '¢'  => '&cent;',
68
      '£'  => '&pound;',
69
      '¤'  => '&curren;',
70
      '¥'  => '&yen;',
71
      '¦'  => '&brvbar;',
72
      '§'  => '&sect;',
73
      '¨'  => '&DoubleDot;',
74
      '©'  => '&copy;',
75
      'ª'  => '&ordf;',
76
      '«'  => '&laquo;',
77
      '¬'  => '&not;',
78
      '­'  => '&shy;',
79
      '®'  => '&reg;',
80
      '¯'  => '&macr;',
81
      '°'  => '&deg;',
82
      '±'  => '&plusmn;',
83
      '²'  => '&sup2;',
84
      '³'  => '&sup3;',
85
      '´'  => '&DiacriticalAcute;',
86
      'µ'  => '&micro;',
87
      '¶'  => '&para;',
88
      '·'  => '&CenterDot;',
89
      '¸'  => '&Cedilla;',
90
      '¹'  => '&sup1;',
91
      'º'  => '&ordm;',
92
      '»'  => '&raquo;',
93
      '¼'  => '&frac14;',
94
      '½'  => '&half;',
95
      '¾'  => '&frac34;',
96
      '¿'  => '&iquest;',
97
      'À'  => '&Agrave;',
98
      'Á'  => '&Aacute;',
99
      'Â'  => '&Acirc;',
100
      'Ã'  => '&Atilde;',
101
      'Ä'  => '&Auml;',
102
      'Å'  => '&Aring;',
103
      'Æ'  => '&AElig;',
104
      'Ç'  => '&Ccedil;',
105
      'È'  => '&Egrave;',
106
      'É'  => '&Eacute;',
107
      'Ê'  => '&Ecirc;',
108
      'Ë'  => '&Euml;',
109
      'Ì'  => '&Igrave;',
110
      'Í'  => '&Iacute;',
111
      'Î'  => '&Icirc;',
112
      'Ï'  => '&Iuml;',
113
      'Ð'  => '&ETH;',
114
      'Ñ'  => '&Ntilde;',
115
      'Ò'  => '&Ograve;',
116
      'Ó'  => '&Oacute;',
117
      'Ô'  => '&Ocirc;',
118
      'Õ'  => '&Otilde;',
119
      'Ö'  => '&Ouml;',
120
      '×'  => '&times;',
121
      'Ø'  => '&Oslash;',
122
      'Ù'  => '&Ugrave;',
123
      'Ú'  => '&Uacute;',
124
      'Û'  => '&Ucirc;',
125
      'Ü'  => '&Uuml;',
126
      'Ý'  => '&Yacute;',
127
      'Þ'  => '&THORN;',
128
      'ß'  => '&szlig;',
129
      'à'  => '&agrave;',
130
      'á'  => '&aacute;',
131
      'â'  => '&acirc;',
132
      'ã'  => '&atilde;',
133
      'ä'  => '&auml;',
134
      'å'  => '&aring;',
135
      'æ'  => '&aelig;',
136
      'ç'  => '&ccedil;',
137
      'è'  => '&egrave;',
138
      'é'  => '&eacute;',
139
      'ê'  => '&ecirc;',
140
      'ë'  => '&euml;',
141
      'ì'  => '&igrave;',
142
      'í'  => '&iacute;',
143
      'î'  => '&icirc;',
144
      'ï'  => '&iuml;',
145
      'ð'  => '&eth;',
146
      'ñ'  => '&ntilde;',
147
      'ò'  => '&ograve;',
148
      'ó'  => '&oacute;',
149
      'ô'  => '&ocirc;',
150
      'õ'  => '&otilde;',
151
      'ö'  => '&ouml;',
152
      '÷'  => '&divide;',
153
      'ø'  => '&oslash;',
154
      'ù'  => '&ugrave;',
155
      'ú'  => '&uacute;',
156
      'û'  => '&ucirc;',
157
      'ü'  => '&uuml;',
158
      'ý'  => '&yacute;',
159
      'þ'  => '&thorn;',
160
      'ÿ'  => '&yuml;',
161
      'Ā'  => '&Amacr;',
162
      'ā'  => '&amacr;',
163
      'Ă'  => '&Abreve;',
164
      'ă'  => '&abreve;',
165
      'Ą'  => '&Aogon;',
166
      'ą'  => '&aogon;',
167
      'Ć'  => '&Cacute;',
168
      'ć'  => '&cacute;',
169
      'Ĉ'  => '&Ccirc;',
170
      'ĉ'  => '&ccirc;',
171
      'Ċ'  => '&Cdot;',
172
      'ċ'  => '&cdot;',
173
      'Č'  => '&Ccaron;',
174
      'č'  => '&ccaron;',
175
      'Ď'  => '&Dcaron;',
176
      'ď'  => '&dcaron;',
177
      'Đ'  => '&Dstrok;',
178
      'đ'  => '&dstrok;',
179
      'Ē'  => '&Emacr;',
180
      'ē'  => '&emacr;',
181
      'Ė'  => '&Edot;',
182
      'ė'  => '&edot;',
183
      'Ę'  => '&Eogon;',
184
      'ę'  => '&eogon;',
185
      'Ě'  => '&Ecaron;',
186
      'ě'  => '&ecaron;',
187
      'Ĝ'  => '&Gcirc;',
188
      'ĝ'  => '&gcirc;',
189
      'Ğ'  => '&Gbreve;',
190
      'ğ'  => '&gbreve;',
191
      'Ġ'  => '&Gdot;',
192
      'ġ'  => '&gdot;',
193
      'Ģ'  => '&Gcedil;',
194
      'Ĥ'  => '&Hcirc;',
195
      'ĥ'  => '&hcirc;',
196
      'Ħ'  => '&Hstrok;',
197
      'ħ'  => '&hstrok;',
198
      'Ĩ'  => '&Itilde;',
199
      'ĩ'  => '&itilde;',
200
      'Ī'  => '&Imacr;',
201
      'ī'  => '&imacr;',
202
      'Į'  => '&Iogon;',
203
      'į'  => '&iogon;',
204
      'İ'  => '&Idot;',
205
      'ı'  => '&inodot;',
206
      'IJ'  => '&IJlig;',
207
      'ij'  => '&ijlig;',
208
      'Ĵ'  => '&Jcirc;',
209
      'ĵ'  => '&jcirc;',
210
      'Ķ'  => '&Kcedil;',
211
      'ķ'  => '&kcedil;',
212
      'ĸ'  => '&kgreen;',
213
      'Ĺ'  => '&Lacute;',
214
      'ĺ'  => '&lacute;',
215
      'Ļ'  => '&Lcedil;',
216
      'ļ'  => '&lcedil;',
217
      'Ľ'  => '&Lcaron;',
218
      'ľ'  => '&lcaron;',
219
      'Ŀ'  => '&Lmidot;',
220
      'ŀ'  => '&lmidot;',
221
      'Ł'  => '&Lstrok;',
222
      'ł'  => '&lstrok;',
223
      'Ń'  => '&Nacute;',
224
      'ń'  => '&nacute;',
225
      'Ņ'  => '&Ncedil;',
226
      'ņ'  => '&ncedil;',
227
      'Ň'  => '&Ncaron;',
228
      'ň'  => '&ncaron;',
229
      'ʼn'  => '&napos;',
230
      'Ŋ'  => '&ENG;',
231
      'ŋ'  => '&eng;',
232
      'Ō'  => '&Omacr;',
233
      'ō'  => '&omacr;',
234
      'Ő'  => '&Odblac;',
235
      'ő'  => '&odblac;',
236
      'Œ'  => '&OElig;',
237
      'œ'  => '&oelig;',
238
      'Ŕ'  => '&Racute;',
239
      'ŕ'  => '&racute;',
240
      'Ŗ'  => '&Rcedil;',
241
      'ŗ'  => '&rcedil;',
242
      'Ř'  => '&Rcaron;',
243
      'ř'  => '&rcaron;',
244
      'Ś'  => '&Sacute;',
245
      'ś'  => '&sacute;',
246
      'Ŝ'  => '&Scirc;',
247
      'ŝ'  => '&scirc;',
248
      'Ş'  => '&Scedil;',
249
      'ş'  => '&scedil;',
250
      'Š'  => '&Scaron;',
251
      'š'  => '&scaron;',
252
      'Ţ'  => '&Tcedil;',
253
      'ţ'  => '&tcedil;',
254
      'Ť'  => '&Tcaron;',
255
      'ť'  => '&tcaron;',
256
      'Ŧ'  => '&Tstrok;',
257
      'ŧ'  => '&tstrok;',
258
      'Ũ'  => '&Utilde;',
259
      'ũ'  => '&utilde;',
260
      'Ū'  => '&Umacr;',
261
      'ū'  => '&umacr;',
262
      'Ŭ'  => '&Ubreve;',
263
      'ŭ'  => '&ubreve;',
264
      'Ů'  => '&Uring;',
265
      'ů'  => '&uring;',
266
      'Ű'  => '&Udblac;',
267
      'ű'  => '&udblac;',
268
      'Ų'  => '&Uogon;',
269
      'ų'  => '&uogon;',
270
      'Ŵ'  => '&Wcirc;',
271
      'ŵ'  => '&wcirc;',
272
      'Ŷ'  => '&Ycirc;',
273
      'ŷ'  => '&ycirc;',
274
      'Ÿ'  => '&Yuml;',
275
      'Ź'  => '&Zacute;',
276
      'ź'  => '&zacute;',
277
      'Ż'  => '&Zdot;',
278
      'ż'  => '&zdot;',
279
      'Ž'  => '&Zcaron;',
280
      'ž'  => '&zcaron;',
281
      'ƒ'  => '&fnof;',
282
      'Ƶ'  => '&imped;',
283
      'ǵ'  => '&gacute;',
284
      'ȷ'  => '&jmath;',
285
      'ˆ'  => '&circ;',
286
      'ˇ'  => '&Hacek;',
287
      '˘'  => '&Breve;',
288
      '˙'  => '&dot;',
289
      '˚'  => '&ring;',
290
      '˛'  => '&ogon;',
291
      '˜'  => '&DiacriticalTilde;',
292
      '˝'  => '&DiacriticalDoubleAcute;',
293
      '̑'  => '&DownBreve;',
294
      'Α'  => '&Alpha;',
295
      'Β'  => '&Beta;',
296
      'Γ'  => '&Gamma;',
297
      'Δ'  => '&Delta;',
298
      'Ε'  => '&Epsilon;',
299
      'Ζ'  => '&Zeta;',
300
      'Η'  => '&Eta;',
301
      'Θ'  => '&Theta;',
302
      'Ι'  => '&Iota;',
303
      'Κ'  => '&Kappa;',
304
      'Λ'  => '&Lambda;',
305
      'Μ'  => '&Mu;',
306
      'Ν'  => '&Nu;',
307
      'Ξ'  => '&Xi;',
308
      'Ο'  => '&Omicron;',
309
      'Π'  => '&Pi;',
310
      'Ρ'  => '&Rho;',
311
      'Σ'  => '&Sigma;',
312
      'Τ'  => '&Tau;',
313
      'Υ'  => '&Upsilon;',
314
      'Φ'  => '&Phi;',
315
      'Χ'  => '&Chi;',
316
      'Ψ'  => '&Psi;',
317
      'Ω'  => '&Omega;',
318
      'α'  => '&alpha;',
319
      'β'  => '&beta;',
320
      'γ'  => '&gamma;',
321
      'δ'  => '&delta;',
322
      'ε'  => '&epsi;',
323
      'ζ'  => '&zeta;',
324
      'η'  => '&eta;',
325
      'θ'  => '&theta;',
326
      'ι'  => '&iota;',
327
      'κ'  => '&kappa;',
328
      'λ'  => '&lambda;',
329
      'μ'  => '&mu;',
330
      'ν'  => '&nu;',
331
      'ξ'  => '&xi;',
332
      'ο'  => '&omicron;',
333
      'π'  => '&pi;',
334
      'ρ'  => '&rho;',
335
      'ς'  => '&sigmav;',
336
      'σ'  => '&sigma;',
337
      'τ'  => '&tau;',
338
      'υ'  => '&upsi;',
339
      'φ'  => '&phi;',
340
      'χ'  => '&chi;',
341
      'ψ'  => '&psi;',
342
      'ω'  => '&omega;',
343
      'ϑ'  => '&thetasym;',
344
      'ϒ'  => '&upsih;',
345
      'ϕ'  => '&straightphi;',
346
      'ϖ'  => '&piv;',
347
      'Ϝ'  => '&Gammad;',
348
      'ϝ'  => '&gammad;',
349
      'ϰ'  => '&varkappa;',
350
      'ϱ'  => '&rhov;',
351
      'ϵ'  => '&straightepsilon;',
352
      '϶'  => '&backepsilon;',
353
      'Ё'  => '&IOcy;',
354
      'Ђ'  => '&DJcy;',
355
      'Ѓ'  => '&GJcy;',
356
      'Є'  => '&Jukcy;',
357
      'Ѕ'  => '&DScy;',
358
      'І'  => '&Iukcy;',
359
      'Ї'  => '&YIcy;',
360
      'Ј'  => '&Jsercy;',
361
      'Љ'  => '&LJcy;',
362
      'Њ'  => '&NJcy;',
363
      'Ћ'  => '&TSHcy;',
364
      'Ќ'  => '&KJcy;',
365
      'Ў'  => '&Ubrcy;',
366
      'Џ'  => '&DZcy;',
367
      'А'  => '&Acy;',
368
      'Б'  => '&Bcy;',
369
      'В'  => '&Vcy;',
370
      'Г'  => '&Gcy;',
371
      'Д'  => '&Dcy;',
372
      'Е'  => '&IEcy;',
373
      'Ж'  => '&ZHcy;',
374
      'З'  => '&Zcy;',
375
      'И'  => '&Icy;',
376
      'Й'  => '&Jcy;',
377
      'К'  => '&Kcy;',
378
      'Л'  => '&Lcy;',
379
      'М'  => '&Mcy;',
380
      'Н'  => '&Ncy;',
381
      'О'  => '&Ocy;',
382
      'П'  => '&Pcy;',
383
      'Р'  => '&Rcy;',
384
      'С'  => '&Scy;',
385
      'Т'  => '&Tcy;',
386
      'У'  => '&Ucy;',
387
      'Ф'  => '&Fcy;',
388
      'Х'  => '&KHcy;',
389
      'Ц'  => '&TScy;',
390
      'Ч'  => '&CHcy;',
391
      'Ш'  => '&SHcy;',
392
      'Щ'  => '&SHCHcy;',
393
      'Ъ'  => '&HARDcy;',
394
      'Ы'  => '&Ycy;',
395
      'Ь'  => '&SOFTcy;',
396
      'Э'  => '&Ecy;',
397
      'Ю'  => '&YUcy;',
398
      'Я'  => '&YAcy;',
399
      'а'  => '&acy;',
400
      'б'  => '&bcy;',
401
      'в'  => '&vcy;',
402
      'г'  => '&gcy;',
403
      'д'  => '&dcy;',
404
      'е'  => '&iecy;',
405
      'ж'  => '&zhcy;',
406
      'з'  => '&zcy;',
407
      'и'  => '&icy;',
408
      'й'  => '&jcy;',
409
      'к'  => '&kcy;',
410
      'л'  => '&lcy;',
411
      'м'  => '&mcy;',
412
      'н'  => '&ncy;',
413
      'о'  => '&ocy;',
414
      'п'  => '&pcy;',
415
      'р'  => '&rcy;',
416
      'с'  => '&scy;',
417
      'т'  => '&tcy;',
418
      'у'  => '&ucy;',
419
      'ф'  => '&fcy;',
420
      'х'  => '&khcy;',
421
      'ц'  => '&tscy;',
422
      'ч'  => '&chcy;',
423
      'ш'  => '&shcy;',
424
      'щ'  => '&shchcy;',
425
      'ъ'  => '&hardcy;',
426
      'ы'  => '&ycy;',
427
      'ь'  => '&softcy;',
428
      'э'  => '&ecy;',
429
      'ю'  => '&yucy;',
430
      'я'  => '&yacy;',
431
      'ё'  => '&iocy;',
432
      'ђ'  => '&djcy;',
433
      'ѓ'  => '&gjcy;',
434
      'є'  => '&jukcy;',
435
      'ѕ'  => '&dscy;',
436
      'і'  => '&iukcy;',
437
      'ї'  => '&yicy;',
438
      'ј'  => '&jsercy;',
439
      'љ'  => '&ljcy;',
440
      'њ'  => '&njcy;',
441
      'ћ'  => '&tshcy;',
442
      'ќ'  => '&kjcy;',
443
      'ў'  => '&ubrcy;',
444
      'џ'  => '&dzcy;',
445
      ' '  => '&ensp;',
446
      ' '  => '&emsp;',
447
      ' '  => '&emsp13;',
448
      ' '  => '&emsp14;',
449
      ' '  => '&numsp;',
450
      ' '  => '&puncsp;',
451
      ' '  => '&ThinSpace;',
452
      ' '  => '&hairsp;',
453
      '​'  => '&ZeroWidthSpace;',
454
      '‌'  => '&zwnj;',
455
      '‍'  => '&zwj;',
456
      '‎'  => '&lrm;',
457
      '‏'  => '&rlm;',
458
      '‐'  => '&hyphen;',
459
      '–'  => '&ndash;',
460
      '—'  => '&mdash;',
461
      '―'  => '&horbar;',
462
      '‖'  => '&Verbar;',
463
      '‘'  => '&OpenCurlyQuote;',
464
      '’'  => '&rsquo;',
465
      '‚'  => '&sbquo;',
466
      '“'  => '&OpenCurlyDoubleQuote;',
467
      '”'  => '&rdquo;',
468
      '„'  => '&bdquo;',
469
      '†'  => '&dagger;',
470
      '‡'  => '&Dagger;',
471
      '•'  => '&bull;',
472
      '‥'  => '&nldr;',
473
      '…'  => '&hellip;',
474
      '‰'  => '&permil;',
475
      '‱'  => '&pertenk;',
476
      '′'  => '&prime;',
477
      '″'  => '&Prime;',
478
      '‴'  => '&tprime;',
479
      '‵'  => '&backprime;',
480
      '‹'  => '&lsaquo;',
481
      '›'  => '&rsaquo;',
482
      '‾'  => '&oline;',
483
      '⁁'  => '&caret;',
484
      '⁃'  => '&hybull;',
485
      '⁄'  => '&frasl;',
486
      '⁏'  => '&bsemi;',
487
      '⁗'  => '&qprime;',
488
      ' '  => '&MediumSpace;',
489
      '  ' => '&ThickSpace;',
490
      '⁠'  => '&NoBreak;',
491
      '⁡'  => '&af;',
492
      '⁢'  => '&InvisibleTimes;',
493
      '⁣'  => '&ic;',
494
      '€'  => '&euro;',
495
      '⃛'  => '&TripleDot;',
496
      '⃜'  => '&DotDot;',
497
      'ℂ'  => '&complexes;',
498
      '℅'  => '&incare;',
499
      'ℊ'  => '&gscr;',
500
      'ℋ'  => '&HilbertSpace;',
501
      'ℌ'  => '&Hfr;',
502
      'ℍ'  => '&Hopf;',
503
      'ℎ'  => '&planckh;',
504
      'ℏ'  => '&planck;',
505
      'ℐ'  => '&imagline;',
506
      'ℑ'  => '&Ifr;',
507
      'ℒ'  => '&lagran;',
508
      'ℓ'  => '&ell;',
509
      'ℕ'  => '&naturals;',
510
      '№'  => '&numero;',
511
      '℗'  => '&copysr;',
512
      '℘'  => '&wp;',
513
      'ℙ'  => '&primes;',
514
      'ℚ'  => '&rationals;',
515
      'ℛ'  => '&realine;',
516
      'ℜ'  => '&Rfr;',
517
      'ℝ'  => '&Ropf;',
518
      '℞'  => '&rx;',
519
      '™'  => '&trade;',
520
      'ℤ'  => '&Zopf;',
521
      '℧'  => '&mho;',
522
      'ℨ'  => '&Zfr;',
523
      '℩'  => '&iiota;',
524
      'ℬ'  => '&Bscr;',
525
      'ℭ'  => '&Cfr;',
526
      'ℯ'  => '&escr;',
527
      'ℰ'  => '&expectation;',
528
      'ℱ'  => '&Fouriertrf;',
529
      'ℳ'  => '&Mellintrf;',
530
      'ℴ'  => '&orderof;',
531
      'ℵ'  => '&aleph;',
532
      'ℶ'  => '&beth;',
533
      'ℷ'  => '&gimel;',
534
      'ℸ'  => '&daleth;',
535
      'ⅅ'  => '&CapitalDifferentialD;',
536
      'ⅆ'  => '&DifferentialD;',
537
      'ⅇ'  => '&exponentiale;',
538
      'ⅈ'  => '&ImaginaryI;',
539
      '⅓'  => '&frac13;',
540
      '⅔'  => '&frac23;',
541
      '⅕'  => '&frac15;',
542
      '⅖'  => '&frac25;',
543
      '⅗'  => '&frac35;',
544
      '⅘'  => '&frac45;',
545
      '⅙'  => '&frac16;',
546
      '⅚'  => '&frac56;',
547
      '⅛'  => '&frac18;',
548
      '⅜'  => '&frac38;',
549
      '⅝'  => '&frac58;',
550
      '⅞'  => '&frac78;',
551
      '←'  => '&larr;',
552
      '↑'  => '&uarr;',
553
      '→'  => '&srarr;',
554
      '↓'  => '&darr;',
555
      '↔'  => '&harr;',
556
      '↕'  => '&UpDownArrow;',
557
      '↖'  => '&nwarrow;',
558
      '↗'  => '&UpperRightArrow;',
559
      '↘'  => '&LowerRightArrow;',
560
      '↙'  => '&swarr;',
561
      '↚'  => '&nleftarrow;',
562
      '↛'  => '&nrarr;',
563
      '↝'  => '&rarrw;',
564
      '↝̸' => '&nrarrw;',
565
      '↞'  => '&Larr;',
566
      '↟'  => '&Uarr;',
567
      '↠'  => '&twoheadrightarrow;',
568
      '↡'  => '&Darr;',
569
      '↢'  => '&larrtl;',
570
      '↣'  => '&rarrtl;',
571
      '↤'  => '&LeftTeeArrow;',
572
      '↥'  => '&UpTeeArrow;',
573
      '↦'  => '&map;',
574
      '↧'  => '&DownTeeArrow;',
575
      '↩'  => '&larrhk;',
576
      '↪'  => '&rarrhk;',
577
      '↫'  => '&larrlp;',
578
      '↬'  => '&looparrowright;',
579
      '↭'  => '&harrw;',
580
      '↮'  => '&nleftrightarrow;',
581
      '↰'  => '&Lsh;',
582
      '↱'  => '&rsh;',
583
      '↲'  => '&ldsh;',
584
      '↳'  => '&rdsh;',
585
      '↵'  => '&crarr;',
586
      '↶'  => '&curvearrowleft;',
587
      '↷'  => '&curarr;',
588
      '↺'  => '&olarr;',
589
      '↻'  => '&orarr;',
590
      '↼'  => '&leftharpoonup;',
591
      '↽'  => '&leftharpoondown;',
592
      '↾'  => '&RightUpVector;',
593
      '↿'  => '&uharl;',
594
      '⇀'  => '&rharu;',
595
      '⇁'  => '&rhard;',
596
      '⇂'  => '&RightDownVector;',
597
      '⇃'  => '&dharl;',
598
      '⇄'  => '&rightleftarrows;',
599
      '⇅'  => '&udarr;',
600
      '⇆'  => '&lrarr;',
601
      '⇇'  => '&llarr;',
602
      '⇈'  => '&upuparrows;',
603
      '⇉'  => '&rrarr;',
604
      '⇊'  => '&downdownarrows;',
605
      '⇋'  => '&leftrightharpoons;',
606
      '⇌'  => '&rightleftharpoons;',
607
      '⇍'  => '&nLeftarrow;',
608
      '⇎'  => '&nhArr;',
609
      '⇏'  => '&nrArr;',
610
      '⇐'  => '&DoubleLeftArrow;',
611
      '⇑'  => '&DoubleUpArrow;',
612
      '⇒'  => '&Implies;',
613
      '⇓'  => '&Downarrow;',
614
      '⇔'  => '&hArr;',
615
      '⇕'  => '&Updownarrow;',
616
      '⇖'  => '&nwArr;',
617
      '⇗'  => '&neArr;',
618
      '⇘'  => '&seArr;',
619
      '⇙'  => '&swArr;',
620
      '⇚'  => '&lAarr;',
621
      '⇛'  => '&rAarr;',
622
      '⇝'  => '&zigrarr;',
623
      '⇤'  => '&LeftArrowBar;',
624
      '⇥'  => '&RightArrowBar;',
625
      '⇵'  => '&DownArrowUpArrow;',
626
      '⇽'  => '&loarr;',
627
      '⇾'  => '&roarr;',
628
      '⇿'  => '&hoarr;',
629
      '∀'  => '&forall;',
630
      '∁'  => '&comp;',
631
      '∂'  => '&part;',
632
      '∂̸' => '&npart;',
633
      '∃'  => '&Exists;',
634
      '∄'  => '&nexist;',
635
      '∅'  => '&empty;',
636
      '∇'  => '&nabla;',
637
      '∈'  => '&isinv;',
638
      '∉'  => '&notin;',
639
      '∋'  => '&ReverseElement;',
640
      '∌'  => '&notniva;',
641
      '∏'  => '&prod;',
642
      '∐'  => '&Coproduct;',
643
      '∑'  => '&sum;',
644
      '−'  => '&minus;',
645
      '∓'  => '&MinusPlus;',
646
      '∔'  => '&plusdo;',
647
      '∖'  => '&ssetmn;',
648
      '∗'  => '&lowast;',
649
      '∘'  => '&compfn;',
650
      '√'  => '&Sqrt;',
651
      '∝'  => '&prop;',
652
      '∞'  => '&infin;',
653
      '∟'  => '&angrt;',
654
      '∠'  => '&angle;',
655
      '∠⃒' => '&nang;',
656
      '∡'  => '&angmsd;',
657
      '∢'  => '&angsph;',
658
      '∣'  => '&mid;',
659
      '∤'  => '&nshortmid;',
660
      '∥'  => '&shortparallel;',
661
      '∦'  => '&nparallel;',
662
      '∧'  => '&and;',
663
      '∨'  => '&or;',
664
      '∩'  => '&cap;',
665
      '∩︀' => '&caps;',
666
      '∪'  => '&cup;',
667
      '∪︀' => '&cups',
668
      '∫'  => '&Integral;',
669
      '∬'  => '&Int;',
670
      '∭'  => '&tint;',
671
      '∮'  => '&ContourIntegral;',
672
      '∯'  => '&DoubleContourIntegral;',
673
      '∰'  => '&Cconint;',
674
      '∱'  => '&cwint;',
675
      '∲'  => '&cwconint;',
676
      '∳'  => '&awconint;',
677
      '∴'  => '&there4;',
678
      '∵'  => '&Because;',
679
      '∶'  => '&ratio;',
680
      '∷'  => '&Colon;',
681
      '∸'  => '&minusd;',
682
      '∺'  => '&mDDot;',
683
      '∻'  => '&homtht;',
684
      '∼'  => '&sim;',
685
      '∼⃒' => '&nvsim;',
686
      '∽'  => '&bsim;',
687
      '∽̱' => '&race;',
688
      '∾'  => '&ac;',
689
      '∾̳' => '&acE;',
690
      '∿'  => '&acd;',
691
      '≀'  => '&wr;',
692
      '≁'  => '&NotTilde;',
693
      '≂'  => '&esim;',
694
      '≂̸' => '&nesim;',
695
      '≃'  => '&simeq;',
696
      '≄'  => '&nsime;',
697
      '≅'  => '&TildeFullEqual;',
698
      '≆'  => '&simne;',
699
      '≇'  => '&ncong;',
700
      '≈'  => '&approx;',
701
      '≉'  => '&napprox;',
702
      '≊'  => '&ape;',
703
      '≋'  => '&apid;',
704
      '≋̸' => '&napid;',
705
      '≌'  => '&bcong;',
706
      '≍'  => '&CupCap;',
707
      '≍⃒' => '&nvap;',
708
      '≎'  => '&bump;',
709
      '≎̸' => '&nbump;',
710
      '≏'  => '&HumpEqual;',
711
      '≏̸' => '&nbumpe;',
712
      '≐'  => '&esdot;',
713
      '≐̸' => '&nedot;',
714
      '≑'  => '&doteqdot;',
715
      '≒'  => '&fallingdotseq;',
716
      '≓'  => '&risingdotseq;',
717
      '≔'  => '&coloneq;',
718
      '≕'  => '&eqcolon;',
719
      '≖'  => '&ecir;',
720
      '≗'  => '&circeq;',
721
      '≙'  => '&wedgeq;',
722
      '≚'  => '&veeeq;',
723
      '≜'  => '&triangleq;',
724
      '≟'  => '&equest;',
725
      '≠'  => '&NotEqual;',
726
      '≡'  => '&Congruent;',
727
      '≡⃥' => '&bnequiv;',
728
      '≢'  => '&NotCongruent;',
729
      '≤'  => '&leq;',
730
      '≤⃒' => '&nvle;',
731
      '≥'  => '&ge;',
732
      '≥⃒' => '&nvge;',
733
      '≦'  => '&lE;',
734
      '≦̸' => '&nlE;',
735
      '≧'  => '&geqq;',
736
      '≧̸' => '&NotGreaterFullEqual;',
737
      '≨'  => '&lneqq;',
738
      '≨︀' => '&lvertneqq;',
739
      '≩'  => '&gneqq;',
740
      '≩︀' => '&gvertneqq;',
741
      '≪'  => '&ll;',
742
      '≪̸' => '&nLtv;',
743
      '≪⃒' => '&nLt;',
744
      '≫'  => '&gg;',
745
      '≫̸' => '&NotGreaterGreater;',
746
      '≫⃒' => '&nGt;',
747
      '≬'  => '&between;',
748
      '≭'  => '&NotCupCap;',
749
      '≮'  => '&NotLess;',
750
      '≯'  => '&ngtr;',
751
      '≰'  => '&NotLessEqual;',
752
      '≱'  => '&ngeq;',
753
      '≲'  => '&LessTilde;',
754
      '≳'  => '&GreaterTilde;',
755
      '≴'  => '&nlsim;',
756
      '≵'  => '&ngsim;',
757
      '≶'  => '&lessgtr;',
758
      '≷'  => '&gl;',
759
      '≸'  => '&ntlg;',
760
      '≹'  => '&NotGreaterLess;',
761
      '≺'  => '&prec;',
762
      '≻'  => '&succ;',
763
      '≼'  => '&PrecedesSlantEqual;',
764
      '≽'  => '&succcurlyeq;',
765
      '≾'  => '&precsim;',
766
      '≿'  => '&SucceedsTilde;',
767
      '≿̸' => '&NotSucceedsTilde;',
768
      '⊀'  => '&npr;',
769
      '⊁'  => '&NotSucceeds;',
770
      '⊂'  => '&sub;',
771
      '⊂⃒' => '&vnsub;',
772
      '⊃'  => '&sup;',
773
      '⊃⃒' => '&nsupset;',
774
      '⊄'  => '&nsub;',
775
      '⊅'  => '&nsup;',
776
      '⊆'  => '&SubsetEqual;',
777
      '⊇'  => '&supe;',
778
      '⊈'  => '&NotSubsetEqual;',
779
      '⊉'  => '&NotSupersetEqual;',
780
      '⊊'  => '&subsetneq;',
781
      '⊊︀' => '&vsubne;',
782
      '⊋'  => '&supsetneq;',
783
      '⊋︀' => '&vsupne;',
784
      '⊍'  => '&cupdot;',
785
      '⊎'  => '&UnionPlus;',
786
      '⊏'  => '&sqsub;',
787
      '⊏̸' => '&NotSquareSubset;',
788
      '⊐'  => '&sqsupset;',
789
      '⊐̸' => '&NotSquareSuperset;',
790
      '⊑'  => '&SquareSubsetEqual;',
791
      '⊒'  => '&SquareSupersetEqual;',
792
      '⊓'  => '&sqcap;',
793
      '⊓︀' => '&sqcaps;',
794
      '⊔'  => '&sqcup;',
795
      '⊔︀' => '&sqcups;',
796
      '⊕'  => '&CirclePlus;',
797
      '⊖'  => '&ominus;',
798
      '⊗'  => '&CircleTimes;',
799
      '⊘'  => '&osol;',
800
      '⊙'  => '&CircleDot;',
801
      '⊚'  => '&ocir;',
802
      '⊛'  => '&oast;',
803
      '⊝'  => '&odash;',
804
      '⊞'  => '&boxplus;',
805
      '⊟'  => '&boxminus;',
806
      '⊠'  => '&timesb;',
807
      '⊡'  => '&sdotb;',
808
      '⊢'  => '&vdash;',
809
      '⊣'  => '&dashv;',
810
      '⊤'  => '&DownTee;',
811
      '⊥'  => '&perp;',
812
      '⊧'  => '&models;',
813
      '⊨'  => '&DoubleRightTee;',
814
      '⊩'  => '&Vdash;',
815
      '⊪'  => '&Vvdash;',
816
      '⊫'  => '&VDash;',
817
      '⊬'  => '&nvdash;',
818
      '⊭'  => '&nvDash;',
819
      '⊮'  => '&nVdash;',
820
      '⊯'  => '&nVDash;',
821
      '⊰'  => '&prurel;',
822
      '⊲'  => '&vartriangleleft;',
823
      '⊳'  => '&vrtri;',
824
      '⊴'  => '&LeftTriangleEqual;',
825
      '⊴⃒' => '&nvltrie;',
826
      '⊵'  => '&RightTriangleEqual;',
827
      '⊵⃒' => '&nvrtrie;',
828
      '⊶'  => '&origof;',
829
      '⊷'  => '&imof;',
830
      '⊸'  => '&mumap;',
831
      '⊹'  => '&hercon;',
832
      '⊺'  => '&intcal;',
833
      '⊻'  => '&veebar;',
834
      '⊽'  => '&barvee;',
835
      '⊾'  => '&angrtvb;',
836
      '⊿'  => '&lrtri;',
837
      '⋀'  => '&xwedge;',
838
      '⋁'  => '&xvee;',
839
      '⋂'  => '&bigcap;',
840
      '⋃'  => '&bigcup;',
841
      '⋄'  => '&diamond;',
842
      '⋅'  => '&sdot;',
843
      '⋆'  => '&Star;',
844
      '⋇'  => '&divonx;',
845
      '⋈'  => '&bowtie;',
846
      '⋉'  => '&ltimes;',
847
      '⋊'  => '&rtimes;',
848
      '⋋'  => '&lthree;',
849
      '⋌'  => '&rthree;',
850
      '⋍'  => '&backsimeq;',
851
      '⋎'  => '&curlyvee;',
852
      '⋏'  => '&curlywedge;',
853
      '⋐'  => '&Sub;',
854
      '⋑'  => '&Supset;',
855
      '⋒'  => '&Cap;',
856
      '⋓'  => '&Cup;',
857
      '⋔'  => '&pitchfork;',
858
      '⋕'  => '&epar;',
859
      '⋖'  => '&lessdot;',
860
      '⋗'  => '&gtrdot;',
861
      '⋘'  => '&Ll;',
862
      '⋘̸' => '&nLl;',
863
      '⋙'  => '&Gg;',
864
      '⋙̸' => '&nGg;',
865
      '⋚'  => '&lesseqgtr;',
866
      '⋚︀' => '&lesg;',
867
      '⋛'  => '&gtreqless;',
868
      '⋛︀' => '&gesl;',
869
      '⋞'  => '&curlyeqprec;',
870
      '⋟'  => '&cuesc;',
871
      '⋠'  => '&NotPrecedesSlantEqual;',
872
      '⋡'  => '&NotSucceedsSlantEqual;',
873
      '⋢'  => '&NotSquareSubsetEqual;',
874
      '⋣'  => '&NotSquareSupersetEqual;',
875
      '⋦'  => '&lnsim;',
876
      '⋧'  => '&gnsim;',
877
      '⋨'  => '&precnsim;',
878
      '⋩'  => '&scnsim;',
879
      '⋪'  => '&nltri;',
880
      '⋫'  => '&ntriangleright;',
881
      '⋬'  => '&nltrie;',
882
      '⋭'  => '&NotRightTriangleEqual;',
883
      '⋮'  => '&vellip;',
884
      '⋯'  => '&ctdot;',
885
      '⋰'  => '&utdot;',
886
      '⋱'  => '&dtdot;',
887
      '⋲'  => '&disin;',
888
      '⋳'  => '&isinsv;',
889
      '⋴'  => '&isins;',
890
      '⋵'  => '&isindot;',
891
      '⋵̸' => '&notindot;',
892
      '⋶'  => '&notinvc;',
893
      '⋷'  => '&notinvb;',
894
      '⋹'  => '&isinE;',
895
      '⋹̸' => '&notinE;',
896
      '⋺'  => '&nisd;',
897
      '⋻'  => '&xnis;',
898
      '⋼'  => '&nis;',
899
      '⋽'  => '&notnivc;',
900
      '⋾'  => '&notnivb;',
901
      '⌅'  => '&barwed;',
902
      '⌆'  => '&doublebarwedge;',
903
      '⌈'  => '&lceil;',
904
      '⌉'  => '&RightCeiling;',
905
      '⌊'  => '&LeftFloor;',
906
      '⌋'  => '&RightFloor;',
907
      '⌌'  => '&drcrop;',
908
      '⌍'  => '&dlcrop;',
909
      '⌎'  => '&urcrop;',
910
      '⌏'  => '&ulcrop;',
911
      '⌐'  => '&bnot;',
912
      '⌒'  => '&profline;',
913
      '⌓'  => '&profsurf;',
914
      '⌕'  => '&telrec;',
915
      '⌖'  => '&target;',
916
      '⌜'  => '&ulcorner;',
917
      '⌝'  => '&urcorner;',
918
      '⌞'  => '&llcorner;',
919
      '⌟'  => '&drcorn;',
920
      '⌢'  => '&frown;',
921
      '⌣'  => '&smile;',
922
      '⌭'  => '&cylcty;',
923
      '⌮'  => '&profalar;',
924
      '⌶'  => '&topbot;',
925
      '⌽'  => '&ovbar;',
926
      '⌿'  => '&solbar;',
927
      '⍼'  => '&angzarr;',
928
      '⎰'  => '&lmoust;',
929
      '⎱'  => '&rmoust;',
930
      '⎴'  => '&OverBracket;',
931
      '⎵'  => '&bbrk;',
932
      '⎶'  => '&bbrktbrk;',
933
      '⏜'  => '&OverParenthesis;',
934
      '⏝'  => '&UnderParenthesis;',
935
      '⏞'  => '&OverBrace;',
936
      '⏟'  => '&UnderBrace;',
937
      '⏢'  => '&trpezium;',
938
      '⏧'  => '&elinters;',
939
      '␣'  => '&blank;',
940
      'Ⓢ'  => '&oS;',
941
      '─'  => '&HorizontalLine;',
942
      '│'  => '&boxv;',
943
      '┌'  => '&boxdr;',
944
      '┐'  => '&boxdl;',
945
      '└'  => '&boxur;',
946
      '┘'  => '&boxul;',
947
      '├'  => '&boxvr;',
948
      '┤'  => '&boxvl;',
949
      '┬'  => '&boxhd;',
950
      '┴'  => '&boxhu;',
951
      '┼'  => '&boxvh;',
952
      '═'  => '&boxH;',
953
      '║'  => '&boxV;',
954
      '╒'  => '&boxdR;',
955
      '╓'  => '&boxDr;',
956
      '╔'  => '&boxDR;',
957
      '╕'  => '&boxdL;',
958
      '╖'  => '&boxDl;',
959
      '╗'  => '&boxDL;',
960
      '╘'  => '&boxuR;',
961
      '╙'  => '&boxUr;',
962
      '╚'  => '&boxUR;',
963
      '╛'  => '&boxuL;',
964
      '╜'  => '&boxUl;',
965
      '╝'  => '&boxUL;',
966
      '╞'  => '&boxvR;',
967
      '╟'  => '&boxVr;',
968
      '╠'  => '&boxVR;',
969
      '╡'  => '&boxvL;',
970
      '╢'  => '&boxVl;',
971
      '╣'  => '&boxVL;',
972
      '╤'  => '&boxHd;',
973
      '╥'  => '&boxhD;',
974
      '╦'  => '&boxHD;',
975
      '╧'  => '&boxHu;',
976
      '╨'  => '&boxhU;',
977
      '╩'  => '&boxHU;',
978
      '╪'  => '&boxvH;',
979
      '╫'  => '&boxVh;',
980
      '╬'  => '&boxVH;',
981
      '▀'  => '&uhblk;',
982
      '▄'  => '&lhblk;',
983
      '█'  => '&block;',
984
      '░'  => '&blk14;',
985
      '▒'  => '&blk12;',
986
      '▓'  => '&blk34;',
987
      '□'  => '&Square;',
988
      '▪'  => '&squarf;',
989
      '▫'  => '&EmptyVerySmallSquare;',
990
      '▭'  => '&rect;',
991
      '▮'  => '&marker;',
992
      '▱'  => '&fltns;',
993
      '△'  => '&bigtriangleup;',
994
      '▴'  => '&blacktriangle;',
995
      '▵'  => '&triangle;',
996
      '▸'  => '&blacktriangleright;',
997
      '▹'  => '&rtri;',
998
      '▽'  => '&bigtriangledown;',
999
      '▾'  => '&blacktriangledown;',
1000
      '▿'  => '&triangledown;',
1001
      '◂'  => '&blacktriangleleft;',
1002
      '◃'  => '&ltri;',
1003
      '◊'  => '&lozenge;',
1004
      '○'  => '&cir;',
1005
      '◬'  => '&tridot;',
1006
      '◯'  => '&bigcirc;',
1007
      '◸'  => '&ultri;',
1008
      '◹'  => '&urtri;',
1009
      '◺'  => '&lltri;',
1010
      '◻'  => '&EmptySmallSquare;',
1011
      '◼'  => '&FilledSmallSquare;',
1012
      '★'  => '&starf;',
1013
      '☆'  => '&star;',
1014
      '☎'  => '&phone;',
1015
      '♀'  => '&female;',
1016
      '♂'  => '&male;',
1017
      '♠'  => '&spadesuit;',
1018
      '♣'  => '&clubs;',
1019
      '♥'  => '&hearts;',
1020
      '♦'  => '&diamondsuit;',
1021
      '♪'  => '&sung;',
1022
      '♭'  => '&flat;',
1023
      '♮'  => '&natur;',
1024
      '♯'  => '&sharp;',
1025
      '✓'  => '&check;',
1026
      '✗'  => '&cross;',
1027
      '✠'  => '&maltese;',
1028
      '✶'  => '&sext;',
1029
      '❘'  => '&VerticalSeparator;',
1030
      '❲'  => '&lbbrk;',
1031
      '❳'  => '&rbbrk;',
1032
      '⟈'  => '&bsolhsub;',
1033
      '⟉'  => '&suphsol;',
1034
      '⟦'  => '&LeftDoubleBracket;',
1035
      '⟧'  => '&RightDoubleBracket;',
1036
      '⟨'  => '&langle;',
1037
      '⟩'  => '&RightAngleBracket;',
1038
      '⟪'  => '&Lang;',
1039
      '⟫'  => '&Rang;',
1040
      '⟬'  => '&loang;',
1041
      '⟭'  => '&roang;',
1042
      '⟵'  => '&longleftarrow;',
1043
      '⟶'  => '&LongRightArrow;',
1044
      '⟷'  => '&LongLeftRightArrow;',
1045
      '⟸'  => '&xlArr;',
1046
      '⟹'  => '&DoubleLongRightArrow;',
1047
      '⟺'  => '&xhArr;',
1048
      '⟼'  => '&xmap;',
1049
      '⟿'  => '&dzigrarr;',
1050
      '⤂'  => '&nvlArr;',
1051
      '⤃'  => '&nvrArr;',
1052
      '⤄'  => '&nvHarr;',
1053
      '⤅'  => '&Map;',
1054
      '⤌'  => '&lbarr;',
1055
      '⤍'  => '&bkarow;',
1056
      '⤎'  => '&lBarr;',
1057
      '⤏'  => '&dbkarow;',
1058
      '⤐'  => '&drbkarow;',
1059
      '⤑'  => '&DDotrahd;',
1060
      '⤒'  => '&UpArrowBar;',
1061
      '⤓'  => '&DownArrowBar;',
1062
      '⤖'  => '&Rarrtl;',
1063
      '⤙'  => '&latail;',
1064
      '⤚'  => '&ratail;',
1065
      '⤛'  => '&lAtail;',
1066
      '⤜'  => '&rAtail;',
1067
      '⤝'  => '&larrfs;',
1068
      '⤞'  => '&rarrfs;',
1069
      '⤟'  => '&larrbfs;',
1070
      '⤠'  => '&rarrbfs;',
1071
      '⤣'  => '&nwarhk;',
1072
      '⤤'  => '&nearhk;',
1073
      '⤥'  => '&searhk;',
1074
      '⤦'  => '&swarhk;',
1075
      '⤧'  => '&nwnear;',
1076
      '⤨'  => '&toea;',
1077
      '⤩'  => '&seswar;',
1078
      '⤪'  => '&swnwar;',
1079
      '⤳'  => '&rarrc;',
1080
      '⤳̸' => '&nrarrc;',
1081
      '⤵'  => '&cudarrr;',
1082
      '⤶'  => '&ldca;',
1083
      '⤷'  => '&rdca;',
1084
      '⤸'  => '&cudarrl;',
1085
      '⤹'  => '&larrpl;',
1086
      '⤼'  => '&curarrm;',
1087
      '⤽'  => '&cularrp;',
1088
      '⥅'  => '&rarrpl;',
1089
      '⥈'  => '&harrcir;',
1090
      '⥉'  => '&Uarrocir;',
1091
      '⥊'  => '&lurdshar;',
1092
      '⥋'  => '&ldrushar;',
1093
      '⥎'  => '&LeftRightVector;',
1094
      '⥏'  => '&RightUpDownVector;',
1095
      '⥐'  => '&DownLeftRightVector;',
1096
      '⥑'  => '&LeftUpDownVector;',
1097
      '⥒'  => '&LeftVectorBar;',
1098
      '⥓'  => '&RightVectorBar;',
1099
      '⥔'  => '&RightUpVectorBar;',
1100
      '⥕'  => '&RightDownVectorBar;',
1101
      '⥖'  => '&DownLeftVectorBar;',
1102
      '⥗'  => '&DownRightVectorBar;',
1103
      '⥘'  => '&LeftUpVectorBar;',
1104
      '⥙'  => '&LeftDownVectorBar;',
1105
      '⥚'  => '&LeftTeeVector;',
1106
      '⥛'  => '&RightTeeVector;',
1107
      '⥜'  => '&RightUpTeeVector;',
1108
      '⥝'  => '&RightDownTeeVector;',
1109
      '⥞'  => '&DownLeftTeeVector;',
1110
      '⥟'  => '&DownRightTeeVector;',
1111
      '⥠'  => '&LeftUpTeeVector;',
1112
      '⥡'  => '&LeftDownTeeVector;',
1113
      '⥢'  => '&lHar;',
1114
      '⥣'  => '&uHar;',
1115
      '⥤'  => '&rHar;',
1116
      '⥥'  => '&dHar;',
1117
      '⥦'  => '&luruhar;',
1118
      '⥧'  => '&ldrdhar;',
1119
      '⥨'  => '&ruluhar;',
1120
      '⥩'  => '&rdldhar;',
1121
      '⥪'  => '&lharul;',
1122
      '⥫'  => '&llhard;',
1123
      '⥬'  => '&rharul;',
1124
      '⥭'  => '&lrhard;',
1125
      '⥮'  => '&udhar;',
1126
      '⥯'  => '&ReverseUpEquilibrium;',
1127
      '⥰'  => '&RoundImplies;',
1128
      '⥱'  => '&erarr;',
1129
      '⥲'  => '&simrarr;',
1130
      '⥳'  => '&larrsim;',
1131
      '⥴'  => '&rarrsim;',
1132
      '⥵'  => '&rarrap;',
1133
      '⥶'  => '&ltlarr;',
1134
      '⥸'  => '&gtrarr;',
1135
      '⥹'  => '&subrarr;',
1136
      '⥻'  => '&suplarr;',
1137
      '⥼'  => '&lfisht;',
1138
      '⥽'  => '&rfisht;',
1139
      '⥾'  => '&ufisht;',
1140
      '⥿'  => '&dfisht;',
1141
      '⦅'  => '&lopar;',
1142
      '⦆'  => '&ropar;',
1143
      '⦋'  => '&lbrke;',
1144
      '⦌'  => '&rbrke;',
1145
      '⦍'  => '&lbrkslu;',
1146
      '⦎'  => '&rbrksld;',
1147
      '⦏'  => '&lbrksld;',
1148
      '⦐'  => '&rbrkslu;',
1149
      '⦑'  => '&langd;',
1150
      '⦒'  => '&rangd;',
1151
      '⦓'  => '&lparlt;',
1152
      '⦔'  => '&rpargt;',
1153
      '⦕'  => '&gtlPar;',
1154
      '⦖'  => '&ltrPar;',
1155
      '⦚'  => '&vzigzag;',
1156
      '⦜'  => '&vangrt;',
1157
      '⦝'  => '&angrtvbd;',
1158
      '⦤'  => '&ange;',
1159
      '⦥'  => '&range;',
1160
      '⦦'  => '&dwangle;',
1161
      '⦧'  => '&uwangle;',
1162
      '⦨'  => '&angmsdaa;',
1163
      '⦩'  => '&angmsdab;',
1164
      '⦪'  => '&angmsdac;',
1165
      '⦫'  => '&angmsdad;',
1166
      '⦬'  => '&angmsdae;',
1167
      '⦭'  => '&angmsdaf;',
1168
      '⦮'  => '&angmsdag;',
1169
      '⦯'  => '&angmsdah;',
1170
      '⦰'  => '&bemptyv;',
1171
      '⦱'  => '&demptyv;',
1172
      '⦲'  => '&cemptyv;',
1173
      '⦳'  => '&raemptyv;',
1174
      '⦴'  => '&laemptyv;',
1175
      '⦵'  => '&ohbar;',
1176
      '⦶'  => '&omid;',
1177
      '⦷'  => '&opar;',
1178
      '⦹'  => '&operp;',
1179
      '⦻'  => '&olcross;',
1180
      '⦼'  => '&odsold;',
1181
      '⦾'  => '&olcir;',
1182
      '⦿'  => '&ofcir;',
1183
      '⧀'  => '&olt;',
1184
      '⧁'  => '&ogt;',
1185
      '⧂'  => '&cirscir;',
1186
      '⧃'  => '&cirE;',
1187
      '⧄'  => '&solb;',
1188
      '⧅'  => '&bsolb;',
1189
      '⧉'  => '&boxbox;',
1190
      '⧍'  => '&trisb;',
1191
      '⧎'  => '&rtriltri;',
1192
      '⧏'  => '&LeftTriangleBar;',
1193
      '⧏̸' => '&NotLeftTriangleBar;',
1194
      '⧐'  => '&RightTriangleBar;',
1195
      '⧐̸' => '&NotRightTriangleBar;',
1196
      '⧜'  => '&iinfin;',
1197
      '⧝'  => '&infintie;',
1198
      '⧞'  => '&nvinfin;',
1199
      '⧣'  => '&eparsl;',
1200
      '⧤'  => '&smeparsl;',
1201
      '⧥'  => '&eqvparsl;',
1202
      '⧫'  => '&lozf;',
1203
      '⧴'  => '&RuleDelayed;',
1204
      '⧶'  => '&dsol;',
1205
      '⨀'  => '&xodot;',
1206
      '⨁'  => '&bigoplus;',
1207
      '⨂'  => '&bigotimes;',
1208
      '⨄'  => '&biguplus;',
1209
      '⨆'  => '&bigsqcup;',
1210
      '⨌'  => '&iiiint;',
1211
      '⨍'  => '&fpartint;',
1212
      '⨐'  => '&cirfnint;',
1213
      '⨑'  => '&awint;',
1214
      '⨒'  => '&rppolint;',
1215
      '⨓'  => '&scpolint;',
1216
      '⨔'  => '&npolint;',
1217
      '⨕'  => '&pointint;',
1218
      '⨖'  => '&quatint;',
1219
      '⨗'  => '&intlarhk;',
1220
      '⨢'  => '&pluscir;',
1221
      '⨣'  => '&plusacir;',
1222
      '⨤'  => '&simplus;',
1223
      '⨥'  => '&plusdu;',
1224
      '⨦'  => '&plussim;',
1225
      '⨧'  => '&plustwo;',
1226
      '⨩'  => '&mcomma;',
1227
      '⨪'  => '&minusdu;',
1228
      '⨭'  => '&loplus;',
1229
      '⨮'  => '&roplus;',
1230
      '⨯'  => '&Cross;',
1231
      '⨰'  => '&timesd;',
1232
      '⨱'  => '&timesbar;',
1233
      '⨳'  => '&smashp;',
1234
      '⨴'  => '&lotimes;',
1235
      '⨵'  => '&rotimes;',
1236
      '⨶'  => '&otimesas;',
1237
      '⨷'  => '&Otimes;',
1238
      '⨸'  => '&odiv;',
1239
      '⨹'  => '&triplus;',
1240
      '⨺'  => '&triminus;',
1241
      '⨻'  => '&tritime;',
1242
      '⨼'  => '&iprod;',
1243
      '⨿'  => '&amalg;',
1244
      '⩀'  => '&capdot;',
1245
      '⩂'  => '&ncup;',
1246
      '⩃'  => '&ncap;',
1247
      '⩄'  => '&capand;',
1248
      '⩅'  => '&cupor;',
1249
      '⩆'  => '&cupcap;',
1250
      '⩇'  => '&capcup;',
1251
      '⩈'  => '&cupbrcap;',
1252
      '⩉'  => '&capbrcup;',
1253
      '⩊'  => '&cupcup;',
1254
      '⩋'  => '&capcap;',
1255
      '⩌'  => '&ccups;',
1256
      '⩍'  => '&ccaps;',
1257
      '⩐'  => '&ccupssm;',
1258
      '⩓'  => '&And;',
1259
      '⩔'  => '&Or;',
1260
      '⩕'  => '&andand;',
1261
      '⩖'  => '&oror;',
1262
      '⩗'  => '&orslope;',
1263
      '⩘'  => '&andslope;',
1264
      '⩚'  => '&andv;',
1265
      '⩛'  => '&orv;',
1266
      '⩜'  => '&andd;',
1267
      '⩝'  => '&ord;',
1268
      '⩟'  => '&wedbar;',
1269
      '⩦'  => '&sdote;',
1270
      '⩪'  => '&simdot;',
1271
      '⩭'  => '&congdot;',
1272
      '⩭̸' => '&ncongdot;',
1273
      '⩮'  => '&easter;',
1274
      '⩯'  => '&apacir;',
1275
      '⩰'  => '&apE;',
1276
      '⩰̸' => '&napE;',
1277
      '⩱'  => '&eplus;',
1278
      '⩲'  => '&pluse;',
1279
      '⩳'  => '&Esim;',
1280
      '⩴'  => '&Colone;',
1281
      '⩵'  => '&Equal;',
1282
      '⩷'  => '&ddotseq;',
1283
      '⩸'  => '&equivDD;',
1284
      '⩹'  => '&ltcir;',
1285
      '⩺'  => '&gtcir;',
1286
      '⩻'  => '&ltquest;',
1287
      '⩼'  => '&gtquest;',
1288
      '⩽'  => '&les;',
1289
      '⩽̸' => '&nles;',
1290
      '⩾'  => '&ges;',
1291
      '⩾̸' => '&nges;',
1292
      '⩿'  => '&lesdot;',
1293
      '⪀'  => '&gesdot;',
1294
      '⪁'  => '&lesdoto;',
1295
      '⪂'  => '&gesdoto;',
1296
      '⪃'  => '&lesdotor;',
1297
      '⪄'  => '&gesdotol;',
1298
      '⪅'  => '&lap;',
1299
      '⪆'  => '&gap;',
1300
      '⪇'  => '&lne;',
1301
      '⪈'  => '&gne;',
1302
      '⪉'  => '&lnap;',
1303
      '⪊'  => '&gnap;',
1304
      '⪋'  => '&lesseqqgtr;',
1305
      '⪌'  => '&gEl;',
1306
      '⪍'  => '&lsime;',
1307
      '⪎'  => '&gsime;',
1308
      '⪏'  => '&lsimg;',
1309
      '⪐'  => '&gsiml;',
1310
      '⪑'  => '&lgE;',
1311
      '⪒'  => '&glE;',
1312
      '⪓'  => '&lesges;',
1313
      '⪔'  => '&gesles;',
1314
      '⪕'  => '&els;',
1315
      '⪖'  => '&egs;',
1316
      '⪗'  => '&elsdot;',
1317
      '⪘'  => '&egsdot;',
1318
      '⪙'  => '&el;',
1319
      '⪚'  => '&eg;',
1320
      '⪝'  => '&siml;',
1321
      '⪞'  => '&simg;',
1322
      '⪟'  => '&simlE;',
1323
      '⪠'  => '&simgE;',
1324
      '⪡'  => '&LessLess;',
1325
      '⪡̸' => '&NotNestedLessLess;',
1326
      '⪢'  => '&GreaterGreater;',
1327
      '⪢̸' => '&NotNestedGreaterGreater;',
1328
      '⪤'  => '&glj;',
1329
      '⪥'  => '&gla;',
1330
      '⪦'  => '&ltcc;',
1331
      '⪧'  => '&gtcc;',
1332
      '⪨'  => '&lescc;',
1333
      '⪩'  => '&gescc;',
1334
      '⪪'  => '&smt;',
1335
      '⪫'  => '&lat;',
1336
      '⪬'  => '&smte;',
1337
      '⪬︀' => '&smtes;',
1338
      '⪭'  => '&late;',
1339
      '⪭︀' => '&lates;',
1340
      '⪮'  => '&bumpE;',
1341
      '⪯'  => '&preceq;',
1342
      '⪯̸' => '&NotPrecedesEqual;',
1343
      '⪰'  => '&SucceedsEqual;',
1344
      '⪰̸' => '&NotSucceedsEqual;',
1345
      '⪳'  => '&prE;',
1346
      '⪴'  => '&scE;',
1347
      '⪵'  => '&precneqq;',
1348
      '⪶'  => '&scnE;',
1349
      '⪷'  => '&precapprox;',
1350
      '⪸'  => '&succapprox;',
1351
      '⪹'  => '&precnapprox;',
1352
      '⪺'  => '&succnapprox;',
1353
      '⪻'  => '&Pr;',
1354
      '⪼'  => '&Sc;',
1355
      '⪽'  => '&subdot;',
1356
      '⪾'  => '&supdot;',
1357
      '⪿'  => '&subplus;',
1358
      '⫀'  => '&supplus;',
1359
      '⫁'  => '&submult;',
1360
      '⫂'  => '&supmult;',
1361
      '⫃'  => '&subedot;',
1362
      '⫄'  => '&supedot;',
1363
      '⫅'  => '&subE;',
1364
      '⫅̸' => '&nsubE;',
1365
      '⫆'  => '&supseteqq;',
1366
      '⫆̸' => '&nsupseteqq;',
1367
      '⫇'  => '&subsim;',
1368
      '⫈'  => '&supsim;',
1369
      '⫋'  => '&subsetneqq;',
1370
      '⫋︀' => '&vsubnE;',
1371
      '⫌'  => '&supnE;',
1372
      '⫌︀' => '&varsupsetneqq;',
1373
      '⫏'  => '&csub;',
1374
      '⫐'  => '&csup;',
1375
      '⫑'  => '&csube;',
1376
      '⫒'  => '&csupe;',
1377
      '⫓'  => '&subsup;',
1378
      '⫔'  => '&supsub;',
1379
      '⫕'  => '&subsub;',
1380
      '⫖'  => '&supsup;',
1381
      '⫗'  => '&suphsub;',
1382
      '⫘'  => '&supdsub;',
1383
      '⫙'  => '&forkv;',
1384
      '⫚'  => '&topfork;',
1385
      '⫛'  => '&mlcp;',
1386
      '⫤'  => '&Dashv;',
1387
      '⫦'  => '&Vdashl;',
1388
      '⫧'  => '&Barv;',
1389
      '⫨'  => '&vBar;',
1390
      '⫩'  => '&vBarv;',
1391
      '⫫'  => '&Vbar;',
1392
      '⫬'  => '&Not;',
1393
      '⫭'  => '&bNot;',
1394
      '⫮'  => '&rnmid;',
1395
      '⫯'  => '&cirmid;',
1396
      '⫰'  => '&midcir;',
1397
      '⫱'  => '&topcir;',
1398
      '⫲'  => '&nhpar;',
1399
      '⫳'  => '&parsim;',
1400
      '⫽'  => '&parsl;',
1401
      '⫽⃥' => '&nparsl;',
1402
      'ff'  => '&fflig;',
1403
      'fi'  => '&filig;',
1404
      'fl'  => '&fllig;',
1405
      'ffi'  => '&ffilig;',
1406
      'ffl'  => '&ffllig;',
1407
      '𝒜' => '&Ascr;',
1408
      '𝒞' => '&Cscr;',
1409
      '𝒟' => '&Dscr;',
1410
      '𝒢' => '&Gscr;',
1411
      '𝒥' => '&Jscr;',
1412
      '𝒦' => '&Kscr;',
1413
      '𝒩' => '&Nscr;',
1414
      '𝒪' => '&Oscr;',
1415
      '𝒫' => '&Pscr;',
1416
      '𝒬' => '&Qscr;',
1417
      '𝒮' => '&Sscr;',
1418
      '𝒯' => '&Tscr;',
1419
      '𝒰' => '&Uscr;',
1420
      '𝒱' => '&Vscr;',
1421
      '𝒲' => '&Wscr;',
1422
      '𝒳' => '&Xscr;',
1423
      '𝒴' => '&Yscr;',
1424
      '𝒵' => '&Zscr;',
1425
      '𝒶' => '&ascr;',
1426
      '𝒷' => '&bscr;',
1427
      '𝒸' => '&cscr;',
1428
      '𝒹' => '&dscr;',
1429
      '𝒻' => '&fscr;',
1430
      '𝒽' => '&hscr;',
1431
      '𝒾' => '&iscr;',
1432
      '𝒿' => '&jscr;',
1433
      '𝓀' => '&kscr;',
1434
      '𝓁' => '&lscr;',
1435
      '𝓂' => '&mscr;',
1436
      '𝓃' => '&nscr;',
1437
      '𝓅' => '&pscr;',
1438
      '𝓆' => '&qscr;',
1439
      '𝓇' => '&rscr;',
1440
      '𝓈' => '&sscr;',
1441
      '𝓉' => '&tscr;',
1442
      '𝓊' => '&uscr;',
1443
      '𝓋' => '&vscr;',
1444
      '𝓌' => '&wscr;',
1445
      '𝓍' => '&xscr;',
1446
      '𝓎' => '&yscr;',
1447
      '𝓏' => '&zscr;',
1448
      '𝔄' => '&Afr;',
1449
      '𝔅' => '&Bfr;',
1450
      '𝔇' => '&Dfr;',
1451
      '𝔈' => '&Efr;',
1452
      '𝔉' => '&Ffr;',
1453
      '𝔊' => '&Gfr;',
1454
      '𝔍' => '&Jfr;',
1455
      '𝔎' => '&Kfr;',
1456
      '𝔏' => '&Lfr;',
1457
      '𝔐' => '&Mfr;',
1458
      '𝔑' => '&Nfr;',
1459
      '𝔒' => '&Ofr;',
1460
      '𝔓' => '&Pfr;',
1461
      '𝔔' => '&Qfr;',
1462
      '𝔖' => '&Sfr;',
1463
      '𝔗' => '&Tfr;',
1464
      '𝔘' => '&Ufr;',
1465
      '𝔙' => '&Vfr;',
1466
      '𝔚' => '&Wfr;',
1467
      '𝔛' => '&Xfr;',
1468
      '𝔜' => '&Yfr;',
1469
      '𝔞' => '&afr;',
1470
      '𝔟' => '&bfr;',
1471
      '𝔠' => '&cfr;',
1472
      '𝔡' => '&dfr;',
1473
      '𝔢' => '&efr;',
1474
      '𝔣' => '&ffr;',
1475
      '𝔤' => '&gfr;',
1476
      '𝔥' => '&hfr;',
1477
      '𝔦' => '&ifr;',
1478
      '𝔧' => '&jfr;',
1479
      '𝔨' => '&kfr;',
1480
      '𝔩' => '&lfr;',
1481
      '𝔪' => '&mfr;',
1482
      '𝔫' => '&nfr;',
1483
      '𝔬' => '&ofr;',
1484
      '𝔭' => '&pfr;',
1485
      '𝔮' => '&qfr;',
1486
      '𝔯' => '&rfr;',
1487
      '𝔰' => '&sfr;',
1488
      '𝔱' => '&tfr;',
1489
      '𝔲' => '&ufr;',
1490
      '𝔳' => '&vfr;',
1491
      '𝔴' => '&wfr;',
1492
      '𝔵' => '&xfr;',
1493
      '𝔶' => '&yfr;',
1494
      '𝔷' => '&zfr;',
1495
      '𝔸' => '&Aopf;',
1496
      '𝔹' => '&Bopf;',
1497
      '𝔻' => '&Dopf;',
1498
      '𝔼' => '&Eopf;',
1499
      '𝔽' => '&Fopf;',
1500
      '𝔾' => '&Gopf;',
1501
      '𝕀' => '&Iopf;',
1502
      '𝕁' => '&Jopf;',
1503
      '𝕂' => '&Kopf;',
1504
      '𝕃' => '&Lopf;',
1505
      '𝕄' => '&Mopf;',
1506
      '𝕆' => '&Oopf;',
1507
      '𝕊' => '&Sopf;',
1508
      '𝕋' => '&Topf;',
1509
      '𝕌' => '&Uopf;',
1510
      '𝕍' => '&Vopf;',
1511
      '𝕎' => '&Wopf;',
1512
      '𝕏' => '&Xopf;',
1513
      '𝕐' => '&Yopf;',
1514
      '𝕒' => '&aopf;',
1515
      '𝕓' => '&bopf;',
1516
      '𝕔' => '&copf;',
1517
      '𝕕' => '&dopf;',
1518
      '𝕖' => '&eopf;',
1519
      '𝕗' => '&fopf;',
1520
      '𝕘' => '&gopf;',
1521
      '𝕙' => '&hopf;',
1522
      '𝕚' => '&iopf;',
1523
      '𝕛' => '&jopf;',
1524
      '𝕜' => '&kopf;',
1525
      '𝕝' => '&lopf;',
1526
      '𝕞' => '&mopf;',
1527
      '𝕟' => '&nopf;',
1528
      '𝕠' => '&oopf;',
1529
      '𝕡' => '&popf;',
1530
      '𝕢' => '&qopf;',
1531
      '𝕣' => '&ropf;',
1532
      '𝕤' => '&sopf;',
1533
      '𝕥' => '&topf;',
1534
      '𝕦' => '&uopf;',
1535
      '𝕧' => '&vopf;',
1536
      '𝕨' => '&wopf;',
1537
      '𝕩' => '&xopf;',
1538
      '𝕪' => '&yopf;',
1539
      '𝕫' => '&zopf;',
1540
  );
1541
1542
  /**
1543
   * List of never allowed regex replacements.
1544
   *
1545
   * @var  array
1546
   */
1547
  private static $_never_allowed_regex = array(
1548
    // default javascript
1549
    'javascript\s*:',
1550
    // default javascript
1551
    '(document|(document\.)?window)\.(location|on\w*)',
1552
    // Java: jar-protocol is an XSS hazard
1553
    'jar\s*:',
1554
    // Mac (will not run the script, but open it in AppleScript Editor)
1555
    'applescript\s*:',
1556
    // IE: https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet#VBscript_in_an_image
1557
    'vbscript\s*:',
1558
    // IE, surprise!
1559
    'wscript\s*:',
1560
    // IE
1561
    'jscript\s*:',
1562
    // IE: https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet#VBscript_in_an_image
1563
    'vbs\s*:',
1564
    // https://html5sec.org/#behavior
1565
    'behavior\s:',
1566
    // ?
1567
    'Redirect\s+30\d',
1568
    // data-attribute + base64
1569
    "([\"'])?data\s*:[^\\1]*?base64[^\\1]*?,[^\\1]*?\\1?",
1570
    // remove Netscape 4 JS entities
1571
    '&\s*\{[^}]*(\}\s*;?|$)',
1572
    // old IE, old Netscape
1573
    'expression\s*(\(|&\#40;)',
1574
    // old Netscape
1575
    'mocha\s*:',
1576
    // old Netscape
1577
    'livescript\s*:',
1578
    // default view source
1579
    'view-source\s*:',
1580
  );
1581
1582
  /**
1583
   * List of never allowed strings, afterwards.
1584
   *
1585
   * @var array
1586
   */
1587
  private static $_never_allowed_str_afterwards = array(
1588
      'FSCommand',
1589
      'onAbort',
1590
      'onActivate',
1591
      'onAttribute',
1592
      'onAfterPrint',
1593
      'onAfterScriptExecute',
1594
      'onAfterUpdate',
1595
      'onAnimationEnd',
1596
      'onAnimationIteration',
1597
      'onAnimationStart',
1598
      'onAriaRequest',
1599
      'onAutoComplete',
1600
      'onAutoCompleteError',
1601
      'onBeforeActivate',
1602
      'onBeforeCopy',
1603
      'onBeforeCut',
1604
      'onBeforeDeactivate',
1605
      'onBeforeEditFocus',
1606
      'onBeforePaste',
1607
      'onBeforePrint',
1608
      'onBeforeScriptExecute',
1609
      'onBeforeUnload',
1610
      'onBeforeUpdate',
1611
      'onBegin',
1612
      'onBlur',
1613
      'onBounce',
1614
      'onCancel',
1615
      'onCanPlay',
1616
      'onCanPlayThrough',
1617
      'onCellChange',
1618
      'onChange',
1619
      'onClick',
1620
      'onClose',
1621
      'onCommand',
1622
      'onCompassNeedsCalibration',
1623
      'onContextMenu',
1624
      'onControlSelect',
1625
      'onCopy',
1626
      'onCueChange',
1627
      'onCut',
1628
      'onDataAvailable',
1629
      'onDataSetChanged',
1630
      'onDataSetComplete',
1631
      'onDblClick',
1632
      'onDeactivate',
1633
      'onDeviceLight',
1634
      'onDeviceMotion',
1635
      'onDeviceOrientation',
1636
      'onDeviceProximity',
1637
      'onDrag',
1638
      'onDragDrop',
1639
      'onDragEnd',
1640
      'onDragEnter',
1641
      'onDragLeave',
1642
      'onDragOver',
1643
      'onDragStart',
1644
      'onDrop',
1645
      'onDurationChange',
1646
      'onEmptied',
1647
      'onEnd',
1648
      'onEnded',
1649
      'onError',
1650
      'onErrorUpdate',
1651
      'onExit',
1652
      'onFilterChange',
1653
      'onFinish',
1654
      'onFocus',
1655
      'onFocusIn',
1656
      'onFocusOut',
1657
      'onFormChange',
1658
      'onFormInput',
1659
      'onFullScreenChange',
1660
      'onFullScreenError',
1661
      'onGotPointerCapture',
1662
      'onHashChange',
1663
      'onHelp',
1664
      'onInput',
1665
      'onInvalid',
1666
      'onKeyDown',
1667
      'onKeyPress',
1668
      'onKeyUp',
1669
      'onLanguageChange',
1670
      'onLayoutComplete',
1671
      'onLoad',
1672
      'onLoadedData',
1673
      'onLoadedMetaData',
1674
      'onLoadStart',
1675
      'onLoseCapture',
1676
      'onLostPointerCapture',
1677
      'onMediaComplete',
1678
      'onMediaError',
1679
      'onMessage',
1680
      'onMouseDown',
1681
      'onMouseEnter',
1682
      'onMouseLeave',
1683
      'onMouseMove',
1684
      'onMouseOut',
1685
      'onMouseOver',
1686
      'onMouseUp',
1687
      'onMouseWheel',
1688
      'onMove',
1689
      'onMoveEnd',
1690
      'onMoveStart',
1691
      'onMozFullScreenChange',
1692
      'onMozFullScreenError',
1693
      'onMozPointerLockChange',
1694
      'onMozPointerLockError',
1695
      'onMsContentZoom',
1696
      'onMsFullScreenChange',
1697
      'onMsFullScreenError',
1698
      'onMsGestureChange',
1699
      'onMsGestureDoubleTap',
1700
      'onMsGestureEnd',
1701
      'onMsGestureHold',
1702
      'onMsGestureStart',
1703
      'onMsGestureTap',
1704
      'onMsGotPointerCapture',
1705
      'onMsInertiaStart',
1706
      'onMsLostPointerCapture',
1707
      'onMsManipulationStateChanged',
1708
      'onMsPointerCancel',
1709
      'onMsPointerDown',
1710
      'onMsPointerEnter',
1711
      'onMsPointerLeave',
1712
      'onMsPointerMove',
1713
      'onMsPointerOut',
1714
      'onMsPointerOver',
1715
      'onMsPointerUp',
1716
      'onMsSiteModeJumpListItemRemoved',
1717
      'onMsThumbnailClick',
1718
      'onOffline',
1719
      'onOnline',
1720
      'onOutOfSync',
1721
      'onPage',
1722
      'onPageHide',
1723
      'onPageShow',
1724
      'onPaste',
1725
      'onPause',
1726
      'onPlay',
1727
      'onPlaying',
1728
      'onPointerCancel',
1729
      'onPointerDown',
1730
      'onPointerEnter',
1731
      'onPointerLeave',
1732
      'onPointerLockChange',
1733
      'onPointerLockError',
1734
      'onPointerMove',
1735
      'onPointerOut',
1736
      'onPointerOver',
1737
      'onPointerUp',
1738
      'onPopState',
1739
      'onProgress',
1740
      'onPropertyChange',
1741
      'onRateChange',
1742
      'onReadyStateChange',
1743
      'onReceived',
1744
      'onRepeat',
1745
      'onReset',
1746
      'onResize',
1747
      'onResizeEnd',
1748
      'onResizeStart',
1749
      'onResume',
1750
      'onReverse',
1751
      'onRowDelete',
1752
      'onRowEnter',
1753
      'onRowExit',
1754
      'onRowInserted',
1755
      'onRowsDelete',
1756
      'onRowsEnter',
1757
      'onRowsExit',
1758
      'onRowsInserted',
1759
      'onScroll',
1760
      'onSearch',
1761
      'onSeek',
1762
      'onSeeked',
1763
      'onSeeking',
1764
      'onSelect',
1765
      'onSelectionChange',
1766
      'onSelectStart',
1767
      'onStalled',
1768
      'onStorage',
1769
      'onStorageCommit',
1770
      'onStart',
1771
      'onStop',
1772
      'onShow',
1773
      'onSyncRestored',
1774
      'onSubmit',
1775
      'onSuspend',
1776
      'onSynchRestored',
1777
      'onTimeError',
1778
      'onTimeUpdate',
1779
      'onTrackChange',
1780
      'onTransitionEnd',
1781
      'onToggle',
1782
      'onUnload',
1783
      'onURLFlip',
1784
      'onUserProximity',
1785
      'onVolumeChange',
1786
      'onWaiting',
1787
      'onWebKitAnimationEnd',
1788
      'onWebKitAnimationIteration',
1789
      'onWebKitAnimationStart',
1790
      'onWebKitFullScreenChange',
1791
      'onWebKitFullScreenError',
1792
      'onWebKitTransitionEnd',
1793
      'onWheel',
1794
      'seekSegmentTime',
1795
      'userid',
1796
      'datasrc',
1797
      'datafld',
1798
      'dataformatas',
1799
      'ev:handler',
1800
      'ev:event',
1801
      '0;url',
1802
  );
1803
1804
  /**
1805
   * https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet#Event_Handlers
1806
   *
1807
   * @var array
1808
   */
1809
  private $_evil_attributes = array(
1810
      'on\w*',
1811
      'style',
1812
      'xmlns',
1813
      'formaction',
1814
      'form',
1815
      'xlink:href',
1816
      'seekSegmentTime',
1817
      'FSCommand',
1818
      'eval',
1819
  );
1820
1821
  /**
1822
   * XSS Hash - random Hash for protecting URLs.
1823
   *
1824
   * @var  string
1825
   */
1826
  private $_xss_hash;
1827
1828
  /**
1829
   * The replacement-string for not allowed strings.
1830
   *
1831
   * @var string
1832
   */
1833
  private $_replacement = '';
1834
1835
  /**
1836
   * List of never allowed strings.
1837
   *
1838
   * @var  array
1839
   */
1840
  private $_never_allowed_str = array();
1841
1842
  /**
1843
   * If your DB (MySQL) encoding is "utf8" and not "utf8mb4", then
1844
   * you can't save 4-Bytes chars from UTF-8 and someone can create stored XSS-attacks.
1845
   *
1846
   * @var bool
1847
   */
1848
  private $_stripe_4byte_chars = false;
1849
1850
  /**
1851
   * @var bool|null
1852
   */
1853
  private $xss_found = null;
1854
1855
  /**
1856
   * __construct()
1857
   */
1858
  public function __construct()
1859
  {
1860
    $this->_initNeverAllowedStr();
1861
  }
1862
1863
  /**
1864
   * Compact exploded words.
1865
   *
1866
   * <p>
1867
   * <br />
1868
   * INFO: Callback method for xss_clean() to remove whitespace from things like 'j a v a s c r i p t'.
1869
   * </p>
1870
   *
1871
   * @param  array $matches
1872
   *
1873
   * @return  string
1874
   */
1875
  private function _compact_exploded_words_callback($matches)
1876
  {
1877
    return preg_replace('/(?:\s+|"|\042|\'|\047|\+)*+/', '', $matches[1]) . $matches[2];
1878
  }
1879
1880
  /**
1881
   * HTML-Entity decode callback.
1882
   *
1883
   * @param array $match
1884
   *
1885
   * @return string
1886
   */
1887
  private function _decode_entity($match)
1888
  {
1889
    // init
1890
    $this->_xss_hash();
1891
1892
    $match = $match[0];
1893
1894
    // protect GET variables in URLs
1895
    $match = preg_replace('|\?([a-z\_0-9\-]+)\=([a-z\_0-9\-/]+)|i', $this->_xss_hash . '::GET_FIRST' . '\\1=\\2', $match);
1896
    $match = preg_replace('|\&([a-z\_0-9\-]+)\=([a-z\_0-9\-/]+)|i', $this->_xss_hash . '::GET_NEXT' . '\\1=\\2', $match);
1897
1898
    // un-protect URL GET vars
1899
    return str_replace(
1900
        array(
1901
            $this->_xss_hash . '::GET_FIRST',
1902
            $this->_xss_hash . '::GET_NEXT',
1903
        ),
1904
        array(
1905
            '?',
1906
            '&',
1907
        ),
1908
        $this->_entity_decode($match)
1909
    );
1910
  }
1911
1912
  /**
1913
   * @param string $str
1914
   *
1915
   * @return mixed
1916
   */
1917
  private function _do($str)
1918
  {
1919
    $str = (string)$str;
1920
    $strInt = (int)$str;
1921
    $strFloat = (float)$str;
1922
    if (
1923
        !$str
1924
        ||
1925
        "$strInt" == $str
1926
        ||
1927
        "$strFloat" == $str
1928
    ) {
1929
1930
      // no xss found
1931
      if ($this->xss_found !== true) {
1932
        $this->xss_found = false;
1933
      }
1934
1935
      return $str;
1936
    }
1937
1938
    // removes all non-UTF-8 characters
1939
    // &&
1940
    // remove NULL characters (ignored by some browsers)
1941
    $str = UTF8::clean($str, true, true, false);
1942
1943
    // decode UTF-7 characters
1944
    $str = $this->_repack_utf7($str);
1945
1946
    // decode the string
1947
    $str = $this->_decode_string($str);
1948
1949
    // remove all >= 4-Byte chars if needed
1950
    if ($this->_stripe_4byte_chars === true) {
1951
      $str = preg_replace('/[\x{10000}-\x{10FFFF}]/u', '', $str);
1952
    }
1953
1954
    // backup the string (for later comparision)
1955
    $str_backup = $str;
1956
1957
    // remove strings that are never allowed
1958
    $str = $this->_do_never_allowed($str);
1959
1960
    // corrects words before the browser will do it
1961
    $str = $this->_compact_exploded_javascript($str);
1962
1963
    // remove disallowed javascript calls in links, images etc.
1964
    $str = $this->_remove_disallowed_javascript($str);
1965
1966
    // remove evil attributes such as style, onclick and xmlns
1967
    $str = $this->_remove_evil_attributes($str);
1968
1969
    // sanitize naughty HTML elements
1970
    $str = $this->_sanitize_naughty_html($str);
1971
1972
    // sanitize naughty JavaScript elements
1973
    $str = $this->_sanitize_naughty_javascript($str);
1974
1975
    // final clean up
1976
    //
1977
    // -> This adds a bit of extra precaution in case something got through the above filters.
1978
    $str = $this->_do_never_allowed_afterwards($str);
1979
1980
    // check for xss
1981
    if ($this->xss_found !== true) {
1982
      $this->xss_found = !($str_backup === $str);
1983
    }
1984
1985
    return $str;
1986
  }
1987
1988
  /**
1989
   * Remove never allowed strings.
1990
   *
1991
   * @param string $str
1992
   *
1993
   * @return string
1994
   */
1995
  private function _do_never_allowed($str)
1996
  {
1997
    static $NEVER_ALLOWED_CACHE = array();
1998
    $NEVER_ALLOWED_CACHE['keys'] = null;
1999
    $NEVER_ALLOWED_CACHE['regex'] = null;
2000
2001
    if (null === $NEVER_ALLOWED_CACHE['keys']) {
2002
      $NEVER_ALLOWED_CACHE['keys'] = array_keys($this->_never_allowed_str);
2003
    }
2004
    $str = str_ireplace($NEVER_ALLOWED_CACHE['keys'], $this->_never_allowed_str, $str);
2005
2006
    if (null === $NEVER_ALLOWED_CACHE['regex']) {
2007
      $NEVER_ALLOWED_CACHE['regex'] = implode('|', self::$_never_allowed_regex);
2008
    }
2009
    $str = preg_replace('#' . $NEVER_ALLOWED_CACHE['regex'] . '#is', $this->_replacement, $str);
2010
2011
    return (string)$str;
2012
  }
2013
2014
  /**
2015
   * Remove never allowed string, afterwards.
2016
   *
2017
   * <p>
2018
   * <br />
2019
   * INFO: clean-up also some string, if there is no html-tag
2020
   * </p>
2021
   *
2022
   * @param string $str
2023
   *
2024
   * @return  string
2025
   */
2026
  private function _do_never_allowed_afterwards($str)
2027
  {
2028
    static $NEVER_ALLOWED_STR_AFTERWARDS_CACHE;
2029
2030
    if (null === $NEVER_ALLOWED_STR_AFTERWARDS_CACHE) {
2031
      foreach (self::$_never_allowed_str_afterwards as &$neverAllowedStr) {
2032
        $neverAllowedStr .= '.*=';
2033
      }
2034
2035
      $NEVER_ALLOWED_STR_AFTERWARDS_CACHE = implode('|', self::$_never_allowed_str_afterwards);
2036
    }
2037
2038
    $str = preg_replace('#' . $NEVER_ALLOWED_STR_AFTERWARDS_CACHE . '#isU', $this->_replacement, $str);
2039
2040
    return (string)$str;
2041
  }
2042
2043
  /**
2044
   * Entity-decoding.
2045
   *
2046
   * @param string $str
2047
   *
2048
   * @return string
2049
   */
2050
  private function _entity_decode($str)
2051
  {
2052
    static $HTML_ENTITIES_CACHE;
2053
2054
    /** @noinspection UsageOfSilenceOperatorInspection */
2055
    /** @noinspection PhpUsageOfSilenceOperatorInspection */
2056
    // HHVM dons't support "ENT_DISALLOWED" && "ENT_SUBSTITUTE"
2057
    $flags = Bootup::is_php('5.4') ?
2058
        ENT_QUOTES | ENT_HTML5 | @ENT_DISALLOWED | @ENT_SUBSTITUTE :
2059
        ENT_QUOTES;
2060
2061
    // decode
2062
    if (strpos($str, $this->_xss_hash) !== false) {
2063
      $str = UTF8::html_entity_decode($str, $flags);
2064
    } else {
2065
      $str = UTF8::rawurldecode($str);
2066
    }
2067
2068
    // decode-again, for e.g. HHVM, PHP 5.3, miss configured applications ...
2069
    if (preg_match_all('/&[A-Za-z]{2,}[;]{0}/', $str, $matches)) {
2070
2071
      if (null === $HTML_ENTITIES_CACHE) {
2072
2073
        // links:
2074
        // - http://dev.w3.org/html5/html-author/charref
2075
        // - http://www.w3schools.com/charsets/ref_html_entities_n.asp
2076
        $entitiesSecurity = array(
2077
            '&#x00000;'          => '',
2078
            '&#0;'               => '',
2079
            '&#x00001;'          => '',
2080
            '&#1;'               => '',
2081
            '&nvgt;'             => '',
2082
            '&#61253;'           => '',
2083
            '&#x0EF45;'          => '',
2084
            '&shy;'              => '',
2085
            '&#x000AD;'          => '',
2086
            '&#173;'             => '',
2087
            '&colon;'            => ':',
2088
            '&#x0003A;'          => ':',
2089
            '&#58;'              => ':',
2090
            '&lpar;'             => '(',
2091
            '&#x00028;'          => '(',
2092
            '&#40;'              => '(',
2093
            '&rpar;'             => ')',
2094
            '&#x00029;'          => ')',
2095
            '&#41;'              => ')',
2096
            '&quest;'            => '?',
2097
            '&#x0003F;'          => '?',
2098
            '&#63;'              => '?',
2099
            '&sol;'              => '/',
2100
            '&#x0002F;'          => '/',
2101
            '&#47;'              => '/',
2102
            '&apos;'             => '\'',
2103
            '&#x00027;'          => '\'',
2104
            '&#039;'             => '\'',
2105
            '&#39;'              => '\'',
2106
            '&#x27;'             => '\'',
2107
            '&bsol;'             => '\'',
2108
            '&#x0005C;'          => '\\',
2109
            '&#92;'              => '\\',
2110
            '&comma;'            => ',',
2111
            '&#x0002C;'          => ',',
2112
            '&#44;'              => ',',
2113
            '&period;'           => '.',
2114
            '&#x0002E;'          => '.',
2115
            '&quot;'             => '"',
2116
            '&QUOT;'             => '"',
2117
            '&#x00022;'          => '"',
2118
            '&#34;'              => '"',
2119
            '&grave;'            => '`',
2120
            '&DiacriticalGrave;' => '`',
2121
            '&#x00060;'          => '`',
2122
            '&#96;'              => '`',
2123
            '&#46;'              => '.',
2124
            '&equals;'           => '=',
2125
            '&#x0003D;'          => '=',
2126
            '&#61;'              => '=',
2127
            '&newline;'          => "\n",
2128
            '&#x0000A;'          => "\n",
2129
            '&#10;'              => "\n",
2130
            '&tab;'              => "\t",
2131
            '&#x00009;'          => "\t",
2132
            '&#9;'               => "\t",
2133
        );
2134
2135
        $HTML_ENTITIES_CACHE = array_merge(
2136
            $entitiesSecurity,
2137
            array_flip(get_html_translation_table(HTML_ENTITIES, $flags)),
2138
            array_flip(self::$entitiesFallback)
2139
        );
2140
      }
2141
2142
      $replace = array();
2143
      foreach ($matches[0] as $match) {
2144
        $match .= ';';
2145
        if (isset($HTML_ENTITIES_CACHE[$match])) {
2146
          $replace[$match] = $HTML_ENTITIES_CACHE[$match];
2147
        }
2148
      }
2149
2150
      if (count($replace) > 0) {
2151
        $str = str_replace(array_keys($replace), array_values($replace), $str);
2152
      }
2153
    }
2154
2155
    return $str;
2156
  }
2157
2158
  /**
2159
   * Filters tag attributes for consistency and safety.
2160
   *
2161
   * @param string $str
2162
   *
2163
   * @return string
2164
   */
2165
  private function _filter_attributes($str)
2166
  {
2167
    if ($str === '') {
2168
      return '';
2169
    }
2170
2171
    $out = '';
2172
    if (
2173
        preg_match_all('#\s*[A-Za-z\-]+\s*=\s*("|\042|\'|\047)([^\\1]*?)\\1#', $str, $matches)
2174
        ||
2175
        (
2176
            $this->_replacement
2177
            &&
2178
            preg_match_all('#\s*[a-zA-Z\-]+\s*=' . preg_quote($this->_replacement, '#') . '$#', $str, $matches)
2179
        )
2180
    ) {
2181
      foreach ($matches[0] as $match) {
2182
        $out .= $match;
2183
      }
2184
    }
2185
2186
    return $out;
2187
  }
2188
2189
  /**
2190
   * initialize "$this->_never_allowed_str"
2191
   */
2192
  private function _initNeverAllowedStr()
2193
  {
2194
    $this->_never_allowed_str = array(
2195
        'document.cookie' => $this->_replacement,
2196
        'document.write'  => $this->_replacement,
2197
        '.parentNode'     => $this->_replacement,
2198
        '.innerHTML'      => $this->_replacement,
2199
        '.appendChild'    => $this->_replacement,
2200
        '-moz-binding'    => $this->_replacement,
2201
        '<!--'            => '&lt;!--',
2202
        '-->'             => '--&gt;',
2203
        '<?'              => '&lt;?',
2204
        '?>'              => '?&gt;',
2205
        '<![CDATA['       => '&lt;![CDATA[',
2206
        '<!ENTITY'        => '&lt;!ENTITY',
2207
        '<!DOCTYPE'       => '&lt;!DOCTYPE',
2208
        '<!ATTLIST'       => '&lt;!ATTLIST',
2209
        '<comment>'       => '&lt;comment&gt;',
2210
    );
2211
  }
2212
2213
  /**
2214
   * Callback method for xss_clean() to sanitize links.
2215
   *
2216
   * <p>
2217
   * <br />
2218
   * INFO: This limits the PCRE backtracks, making it more performance friendly
2219
   * and prevents PREG_BACKTRACK_LIMIT_ERROR from being triggered in
2220
   * PHP 5.2+ on link-heavy strings.
2221
   * </p>
2222
   *
2223
   * @param array $match
2224
   *
2225
   * @return string
2226
   */
2227
  private function _js_link_removal_callback($match)
2228
  {
2229
    return $this->_js_removal_calback($match, 'href');
2230
  }
2231
2232
  /**
2233
   * Callback method for xss_clean() to sanitize tags.
2234
   *
2235
   * <p>
2236
   * <br />
2237
   * INFO: This limits the PCRE backtracks, making it more performance friendly
2238
   * and prevents PREG_BACKTRACK_LIMIT_ERROR from being triggered in
2239
   * PHP 5.2+ on image tag heavy strings.
2240
   * </p>
2241
   *
2242
   * @param array  $match
2243
   * @param string $search
2244
   *
2245
   * @return string
2246
   */
2247
  private function _js_removal_calback($match, $search)
2248
  {
2249
    if (!$match[0]) {
2250
      return '';
2251
    }
2252
2253
    // init
2254
    $replacer = $this->_filter_attributes(str_replace(array('<', '>',), '', $match[1]));
2255
    $pattern = '#' . $search . '=.*(?:\(.+([^\)]*?)(?:\)|$)|javascript:|view-source:|livescript:|wscript:|vbscript:|mocha:|charset=|window\.|document\.|\.cookie|<script|d\s*a\s*t\s*a\s*:)#is';
2256
2257
    $matchInner = array();
2258
    preg_match($pattern, $match[1], $matchInner);
2259
    if (count($matchInner) > 0) {
2260
      $replacer = (string)preg_replace(
2261
          $pattern,
2262
          $search . '="' . $this->_replacement . '"',
2263
          $replacer
2264
      );
2265
    }
2266
2267
    return str_ireplace($match[1], $replacer, $match[0]);
2268
  }
2269
2270
  /**
2271
   * Callback method for xss_clean() to sanitize image tags.
2272
   *
2273
   * <p>
2274
   * <br />
2275
   * INFO: This limits the PCRE backtracks, making it more performance friendly
2276
   * and prevents PREG_BACKTRACK_LIMIT_ERROR from being triggered in
2277
   * PHP 5.2+ on image tag heavy strings.
2278
   * </p>
2279
   *
2280
   * @param array $match
2281
   *
2282
   * @return string
2283
   */
2284
  private function _js_src_removal_callback($match)
2285
  {
2286
    return $this->_js_removal_calback($match, 'src');
2287
  }
2288
2289
  /**
2290
   * Sanitize naughty HTML.
2291
   *
2292
   * <p>
2293
   * <br />
2294
   * Callback method for AntiXSS->sanitize_naughty_html() to remove naughty HTML elements.
2295
   * </p>
2296
   *
2297
   * @param array $matches
2298
   *
2299
   * @return string
2300
   */
2301
  private function _sanitize_naughty_html_callback($matches)
2302
  {
2303
    return '&lt;' . $matches[1] . $matches[2] . $matches[3] // encode opening brace
2304
           // encode captured opening or closing brace to prevent recursive vectors:
2305
           . str_replace(
2306
               array(
2307
                   '>',
2308
                   '<',
2309
               ),
2310
               array(
2311
                   '&gt;',
2312
                   '&lt;',
2313
               ),
2314
               $matches[4]
2315
           );
2316
  }
2317
2318
  /**
2319
   * Add some strings to the "_evil_attributes"-array.
2320
   *
2321
   * @param array $strings
2322
   *
2323
   * @return $this
2324
   */
2325
  public function addEvilAttributes(array $strings)
2326
  {
2327
    $this->_evil_attributes = array_merge($strings, $this->_evil_attributes);
2328
2329
    return $this;
2330
  }
2331
2332
  /**
2333
   * Compact any exploded words.
2334
   *
2335
   * <p>
2336
   * <br />
2337
   * INFO: This corrects words like:  j a v a s c r i p t
2338
   * <br />
2339
   * These words are compacted back to their correct state.
2340
   * </p>
2341
   *
2342
   * @param string $str
2343
   *
2344
   * @return string
2345
   */
2346
  private function _compact_exploded_javascript($str)
2347
  {
2348
    static $WORDS_CACHE;
2349
2350
    $words = array(
2351
        'javascript',
2352
        'expression',
2353
        'view-source',
2354
        'vbscript',
2355
        'jscript',
2356
        'wscript',
2357
        'vbs',
2358
        'script',
2359
        'base64',
2360
        'applet',
2361
        'alert',
2362
        'document',
2363
        'write',
2364
        'cookie',
2365
        'window',
2366
        'confirm',
2367
        'prompt',
2368
        'eval',
2369
    );
2370
2371
    foreach ($words as $word) {
2372
2373
      if (!isset($WORDS_CACHE[$word])) {
2374
        $regex = '(?:\s|\+|"|\042|\'|\047)*';
2375
        $word = $WORDS_CACHE[$word] = substr(
2376
            chunk_split($word, 1, $regex),
2377
            0,
2378
            -strlen($regex)
2379
        );
2380
      } else {
2381
        $word = $WORDS_CACHE[$word];
2382
      }
2383
2384
      // We only want to do this when it is followed by a non-word character
2385
      // That way valid stuff like "dealer to" does not become "dealerto".
2386
      $str = preg_replace_callback(
2387
          '#(' . $word . ')(\W)#is',
2388
          array(
2389
              $this,
2390
              '_compact_exploded_words_callback',
2391
          ),
2392
          $str
2393
      );
2394
    }
2395
2396
    return (string)$str;
2397
  }
2398
2399
  /**
2400
   * Decode the html-tags via "UTF8::html_entity_decode()" or the string via "UTF8::rawurldecode()".
2401
   *
2402
   * @param string $str
2403
   *
2404
   * @return string
2405
   */
2406
  private function _decode_string($str)
2407
  {
2408
    // init
2409
    $regExForHtmlTags = '/<\w+.*+/si';
2410
2411
    if (preg_match($regExForHtmlTags, $str, $matches) === 1) {
2412
      $str = preg_replace_callback(
2413
          $regExForHtmlTags,
2414
          array(
2415
              $this,
2416
              '_decode_entity',
2417
          ),
2418
          $str
2419
      );
2420
    } else {
2421
      $str = UTF8::rawurldecode($str);
2422
    }
2423
2424
    return $str;
2425
  }
2426
2427
  /**
2428
   * Check if the "AntiXSS->xss_clean()"-method found an XSS attack in the last run.
2429
   *
2430
   * @return bool|null <p>Will return null if the "xss_clean()" wan't running at all.</p>
2431
   */
2432
  public function isXssFound()
2433
  {
2434
    return $this->xss_found;
2435
  }
2436
2437
  /**
2438
   * Remove some strings from the "_evil_attributes"-array.
2439
   *
2440
   * <p>
2441
   * <br />
2442
   * WARNING: Use this method only if you have a really good reason.
2443
   * </p>
2444
   *
2445
   * @param array $strings
2446
   *
2447
   * @return $this
2448
   */
2449
  public function removeEvilAttributes(array $strings)
2450
  {
2451
    $this->_evil_attributes = array_diff(
2452
        array_intersect($strings, $this->_evil_attributes),
2453
        $this->_evil_attributes
2454
    );
2455
2456
    return $this;
2457
  }
2458
2459
  /**
2460
   * Remove disallowed Javascript in links or img tags
2461
   *
2462
   * <p>
2463
   * <br />
2464
   * We used to do some version comparisons and use of stripos(),
2465
   * but it is dog slow compared to these simplified non-capturing
2466
   * preg_match(), especially if the pattern exists in the string
2467
   * </p>
2468
   *
2469
   * <p>
2470
   * <br />
2471
   * Note: It was reported that not only space characters, but all in
2472
   * the following pattern can be parsed as separators between a tag name
2473
   * and its attributes: [\d\s"\'`;,\/\=\(\x00\x0B\x09\x0C]
2474
   * ... however, UTF8::clean() above already strips the
2475
   * hex-encoded ones, so we'll skip them below.
2476
   * </p>
2477
   *
2478
   * @param string $str
2479
   *
2480
   * @return string
2481
   */
2482
  private function _remove_disallowed_javascript($str)
2483
  {
2484
    do {
2485
      $original = $str;
2486
2487
      if (stripos($str, '<a') !== false) {
2488
        $str = preg_replace_callback(
2489
            '#<a[^a-z0-9>]+([^>]*?)(?:>|$)#i',
2490
            array(
2491
                $this,
2492
                '_js_link_removal_callback',
2493
            ),
2494
            $str
2495
        );
2496
      }
2497
2498
      if (stripos($str, '<img') !== false) {
2499
        $str = preg_replace_callback(
2500
            '#<img[^a-z0-9]+([^>]*?)(?:\s?/?>|$)#i',
2501
            array(
2502
                $this,
2503
                '_js_src_removal_callback',
2504
            ),
2505
            $str
2506
        );
2507
      }
2508
2509
      if (stripos($str, '<audio') !== false) {
2510
        $str = preg_replace_callback(
2511
            '#<audio[^a-z0-9]+([^>]*?)(?:\s?/?>|$)#i',
2512
            array(
2513
                $this,
2514
                '_js_src_removal_callback',
2515
            ),
2516
            $str
2517
        );
2518
      }
2519
2520
      if (stripos($str, '<video') !== false) {
2521
        $str = preg_replace_callback(
2522
            '#<video[^a-z0-9]+([^>]*?)(?:\s?/?>|$)#i',
2523
            array(
2524
                $this,
2525
                '_js_src_removal_callback',
2526
            ),
2527
            $str
2528
        );
2529
      }
2530
2531
      if (stripos($str, '<source') !== false) {
2532
        $str = preg_replace_callback(
2533
            '#<source[^a-z0-9]+([^>]*?)(?:\s?/?>|$)#i',
2534
            array(
2535
                $this,
2536
                '_js_src_removal_callback',
2537
            ),
2538
            $str
2539
        );
2540
      }
2541
2542
      if (stripos($str, 'script') !== false) {
2543
        // US-ASCII: ¼ === <
2544
        $str = preg_replace('#(?:¼|<)/*(?:script).*(?:¾|>)#isuU', $this->_replacement, $str);
2545
      }
2546
    } while ($original !== $str);
2547
2548
    return (string)$str;
2549
  }
2550
2551
  /**
2552
   * Remove Evil HTML Attributes (like event handlers and style).
2553
   *
2554
   * It removes the evil attribute and either:
2555
   *
2556
   *  - Everything up until a space. For example, everything between the pipes:
2557
   *
2558
   * <code>
2559
   *   <a |style=document.write('hello');alert('world');| class=link>
2560
   * </code>
2561
   *
2562
   *  - Everything inside the quotes. For example, everything between the pipes:
2563
   *
2564
   * <code>
2565
   *   <a |style="document.write('hello'); alert('world');"| class="link">
2566
   * </code>
2567
   *
2568
   * @param string $str <p>The string to check.</p>
2569
   *
2570
   * @return string <p>The string with the evil attributes removed.</p>
2571
   */
2572
  private function _remove_evil_attributes($str)
2573
  {
2574
    $evil_attributes_string = implode('|', $this->_evil_attributes);
2575
2576
    // replace style-attribute, first (if needed)
2577
    if (in_array('style', $this->_evil_attributes, true)) {
2578
      do {
2579
        $count = $temp_count = 0;
2580
2581
        $str = preg_replace('/(<[^>]+)(?<!\w)(style="(:?[^"]*?)"|style=\'(:?[^\']*?)\')/i', '$1' . $this->_replacement, $str, -1, $temp_count);
2582
        $count += $temp_count;
2583
2584
      } while ($count);
2585
    }
2586
2587
    do {
2588
      $count = $temp_count = 0;
2589
2590
      // find occurrences of illegal attribute strings with and without quotes (042 ["] and 047 ['] are octal quotes)
2591
      $str = preg_replace('/(<[^>]+)(?<!\w)(' . $evil_attributes_string . ')\s*=\s*(?:(?:"|\042|\'|\047)(?:[^\\2]*?)(?:\\2)|[^\s>]*)/is', '$1' . $this->_replacement, $str, -1, $temp_count);
2592
      $count += $temp_count;
2593
2594
    } while ($count);
2595
2596
    return (string)$str;
2597
  }
2598
2599
  /**
2600
   * UTF-7 decoding function.
2601
   *
2602
   * @param string $str <p>HTML document for recode ASCII part of UTF-7 back to ASCII.</p>
2603
   *
2604
   * @return string
2605
   */
2606
  private function _repack_utf7($str)
2607
  {
2608
    return preg_replace_callback(
2609
        '#\+([0-9a-zA-Z/]+)\-#',
2610
        array($this, '_repack_utf7_callback'),
2611
        $str
2612
    );
2613
  }
2614
2615
  /**
2616
   * Additional UTF-7 decoding function.
2617
   *
2618
   * @param string $str <p>String for recode ASCII part of UTF-7 back to ASCII.</p>
2619
   *
2620
   * @return string
2621
   */
2622
  private function _repack_utf7_callback($str)
2623
  {
2624
    $strTmp = base64_decode($str[1]);
2625
2626
    if ($strTmp === false) {
2627
      return $str;
2628
    }
2629
2630
    $str = preg_replace_callback(
2631
        '/^((?:\x00.)*?)((?:[^\x00].)+)/us',
2632
        array($this, '_repack_utf7_callback_back'),
2633
        $strTmp
2634
    );
2635
2636
    return preg_replace('/\x00(.)/us', '$1', $str);
2637
  }
2638
2639
  /**
2640
   * Additional UTF-7 encoding function.
2641
   *
2642
   * @param string $str <p>String for recode ASCII part of UTF-7 back to ASCII.</p>
2643
   *
2644
   * @return string
2645
   */
2646
  private function _repack_utf7_callback_back($str)
2647
  {
2648
    return $str[1] . '+' . rtrim(base64_encode($str[2]), '=') . '-';
2649
  }
2650
2651
  /**
2652
   * Sanitize naughty HTML elements.
2653
   *
2654
   * <p>
2655
   * <br />
2656
   *
2657
   * If a tag containing any of the words in the list
2658
   * below is found, the tag gets converted to entities.
2659
   *
2660
   * <br /><br />
2661
   *
2662
   * So this: <blink>
2663
   * <br />
2664
   * Becomes: &lt;blink&gt;
2665
   * </p>
2666
   *
2667
   * @param string $str
2668
   *
2669
   * @return string
2670
   */
2671
  private function _sanitize_naughty_html($str)
2672
  {
2673
    $naughty = 'alert|prompt|confirm|applet|audio|basefont|base|behavior|bgsound|blink|body|embed|expression|form|frameset|frame|head|html|ilayer|iframe|input|button|select|isindex|layer|link|meta|keygen|object|plaintext|style|script|textarea|title|math|video|source|svg|xml|xss|eval';
2674
    $str = preg_replace_callback(
2675
        '#<(/*\s*)(' . $naughty . ')([^><]*)([><]*)#i',
2676
        array(
2677
            $this,
2678
            '_sanitize_naughty_html_callback',
2679
        ),
2680
        $str
2681
    );
2682
2683
    return (string)$str;
2684
  }
2685
2686
  /**
2687
   * Sanitize naughty scripting elements
2688
   *
2689
   * <p>
2690
   * <br />
2691
   *
2692
   * Similar to above, only instead of looking for
2693
   * tags it looks for PHP and JavaScript commands
2694
   * that are disallowed. Rather than removing the
2695
   * code, it simply converts the parenthesis to entities
2696
   * rendering the code un-executable.
2697
   *
2698
   * <br /><br />
2699
   *
2700
   * For example:  <pre>eval('some code')</pre>
2701
   * <br />
2702
   * Becomes:      <pre>eval&#40;'some code'&#41;</pre>
2703
   * </p>
2704
   *
2705
   * @param string $str
2706
   *
2707
   * @return string
2708
   */
2709
  private function _sanitize_naughty_javascript($str)
2710
  {
2711
    $str = preg_replace(
2712
        '#(alert|eval|prompt|confirm|cmd|passthru|eval|exec|expression|system|fopen|fsockopen|file|file_get_contents|readfile|unlink)(\s*)\((.*)\)#siU',
2713
        '\\1\\2&#40;\\3&#41;',
2714
        $str
2715
    );
2716
2717
    return (string)$str;
2718
  }
2719
2720
  /**
2721
   * Set the replacement-string for not allowed strings.
2722
   *
2723
   * @param string $string
2724
   *
2725
   * @return $this
2726
   */
2727
  public function setReplacement($string)
2728
  {
2729
    $this->_replacement = (string)$string;
2730
2731
    $this->_initNeverAllowedStr();
2732
2733
    return $this;
2734
  }
2735
2736
  /**
2737
   * Set the option to stripe 4-Byte chars.
2738
   *
2739
   * <p>
2740
   * <br />
2741
   * INFO: use it if your DB (MySQL) can't use "utf8mb4" -> preventing stored XSS-attacks
2742
   * </p>
2743
   *
2744
   * @param $bool
2745
   *
2746
   * @return $this
2747
   */
2748
  public function setStripe4byteChars($bool)
2749
  {
2750
    $this->_stripe_4byte_chars = (bool)$bool;
2751
2752
    return $this;
2753
  }
2754
2755
  /**
2756
   * XSS Clean
2757
   *
2758
   * <p>
2759
   * <br />
2760
   * Sanitizes data so that "Cross Site Scripting" hacks can be
2761
   * prevented. This method does a fair amount of work but
2762
   * it is extremely thorough, designed to prevent even the
2763
   * most obscure XSS attempts. But keep in mind that nothing
2764
   * is ever 100% foolproof...
2765
   * </p>
2766
   *
2767
   * <p>
2768
   * <br />
2769
   * <strong>Note:</strong> Should only be used to deal with data upon submission.
2770
   *   It's not something that should be used for general
2771
   *   runtime processing.
2772
   * </p>
2773
   *
2774
   * @link http://channel.bitflux.ch/wiki/XSS_Prevention
2775
   *    Based in part on some code and ideas from Bitflux.
2776
   *
2777
   * @link http://ha.ckers.org/xss.html
2778
   *    To help develop this script I used this great list of
2779
   *    vulnerabilities along with a few other hacks I've
2780
   *    harvested from examining vulnerabilities in other programs.
2781
   *
2782
   * @param string|array $str <p>input data e.g. string or array</p>
2783
   *
2784
   * @return string|array|boolean <p>
2785
   *                              boolean: will return a boolean, if the "is_image"-parameter is true<br />
2786
   *                              string: will return a string, if the input is a string<br />
2787
   *                              array: will return a array, if the input is a array<br />
2788
   *                              </p>
2789
   */
2790
  public function xss_clean($str)
2791
  {
2792
    // reset
2793
    $this->xss_found = null;
2794
2795
    // check for an array of strings
2796
    if (is_array($str) === true) {
2797
      foreach ($str as $key => &$value) {
2798
        $str[$key] = $this->xss_clean($value);
2799
      }
2800
2801
      return $str;
2802
    }
2803
2804
    // process
2805
    do {
2806
      $old_str = $str;
2807
      $str = $this->_do($str);
2808
    } while ($old_str !== $str);
2809
2810
    return $str;
2811
  }
2812
2813
  /**
2814
   * Generates the XSS hash if needed and returns it.
2815
   *
2816
   * @return string <p>XSS hash</p>
2817
   */
2818
  private function _xss_hash()
2819
  {
2820
    if ($this->_xss_hash === null) {
2821
      $rand = Bootup::get_random_bytes(16);
2822
2823
      if (!$rand) {
2824
        $this->_xss_hash = md5(uniqid(mt_rand(), true));
2825
      } else {
2826
        $this->_xss_hash = bin2hex($rand);
2827
      }
2828
    }
2829
2830
    return 'voku::anti-xss::' . $this->_xss_hash;
2831
  }
2832
2833
}