Completed
Push — master ( dca322...47e653 )
by Julito
08:54
created

kses5   F

Complexity

Total Complexity 115

Size/Duplication

Total Lines 1056
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 304
dl 0
loc 1056
rs 2
c 0
b 0
f 0
wmc 115

How to fix   Complexity   

Complex Class

Complex classes like kses5 often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use kses5, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
	/*
4
	 * ==========================================================================================
5
	 *
6
	 * This program is free software and open source software; you can redistribute
7
	 * it and/or modify it under the terms of the GNU General Public License as
8
	 * published by the Free Software Foundation; either version 2 of the License,
9
	 * or (at your option) any later version.
10
	 *
11
	 * This program is distributed in the hope that it will be useful, but WITHOUT
12
	 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
13
	 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
14
	 * more details.
15
	 *
16
	 * You should have received a copy of the GNU General Public License along
17
	 * with this program; if not, write to the Free Software Foundation, Inc.,
18
	 * 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA  or visit
19
	 * http://www.gnu.org/licenses/gpl.html
20
	 *
21
	 * ==========================================================================================
22
	 */
23
24
	/**
25
	*	Class file for PHP5 OOP version of kses
26
	*
27
	*	This is an updated version of kses to work with PHP5 that works under E_STRICT.
28
	*
29
	*	This version is a bit of a rewrite to match my own coding style and use some of the
30
	*	capabilities allowed in PHP5.  Since this was a significant rewrite, but it still
31
	*	maintains backward compatibility syntax-wise, the version number is now 1.0.0.  Any
32
	*	minor changes that do not break compatibility will be indicated in the second or third
33
	*	digits.  Anything that breaks compatibility will change the major version number.
34
	*
35
	*	PHP5 specific changes:
36
	*	+ Private methods are now in place
37
	*	+ __construct() is now used rather then the standard class name 'kses()'
38
	*	+ Kses will not load in any version less that PHP5
39
	*	Other modifications:
40
	*	+ PHPdoc style documentation has been added to the class.  See http://www.phpdoc.org/ for more info.
41
	*	+ Method names have been changed to reflect status as verbs
42
	*	+ One line methods have been folded into the code
43
	*	+ Some methods are now deprecated due to nomenclature style change.  See method documentation for specifics.
44
	*	+ Kses5 now works in E_STRICT
45
	*	+ Version number is 1.0.0 to reflect serious code changes
46
	*	+ Addition of methods AddProtocols(), filterKsestextHook(), RemoveProtocol(), RemoveProtocols() and SetProtocols()
47
	*	+ Deprecated _hook(), Protocols()
48
	*
49
	*	@package    kses
50
	*	@subpackage kses5
51
	*/
52
53
	if(substr(phpversion(), 0, 1) < 5)
54
	{
55
		die("Class kses requires PHP 5 or higher.");
56
	}
57
58
	/**
59
	*	Only install KSES5 once
60
	*/
61
	if(!defined('KSES_CLASS_PHP5'))
62
	{
63
		define('KSES_CLASS_PHP5', true);
64
65
	/**
66
	*	Kses strips evil scripts!
67
	*
68
	*	This class provides the capability for removing unwanted HTML/XHTML, attributes from
69
	*	tags, and protocols contained in links.  The net result is a much more powerful tool
70
	*	than the PHP internal strip_tags()
71
	*
72
	*	This is a fork of a slick piece of procedural code called 'kses' written by Ulf Harnhammar.
73
	*
74
	*	The original class for PHP4 was basically a wrapper around all of the functions in
75
	*	the procedural code written by Ulf, and was released 7/25/2003.
76
	*
77
	*	This version is a bit of a rewrite to match my own coding style and use some of the
78
	*	capabilities allowed in PHP5.  Since this was a significant rewrite, but it still
79
	*	maintains backward compatibility syntax-wise, the version number is now 1.0.0.  Any
80
	*	minor changes that do not break compatibility will be indicated in the second or third
81
	*	digits.  Anything that breaks compatibility will change the major version number.
82
	*
83
	*	PHP5 specific changes:
84
	*	+ Private methods are now in place
85
	*	+ __construct() is now used rather then the standard class name 'kses()'
86
	*	+ Kses5 will not load in any version less that PHP5
87
	*	Other modifications:
88
	*	+ PHPdoc style documentation has been added to the class.  See http://www.phpdoc.org/ for more info.
89
	*	+ Method names have been changed to reflect status as verbs
90
	*	+ One line methods have been folded into the code
91
	*	+ Some methods are now deprecated due to nomenclature style change.  See method documentation for specifics.
92
	*	+ Kses now works in E_STRICT
93
	*	+ Initial Version number set to 1.0.0 to reflect serious code changes
94
	*	+ Addition of methods AddProtocols(), filterKsestextHook(), RemoveProtocol(), RemoveProtocols() and SetProtocols()
95
	*	+ Deprecated _hook(), Protocols()
96
	*	+ Integrated code from kses 0.2.2 into class.
97
	*	+ Added methods DumpProtocols(), DumpMethods()
98
	*
99
	*	@author     Richard R. V�squez, Jr. (Original procedural code by Ulf H�rnhammar)
100
	*	@link       http://sourceforge.net/projects/kses/ Home Page for Kses
101
	*	@link       http://chaos.org/contact/ Contact page with current email address for Richard Vasquez
102
	*	@copyright  Richard R. V�squez, Jr. 2005
103
	*	@version    PHP5 OOP 1.0.2
104
	*	@license    http://www.gnu.org/licenses/gpl.html GNU Public License
105
	*	@package    kses
106
	*/
107
		class kses5
108
		{
109
			/**#@+
110
			 *	@access private
111
			 *	@var array
112
			 */
113
			private $allowed_protocols;
114
			private $allowed_html;
115
			/**#@-*/
116
117
			/**
118
			 *	Constructor for kses.
119
			 *
120
			 *	This sets a default collection of protocols allowed in links, and creates an
121
			 *	empty set of allowed HTML tags.
122
			 *	@since PHP5 OOP 1.0.0
123
			 */
124
			public function __construct()
125
			{
126
				/**
127
				 *	You could add protocols such as ftp, new, gopher, mailto, irc, etc.
128
				 *
129
				 *	The base values the original kses provided were:
130
				 *		'http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'gopher', 'mailto'
131
				 */
132
				$this->allowed_protocols = array('http', 'ftp', 'mailto');
133
				$this->allowed_html      = array();
134
			}
135
136
			/**
137
			 *	Basic task of kses - parses $string and strips it as required.
138
			 *
139
			 *	This method strips all the disallowed (X)HTML tags, attributes
140
			 *	and protocols from the input $string.
141
			 *
142
			 *	@access public
143
			 *	@param string $string String to be stripped of 'evil scripts'
144
			 *	@return string The stripped string
145
			 *	@since PHP4 OOP 0.0.1
146
			 */
147
			public function Parse($string = "")
148
			{
149
				if (get_magic_quotes_gpc())
150
				{
151
					$string = stripslashes($string);
152
				}
153
				$string = $this->removeNulls($string);
154
				//	Remove JavaScript entities from early Netscape 4 versions
155
				$string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string);
156
				$string = $this->normalizeEntities($string);
157
				$string = $this->filterKsesTextHook($string);
158
				$string = preg_replace('%(<' . '[^>]*' . '(>|$)' . '|>)%e', "\$this->stripTags('\\1')", $string);
159
				return $string;
160
			}
161
162
			/**
163
			 *	Allows for single/batch addition of protocols
164
			 *
165
			 *	This method accepts one argument that can be either a string
166
			 *	or an array of strings.  Invalid data will be ignored.
167
			 *
168
			 *	The argument will be processed, and each string will be added
169
			 *	via AddProtocol().
170
			 *
171
			 *	@access public
172
			 *	@param mixed , A string or array of protocols that will be added to the internal list of allowed protocols.
173
			 *	@return bool Status of adding valid protocols.
174
			 *	@see AddProtocol()
175
			 *	@since PHP5 OOP 1.0.0
176
			 */
177
			public function AddProtocols()
178
			{
179
				$c_args = func_num_args();
180
				if($c_args != 1)
181
				{
182
					trigger_error("kses5::AddProtocols() did not receive an argument.", E_USER_WARNING);
183
					return false;
184
				}
185
186
				$protocol_data = func_get_arg(0);
187
188
				if(is_array($protocol_data) && count($protocol_data) > 0)
189
				{
190
					foreach($protocol_data as $protocol)
191
					{
192
						$this->AddProtocol($protocol);
193
					}
194
					return true;
195
				}
196
				elseif(is_string($protocol_data))
197
				{
198
					$this->AddProtocol($protocol_data);
199
					return true;
200
				}
201
				else
202
				{
203
					trigger_error("kses5::AddProtocols() did not receive a string or an array.", E_USER_WARNING);
204
					return false;
205
				}
206
			}
207
208
			/**
209
			 *	Allows for single/batch addition of protocols
210
			 *
211
			 *	@deprecated Use AddProtocols()
212
			 *	@see AddProtocols()
213
			 *	@return bool
214
			 *	@since PHP4 OOP 0.0.1
215
			 */
216
			public function Protocols()
217
			{
218
				$c_args = func_num_args();
219
				if($c_args != 1)
220
				{
221
					trigger_error("kses5::Protocols() did not receive an argument.", E_USER_WARNING);
222
					return false;
223
				}
224
225
				return $this->AddProtocols(func_get_arg(0));
226
			}
227
228
			/**
229
			 *	Adds a single protocol to $this->allowed_protocols.
230
			 *
231
			 *	This method accepts a string argument and adds it to
232
			 *	the list of allowed protocols to keep when performing
233
			 *	Parse().
234
			 *
235
			 *	@access public
236
			 *	@param string $protocol The name of the protocol to be added.
237
			 *	@return bool Status of adding valid protocol.
238
			 *	@since PHP4 OOP 0.0.1
239
			 */
240
			public function AddProtocol($protocol = "")
241
			{
242
				if(!is_string($protocol))
243
				{
244
					trigger_error("kses5::AddProtocol() requires a string.", E_USER_WARNING);
245
					return false;
246
				}
247
248
				// Remove any inadvertent ':' at the end of the protocol.
249
				if(substr($protocol, strlen($protocol) - 1, 1) == ":")
250
				{
251
					$protocol = substr($protocol, 0, strlen($protocol) - 1);
252
				}
253
254
				$protocol = strtolower(trim($protocol));
255
				if($protocol == "")
256
				{
257
					trigger_error("kses5::AddProtocol() tried to add an empty/NULL protocol.", E_USER_WARNING);
258
					return false;
259
				}
260
261
				//	prevent duplicate protocols from being added.
262
				if(!in_array($protocol, $this->allowed_protocols))
263
				{
264
					array_push($this->allowed_protocols, $protocol);
265
					sort($this->allowed_protocols);
266
				}
267
				return true;
268
			}
269
270
			/**
271
			 *	Removes a single protocol from $this->allowed_protocols.
272
			 *
273
			 *	This method accepts a string argument and removes it from
274
			 *	the list of allowed protocols to keep when performing
275
			 *	Parse().
276
			 *
277
			 *	@access public
278
			 *	@param string $protocol The name of the protocol to be removed.
279
			 *	@return bool Status of removing valid protocol.
280
			 *	@since PHP5 OOP 1.0.0
281
			 */
282
			public function RemoveProtocol($protocol = "")
283
			{
284
				if(!is_string($protocol))
285
				{
286
					trigger_error("kses5::RemoveProtocol() requires a string.", E_USER_WARNING);
287
					return false;
288
				}
289
290
				// Remove any inadvertent ':' at the end of the protocol.
291
				if(substr($protocol, strlen($protocol) - 1, 1) == ":")
292
				{
293
					$protocol = substr($protocol, 0, strlen($protocol) - 1);
294
				}
295
296
				$protocol = strtolower(trim($protocol));
297
				if($protocol == "")
298
				{
299
					trigger_error("kses5::RemoveProtocol() tried to remove an empty/NULL protocol.", E_USER_WARNING);
300
					return false;
301
				}
302
303
				//	Ensures that the protocol exists before removing it.
304
				if(in_array($protocol, $this->allowed_protocols))
305
				{
306
					$this->allowed_protocols = array_diff($this->allowed_protocols, array($protocol));
307
					sort($this->allowed_protocols);
308
				}
309
310
				return true;
311
			}
312
313
			/**
314
			 *	Allows for single/batch removal of protocols
315
			 *
316
			 *	This method accepts one argument that can be either a string
317
			 *	or an array of strings.  Invalid data will be ignored.
318
			 *
319
			 *	The argument will be processed, and each string will be removed
320
			 *	via RemoveProtocol().
321
			 *
322
			 *	@access public
323
			 *	@param mixed , A string or array of protocols that will be removed from the internal list of allowed protocols.
324
			 *	@return bool Status of removing valid protocols.
325
			 *	@see RemoveProtocol()
326
			 *	@since PHP5 OOP 1.0.0
327
			 */
328
			public function RemoveProtocols()
329
			{
330
				$c_args = func_num_args();
331
				if($c_args != 1)
332
				{
333
					return false;
334
				}
335
336
				$protocol_data = func_get_arg(0);
337
338
				if(is_array($protocol_data) && count($protocol_data) > 0)
339
				{
340
					foreach($protocol_data as $protocol)
341
					{
342
						$this->RemoveProtocol($protocol);
343
					}
344
				}
345
				elseif(is_string($protocol_data))
346
				{
347
					$this->RemoveProtocol($protocol_data);
348
					return true;
349
				}
350
				else
351
				{
352
					trigger_error("kses5::RemoveProtocols() did not receive a string or an array.", E_USER_WARNING);
353
					return false;
354
				}
355
			}
356
357
			/**
358
			 *	Allows for single/batch replacement of protocols
359
			 *
360
			 *	This method accepts one argument that can be either a string
361
			 *	or an array of strings.  Invalid data will be ignored.
362
			 *
363
			 *	Existing protocols will be removed, then the argument will be
364
			 *	processed, and each string will be added via AddProtocol().
365
			 *
366
			 *	@access public
367
			 *	@param mixed , A string or array of protocols that will be the new internal list of allowed protocols.
368
			 *	@return bool Status of replacing valid protocols.
369
			 *	@since PHP5 OOP 1.0.1
370
			 *	@see AddProtocol()
371
			 */
372
			public function SetProtocols()
373
			{
374
				$c_args = func_num_args();
375
				if($c_args != 1)
376
				{
377
					trigger_error("kses5::SetProtocols() did not receive an argument.", E_USER_WARNING);
378
					return false;
379
				}
380
381
				$protocol_data = func_get_arg(0);
382
383
				if(is_array($protocol_data) && count($protocol_data) > 0)
384
				{
385
					$this->allowed_protocols = array();
386
					foreach($protocol_data as $protocol)
387
					{
388
						$this->AddProtocol($protocol);
389
					}
390
					return true;
391
				}
392
				elseif(is_string($protocol_data))
393
				{
394
					$this->allowed_protocols = array();
395
					$this->AddProtocol($protocol_data);
396
					return true;
397
				}
398
				else
399
				{
400
					trigger_error("kses5::SetProtocols() did not receive a string or an array.", E_USER_WARNING);
401
					return false;
402
				}
403
			}
404
405
			/**
406
			 *	Raw dump of allowed protocols
407
			 *
408
			 *	This returns an indexed array of allowed protocols for a particular KSES
409
			 *	instantiation.
410
			 *
411
			 *	@access public
412
			 *	@return array The list of allowed protocols.
413
			 *	@since PHP5 OOP 1.0.2
414
			 */
415
			public function DumpProtocols()
416
			{
417
				return $this->allowed_protocols;
418
			}
419
420
			/**
421
			 *	Raw dump of allowed (X)HTML elements
422
			 *
423
			 *	This returns an indexed array of allowed (X)HTML elements and attributes
424
			 *	for a particular KSES instantiation.
425
			 *
426
			 *	@access public
427
			 *	@return array The list of allowed elements.
428
			 *	@since PHP5 OOP 1.0.2
429
			 */
430
			public function DumpElements()
431
			{
432
				return $this->allowed_html;
433
			}
434
435
436
			/**
437
			 *	Adds valid (X)HTML with corresponding attributes that will be kept when stripping 'evil scripts'.
438
			 *
439
			 *	This method accepts one argument that can be either a string
440
			 *	or an array of strings.  Invalid data will be ignored.
441
			 *
442
			 *	@access public
443
			 *	@param string $tag (X)HTML tag that will be allowed after stripping text.
444
			 *	@param array $attribs Associative array of allowed attributes - key => attribute name - value => attribute parameter
445
			 *	@return bool Status of Adding (X)HTML and attributes.
446
			 *	@since PHP4 OOP 0.0.1
447
			 */
448
			public function AddHTML($tag = "", $attribs = array())
449
			{
450
				if(!is_string($tag))
451
				{
452
					trigger_error("kses5::AddHTML() requires the tag to be a string", E_USER_WARNING);
453
					return false;
454
				}
455
456
				$tag = strtolower(trim($tag));
457
				if($tag == "")
458
				{
459
					trigger_error("kses5::AddHTML() tried to add an empty/NULL tag", E_USER_WARNING);
460
					return false;
461
				}
462
463
				if(!is_array($attribs))
464
				{
465
					trigger_error("kses5::AddHTML() requires an array (even an empty one) of attributes for '$tag'", E_USER_WARNING);
466
					return false;
467
				}
468
469
				$new_attribs = array();
470
				if(is_array($attribs) && count($attribs) > 0)
471
				{
472
					foreach($attribs as $idx1 => $val1)
473
					{
474
						$new_idx1 = strtolower($idx1);
475
						$new_val1 = $attribs[$idx1];
476
477
						if(is_array($new_val1) && count($attribs) > 0)
478
						{
479
							$tmp_val = array();
480
							foreach($new_val1 as $idx2 => $val2)
481
							{
482
								$new_idx2 = strtolower($idx2);
483
								$tmp_val[$new_idx2] = $val2;
484
							}
485
							$new_val1 = $tmp_val;
486
						}
487
488
						$new_attribs[$new_idx1] = $new_val1;
489
					}
490
				}
491
492
				$this->allowed_html[$tag] = $new_attribs;
493
				return true;
494
			}
495
496
			/**
497
			 *	This method removes any NULL characters in $string.
498
			 *
499
			 *	@access private
500
			 *	@param string $string
501
			 *	@return string String without any NULL/chr(173)
502
			 *	@since PHP4 OOP 0.0.1
503
			 */
504
			private function removeNulls($string)
505
			{
506
				$string = preg_replace('/\0+/', '', $string);
507
				$string = preg_replace('/(\\\\0)+/', '', $string);
508
				return $string;
509
			}
510
511
			/**
512
			 *	Normalizes HTML entities
513
			 *
514
			 *	This function normalizes HTML entities. It will convert "AT&T" to the correct
515
			 *	"AT&amp;T", "&#00058;" to "&#58;", "&#XYZZY;" to "&amp;#XYZZY;" and so on.
516
			 *
517
			 *	@access private
518
			 *	@param string $string
519
			 *	@return string String with normalized entities
520
			 *	@since PHP4 OOP 0.0.1
521
			 */
522
			private function normalizeEntities($string)
523
			{
524
				# Disarm all entities by converting & to &amp;
525
				$string = str_replace('&', '&amp;', $string);
526
527
				#	TODO: Change back (Keep?) the allowed entities in our entity white list
528
529
				#	Keeps entities that start with [A-Za-z]
530
				$string = preg_replace(
531
					'/&amp;([A-Za-z][A-Za-z0-9]{0,19});/',
532
					'&\\1;',
533
					$string
534
				);
535
536
				#	Change numeric entities to valid 16 bit values
537
538
				$string = preg_replace(
539
					'/&amp;#0*([0-9]{1,5});/e',
540
					'\$this->normalizeEntities16bit("\\1")',
541
					$string
542
				);
543
544
				#	Change &XHHHHHHH (Hex digits) to 16 bit hex values
545
				$string = preg_replace(
546
					'/&amp;#([Xx])0*(([0-9A-Fa-f]{2}){1,2});/',
547
					'&#\\1\\2;',
548
					$string
549
				);
550
551
				return $string;
552
			}
553
554
			/**
555
			 *	Helper method used by normalizeEntites()
556
			 *
557
			 *	This method helps normalizeEntities() to only accept 16 bit values
558
			 *	and nothing more for &#number; entities.
559
			 *
560
			 *	This method helps normalize_entities() during a preg_replace()
561
			 *	where a &#(0)*XXXXX; occurs.  The '(0)*XXXXXX' value is converted to
562
			 *	a number and the result is returned as a numeric entity if the number
563
			 *	is less than 65536.  Otherwise, the value is returned 'as is'.
564
			 *
565
			 *	@access private
566
			 *	@param string $i
567
			 *	@return string Normalized numeric entity
568
			 *	@see normalizeEntities()
569
			 *	@since PHP4 OOP 0.0.1
570
			 */
571
			private function normalizeEntities16bit($i)
572
			{
573
			  return (($i > 65535) ? "&amp;#$i;" : "&#$i;");
574
			}
575
576
			/**
577
			 *	Allows for additional user defined modifications to text.
578
			 *
579
			 *	This method allows for additional modifications to be performed on
580
			 *	a string that's being run through Parse().  Currently, it returns the
581
			 *	input string 'as is'.
582
			 *
583
			 *	This method is provided for users to extend the kses class for their own
584
			 *	requirements.
585
			 *
586
			 *	@access public
587
			 *	@param string $string String to perfrom additional modifications on.
588
			 *	@return string User modified string.
589
			 *	@see Parse()
590
			 *	@since PHP5 OOP 1.0.0
591
			 */
592
			private function filterKsesTextHook($string)
593
			{
594
			  return $string;
595
			}
596
597
			/**
598
			 *	Allows for additional user defined modifications to text.
599
			 *
600
			 *	@deprecated use filterKsesTextHook()
601
			 *	@param string $string
602
			 *	@return string
603
			 *	@see filterKsesTextHook()
604
			 *	@since PHP4 OOP 0.0.1
605
			 */
606
			private function _hook($string)
607
			{
608
				return $this->filterKsesTextHook($string);
609
			}
610
611
			/**
612
			 *	This method goes through an array, and changes the keys to all lower case.
613
			 *
614
			 *	@access private
615
			 *	@param array $in_array Associative array
616
			 *	@return array Modified array
617
			 *	@since PHP4 OOP 0.0.1
618
			 */
619
			private function makeArrayKeysLowerCase($in_array)
620
			{
621
				$out_array = array();
622
623
				if(is_array($in_array) && count($in_array) > 0)
624
				{
625
					foreach ($in_array as $in_key => $in_val)
626
					{
627
						$out_key = strtolower($in_key);
628
						$out_array[$out_key] = array();
629
630
						if(is_array($in_val) && count($in_val) > 0)
631
						{
632
							foreach ($in_val as $in_key2 => $in_val2)
633
							{
634
								$out_key2 = strtolower($in_key2);
635
								$out_array[$out_key][$out_key2] = $in_val2;
636
							}
637
						}
638
					}
639
				}
640
641
				return $out_array;
642
			}
643
644
			/**
645
			 *	This method strips out disallowed and/or mangled (X)HTML tags along with assigned attributes.
646
			 *
647
			 *	This method does a lot of work. It rejects some very malformed things
648
			 *	like <:::>. It returns an empty string if the element isn't allowed (look
649
			 *	ma, no strip_tags()!). Otherwise it splits the tag into an element and an
650
			 *	allowed attribute list.
651
			 *
652
			 *	@access private
653
			 *	@param string $string
654
			 *	@return string Modified string minus disallowed/mangled (X)HTML and attributes
655
			 *	@since PHP4 OOP 0.0.1
656
			 */
657
			private function stripTags($string)
658
			{
659
				$string = preg_replace('%\\\\"%', '"', $string);
660
661
				if (substr($string, 0, 1) != '<')
662
				{
663
					# It matched a ">" character
664
					return '&gt;';
665
				}
666
667
				if (!preg_match('%^<\s*(/\s*)?([a-zA-Z0-9]+)([^>]*)>?$%', $string, $matches))
668
				{
669
					# It's seriously malformed
670
					return '';
671
				}
672
673
				$slash    = trim($matches[1]);
674
				$elem     = $matches[2];
675
				$attrlist = $matches[3];
676
677
				if (
678
					!isset($this->allowed_html[strtolower($elem)]) ||
679
					!is_array($this->allowed_html[strtolower($elem)]))
680
				{
681
					#	Found an HTML element not in the white list
682
					return '';
683
				}
684
685
				if ($slash != '')
686
				{
687
					return "<$slash$elem>";
688
				}
689
				# No attributes are allowed for closing elements
690
691
				return $this->stripAttributes("$slash$elem", $attrlist);
692
			}
693
694
			/**
695
			 *	This method strips out disallowed attributes for (X)HTML tags.
696
			 *
697
			 *	This method removes all attributes if none are allowed for this element.
698
			 *	If some are allowed it calls combAttributes() to split them further, and then it
699
			 *	builds up new HTML code from the data that combAttributes() returns. It also
700
			 *	removes "<" and ">" characters, if there are any left. One more thing it
701
			 *	does is to check if the tag has a closing XHTML slash, and if it does,
702
			 *	it puts one in the returned code as well.
703
			 *
704
			 *	@access private
705
			 *	@param string $element (X)HTML tag to check
706
			 *	@param string $attr Text containing attributes to check for validity.
707
			 *	@return string Resulting valid (X)HTML or ''
708
			 *	@see combAttributes()
709
			 *	@since PHP4 OOP 0.0.1
710
			 */
711
			private function stripAttributes($element, $attr)
712
			{
713
				# Is there a closing XHTML slash at the end of the attributes?
714
				$xhtml_slash = '';
715
				if (preg_match('%\s/\s*$%', $attr))
716
				{
717
					$xhtml_slash = ' /';
718
				}
719
720
				# Are any attributes allowed at all for this element?
721
				if (
722
					!isset($this->allowed_html[strtolower($element)]) ||
723
					count($this->allowed_html[strtolower($element)]) == 0
724
				)
725
				{
726
					return "<$element$xhtml_slash>";
727
				}
728
729
				# Split it
730
				$attrarr = $this->combAttributes($attr);
731
732
				# Go through $attrarr, and save the allowed attributes for this element
733
				# in $attr2
734
				$attr2 = '';
735
				if(is_array($attrarr) && count($attrarr) > 0)
736
				{
737
					foreach ($attrarr as $arreach)
738
					{
739
						if(!isset($this->allowed_html[strtolower($element)][strtolower($arreach['name'])]))
740
						{
741
							continue;
742
						}
743
744
						$current = $this->allowed_html[strtolower($element)][strtolower($arreach['name'])];
745
746
						if (!is_array($current))
747
						{
748
							# there are no checks
749
							$attr2 .= ' '.$arreach['whole'];
750
						}
751
						else
752
						{
753
							# there are some checks
754
							$ok = true;
755
							if(is_array($current) && count($current) > 0)
756
							{
757
								foreach ($current as $currkey => $currval)
758
								{
759
									if (!$this->checkAttributeValue($arreach['value'], $arreach['vless'], $currkey, $currval))
760
									{
761
										$ok = false;
762
										break;
763
									}
764
								}
765
							}
766
767
							if ($ok)
768
							{
769
								# it passed them
770
								$attr2 .= ' '.$arreach['whole'];
771
							}
772
						}
773
					}
774
				}
775
776
				# Remove any "<" or ">" characters
777
				$attr2 = preg_replace('/[<>]/', '', $attr2);
778
				return "<$element$attr2$xhtml_slash>";
779
			}
780
781
			/**
782
			 *	This method combs through an attribute list string and returns an associative array of attributes and values.
783
			 *
784
			 *	This method does a lot of work. It parses an attribute list into an array
785
			 *	with attribute data, and tries to do the right thing even if it gets weird
786
			 *	input. It will add quotes around attribute values that don't have any quotes
787
			 *	or apostrophes around them, to make it easier to produce HTML code that will
788
			 *	conform to W3C's HTML specification. It will also remove bad URL protocols
789
			 *	from attribute values.
790
			 *
791
			 *	@access private
792
			 *	@param string $attr Text containing tag attributes for parsing
793
			 *	@return array Associative array containing data on attribute and value
794
			 *	@since PHP4 OOP 0.0.1
795
			 */
796
			private function combAttributes($attr)
797
			{
798
				$attrarr  = array();
799
				$mode     = 0;
800
				$attrname = '';
801
802
				# Loop through the whole attribute list
803
804
				while (strlen($attr) != 0)
805
				{
806
					# Was the last operation successful?
807
					$working = 0;
808
809
					switch ($mode)
810
					{
811
						case 0:	# attribute name, href for instance
812
							if (preg_match('/^([-a-zA-Z]+)/', $attr, $match))
813
							{
814
								$attrname = $match[1];
815
								$working = $mode = 1;
816
								$attr = preg_replace('/^[-a-zA-Z]+/', '', $attr);
817
							}
818
							break;
819
						case 1:	# equals sign or valueless ("selected")
820
							if (preg_match('/^\s*=\s*/', $attr)) # equals sign
821
							{
822
								$working = 1;
823
								$mode    = 2;
824
								$attr    = preg_replace('/^\s*=\s*/', '', $attr);
825
								break;
826
							}
827
							if (preg_match('/^\s+/', $attr)) # valueless
828
							{
829
								$working   = 1;
830
								$mode      = 0;
831
								$attrarr[] = array(
832
									'name'  => $attrname,
833
									'value' => '',
834
									'whole' => $attrname,
835
									'vless' => 'y'
836
								);
837
								$attr      = preg_replace('/^\s+/', '', $attr);
838
							}
839
							break;
840
						case 2: # attribute value, a URL after href= for instance
841
							if (preg_match('/^"([^"]*)"(\s+|$)/', $attr, $match)) # "value"
842
							{
843
								$thisval   = $this->removeBadProtocols($match[1]);
844
								$attrarr[] = array(
845
									'name'  => $attrname,
846
									'value' => $thisval,
847
									'whole' => $attrname . '="' . $thisval . '"',
848
									'vless' => 'n'
849
								);
850
								$working   = 1;
851
								$mode      = 0;
852
								$attr      = preg_replace('/^"[^"]*"(\s+|$)/', '', $attr);
853
								break;
854
							}
855
							if (preg_match("/^'([^']*)'(\s+|$)/", $attr, $match)) # 'value'
856
							{
857
								$thisval   = $this->removeBadProtocols($match[1]);
858
								$attrarr[] = array(
859
									'name'  => $attrname,
860
									'value' => $thisval,
861
									'whole' => "$attrname='$thisval'",
862
									'vless' => 'n'
863
								);
864
								$working   = 1;
865
								$mode      = 0;
866
								$attr      = preg_replace("/^'[^']*'(\s+|$)/", '', $attr);
867
								break;
868
							}
869
							if (preg_match("%^([^\s\"']+)(\s+|$)%", $attr, $match)) # value
870
							{
871
								$thisval   = $this->removeBadProtocols($match[1]);
872
								$attrarr[] = array(
873
									'name'  => $attrname,
874
									'value' => $thisval,
875
									'whole' => $attrname . '="' . $thisval . '"',
876
									'vless' => 'n'
877
								);
878
								# We add quotes to conform to W3C's HTML spec.
879
								$working   = 1;
880
								$mode      = 0;
881
								$attr      = preg_replace("%^[^\s\"']+(\s+|$)%", '', $attr);
882
							}
883
							break;
884
					}
885
886
					if ($working == 0) # not well formed, remove and try again
887
					{
888
						$attr = preg_replace('/^("[^"]*("|$)|\'[^\']*(\'|$)|\S)*\s*/', '', $attr);
889
						$mode = 0;
890
					}
891
				}
892
893
				# special case, for when the attribute list ends with a valueless
894
				# attribute like "selected"
895
				if ($mode == 1)
896
				{
897
					$attrarr[] = array(
898
						'name'  => $attrname,
899
						'value' => '',
900
						'whole' => $attrname,
901
						'vless' => 'y'
902
					);
903
				}
904
905
				return $attrarr;
906
			}
907
908
			/**
909
			 *	This method removes disallowed protocols.
910
			 *
911
			 *	This method removes all non-allowed protocols from the beginning of
912
			 *	$string. It ignores whitespace and the case of the letters, and it does
913
			 *	understand HTML entities. It does its work in a while loop, so it won't be
914
			 *	fooled by a string like "javascript:javascript:alert(57)".
915
			 *
916
			 *	@access private
917
			 *	@param string $string String to check for protocols
918
			 *	@return string String with removed protocols
919
			 *	@since PHP4 OOP 0.0.1
920
			 */
921
			private function removeBadProtocols($string)
922
			{
923
				$string  = $this->RemoveNulls($string);
924
				$string = preg_replace('/\xad+/', '', $string); # deals with Opera "feature"
925
				$string2 = $string . 'a';
926
927
				$string2 = preg_split('/:|&#58;|&#x3a;/i', $string, 2);
928
				if(isset($string2[1]) && !preg_match('%/\?%',$string2[0]))
929
				{
930
					$string = $this->filterProtocols($string2[0]).trim($string2[1]);
931
				}
932
				return $string;
933
			}
934
935
			/**
936
			 *	Helper method used by removeBadProtocols()
937
			 *
938
			 *	This function processes URL protocols, checks to see if they're in the white-
939
			 *	list or not, and returns different data depending on the answer.
940
			 *
941
			 *	@access private
942
			 *	@param string $string String to check for protocols
943
			 *	@return string String with removed protocols
944
			 *	@see removeBadProtocols()
945
			 *	@since PHP4 OOP 0.0.1
946
			 */
947
			private function filterProtocols($string)
948
			{
949
				$string = $this->decodeEntities($string);
950
				$string = preg_replace('/\s/', '', $string);
951
				$string = $this->removeNulls($string);
952
				$string = preg_replace('/\xad+/', '', $string2); # deals with Opera "feature"
953
				$string = strtolower($string);
954
955
				if(is_array($this->allowed_protocols) && count($this->allowed_protocols) > 0)
956
				{
957
					foreach ($this->allowed_protocols as $one_protocol)
958
					{
959
						if (strtolower($one_protocol) == $string)
960
						{
961
							return "$string:";
962
						}
963
					}
964
				}
965
966
				return '';
967
			}
968
969
			/**
970
			 *	Controller method for performing checks on attribute values.
971
			 *
972
			 *	This method calls the appropriate method as specified by $checkname with
973
			 *	the parameters $value, $vless, and $checkvalue, and returns the result
974
			 *	of the call.
975
			 *
976
			 *	This method's functionality can be expanded by creating new methods
977
			 *	that would match checkAttributeValue[$checkname].
978
			 *
979
			 *	Current checks implemented are: "maxlen", "minlen", "maxval", "minval" and "valueless"
980
			 *
981
			 *	@access private
982
			 *	@param string $value The value of the attribute to be checked.
983
			 *	@param string $vless Indicates whether the the value is supposed to be valueless
984
			 *	@param string $checkname The check to be performed
985
			 *	@param string $checkvalue The value that is to be checked against
986
			 *	@return bool Indicates whether the check passed or not
987
			 *	@since PHP5 OOP 1.0.0
988
			 */
989
			private function checkAttributeValue($value, $vless, $checkname, $checkvalue)
990
			{
991
				$ok = true;
992
				$check_attribute_method_name  = 'checkAttributeValue' . ucfirst(strtolower($checkname));
993
				if(method_exists($this, $check_attribute_method_name))
994
				{
995
					$ok = $this->$check_attribute_method_name($value, $checkvalue, $vless);
996
				}
997
998
				return $ok;
999
			}
1000
1001
			/**
1002
			 *	Helper method invoked by checkAttributeValue().
1003
			 *
1004
			 *	The maxlen check makes sure that the attribute value has a length not
1005
			 *	greater than the given value. This can be used to avoid Buffer Overflows
1006
			 *	in WWW clients and various Internet servers.
1007
			 *
1008
			 *	@access private
1009
			 *	@param string $value The value of the attribute to be checked.
1010
			 *	@param int $checkvalue The maximum value allowed
1011
			 *	@return bool Indicates whether the check passed or not
1012
			 *	@see checkAttributeValue()
1013
			 *	@since PHP5 OOP 1.0.0
1014
			 */
1015
			private function checkAttributeValueMaxlen($value, $checkvalue)
1016
			{
1017
				if (strlen($value) > intval($checkvalue))
1018
				{
1019
					return false;
1020
				}
1021
				return true;
1022
			}
1023
1024
			/**
1025
			 *	Helper method invoked by checkAttributeValue().
1026
			 *
1027
			 *	The minlen check makes sure that the attribute value has a length not
1028
			 *	smaller than the given value.
1029
			 *
1030
			 *	@access private
1031
			 *	@param string $value The value of the attribute to be checked.
1032
			 *	@param int $checkvalue The minimum value allowed
1033
			 *	@return bool Indicates whether the check passed or not
1034
			 *	@see checkAttributeValue()
1035
			 *	@since PHP5 OOP 1.0.0
1036
			 */
1037
			private function checkAttributeValueMinlen($value, $checkvalue)
1038
			{
1039
				if (strlen($value) < intval($checkvalue))
1040
				{
1041
					return false;
1042
				}
1043
				return true;
1044
			}
1045
1046
			/**
1047
			 *	Helper method invoked by checkAttributeValue().
1048
			 *
1049
			 *	The maxval check does two things: it checks that the attribute value is
1050
			 *	an integer from 0 and up, without an excessive amount of zeroes or
1051
			 *	whitespace (to avoid Buffer Overflows). It also checks that the attribute
1052
			 *	value is not greater than the given value.
1053
			 *
1054
			 *	This check can be used to avoid Denial of Service attacks.
1055
			 *
1056
			 *	@access private
1057
			 *	@param int $value The value of the attribute to be checked.
1058
			 *	@param int $checkvalue The maximum numeric value allowed
1059
			 *	@return bool Indicates whether the check passed or not
1060
			 *	@see checkAttributeValue()
1061
			 *	@since PHP5 OOP 1.0.0
1062
			 */
1063
			private function checkAttributeValueMaxval($value, $checkvalue)
1064
			{
1065
				if (!preg_match('/^\s{0,6}[0-9]{1,6}\s{0,6}$/', $value))
1066
				{
1067
					return false;
1068
				}
1069
				if (intval($value) > intval($checkvalue))
1070
				{
1071
					return false;
1072
				}
1073
				return true;
1074
			}
1075
1076
			/**
1077
			 *	Helper method invoked by checkAttributeValue().
1078
			 *
1079
			 *	The minval check checks that the attribute value is a positive integer,
1080
			 *	and that it is not smaller than the given value.
1081
			 *
1082
			 *	@access private
1083
			 *	@param int $value The value of the attribute to be checked.
1084
			 *	@param int $checkvalue The minimum numeric value allowed
1085
			 *	@return bool Indicates whether the check passed or not
1086
			 *	@see checkAttributeValue()
1087
			 *	@since PHP5 OOP 1.0.0
1088
			 */
1089
			private function checkAttributeValueMinval($value, $checkvalue)
1090
			{
1091
				if (!preg_match('/^\s{0,6}[0-9]{1,6}\s{0,6}$/', $value))
1092
				{
1093
					return false;
1094
				}
1095
				if (intval($value) < ($checkvalue))
1096
				{
1097
					return false;
1098
				}
1099
				return true;
1100
			}
1101
1102
			/**
1103
			 *	Helper method invoked by checkAttributeValue().
1104
			 *
1105
			 *	The valueless check checks if the attribute has a value
1106
			 *	(like <a href="blah">) or not (<option selected>). If the given value
1107
			 *	is a "y" or a "Y", the attribute must not have a value.
1108
			 *
1109
			 *	If the given value is an "n" or an "N", the attribute must have one.
1110
			 *
1111
			 *	@access private
1112
			 *	@param int $value The value of the attribute to be checked.
1113
			 *	@param mixed $checkvalue This variable is ignored for this test
1114
			 *	@param string $vless Flag indicating if this attribute is not supposed to have an attribute
1115
			 *	@return bool Indicates whether the check passed or not
1116
			 *	@see checkAttributeValue()
1117
			 *	@since PHP5 OOP 1.0.0
1118
			 */
1119
			private function checkAttributeValueValueless($value, $checkvalue, $vless)
1120
			{
1121
				if (strtolower($checkvalue) != $vless)
1122
				{
1123
					return false;
1124
				}
1125
				return true;
1126
			}
1127
1128
			/**
1129
			 *	Decodes numeric HTML entities
1130
			 *
1131
			 *	This method decodes numeric HTML entities (&#65; and &#x41;). It doesn't
1132
			 *	do anything with other entities like &auml;, but we don't need them in the
1133
			 *	URL protocol white listing system anyway.
1134
			 *
1135
			 *	@access private
1136
			 *	@param string $value The entitiy to be decoded.
1137
			 *	@return string Decoded entity
1138
			 *	@since PHP4 OOP 0.0.1
1139
			 */
1140
			private function decodeEntities($string)
1141
			{
1142
				$string = preg_replace('/&#([0-9]+);/e', 'chr("\\1")', $string);
1143
				$string = preg_replace('/&#[Xx]([0-9A-Fa-f]+);/e', 'chr(hexdec("\\1"))', $string);
1144
				return $string;
1145
			}
1146
1147
			/**
1148
			 *	Returns PHP5 OOP version # of kses.
1149
			 *
1150
			 *	Since this class has been refactored and documented and proven to work,
1151
			 *	I'm fixing the version number at 1.0.0.
1152
			 *
1153
			 *	This version is syntax compatible with the PHP4 OOP version 0.0.2.  Future
1154
			 *	versions may not be syntax compatible.
1155
			 *
1156
			 *	@access public
1157
			 *	@return string Version number
1158
			 *	@since PHP4 OOP 0.0.1
1159
			 */
1160
			public function Version()
1161
			{
1162
				return 'PHP5 OOP 1.0.2';
1163
			}
1164
		}
1165
	}
1166
?>