HTMLText - Code Metrics - Inspection of "Add new committers" - sminnee/silverstripe-framework - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Push — new-committers ( 29cb6f...bcba16 )

by Sam

created 2016-04-19 10:25 UTC

HTMLText B

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	251
Duplicated Lines	1.99 %

Coupling/Cohesion

Components	2
Dependencies	5

Metric	Value
wmc	40
lcom	2
cbo	5
dl	5
loc	251
rs	8.2608

11 Methods

Rating	Name	Duplication	Size	Complexity
A	setOptions()	0	16	4
A	FirstSentence()	5	16	4
A	AbsoluteLinks()	0	3	1
A	prepValueForDB()	0	3	1
A	exists()	0	20	4
A	scaffoldFormField()	0	3	1
A	scaffoldSearchField()	0	3	1
A	__construct()	0	7	2
C	Summary()	0	59	14
A	forTemplate()	0	8	2
B	whitelistContent()	0	22	6

How to fix Duplicated Code Complexity

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

If you have the same expression in different places: Extract expression to a method
If you have the same method in different sub-classes: Extract method, and pull up field to the parent class
If you have the same code in unrelated classes: Consider extracting the code to a new class, and injecting that class

Complex Class

Tip: Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like HTMLText often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use HTMLText, and based on these observations, apply Extract Interface, too.

<?php
/**
 * Represents a large text field that contains HTML content.
 * This behaves similarly to {@link Text}, but the template processor won't escape any HTML content within it.
 *
 * @see HTMLVarchar
 * @see Text
 * @see Varchar
 *
 * @package framework
 * @subpackage model
 */
class HTMLText extends Text {
	private static $escape_type = 'xml';


	private static $casting = array(

		"AbsoluteLinks" => "HTMLText",
		"BigSummary" => "HTMLText",
		"ContextSummary" => "HTMLText",
		"FirstParagraph" => "HTMLText",
		"FirstSentence" => "HTMLText",
		"LimitCharacters" => "HTMLText",
		"LimitSentences" => "HTMLText",
		"Lower" => "HTMLText",
		"LowerCase" => "HTMLText",
		"Summary" => "HTMLText",
		"Upper" => "HTMLText",
		"UpperCase" => "HTMLText",
		'EscapeXML' => 'HTMLText',
		'LimitWordCount' => 'HTMLText',
		'LimitWordCountXML' => 'HTMLText',
		'NoHTML' => 'Text',
	);

	protected $processShortcodes = true;

	protected $whitelist = false;

	public function __construct($name = null, $options = array()) {
		if(is_string($options)) {
			$options = array('whitelist' => $options);
		}

		return parent::__construct($name, $options);

	}

	/**
	 * @param array $options
	 *
	 * Options accepted in addition to those provided by Text:
	 *
	 *   - shortcodes: If true, shortcodes will be turned into the appropriate HTML.
	 *                 If false, shortcodes will not be processed.
	 *
	 *   - whitelist: If provided, a comma-separated list of elements that will be allowed to be stored
	 *                (be careful on relying on this for XSS protection - some seemingly-safe elements allow
	 *                attributes that can be exploited, for instance <img onload="exploiting_code();" src="..." />)
	 *                Text nodes outside of HTML tags are filtered out by default, but may be included by adding
	 *                the text() directive. E.g. 'link,meta,text()' will allow only <link /> <meta /> and text at
	 *                the root level.
	 */
	public function setOptions(array $options = array()) {
		parent::setOptions($options);

		if(array_key_exists("shortcodes", $options)) {
			$this->processShortcodes = !!$options["shortcodes"];
		}

		if(array_key_exists("whitelist", $options)) {
			if(is_array($options['whitelist'])) {
				$this->whitelist = $options['whitelist'];

			}
			else {
				$this->whitelist = preg_split('/,\s*/', $options['whitelist']);

			}
		}
	}

	/**
	 * Create a summary of the content. This will be some section of the first paragraph, limited by
	 * $maxWords. All internal tags are stripped out - the return value is a string
	 *
	 * This is sort of the HTML aware equivilent to Text#Summary, although the logic for summarising is not exactly
	 * the same
	 *
	 * @param int $maxWords Maximum number of words to return - may return less, but never more. Pass -1 for no limit
	 * @param int $flex Number of words to search through when looking for a nice cut point
	 * @param string $add What to add to the end of the summary if we cut at a less-than-ideal cut point
	 * @return string A nice(ish) summary with no html tags (but possibly still some html entities)
	 *
	 * @see framework/core/model/fieldtypes/Text#Summary($maxWords)
	 */
	public function Summary($maxWords = 50, $flex = 15, $add = '...') {
		$str = false;

		/* First we need the text of the first paragraph, without tags. Try using SimpleXML first */
		if (class_exists('SimpleXMLElement')) {
			$doc = new DOMDocument();

			// Catch warnings thrown by loadHTML and turn them into a failure boolean rather than a SilverStripe error
			set_error_handler(create_function('$no, $str', 'throw new Exception("HTML Parse Error: ".$str);'), E_ALL);
			//  Nonbreaking spaces get converted into weird characters, so strip them
			$value = str_replace('&nbsp;', ' ', $this->value);
			try {
				$res = $doc->loadHTML('<meta content="text/html; charset=utf-8" http-equiv="Content-type"/>' . $value);
			}
			catch (Exception $e) { $res = false; }
			restore_error_handler();

			if ($res) {
				$xml = simplexml_import_dom($doc);
				$res = $xml->xpath('//p');
				if (!empty($res)) $str = strip_tags($res[0]->asXML());
			}
		}

		/* If that failed, most likely the passed HTML is broken. use a simple regex + a custom more brutal strip_tags.
		 * We don't use strip_tags because that does very badly on broken HTML */
		if (!$str) {
''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
			/* See if we can pull a paragraph out*/

			// Strip out any images in case there's one at the beginning. Not doing this will return a blank paragraph
			$str = preg_replace('{^\s*(<.+?>)*<img[^>]*>}', '', $this->value);
			if (preg_match('{<p(\s[^<>]*)?>(.*[A-Za-z]+.*)</p>}', $str, $matches)) $str = $matches[2];

			/* If _that_ failed, just use the whole text */
			if (!$str) $str = $this->value;

			/* Now pull out all the html-alike stuff */
			/* Take out anything that is obviously a tag */
			$str = preg_replace('{</?[a-zA-Z]+[^<>]*>}', '', $str);
			/* Strip out any left over looking bits. Textual < or > should already be encoded to &lt; or &gt; */
			$str = preg_replace('{</|<|>}', '', $str);
		}

		/* Now split into words. If we are under the maxWords limit, just return the whole string (re-implode for
		 * whitespace normalization) */
		$words = preg_split('/\s+/', $str);
		if ($maxWords == -1 || count($words) <= $maxWords) return implode(' ', $words);

		/* Otherwise work backwards for a looking for a sentence ending (we try to avoid abbreviations, but aren't
		 * very good at it) */
		for ($i = $maxWords; $i >= $maxWords - $flex && $i >= 0; $i--) {
			if (preg_match('/\.$/', $words[$i]) && !preg_match('/(Dr|Mr|Mrs|Ms|Miss|Sr|Jr|No)\.$/i', $words[$i])) {
				return implode(' ', array_slice($words, 0, $i+1));
			}
		}

		// If we didn't find a sentence ending quickly enough, just cut at the maxWords point and add '...' to the end
		return implode(' ', array_slice($words, 0, $maxWords)) . $add;
	}

	/**
	 * Returns the first sentence from the first paragraph. If it can't figure out what the first paragraph is (or
	 * there isn't one), it returns the same as Summary()
	 *
	 * This is the HTML aware equivilent to Text#FirstSentence
	 *
	 * @see framework/core/model/fieldtypes/Text#FirstSentence()
	 */
	public function FirstSentence() {
		/* Use summary's html processing logic to get the first paragraph */
		$paragraph = $this->Summary(-1);

		/* Then look for the first sentence ending. We could probably use a nice regex, but for now this will do */
		$words = preg_split('/\s+/', $paragraph);
		foreach ($words as $i => $word) {

			if (preg_match('/(!|\?|\.)$/', $word) && !preg_match('/(Dr|Mr|Mrs|Ms|Miss|Sr|Jr|No)\.$/i', $word)) {
				return implode(' ', array_slice($words, 0, $i+1));
			}
		}

		/* If we didn't find a sentence ending, use the summary. We re-call rather than using paragraph so that
		 * Summary will limit the result this time */
		return $this->Summary();
	}

	/**
	 * Return the value of the field with relative links converted to absolute urls (with placeholders parsed).
	 * @return string
	 */
	public function AbsoluteLinks() {
		return HTTP::absoluteURLs($this->forTemplate());
	}

	public function forTemplate() {
		if ($this->processShortcodes) {
			return ShortcodeParser::get_active()->parse($this->value);
		}
		else {
			return $this->value;
		}
	}

	public function prepValueForDB($value) {
		return parent::prepValueForDB($this->whitelistContent($value));
	}

	/**
	 * Filter the given $value string through the whitelist filter
	 *
	 * @param string $value Input html content
	 * @return string Value with all non-whitelisted content stripped (if applicable)
	 */
	public function whitelistContent($value) {
		if($this->whitelist) {
			$dom = Injector::inst()->create('HTMLValue', $value);

			$query = array();
			$textFilter = ' | //body/text()';
			foreach ($this->whitelist as $tag) {

				if($tag === 'text()') {
					$textFilter = ''; // Disable text filter if allowed
				} else {
					$query[] = 'not(self::'.$tag.')';
				}
			}

			foreach($dom->query('//body//*['.implode(' and ', $query).']'.$textFilter) as $el) {
				if ($el->parentNode) $el->parentNode->removeChild($el);
			}

			$value = $dom->getContent();
		}
		return $value;
	}

	/**
	 * Returns true if the field has meaningful content.
	 * Excludes null content like <h1></h1>, <p></p> ,etc
	 *
	 * @return boolean
	 */
	public function exists() {
		// If it's blank, it's blank
		if(!parent::exists()) {
			return false;
		}

		// If it's got a content tag
		if(preg_match('/<(img|embed|object|iframe|meta|source|link)[^>]*>/i', $this->value)) {
			return true;
		}

		// If it's just one or two tags on its own (and not the above) it's empty.
		// This might be <p></p> or <h1></h1> or whatever.
		if(preg_match('/^[\\s]*(<[^>]+>[\\s]*){1,2}$/', $this->value)) {
			return false;
		}

		// Otherwise its content is genuine content
		return true;
	}

	public function scaffoldFormField($title = null, $params = null) {
		return new HtmlEditorField($this->name, $title);
	}

	public function scaffoldSearchField($title = null, $params = null) {
		return new TextField($this->name, $title);
	}

}




1		<?php
2		/**
3		* Represents a large text field that contains HTML content.
4		* This behaves similarly to {@link Text}, but the template processor won't escape any HTML content within it.
5		*
6		* @see HTMLVarchar
7		* @see Text
8		* @see Varchar
9		*
10		* @package framework
11		* @subpackage model
12		*/
13		class HTMLText extends Text {
14		private static $escape_type = 'xml';
		0 ignored issues – show Comprehensibility introduced 2015-12-18 04:35 UTC by Report Bug Copy Issue Report Consider using a different property name as you override a private property of the parent class. Loading history...
15
16		private static $casting = array(
		0 ignored issues – show Comprehensibility introduced 2015-12-18 04:35 UTC by Report Bug Copy Issue Report Consider using a different property name as you override a private property of the parent class. Loading history...
17		"AbsoluteLinks" => "HTMLText",
18		"BigSummary" => "HTMLText",
19		"ContextSummary" => "HTMLText",
20		"FirstParagraph" => "HTMLText",
21		"FirstSentence" => "HTMLText",
22		"LimitCharacters" => "HTMLText",
23		"LimitSentences" => "HTMLText",
24		"Lower" => "HTMLText",
25		"LowerCase" => "HTMLText",
26		"Summary" => "HTMLText",
27		"Upper" => "HTMLText",
28		"UpperCase" => "HTMLText",
29		'EscapeXML' => 'HTMLText',
30		'LimitWordCount' => 'HTMLText',
31		'LimitWordCountXML' => 'HTMLText',
32		'NoHTML' => 'Text',
33		);
34
35		protected $processShortcodes = true;
36
37		protected $whitelist = false;
38
39		public function __construct($name = null, $options = array()) {
40		if(is_string($options)) {
41		$options = array('whitelist' => $options);
42		}
43
44		return parent::__construct($name, $options);
		0 ignored issues – show Bug introduced 2015-12-18 04:35 UTC by Report Bug Copy Issue Report Constructors do not have meaningful return values, anything that is returned from here is discarded. Are you sure this is correct? Loading history...
45		}
46
47		/**
48		* @param array $options
49		*
50		* Options accepted in addition to those provided by Text:
51		*
52		* - shortcodes: If true, shortcodes will be turned into the appropriate HTML.
53		* If false, shortcodes will not be processed.
54		*
55		* - whitelist: If provided, a comma-separated list of elements that will be allowed to be stored
56		* (be careful on relying on this for XSS protection - some seemingly-safe elements allow
57		* attributes that can be exploited, for instance <img onload="exploiting_code();" src="..." />)
58		* Text nodes outside of HTML tags are filtered out by default, but may be included by adding
59		* the text() directive. E.g. 'link,meta,text()' will allow only <link /> <meta /> and text at
60		* the root level.
61		*/
62		public function setOptions(array $options = array()) {
63		parent::setOptions($options);
64
65		if(array_key_exists("shortcodes", $options)) {
66		$this->processShortcodes = !!$options["shortcodes"];
67		}
68
69		if(array_key_exists("whitelist", $options)) {
70		if(is_array($options['whitelist'])) {
71		$this->whitelist = $options['whitelist'];
		0 ignored issues – show Documentation Bug introduced 2015-12-18 04:35 UTC by Report Bug Copy Issue Report It seems like `$options['whitelist']` of type `array` is incompatible with the declared type `boolean` of property `$whitelist`. Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property. Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property.. Loading history...
72		}
73		else {
74		$this->whitelist = preg_split('/,\s*/', $options['whitelist']);
		0 ignored issues – show Documentation Bug introduced 2015-12-18 04:35 UTC by Report Bug Copy Issue Report It seems like `preg_split('/,\\s*/', $options['whitelist'])` of type `array` is incompatible with the declared type `boolean` of property `$whitelist`. Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property. Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property.. Loading history...
75		}
76		}
77		}
78
79		/**
80		* Create a summary of the content. This will be some section of the first paragraph, limited by
81		* $maxWords. All internal tags are stripped out - the return value is a string
82		*
83		* This is sort of the HTML aware equivilent to Text#Summary, although the logic for summarising is not exactly
84		* the same
85		*
86		* @param int $maxWords Maximum number of words to return - may return less, but never more. Pass -1 for no limit
87		* @param int $flex Number of words to search through when looking for a nice cut point
88		* @param string $add What to add to the end of the summary if we cut at a less-than-ideal cut point
89		* @return string A nice(ish) summary with no html tags (but possibly still some html entities)
90		*
91		* @see framework/core/model/fieldtypes/Text#Summary($maxWords)
92		*/
93		public function Summary($maxWords = 50, $flex = 15, $add = '...') {
94		$str = false;
95
96		/* First we need the text of the first paragraph, without tags. Try using SimpleXML first */
97		if (class_exists('SimpleXMLElement')) {
98		$doc = new DOMDocument();
99
100		// Catch warnings thrown by loadHTML and turn them into a failure boolean rather than a SilverStripe error
101		set_error_handler(create_function('$no, $str', 'throw new Exception("HTML Parse Error: ".$str);'), E_ALL);
102		// Nonbreaking spaces get converted into weird characters, so strip them
103		$value = str_replace(' ', ' ', $this->value);
104		try {
105		$res = $doc->loadHTML('<meta content="text/html; charset=utf-8" http-equiv="Content-type"/>' . $value);
106		}
107		catch (Exception $e) { $res = false; }
108		restore_error_handler();
109
110		if ($res) {
111		$xml = simplexml_import_dom($doc);
112		$res = $xml->xpath('//p');
113		if (!empty($res)) $str = strip_tags($res[0]->asXML());
114		}
115		}
116
117		/* If that failed, most likely the passed HTML is broken. use a simple regex + a custom more brutal strip_tags.
118		* We don't use strip_tags because that does very badly on broken HTML */
119		if (!$str) {
		0 ignored issues – show Bug Best Practice introduced 2015-12-18 04:35 UTC by Report Bug Copy Issue Report The expression `$str` of type `string\|false` is loosely compared to `false`; this is ambiguous if the string can be empty. You might want to explicitly use `=== false` instead. In PHP, under loose comparison (like `==`, or `!=`, or `switch` conditions), values of different types might be equal. For `string` values, the empty string `''` is a special case, in particular the following results might be unexpected: '' == false // true '' == null // true 'ab' == false // false 'ab' == null // false // It is often better to use strict comparison '' === false // false '' === null // false Loading history...
120		/* See if we can pull a paragraph out*/
121
122		// Strip out any images in case there's one at the beginning. Not doing this will return a blank paragraph
123		$str = preg_replace('{^\s(<.+?>)<img[^>]*>}', '', $this->value);
124		if (preg_match('{<p(\s[^<>])?>(.[A-Za-z]+.*)</p>}', $str, $matches)) $str = $matches[2];
125
126		/* If _that_ failed, just use the whole text */
127		if (!$str) $str = $this->value;
128
129		/* Now pull out all the html-alike stuff */
130		/* Take out anything that is obviously a tag */
131		$str = preg_replace('{</?[a-zA-Z]+[^<>]*>}', '', $str);
132		/* Strip out any left over looking bits. Textual < or > should already be encoded to < or > */
133		$str = preg_replace('{</\|<\|>}', '', $str);
134		}
135
136		/* Now split into words. If we are under the maxWords limit, just return the whole string (re-implode for
137		* whitespace normalization) */
138		$words = preg_split('/\s+/', $str);
139		if ($maxWords == -1 \|\| count($words) <= $maxWords) return implode(' ', $words);
140
141		/* Otherwise work backwards for a looking for a sentence ending (we try to avoid abbreviations, but aren't
142		* very good at it) */
143		for ($i = $maxWords; $i >= $maxWords - $flex && $i >= 0; $i--) {
144		if (preg_match('/\.$/', $words[$i]) && !preg_match('/(Dr\|Mr\|Mrs\|Ms\|Miss\|Sr\|Jr\|No)\.$/i', $words[$i])) {
145		return implode(' ', array_slice($words, 0, $i+1));
146		}
147		}
148
149		// If we didn't find a sentence ending quickly enough, just cut at the maxWords point and add '...' to the end
150		return implode(' ', array_slice($words, 0, $maxWords)) . $add;
151		}
152
153		/**
154		* Returns the first sentence from the first paragraph. If it can't figure out what the first paragraph is (or
155		* there isn't one), it returns the same as Summary()
156		*
157		* This is the HTML aware equivilent to Text#FirstSentence
158		*
159		* @see framework/core/model/fieldtypes/Text#FirstSentence()
160		*/
161		public function FirstSentence() {
162		/* Use summary's html processing logic to get the first paragraph */
163		$paragraph = $this->Summary(-1);
164
165		/* Then look for the first sentence ending. We could probably use a nice regex, but for now this will do */
166		$words = preg_split('/\s+/', $paragraph);
167	View Code Duplication	foreach ($words as $i => $word) {
		0 ignored issues – show Duplication introduced 2015-12-18 04:35 UTC by Report Bug Copy Issue Report This code seems to be duplicated across your project. Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation. You can also find more detailed suggestions in the “Code” section of your repository. Loading history...
168		if (preg_match('/(!\|\?\|\.)$/', $word) && !preg_match('/(Dr\|Mr\|Mrs\|Ms\|Miss\|Sr\|Jr\|No)\.$/i', $word)) {
169		return implode(' ', array_slice($words, 0, $i+1));
170		}
171		}
172
173		/* If we didn't find a sentence ending, use the summary. We re-call rather than using paragraph so that
174		* Summary will limit the result this time */
175		return $this->Summary();
176		}
177
178		/**
179		* Return the value of the field with relative links converted to absolute urls (with placeholders parsed).
180		* @return string
181		*/
182		public function AbsoluteLinks() {
183		return HTTP::absoluteURLs($this->forTemplate());
184		}
185
186		public function forTemplate() {
187		if ($this->processShortcodes) {
188		return ShortcodeParser::get_active()->parse($this->value);
189		}
190		else {
191		return $this->value;
192		}
193		}
194
195		public function prepValueForDB($value) {
196		return parent::prepValueForDB($this->whitelistContent($value));
197		}
198
199		/**
200		* Filter the given $value string through the whitelist filter
201		*
202		* @param string $value Input html content
203		* @return string Value with all non-whitelisted content stripped (if applicable)
204		*/
205		public function whitelistContent($value) {
206		if($this->whitelist) {
207		$dom = Injector::inst()->create('HTMLValue', $value);
208
209		$query = array();
210		$textFilter = ' \| //body/text()';
211		foreach ($this->whitelist as $tag) {
		0 ignored issues – show Bug introduced 2015-12-18 04:35 UTC by Report Bug Copy Issue Report The expression `$this->whitelist` of type `boolean` is not traversable. Loading history...
212		if($tag === 'text()') {
213		$textFilter = ''; // Disable text filter if allowed
214		} else {
215		$query[] = 'not(self::'.$tag.')';
216		}
217		}
218
219		foreach($dom->query('//body//*['.implode(' and ', $query).']'.$textFilter) as $el) {
220		if ($el->parentNode) $el->parentNode->removeChild($el);
221		}
222
223		$value = $dom->getContent();
224		}
225		return $value;
226		}
227
228		/**
229		* Returns true if the field has meaningful content.
230		* Excludes null content like <h1></h1>, <p></p> ,etc
231		*
232		* @return boolean
233		*/
234		public function exists() {
235		// If it's blank, it's blank
236		if(!parent::exists()) {
237		return false;
238		}
239
240		// If it's got a content tag
241		if(preg_match('/<(img\|embed\|object\|iframe\|meta\|source\|link)[^>]*>/i', $this->value)) {
242		return true;
243		}
244
245		// If it's just one or two tags on its own (and not the above) it's empty.
246		// This might be <p></p> or <h1></h1> or whatever.
247		if(preg_match('/^[\\s](<[^>]+>[\\s]){1,2}$/', $this->value)) {
248		return false;
249		}
250
251		// Otherwise its content is genuine content
252		return true;
253		}
254
255		public function scaffoldFormField($title = null, $params = null) {
256		return new HtmlEditorField($this->name, $title);
257		}
258
259		public function scaffoldSearchField($title = null, $params = null) {
260		return new TextField($this->name, $title);
261		}
262
263		}
264
265
266

sminnee / silverstripe-framework