Completed
Push — new-committers ( 29cb6f...bcba16 )
by Sam
12:18 queued 33s
created

HTMLText   B

Complexity

Total Complexity 40

Size/Duplication

Total Lines 251
Duplicated Lines 1.99 %

Coupling/Cohesion

Components 2
Dependencies 5
Metric Value
wmc 40
lcom 2
cbo 5
dl 5
loc 251
rs 8.2608

11 Methods

Rating   Name   Duplication   Size   Complexity  
A setOptions() 0 16 4
A FirstSentence() 5 16 4
A AbsoluteLinks() 0 3 1
A prepValueForDB() 0 3 1
A exists() 0 20 4
A scaffoldFormField() 0 3 1
A scaffoldSearchField() 0 3 1
A __construct() 0 7 2
C Summary() 0 59 14
A forTemplate() 0 8 2
B whitelistContent() 0 22 6

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like HTMLText often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use HTMLText, and based on these observations, apply Extract Interface, too.

1
<?php
2
/**
3
 * Represents a large text field that contains HTML content.
4
 * This behaves similarly to {@link Text}, but the template processor won't escape any HTML content within it.
5
 *
6
 * @see HTMLVarchar
7
 * @see Text
8
 * @see Varchar
9
 *
10
 * @package framework
11
 * @subpackage model
12
 */
13
class HTMLText extends Text {
14
	private static $escape_type = 'xml';
0 ignored issues
show
Comprehensibility introduced by
Consider using a different property name as you override a private property of the parent class.
Loading history...
15
16
	private static $casting = array(
0 ignored issues
show
Comprehensibility introduced by
Consider using a different property name as you override a private property of the parent class.
Loading history...
17
		"AbsoluteLinks" => "HTMLText",
18
		"BigSummary" => "HTMLText",
19
		"ContextSummary" => "HTMLText",
20
		"FirstParagraph" => "HTMLText",
21
		"FirstSentence" => "HTMLText",
22
		"LimitCharacters" => "HTMLText",
23
		"LimitSentences" => "HTMLText",
24
		"Lower" => "HTMLText",
25
		"LowerCase" => "HTMLText",
26
		"Summary" => "HTMLText",
27
		"Upper" => "HTMLText",
28
		"UpperCase" => "HTMLText",
29
		'EscapeXML' => 'HTMLText',
30
		'LimitWordCount' => 'HTMLText',
31
		'LimitWordCountXML' => 'HTMLText',
32
		'NoHTML' => 'Text',
33
	);
34
35
	protected $processShortcodes = true;
36
37
	protected $whitelist = false;
38
39
	public function __construct($name = null, $options = array()) {
40
		if(is_string($options)) {
41
			$options = array('whitelist' => $options);
42
		}
43
44
		return parent::__construct($name, $options);
0 ignored issues
show
Bug introduced by
Constructors do not have meaningful return values, anything that is returned from here is discarded. Are you sure this is correct?
Loading history...
45
	}
46
47
	/**
48
	 * @param array $options
49
	 *
50
	 * Options accepted in addition to those provided by Text:
51
	 *
52
	 *   - shortcodes: If true, shortcodes will be turned into the appropriate HTML.
53
	 *                 If false, shortcodes will not be processed.
54
	 *
55
	 *   - whitelist: If provided, a comma-separated list of elements that will be allowed to be stored
56
	 *                (be careful on relying on this for XSS protection - some seemingly-safe elements allow
57
	 *                attributes that can be exploited, for instance <img onload="exploiting_code();" src="..." />)
58
	 *                Text nodes outside of HTML tags are filtered out by default, but may be included by adding
59
	 *                the text() directive. E.g. 'link,meta,text()' will allow only <link /> <meta /> and text at
60
	 *                the root level.
61
	 */
62
	public function setOptions(array $options = array()) {
63
		parent::setOptions($options);
64
65
		if(array_key_exists("shortcodes", $options)) {
66
			$this->processShortcodes = !!$options["shortcodes"];
67
		}
68
69
		if(array_key_exists("whitelist", $options)) {
70
			if(is_array($options['whitelist'])) {
71
				$this->whitelist = $options['whitelist'];
0 ignored issues
show
Documentation Bug introduced by
It seems like $options['whitelist'] of type array is incompatible with the declared type boolean of property $whitelist.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
72
			}
73
			else {
74
				$this->whitelist = preg_split('/,\s*/', $options['whitelist']);
0 ignored issues
show
Documentation Bug introduced by
It seems like preg_split('/,\\s*/', $options['whitelist']) of type array is incompatible with the declared type boolean of property $whitelist.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
75
			}
76
		}
77
	}
78
79
	/**
80
	 * Create a summary of the content. This will be some section of the first paragraph, limited by
81
	 * $maxWords. All internal tags are stripped out - the return value is a string
82
	 *
83
	 * This is sort of the HTML aware equivilent to Text#Summary, although the logic for summarising is not exactly
84
	 * the same
85
	 *
86
	 * @param int $maxWords Maximum number of words to return - may return less, but never more. Pass -1 for no limit
87
	 * @param int $flex Number of words to search through when looking for a nice cut point
88
	 * @param string $add What to add to the end of the summary if we cut at a less-than-ideal cut point
89
	 * @return string A nice(ish) summary with no html tags (but possibly still some html entities)
90
	 *
91
	 * @see framework/core/model/fieldtypes/Text#Summary($maxWords)
92
	 */
93
	public function Summary($maxWords = 50, $flex = 15, $add = '...') {
94
		$str = false;
95
96
		/* First we need the text of the first paragraph, without tags. Try using SimpleXML first */
97
		if (class_exists('SimpleXMLElement')) {
98
			$doc = new DOMDocument();
99
100
			// Catch warnings thrown by loadHTML and turn them into a failure boolean rather than a SilverStripe error
101
			set_error_handler(create_function('$no, $str', 'throw new Exception("HTML Parse Error: ".$str);'), E_ALL);
102
			//  Nonbreaking spaces get converted into weird characters, so strip them
103
			$value = str_replace('&nbsp;', ' ', $this->value);
104
			try {
105
				$res = $doc->loadHTML('<meta content="text/html; charset=utf-8" http-equiv="Content-type"/>' . $value);
106
			}
107
			catch (Exception $e) { $res = false; }
108
			restore_error_handler();
109
110
			if ($res) {
111
				$xml = simplexml_import_dom($doc);
112
				$res = $xml->xpath('//p');
113
				if (!empty($res)) $str = strip_tags($res[0]->asXML());
114
			}
115
		}
116
117
		/* If that failed, most likely the passed HTML is broken. use a simple regex + a custom more brutal strip_tags.
118
		 * We don't use strip_tags because that does very badly on broken HTML */
119
		if (!$str) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $str of type string|false is loosely compared to false; this is ambiguous if the string can be empty. You might want to explicitly use === false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
120
			/* See if we can pull a paragraph out*/
121
122
			// Strip out any images in case there's one at the beginning. Not doing this will return a blank paragraph
123
			$str = preg_replace('{^\s*(<.+?>)*<img[^>]*>}', '', $this->value);
124
			if (preg_match('{<p(\s[^<>]*)?>(.*[A-Za-z]+.*)</p>}', $str, $matches)) $str = $matches[2];
125
126
			/* If _that_ failed, just use the whole text */
127
			if (!$str) $str = $this->value;
128
129
			/* Now pull out all the html-alike stuff */
130
			/* Take out anything that is obviously a tag */
131
			$str = preg_replace('{</?[a-zA-Z]+[^<>]*>}', '', $str);
132
			/* Strip out any left over looking bits. Textual < or > should already be encoded to &lt; or &gt; */
133
			$str = preg_replace('{</|<|>}', '', $str);
134
		}
135
136
		/* Now split into words. If we are under the maxWords limit, just return the whole string (re-implode for
137
		 * whitespace normalization) */
138
		$words = preg_split('/\s+/', $str);
139
		if ($maxWords == -1 || count($words) <= $maxWords) return implode(' ', $words);
140
141
		/* Otherwise work backwards for a looking for a sentence ending (we try to avoid abbreviations, but aren't
142
		 * very good at it) */
143
		for ($i = $maxWords; $i >= $maxWords - $flex && $i >= 0; $i--) {
144
			if (preg_match('/\.$/', $words[$i]) && !preg_match('/(Dr|Mr|Mrs|Ms|Miss|Sr|Jr|No)\.$/i', $words[$i])) {
145
				return implode(' ', array_slice($words, 0, $i+1));
146
			}
147
		}
148
149
		// If we didn't find a sentence ending quickly enough, just cut at the maxWords point and add '...' to the end
150
		return implode(' ', array_slice($words, 0, $maxWords)) . $add;
151
	}
152
153
	/**
154
	 * Returns the first sentence from the first paragraph. If it can't figure out what the first paragraph is (or
155
	 * there isn't one), it returns the same as Summary()
156
	 *
157
	 * This is the HTML aware equivilent to Text#FirstSentence
158
	 *
159
	 * @see framework/core/model/fieldtypes/Text#FirstSentence()
160
	 */
161
	public function FirstSentence() {
162
		/* Use summary's html processing logic to get the first paragraph */
163
		$paragraph = $this->Summary(-1);
164
165
		/* Then look for the first sentence ending. We could probably use a nice regex, but for now this will do */
166
		$words = preg_split('/\s+/', $paragraph);
167 View Code Duplication
		foreach ($words as $i => $word) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
168
			if (preg_match('/(!|\?|\.)$/', $word) && !preg_match('/(Dr|Mr|Mrs|Ms|Miss|Sr|Jr|No)\.$/i', $word)) {
169
				return implode(' ', array_slice($words, 0, $i+1));
170
			}
171
		}
172
173
		/* If we didn't find a sentence ending, use the summary. We re-call rather than using paragraph so that
174
		 * Summary will limit the result this time */
175
		return $this->Summary();
176
	}
177
178
	/**
179
	 * Return the value of the field with relative links converted to absolute urls (with placeholders parsed).
180
	 * @return string
181
	 */
182
	public function AbsoluteLinks() {
183
		return HTTP::absoluteURLs($this->forTemplate());
184
	}
185
186
	public function forTemplate() {
187
		if ($this->processShortcodes) {
188
			return ShortcodeParser::get_active()->parse($this->value);
189
		}
190
		else {
191
			return $this->value;
192
		}
193
	}
194
195
	public function prepValueForDB($value) {
196
		return parent::prepValueForDB($this->whitelistContent($value));
197
	}
198
199
	/**
200
	 * Filter the given $value string through the whitelist filter
201
	 *
202
	 * @param string $value Input html content
203
	 * @return string Value with all non-whitelisted content stripped (if applicable)
204
	 */
205
	public function whitelistContent($value) {
206
		if($this->whitelist) {
207
			$dom = Injector::inst()->create('HTMLValue', $value);
208
209
			$query = array();
210
			$textFilter = ' | //body/text()';
211
			foreach ($this->whitelist as $tag) {
0 ignored issues
show
Bug introduced by
The expression $this->whitelist of type boolean is not traversable.
Loading history...
212
				if($tag === 'text()') {
213
					$textFilter = ''; // Disable text filter if allowed
214
				} else {
215
					$query[] = 'not(self::'.$tag.')';
216
				}
217
			}
218
219
			foreach($dom->query('//body//*['.implode(' and ', $query).']'.$textFilter) as $el) {
220
				if ($el->parentNode) $el->parentNode->removeChild($el);
221
			}
222
223
			$value = $dom->getContent();
224
		}
225
		return $value;
226
	}
227
228
	/**
229
	 * Returns true if the field has meaningful content.
230
	 * Excludes null content like <h1></h1>, <p></p> ,etc
231
	 *
232
	 * @return boolean
233
	 */
234
	public function exists() {
235
		// If it's blank, it's blank
236
		if(!parent::exists()) {
237
			return false;
238
		}
239
240
		// If it's got a content tag
241
		if(preg_match('/<(img|embed|object|iframe|meta|source|link)[^>]*>/i', $this->value)) {
242
			return true;
243
		}
244
245
		// If it's just one or two tags on its own (and not the above) it's empty.
246
		// This might be <p></p> or <h1></h1> or whatever.
247
		if(preg_match('/^[\\s]*(<[^>]+>[\\s]*){1,2}$/', $this->value)) {
248
			return false;
249
		}
250
251
		// Otherwise its content is genuine content
252
		return true;
253
	}
254
255
	public function scaffoldFormField($title = null, $params = null) {
256
		return new HtmlEditorField($this->name, $title);
257
	}
258
259
	public function scaffoldSearchField($title = null, $params = null) {
260
		return new TextField($this->name, $title);
261
	}
262
263
}
264
265
266