ElggAutoP::addParagraphs()   F
last analyzed

Complexity

Conditions 32
Paths 10384

Size

Total Lines 129

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 86
CRAP Score 33.4932

Importance

Changes 0
Metric Value
cc 32
nc 10384
nop 1
dl 0
loc 129
ccs 86
cts 97
cp 0.8866
crap 33.4932
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * Create wrapper P and BR elements in HTML depending on newlines. Useful when
5
 * users use newlines to signal line and paragraph breaks. In all cases output
6
 * should be well-formed markup.
7
 *
8
 * In DIV elements, Ps are only added when there would be at
9
 * least two of them.
10
 * 
11
 * @package    Elgg.Core
12
 * @subpackage Output
13
 */
14
class ElggAutoP {
15
16
	public $encoding = 'UTF-8';
17
18
	/**
19
	 * @var DOMDocument
20
	 */
21
	protected $_doc = null;
22
23
	/**
24
	 * @var DOMXPath
25
	 */
26
	protected $_xpath = null;
27
28
	protected $_blocks = 'address article area aside blockquote caption col colgroup dd 
29
		details div dl dt fieldset figure figcaption footer form h1 h2 h3 h4 h5 h6 header 
30
		hr hgroup legend map math menu nav noscript p pre section select style summary
31
		table tbody td tfoot th thead tr ul ol option li';
32
33
	/**
34
	 * @var array
35
	 */
36
	protected $_inlines = 'a abbr audio b button canvas caption cite code command datalist
37
		del dfn em embed i iframe img input ins kbd keygen label map mark meter object
38
		output progress q rp rt ruby s samp script select small source span strong style
39
		sub sup textarea time var video wbr';
40
41
	/**
42
	 * Descend into these elements to add Ps
43
	 *
44
	 * @var array
45
	 */
46
	protected $_descendList = 'article aside blockquote body details div footer form
47
		header section';
48
49
	/**
50
	 * Add Ps inside these elements
51
	 *
52
	 * @var array
53
	 */
54
	protected $_alterList = 'article aside blockquote body details div footer header
55
		section';
56
57
	/** @var string */
58
	protected $_unique = '';
59
60
	/**
61
	 * Constructor
62
	 */
63 7
	public function __construct() {
64 7
		$this->_blocks = preg_split('@\\s+@', $this->_blocks);
0 ignored issues
show
Documentation Bug introduced by
It seems like preg_split('@\\s+@', $this->_blocks) of type array is incompatible with the declared type string of property $_blocks.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
65 7
		$this->_descendList = preg_split('@\\s+@', $this->_descendList);
66 7
		$this->_alterList = preg_split('@\\s+@', $this->_alterList);
67 7
		$this->_inlines = preg_split('@\\s+@', $this->_inlines);
68 7
		$this->_unique = md5(__FILE__);
69 7
	}
70
71
	/**
72
	 * Create wrapper P and BR elements in HTML depending on newlines. Useful when
73
	 * users use newlines to signal line and paragraph breaks. In all cases output
74
	 * should be well-formed markup.
75
	 *
76
	 * In DIV, LI, TD, and TH elements, Ps are only added when their would be at
77
	 * least two of them.
78
	 *
79
	 * @param string $html snippet
80
	 * @return string|false output or false if parse error occurred
81
	 */
82 5
	public function process($html) {
83
		// normalize whitespace
84 5
		$html = str_replace(array("\r\n", "\r"), "\n", $html);
85
86
		// allows preserving entities untouched
87 5
		$html = str_replace('&', $this->_unique . 'AMP', $html);
88
89 5
		$this->_doc = new DOMDocument();
90
91
		// parse to DOM, suppressing loadHTML warnings
92
		// http://www.php.net/manual/en/domdocument.loadhtml.php#95463
93 5
		libxml_use_internal_errors(true);
94
95
		// Do not load entities. May be unnecessary, better safe than sorry
96 5
		$disable_load_entities = libxml_disable_entity_loader(true);
97
98 5
		if (!$this->_doc->loadHTML("<html><meta http-equiv='content-type' " 
99 5
				. "content='text/html; charset={$this->encoding}'><body>{$html}</body>"
100 5
				. "</html>")) {
101
102
			libxml_disable_entity_loader($disable_load_entities);
103
			return false;
104
		}
105
106 5
		libxml_disable_entity_loader($disable_load_entities);
107
108 5
		$this->_xpath = new DOMXPath($this->_doc);
109
		// start processing recursively at the BODY element
110 5
		$nodeList = $this->_xpath->query('//body[1]');
111 5
		$this->addParagraphs($nodeList->item(0));
0 ignored issues
show
Compatibility introduced by
$nodeList->item(0) of type object<DOMNode> is not a sub-type of object<DOMElement>. It seems like you assume a child class of the class DOMNode to be always present.

This check looks for parameters that are defined as one type in their type hint or doc comment but seem to be used as a narrower type, i.e an implementation of an interface or a subclass.

Consider changing the type of the parameter or doing an instanceof check before assuming your parameter is of the expected type.

Loading history...
112
113
		// serialize back to HTML
114 5
		$html = $this->_doc->saveHTML();
115
116
		// Note: we create <autop> elements, which will later be converted to paragraphs
117
118
		// split AUTOPs into multiples at /\n\n+/
119 5
		$html = preg_replace('/(' . $this->_unique . 'NL){2,}/', '</autop><autop>', $html);
120 5
		$html = str_replace(array($this->_unique . 'BR', $this->_unique . 'NL', '<br>'),
121 5
				'<br />',
122 5
				$html);
123 5
		$html = str_replace('<br /></autop>', '</autop>', $html);
124
125
		// re-parse so we can handle new AUTOP elements
126
127
		// Do not load entities. May be unnecessary, better safe than sorry
128 5
		$disable_load_entities = libxml_disable_entity_loader(true);
129
130 5
		if (!$this->_doc->loadHTML($html)) {
131
			libxml_disable_entity_loader($disable_load_entities);
132
			return false;
133
		}
134
135 5
		libxml_disable_entity_loader($disable_load_entities);
136
137
		// must re-create XPath object after DOM load
138 5
		$this->_xpath = new DOMXPath($this->_doc);
139
140
		// strip AUTOPs that only have comments/whitespace
141 5
		foreach ($this->_xpath->query('//autop') as $autop) {
142
			/* @var DOMElement $autop */
143 5
			$hasContent = false;
144 5
			if (trim($autop->textContent) !== '') {
145 5
				$hasContent = true;
146 5
			} else {
147 2
				foreach ($autop->childNodes as $node) {
148 2
					if ($node->nodeType === XML_ELEMENT_NODE) {
149 2
						$hasContent = true;
150 2
						break;
151
					}
152 2
				}
153
			}
154 5
			if (!$hasContent) {
155
				// mark to be later replaced w/ preg_replace (faster than moving nodes out)
156 2
				$autop->setAttribute("r", "1");
157 2
			}
158 5
		}
159
160
		// If a DIV contains a single AUTOP, remove it
161 5
		foreach ($this->_xpath->query('//div') as $el) {
162
			/* @var DOMElement $el */
163 1
			$autops = $this->_xpath->query('./autop', $el);
164 1
			if ($autops->length === 1) {
165 1
				$firstAutop = $autops->item(0);
166
				/* @var DOMElement $firstAutop */
167 1
				$firstAutop->setAttribute("r", "1");
168 1
			}
169 5
		}
170
171 5
		$html = $this->_doc->saveHTML();
172
173
		// trim to the contents of BODY
174 5
		$bodyStart = strpos($html, '<body>');
175 5
		$bodyEnd = strpos($html, '</body>', $bodyStart + 6);
176 5
		$html = substr($html, $bodyStart + 6, $bodyEnd - $bodyStart - 6);
177
		
178
		// strip AUTOPs that should be removed
179 5
		$html = preg_replace('@<autop r="1">(.*?)</autop>@', '\\1', $html);
180
181
		// commit to converting AUTOPs to Ps
182 5
		$html = str_replace('<autop>', "\n<p>", $html);
183 5
		$html = str_replace('</autop>', "</p>\n", $html);
184
		
185 5
		$html = str_replace('<br>', '<br />', $html);
186 5
		$html = str_replace($this->_unique . 'AMP', '&', $html);
187 5
		return $html;
188
	}
189
190
	/**
191
	 * Add P and BR elements as necessary
192
	 *
193
	 * @param DOMElement $el DOM element
194
	 * @return void
195
	 */
196 5
	protected function addParagraphs(DOMElement $el) {
197
		// no need to call recursively, just queue up
198 5
		$elsToProcess = array($el);
199 5
		$inlinesToProcess = array();
200 5
		while ($el = array_shift($elsToProcess)) {
201
			// if true, we can alter all child nodes, if not, we'll just call
202
			// addParagraphs on each element in the descendInto list
203 5
			$alterInline = in_array($el->nodeName, $this->_alterList);
204
205
			// inside affected elements, we want to trim leading whitespace from
206
			// the first text node
207 5
			$ltrimFirstTextNode = true;
208
209
			// should we open a new AUTOP element to move inline elements into?
210 5
			$openP = true;
211 5
			$autop = null;
212
213
			// after BR, ignore a newline
214 5
			$isFollowingBr = false;
215
216 5
			$node = $el->firstChild;
217 5
			while (null !== $node) {
218 5
				if ($alterInline) {
219 5
					if ($openP) {
220 5
						$openP = false;
221
						// create a P to move inline content into (this may be removed later)
222 5
						$autop = $el->insertBefore($this->_doc->createElement('autop'), $node);
223 5
					}
224 5
				}
225
226 5
				$isElement = ($node->nodeType === XML_ELEMENT_NODE);
227 5
				if ($isElement) {
228 5
					$isBlock = in_array($node->nodeName, $this->_blocks);
229 5
					if (!$isBlock) {
230
						// if we start with an inline element we don't need to do this
231 5
						$ltrimFirstTextNode = false;
232 5
					}
233 5
				} else {
234 5
					$isBlock = false;
235
				}
236
237 5
				if ($alterInline) {
238 5
					$isText = ($node->nodeType === XML_TEXT_NODE);
239 5
					$isLastInline = (! $node->nextSibling
240 5
							|| ($node->nextSibling->nodeType === XML_ELEMENT_NODE
241 5
								&& in_array($node->nextSibling->nodeName, $this->_blocks)));
242 5
					if ($isElement) {
243 5
						$isFollowingBr = ($node->nodeName === 'br');
244 5
					}
245
246 5
					if ($isText) {
247 5
						$nodeText = $node->nodeValue;
248
249 5
						if ($ltrimFirstTextNode) {
250
							// we're at the beginning of a sequence of text/inline elements
251 4
							$nodeText = ltrim($nodeText);
252 4
							$ltrimFirstTextNode = false;
253 4
						}
254 5
						if ($isFollowingBr && preg_match('@^[ \\t]*\\n[ \\t]*@', $nodeText, $m)) {
255
							// if a user ends a line with <br>, don't add a second BR
256 1
							$nodeText = substr($nodeText, strlen($m[0]));
257 1
						}
258 5
						if ($isLastInline) {
259
							// we're at the end of a sequence of text/inline elements
260 4
							$nodeText = rtrim($nodeText);
261 4
						}
262 5
						$nodeText = str_replace("\n", $this->_unique . 'NL', $nodeText);
263 5
						$tmpNode = $node;
264 5
						$node = $node->nextSibling; // move loop to next node
265
266
						// alter node in place, then move into AUTOP
267 5
						$tmpNode->nodeValue = $nodeText;
268 5
						$autop->appendChild($tmpNode);
269
270 5
						continue;
271
					}
272 5
				}
273 5
				if ($isBlock || ! $node->nextSibling) {
274 5
					if ($isBlock) {
275 4
						if (in_array($node->nodeName, $this->_descendList)) {
276 3
							$elsToProcess[] = $node;
277
							//$this->addParagraphs($node);
278 3
						}
279 4
					}
280 5
					$openP = true;
281 5
					$ltrimFirstTextNode = true;
282 5
				}
283 5
				if ($alterInline) {
284 5
					if (! $isBlock) {
285 5
						$tmpNode = $node;
286 5
						if ($isElement && false !== strpos($tmpNode->textContent, "\n")) {
287 1
							$inlinesToProcess[] = $tmpNode;
288 1
						}
289 5
						$node = $node->nextSibling;
290 5
						$autop->appendChild($tmpNode);
291 5
						continue;
292
					}
293 4
				}
294
295 4
				$node = $node->nextSibling;
296 4
			}
297 5
		}
298
299
		// handle inline nodes
300
		// no need to recurse, just queue up
301 5
		while ($el = array_shift($inlinesToProcess)) {
302 1
			$ignoreLeadingNewline = false;
303 1
			foreach ($el->childNodes as $node) {
304 1
				if ($node->nodeType === XML_ELEMENT_NODE) {
305
					if ($node->nodeValue === 'BR') {
306
						$ignoreLeadingNewline = true;
307
					} else {
308
						$ignoreLeadingNewline = false;
309
						if (false !== strpos($node->textContent, "\n")) {
310
							$inlinesToProcess[] = $node;
311
						}
312
					}
313
					continue;
314 1
				} elseif ($node->nodeType === XML_TEXT_NODE) {
315 1
					$text = $node->nodeValue;
316 1
					if ($text[0] === "\n" && $ignoreLeadingNewline) {
317
						$text = substr($text, 1);
318
						$ignoreLeadingNewline = false;
319
					}
320 1
					$node->nodeValue = str_replace("\n", $this->_unique . 'BR', $text);
321 1
				}
322 1
			}
323 1
		}
324 5
	}
325
}
326