HTMLEditorSanitiser   F
last analyzed

Complexity

Total Complexity 74

Size/Duplication

Total Lines 356
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 139
c 0
b 0
f 0
dl 0
loc 356
rs 2.48
wmc 74

9 Methods

Rating   Name   Duplication   Size   Complexity  
A getRuleForAttribute() 0 11 4
F addValidElements() 0 95 27
A getRuleForElement() 0 11 4
B elementMatchesRule() 0 30 8
A attributeMatchesRule() 0 14 4
A __construct() 0 10 3
A patternToRegex() 0 3 1
C sanitise() 0 64 17
A addRelValue() 0 13 6

How to fix   Complexity   

Complex Class

Complex classes like HTMLEditorSanitiser often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use HTMLEditorSanitiser, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace SilverStripe\Forms\HTMLEditor;
4
5
use DOMAttr;
6
use DOMElement;
7
use SilverStripe\Core\Config\Configurable;
8
use SilverStripe\Core\Injector\Injectable;
9
use SilverStripe\View\Parsers\HTMLValue;
10
use stdClass;
11
12
/**
13
 * Sanitises an HTMLValue so it's contents are the elements and attributes that are whitelisted
14
 * using the same configuration as TinyMCE
15
 *
16
 * See www.tinymce.com/wiki.php/configuration:valid_elements for details on the spec of TinyMCE's
17
 * whitelist configuration
18
 */
19
class HTMLEditorSanitiser
20
{
21
    use Configurable;
22
    use Injectable;
23
24
    /**
25
     * rel attribute to add to link elements which have a target attribute (usually "_blank")
26
     * this is to done to prevent reverse tabnabbing - see https://www.owasp.org/index.php/Reverse_Tabnabbing
27
     * noopener includes the behaviour we want, though some browsers don't yet support it and rely
28
     * upon using noreferrer instead - see https://caniuse.com/rel-noopener for current browser compatibility
29
     * set this to null if you would like to disable this behaviour
30
     * set this to an empty string if you would like to remove rel attributes that were previously set
31
     *
32
     * @var string
33
     */
34
    private static $link_rel_value = 'noopener noreferrer';
35
36
    /** @var [stdClass] - $element => $rule hash for whitelist element rules where the element name isn't a pattern */
0 ignored issues
show
Documentation Bug introduced by
The doc comment [stdClass] at position 0 could not be parsed: Unknown type name '[' at position 0 in [stdClass].
Loading history...
37
    protected $elements = array();
38
    /** @var [stdClass] - Sequential list of whitelist element rules where the element name is a pattern */
0 ignored issues
show
Documentation Bug introduced by
The doc comment [stdClass] at position 0 could not be parsed: Unknown type name '[' at position 0 in [stdClass].
Loading history...
39
    protected $elementPatterns = array();
40
41
    /** @var [stdClass] - The list of attributes that apply to all further whitelisted elements added */
0 ignored issues
show
Documentation Bug introduced by
The doc comment [stdClass] at position 0 could not be parsed: Unknown type name '[' at position 0 in [stdClass].
Loading history...
42
    protected $globalAttributes = array();
43
44
    /**
45
     * Construct a sanitiser from a given HTMLEditorConfig
46
     *
47
     * Note that we build data structures from the current state of HTMLEditorConfig - later changes to
48
     * the passed instance won't cause this instance to update it's whitelist
49
     *
50
     * @param HTMLEditorConfig $config
51
     */
52
    public function __construct(HTMLEditorConfig $config)
53
    {
54
        $valid = $config->getOption('valid_elements');
55
        if ($valid) {
56
            $this->addValidElements($valid);
57
        }
58
59
        $valid = $config->getOption('extended_valid_elements');
60
        if ($valid) {
61
            $this->addValidElements($valid);
62
        }
63
    }
64
65
    /**
66
     * Given a TinyMCE pattern (close to unix glob style), create a regex that does the match
67
     *
68
     * @param $str - The TinyMCE pattern
0 ignored issues
show
Documentation Bug introduced by
The doc comment - at position 0 could not be parsed: Unknown type name '-' at position 0 in -.
Loading history...
69
     * @return string - The equivalent regex
70
     */
71
    protected function patternToRegex($str)
72
    {
73
        return '/^' . preg_replace('/([?+*])/', '.$1', $str) . '$/';
74
    }
75
76
    /**
77
     * Given a valid_elements string, parse out the actual element and attribute rules and add to the
78
     * internal whitelist
79
     *
80
     * Logic based heavily on javascript version from tiny_mce_src.js
81
     *
82
     * @param string $validElements - The valid_elements or extended_valid_elements string to add to the whitelist
83
     */
84
    protected function addValidElements($validElements)
85
    {
86
        $elementRuleRegExp = '/^([#+\-])?([^\[\/]+)(?:\/([^\[]+))?(?:\[([^\]]+)\])?$/';
87
        $attrRuleRegExp = '/^([!\-])?(\w+::\w+|[^=:<]+)?(?:([=:<])(.*))?$/';
88
        $hasPatternsRegExp = '/[*?+]/';
89
90
        foreach (explode(',', $validElements) as $validElement) {
91
            if (preg_match($elementRuleRegExp, $validElement, $matches)) {
92
                $prefix = isset($matches[1]) ? $matches[1] : null;
93
                $elementName = isset($matches[2]) ? $matches[2] : null;
94
                $outputName = isset($matches[3]) ? $matches[3] : null;
95
                $attrData = isset($matches[4]) ? $matches[4] : null;
96
97
                // Create the new element
98
                $element = new stdClass();
99
                $element->attributes = array();
100
                $element->attributePatterns = array();
101
102
                $element->attributesRequired = array();
103
                $element->attributesDefault = array();
104
                $element->attributesForced = array();
105
106
                foreach (array('#' => 'paddEmpty', '-' => 'removeEmpty') as $match => $means) {
107
                    $element->$means = ($prefix === $match);
108
                }
109
110
                // Copy attributes from global rule into current rule
111
                if ($this->globalAttributes) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->globalAttributes of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
112
                    $element->attributes = array_merge($element->attributes, $this->globalAttributes);
113
                }
114
115
                // Attributes defined
116
                if ($attrData) {
117
                    foreach (explode('|', $attrData) as $attr) {
118
                        if (preg_match($attrRuleRegExp, $attr, $matches)) {
119
                            $attr = new stdClass();
120
121
                            $attrType = isset($matches[1]) ? $matches[1] : null;
122
                            $attrName = isset($matches[2]) ? str_replace('::', ':', $matches[2]) : null;
123
                            $prefix = isset($matches[3]) ? $matches[3] : null;
124
                            $value = isset($matches[4]) ? $matches[4] : null;
125
126
                            // Required
127
                            if ($attrType === '!') {
128
                                $element->attributesRequired[] = $attrName;
129
                                $attr->required = true;
130
                            } elseif ($attrType === '-') {
131
                                // Denied from global
132
                                unset($element->attributes[$attrName]);
133
                                continue;
134
                            }
135
136
                            // Default value
137
                            if ($prefix) {
138
                                if ($prefix === '=') { // Default value
139
                                    $element->attributesDefault[$attrName] = $value;
140
                                    $attr->defaultValue = $value;
141
                                } elseif ($prefix === ':') {
142
                                    // Forced value
143
                                    $element->attributesForced[$attrName] = $value;
144
                                    $attr->forcedValue = $value;
145
                                } elseif ($prefix === '<') {
146
                                    // Required values
147
                                    $attr->validValues = explode('?', $value);
148
                                }
149
                            }
150
151
                            // Check for attribute patterns
152
                            if (preg_match($hasPatternsRegExp, $attrName)) {
153
                                $attr->pattern = $this->patternToRegex($attrName);
154
                                $element->attributePatterns[] = $attr;
155
                            } else {
156
                                $element->attributes[$attrName] = $attr;
157
                            }
158
                        }
159
                    }
160
                }
161
162
                // Global rule, store away these for later usage
163
                if (!$this->globalAttributes && $elementName == '@') {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->globalAttributes of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
164
                    $this->globalAttributes = $element->attributes;
165
                }
166
167
                // Handle substitute elements such as b/strong
168
                if ($outputName) {
169
                    $element->outputName = $elementName;
170
                    $this->elements[$outputName] = $element;
171
                }
172
173
                // Add pattern or exact element
174
                if (preg_match($hasPatternsRegExp, $elementName)) {
175
                    $element->pattern = $this->patternToRegex($elementName);
176
                    $this->elementPatterns[] = $element;
177
                } else {
178
                    $this->elements[$elementName] = $element;
179
                }
180
            }
181
        }
182
    }
183
184
    /**
185
     * Given an element tag, return the rule structure for that element
186
     * @param string $tag The element tag
187
     * @return stdClass The element rule
188
     */
189
    protected function getRuleForElement($tag)
190
    {
191
        if (isset($this->elements[$tag])) {
192
            return $this->elements[$tag];
193
        }
194
        foreach ($this->elementPatterns as $element) {
195
            if (preg_match($element->pattern, $tag)) {
196
                return $element;
197
            }
198
        }
199
        return null;
200
    }
201
202
    /**
203
     * Given an attribute name, return the rule structure for that attribute
204
     *
205
     * @param object $elementRule
206
     * @param string $name The attribute name
207
     * @return stdClass The attribute rule
208
     */
209
    protected function getRuleForAttribute($elementRule, $name)
210
    {
211
        if (isset($elementRule->attributes[$name])) {
212
            return $elementRule->attributes[$name];
213
        }
214
        foreach ($elementRule->attributePatterns as $attribute) {
215
            if (preg_match($attribute->pattern, $name)) {
216
                return $attribute;
217
            }
218
        }
219
        return null;
220
    }
221
222
    /**
223
     * Given a DOMElement and an element rule, check if that element passes the rule
224
     * @param DOMElement $element The element to check
225
     * @param stdClass $rule The rule to check against
226
     * @return bool True if the element passes (and so can be kept), false if it fails (and so needs stripping)
227
     */
228
    protected function elementMatchesRule($element, $rule = null)
229
    {
230
        // If the rule doesn't exist at all, the element isn't allowed
231
        if (!$rule) {
232
            return false;
233
        }
234
235
        // If the rule has attributes required, check them to see if this element has at least one
236
        if ($rule->attributesRequired) {
237
            $hasMatch = false;
238
239
            foreach ($rule->attributesRequired as $attr) {
240
                if ($element->getAttribute($attr)) {
241
                    $hasMatch = true;
242
                    break;
243
                }
244
            }
245
246
            if (!$hasMatch) {
247
                return false;
248
            }
249
        }
250
251
        // If the rule says to remove empty elements, and this element is empty, remove it
252
        if ($rule->removeEmpty && !$element->firstChild) {
253
            return false;
254
        }
255
256
        // No further tests required, element passes
257
        return true;
258
    }
259
260
    /**
261
     * Given a DOMAttr and an attribute rule, check if that attribute passes the rule
262
     * @param DOMAttr $attr - the attribute to check
263
     * @param stdClass $rule - the rule to check against
264
     * @return bool - true if the attribute passes (and so can be kept), false if it fails (and so needs stripping)
265
     */
266
    protected function attributeMatchesRule($attr, $rule = null)
267
    {
268
        // If the rule doesn't exist at all, the attribute isn't allowed
269
        if (!$rule) {
270
            return false;
271
        }
272
273
        // If the rule has a set of valid values, check them to see if this attribute is one
274
        if (isset($rule->validValues) && !in_array($attr->value, $rule->validValues)) {
275
            return false;
276
        }
277
278
        // No further tests required, attribute passes
279
        return true;
280
    }
281
282
    /**
283
     * Given an SS_HTMLValue instance, will remove and elements and attributes that are
284
     * not explicitly included in the whitelist passed to __construct on instance creation
285
     *
286
     * @param HTMLValue $html - The HTMLValue to remove any non-whitelisted elements & attributes from
287
     */
288
    public function sanitise(HTMLValue $html)
289
    {
290
        if (!$this->elements && !$this->elementPatterns) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->elements of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
Bug Best Practice introduced by
The expression $this->elementPatterns of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
291
            return;
292
        }
293
294
        $linkRelValue = $this->config()->get('link_rel_value');
295
        $doc = $html->getDocument();
296
297
        /** @var DOMElement $el */
298
        foreach ($html->query('//body//*') as $el) {
299
            $elementRule = $this->getRuleForElement($el->tagName);
300
301
            // If this element isn't allowed, strip it
302
            if (!$this->elementMatchesRule($el, $elementRule)) {
303
                // If it's a script or style, we don't keep contents
304
                if ($el->tagName === 'script' || $el->tagName === 'style') {
305
                    $el->parentNode->removeChild($el);
306
                } else {
307
                    // Otherwise we replace this node with all it's children
308
                    // First, create a new fragment with all of $el's children moved into it
309
                    $frag = $doc->createDocumentFragment();
310
                    while ($el->firstChild) {
311
                        $frag->appendChild($el->firstChild);
312
                    }
313
314
                    // Then replace $el with the frags contents (which used to be it's children)
315
                    $el->parentNode->replaceChild($frag, $el);
316
                }
317
            } else {
318
                // Otherwise tidy the element
319
                // First, if we're supposed to pad & this element is empty, fix that
320
                if ($elementRule->paddEmpty && !$el->firstChild) {
321
                    $el->nodeValue = '&nbsp;';
322
                }
323
324
                // Then filter out any non-whitelisted attributes
325
                $children = $el->attributes;
326
                $i = $children->length;
327
                while ($i--) {
328
                    $attr = $children->item($i);
329
                    $attributeRule = $this->getRuleForAttribute($elementRule, $attr->name);
330
331
                    // If this attribute isn't allowed, strip it
332
                    if (!$this->attributeMatchesRule($attr, $attributeRule)) {
333
                        $el->removeAttributeNode($attr);
334
                    }
335
                }
336
337
                // Then enforce any default attributes
338
                foreach ($elementRule->attributesDefault as $attr => $default) {
339
                    if (!$el->getAttribute($attr)) {
340
                        $el->setAttribute($attr, $default);
341
                    }
342
                }
343
344
                // And any forced attributes
345
                foreach ($elementRule->attributesForced as $attr => $forced) {
346
                    $el->setAttribute($attr, $forced);
347
                }
348
            }
349
350
            if ($el->tagName === 'a' && $linkRelValue !== null) {
351
                $this->addRelValue($el, $linkRelValue);
352
            }
353
        }
354
    }
355
356
    /**
357
     * Adds rel="noopener noreferrer" to link elements with a target attribute
358
     *
359
     * @param DOMElement $el
360
     * @param string|null $linkRelValue
361
     */
362
    private function addRelValue(DOMElement $el, $linkRelValue)
363
    {
364
        // user has checked the checkbox 'open link in new window'
365
        if ($el->getAttribute('target') && $el->getAttribute('rel') !== $linkRelValue) {
366
            if ($linkRelValue !== '') {
367
                $el->setAttribute('rel', $linkRelValue);
368
            } else {
369
                $el->removeAttribute('rel');
370
            }
371
        } elseif ($el->getAttribute('rel') === $linkRelValue && !$el->getAttribute('target')) {
372
            // user previously checked 'open link in new window' and noopener was added,
373
            // now user has unchecked the checkbox so we can remove noopener
374
            $el->removeAttribute('rel');
375
        }
376
    }
377
}
378