Passed
Push — 4 ( a124cc...4d662d )
by Steve
27:21 queued 20:26
created

HTMLEditorSanitiser::sanitise()   D

Complexity

Conditions 20
Paths 10

Size

Total Lines 75
Code Lines 36

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 20
eloc 36
c 0
b 0
f 0
nc 10
nop 1
dl 0
loc 75
rs 4.1666

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace SilverStripe\Forms\HTMLEditor;
4
5
use DOMAttr;
6
use DOMElement;
7
use SilverStripe\Core\Config\Configurable;
8
use SilverStripe\Core\Injector\Injectable;
9
use SilverStripe\View\Parsers\HTMLValue;
10
use stdClass;
11
12
/**
13
 * Sanitises an HTMLValue so it's contents are the elements and attributes that are whitelisted
14
 * using the same configuration as TinyMCE
15
 *
16
 * See www.tinymce.com/wiki.php/configuration:valid_elements for details on the spec of TinyMCE's
17
 * whitelist configuration
18
 */
19
class HTMLEditorSanitiser
20
{
21
    use Configurable;
22
    use Injectable;
23
24
    /**
25
     * rel attribute to add to link elements which have a target attribute (usually "_blank")
26
     * this is to done to prevent reverse tabnabbing - see https://www.owasp.org/index.php/Reverse_Tabnabbing
27
     * noopener includes the behaviour we want, though some browsers don't yet support it and rely
28
     * upon using noreferrer instead - see https://caniuse.com/rel-noopener for current browser compatibility
29
     * set this to null if you would like to disable this behaviour
30
     * set this to an empty string if you would like to remove rel attributes that were previously set
31
     *
32
     * @var string
33
     */
34
    private static $link_rel_value = 'noopener noreferrer';
35
36
    /** @var stdClass - $element => $rule hash for whitelist element rules where the element name isn't a pattern */
37
    protected $elements = [];
38
    /** @var stdClass - Sequential list of whitelist element rules where the element name is a pattern */
39
    protected $elementPatterns = [];
40
41
    /** @var stdClass - The list of attributes that apply to all further whitelisted elements added */
42
    protected $globalAttributes = [];
43
44
    /**
45
     * Construct a sanitiser from a given HTMLEditorConfig
46
     *
47
     * Note that we build data structures from the current state of HTMLEditorConfig - later changes to
48
     * the passed instance won't cause this instance to update it's whitelist
49
     *
50
     * @param HTMLEditorConfig $config
51
     */
52
    public function __construct(HTMLEditorConfig $config)
53
    {
54
        $valid = $config->getOption('valid_elements');
55
        if ($valid) {
56
            $this->addValidElements($valid);
57
        }
58
59
        $valid = $config->getOption('extended_valid_elements');
60
        if ($valid) {
61
            $this->addValidElements($valid);
62
        }
63
    }
64
65
    /**
66
     * Given a TinyMCE pattern (close to unix glob style), create a regex that does the match
67
     *
68
     * @param $str - The TinyMCE pattern
0 ignored issues
show
Documentation Bug introduced by
The doc comment - at position 0 could not be parsed: Unknown type name '-' at position 0 in -.
Loading history...
69
     * @return string - The equivalent regex
70
     */
71
    protected function patternToRegex($str)
72
    {
73
        return '/^' . preg_replace('/([?+*])/', '.$1', $str ?? '') . '$/';
74
    }
75
76
    /**
77
     * Given a valid_elements string, parse out the actual element and attribute rules and add to the
78
     * internal whitelist
79
     *
80
     * Logic based heavily on javascript version from tiny_mce_src.js
81
     *
82
     * @param string $validElements - The valid_elements or extended_valid_elements string to add to the whitelist
83
     */
84
    protected function addValidElements($validElements)
85
    {
86
        $elementRuleRegExp = '/^([#+\-])?([^\[\/]+)(?:\/([^\[]+))?(?:\[([^\]]+)\])?$/';
87
        $attrRuleRegExp = '/^([!\-])?(\w+::\w+|[^=:<]+)?(?:([=:<])(.*))?$/';
88
        $hasPatternsRegExp = '/[*?+]/';
89
90
        foreach (explode(',', $validElements ?? '') as $validElement) {
91
            if (preg_match($elementRuleRegExp ?? '', $validElement ?? '', $matches)) {
92
                $prefix = isset($matches[1]) ? $matches[1] : null;
93
                $elementName = isset($matches[2]) ? $matches[2] : null;
94
                $outputName = isset($matches[3]) ? $matches[3] : null;
95
                $attrData = isset($matches[4]) ? $matches[4] : null;
96
97
                // Create the new element
98
                $element = new stdClass();
99
                $element->attributes = [];
100
                $element->attributePatterns = [];
101
102
                $element->attributesRequired = [];
103
                $element->attributesDefault = [];
104
                $element->attributesForced = [];
105
106
                foreach (['#' => 'paddEmpty', '-' => 'removeEmpty'] as $match => $means) {
107
                    $element->$means = ($prefix === $match);
108
                }
109
110
                // Copy attributes from global rule into current rule
111
                if ($this->globalAttributes) {
112
                    $element->attributes = array_merge($element->attributes, $this->globalAttributes);
0 ignored issues
show
Bug introduced by
$this->globalAttributes of type stdClass is incompatible with the type array expected by parameter $arrays of array_merge(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

112
                    $element->attributes = array_merge($element->attributes, /** @scrutinizer ignore-type */ $this->globalAttributes);
Loading history...
113
                }
114
115
                // Attributes defined
116
                if ($attrData) {
117
                    foreach (explode('|', $attrData ?? '') as $attr) {
118
                        if (preg_match($attrRuleRegExp ?? '', $attr ?? '', $matches)) {
119
                            $attr = new stdClass();
120
121
                            $attrType = isset($matches[1]) ? $matches[1] : null;
122
                            $attrName = isset($matches[2]) ? str_replace('::', ':', $matches[2]) : null;
123
                            $prefix = isset($matches[3]) ? $matches[3] : null;
124
                            $value = isset($matches[4]) ? $matches[4] : null;
125
126
                            // Required
127
                            if ($attrType === '!') {
128
                                $element->attributesRequired[] = $attrName;
129
                                $attr->required = true;
130
                            } elseif ($attrType === '-') {
131
                                // Denied from global
132
                                unset($element->attributes[$attrName]);
133
                                continue;
134
                            }
135
136
                            // Default value
137
                            if ($prefix) {
138
                                if ($prefix === '=') { // Default value
139
                                    $element->attributesDefault[$attrName] = $value;
140
                                    $attr->defaultValue = $value;
141
                                } elseif ($prefix === ':') {
142
                                    // Forced value
143
                                    $element->attributesForced[$attrName] = $value;
144
                                    $attr->forcedValue = $value;
145
                                } elseif ($prefix === '<') {
146
                                    // Required values
147
                                    $attr->validValues = explode('?', $value ?? '');
148
                                }
149
                            }
150
151
                            // Check for attribute patterns
152
                            if (preg_match($hasPatternsRegExp ?? '', $attrName ?? '')) {
153
                                $attr->pattern = $this->patternToRegex($attrName);
154
                                $element->attributePatterns[] = $attr;
155
                            } else {
156
                                $element->attributes[$attrName] = $attr;
157
                            }
158
                        }
159
                    }
160
                }
161
162
                // Global rule, store away these for later usage
163
                if (!$this->globalAttributes && $elementName == '@') {
164
                    $this->globalAttributes = $element->attributes;
0 ignored issues
show
Documentation Bug introduced by
It seems like $element->attributes of type array or array is incompatible with the declared type stdClass of property $globalAttributes.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
165
                }
166
167
                // Handle substitute elements such as b/strong
168
                if ($outputName) {
169
                    $element->outputName = $elementName;
170
                    $this->elements[$outputName] = $element;
171
                }
172
173
                // Add pattern or exact element
174
                if (preg_match($hasPatternsRegExp ?? '', $elementName ?? '')) {
175
                    $element->pattern = $this->patternToRegex($elementName);
176
                    $this->elementPatterns[] = $element;
177
                } else {
178
                    $this->elements[$elementName] = $element;
179
                }
180
            }
181
        }
182
    }
183
184
    /**
185
     * Given an element tag, return the rule structure for that element
186
     * @param string $tag The element tag
187
     * @return stdClass The element rule
188
     */
189
    protected function getRuleForElement($tag)
190
    {
191
        if (isset($this->elements[$tag])) {
192
            return $this->elements[$tag];
193
        }
194
        foreach ($this->elementPatterns as $element) {
195
            if (preg_match($element->pattern ?? '', $tag ?? '')) {
196
                return $element;
197
            }
198
        }
199
        return null;
200
    }
201
202
    /**
203
     * Given an attribute name, return the rule structure for that attribute
204
     *
205
     * @param object $elementRule
206
     * @param string $name The attribute name
207
     * @return stdClass The attribute rule
208
     */
209
    protected function getRuleForAttribute($elementRule, $name)
210
    {
211
        if (isset($elementRule->attributes[$name])) {
212
            return $elementRule->attributes[$name];
213
        }
214
        foreach ($elementRule->attributePatterns as $attribute) {
215
            if (preg_match($attribute->pattern ?? '', $name ?? '')) {
216
                return $attribute;
217
            }
218
        }
219
        return null;
220
    }
221
222
    /**
223
     * Given a DOMElement and an element rule, check if that element passes the rule
224
     * @param DOMElement $element The element to check
225
     * @param stdClass $rule The rule to check against
226
     * @return bool True if the element passes (and so can be kept), false if it fails (and so needs stripping)
227
     */
228
    protected function elementMatchesRule($element, $rule = null)
229
    {
230
        // If the rule doesn't exist at all, the element isn't allowed
231
        if (!$rule) {
232
            return false;
233
        }
234
235
        // If the rule has attributes required, check them to see if this element has at least one
236
        if ($rule->attributesRequired) {
237
            $hasMatch = false;
238
239
            foreach ($rule->attributesRequired as $attr) {
240
                if ($element->getAttribute($attr)) {
241
                    $hasMatch = true;
242
                    break;
243
                }
244
            }
245
246
            if (!$hasMatch) {
247
                return false;
248
            }
249
        }
250
251
        // If the rule says to remove empty elements, and this element is empty, remove it
252
        if ($rule->removeEmpty && !$element->firstChild) {
253
            return false;
254
        }
255
256
        // No further tests required, element passes
257
        return true;
258
    }
259
260
    /**
261
     * Given a DOMAttr and an attribute rule, check if that attribute passes the rule
262
     * @param DOMAttr $attr - the attribute to check
263
     * @param stdClass $rule - the rule to check against
264
     * @return bool - true if the attribute passes (and so can be kept), false if it fails (and so needs stripping)
265
     */
266
    protected function attributeMatchesRule($attr, $rule = null)
267
    {
268
        // If the rule doesn't exist at all, the attribute isn't allowed
269
        if (!$rule) {
270
            return false;
271
        }
272
273
        // If the rule has a set of valid values, check them to see if this attribute is one
274
        if (isset($rule->validValues) && !in_array($attr->value, $rule->validValues ?? [])) {
275
            return false;
276
        }
277
278
        // No further tests required, attribute passes
279
        return true;
280
    }
281
282
    /**
283
     * Given an SS_HTMLValue instance, will remove and elements and attributes that are
284
     * not explicitly included in the whitelist passed to __construct on instance creation
285
     *
286
     * @param HTMLValue $html - The HTMLValue to remove any non-whitelisted elements & attributes from
287
     */
288
    public function sanitise(HTMLValue $html)
289
    {
290
        if (!$this->elements && !$this->elementPatterns) {
291
            return;
292
        }
293
294
        $linkRelValue = $this->config()->get('link_rel_value');
295
        $doc = $html->getDocument();
296
297
        /** @var DOMElement $el */
298
        foreach ($html->query('//body//*') as $el) {
299
            $elementRule = $this->getRuleForElement($el->tagName);
300
301
            // If this element isn't allowed, strip it
302
            if (!$this->elementMatchesRule($el, $elementRule)) {
303
                // If it's a script or style, we don't keep contents
304
                if ($el->tagName === 'script' || $el->tagName === 'style') {
305
                    $el->parentNode->removeChild($el);
306
                } else {
307
                    // Otherwise we replace this node with all it's children
308
                    // First, create a new fragment with all of $el's children moved into it
309
                    $frag = $doc->createDocumentFragment();
310
                    while ($el->firstChild) {
311
                        $frag->appendChild($el->firstChild);
312
                    }
313
314
                    // Then replace $el with the frags contents (which used to be it's children)
315
                    $el->parentNode->replaceChild($frag, $el);
316
                }
317
            } else {
318
                // Otherwise tidy the element
319
                // First, if we're supposed to pad & this element is empty, fix that
320
                if ($elementRule->paddEmpty && !$el->firstChild) {
321
                    $el->nodeValue = '&nbsp;';
322
                }
323
324
                // Then filter out any non-whitelisted attributes
325
                $children = $el->attributes;
326
                $i = $children->length;
327
                while ($i--) {
328
                    $attr = $children->item($i);
329
                    $attributeRule = $this->getRuleForAttribute($elementRule, $attr->name);
330
331
                    // If this attribute isn't allowed, strip it
332
                    if (!$this->attributeMatchesRule($attr, $attributeRule)) {
333
                        $el->removeAttributeNode($attr);
334
                    }
335
                }
336
337
                // Then enforce any default attributes
338
                foreach ($elementRule->attributesDefault as $attr => $default) {
339
                    if (!$el->getAttribute($attr)) {
340
                        $el->setAttribute($attr, $default);
341
                    }
342
                }
343
344
                // And any forced attributes
345
                foreach ($elementRule->attributesForced as $attr => $forced) {
346
                    $el->setAttribute($attr, $forced);
347
                }
348
349
                // Matches "javascript:" with any arbitrary linebreaks inbetween the characters.
350
                $regex = '/^\s*' . implode('\v*', str_split('javascript:')) . '/';
0 ignored issues
show
Bug introduced by
It seems like str_split('javascript:') can also be of type true; however, parameter $pieces of implode() does only seem to accept array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

350
                $regex = '/^\s*' . implode('\v*', /** @scrutinizer ignore-type */ str_split('javascript:')) . '/';
Loading history...
351
                // Strip out javascript execution in href or src attributes.
352
                foreach (['src', 'href'] as $dangerAttribute) {
353
                    if ($el->hasAttribute($dangerAttribute)) {
354
                        if (preg_match($regex, $el->getAttribute($dangerAttribute))) {
355
                            $el->removeAttribute($dangerAttribute);
356
                        }
357
                    }
358
                }
359
            }
360
361
            if ($el->tagName === 'a' && $linkRelValue !== null) {
362
                $this->addRelValue($el, $linkRelValue);
363
            }
364
        }
365
    }
366
367
    /**
368
     * Adds rel="noopener noreferrer" to link elements with a target attribute
369
     *
370
     * @param DOMElement $el
371
     * @param string|null $linkRelValue
372
     */
373
    private function addRelValue(DOMElement $el, $linkRelValue)
374
    {
375
        // user has checked the checkbox 'open link in new window'
376
        if ($el->getAttribute('target') && $el->getAttribute('rel') !== $linkRelValue) {
377
            if ($linkRelValue !== '') {
378
                $el->setAttribute('rel', $linkRelValue);
379
            } else {
380
                $el->removeAttribute('rel');
381
            }
382
        } elseif ($el->getAttribute('rel') === $linkRelValue && !$el->getAttribute('target')) {
383
            // user previously checked 'open link in new window' and noopener was added,
384
            // now user has unchecked the checkbox so we can remove noopener
385
            $el->removeAttribute('rel');
386
        }
387
    }
388
}
389