Completed
Push — master ( cf38d9...2ecc50 )
by Rob
01:50
created

Xss   A

Complexity

Total Complexity 41

Size/Duplication

Total Lines 374
Duplicated Lines 0 %

Coupling/Cohesion

Components 1
Dependencies 10

Test Coverage

Coverage 77.52%

Importance

Changes 0
Metric Value
wmc 41
lcom 1
cbo 10
dl 0
loc 374
ccs 100
cts 129
cp 0.7752
rs 9.1199
c 0
b 0
f 0

15 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 4 1
A init() 0 17 1
A setString() 0 4 1
A cleanArray() 0 4 1
A isXssFound() 0 4 1
A _get_data() 0 5 1
A cleanString() 0 4 1
A clean() 0 17 3
A process() 0 16 3
B _do() 0 66 6
A decodeString() 0 19 3
A decodeEntity() 0 24 5
B _entity_decode() 0 100 6
B cleanUrl() 0 25 6
A result() 0 8 2

How to fix   Complexity   

Complex Class

Complex classes like Xss often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Xss, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace devtoolboxuk\soteria\handlers;
4
5
use devtoolboxuk\soteria\models\SoteriaModel;
6
use devtoolboxuk\soteria\voku\Resources\Attributes;
7
8
use devtoolboxuk\soteria\voku\Resources\Exploded;
9
use devtoolboxuk\soteria\voku\Resources\Html;
10
use devtoolboxuk\soteria\voku\Resources\JavaScript;
11
use devtoolboxuk\soteria\voku\Resources\NeverAllowed;
12
13
use devtoolboxuk\soteria\voku\Resources\System;
14
use devtoolboxuk\soteria\voku\Resources\Utf7;
15
use devtoolboxuk\soteria\voku\Resources\Utf8;
16
use devtoolboxuk\soteria\voku\Resources\StringResource;
17
18
19
class Xss
0 ignored issues
show
Coding Style introduced by
The property $_xss_found is not named in camelCase.

This check marks property names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
20
{
21
22
    private $_xss_found = null;
23
    private $neverAllowed;
24
    private $exploded;
25
    private $string;
26
    private $attributes;
27
    private $javascript;
28
    private $html;
29
    private $utf7;
30
    private $utf8;
31
    private $strings;
32
    private $system;
33
    private $input;
34
    private $output;
35
36
37 6
    function __construct()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for __construct.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
38
    {
39 6
        $this->init();
40 6
    }
41
42 6
    function init()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for init.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
43
    {
44 6
        $this->neverAllowed = new NeverAllowed();
45
46 6
        $this->exploded = new Exploded();
47
48 6
        $this->attributes = new Attributes();
49 6
        $this->javascript = new JavaScript();
50 6
        $this->html = new Html();
51
52 6
        $this->system = new System();
53
54 6
        $this->utf7 = new Utf7();
55 6
        $this->utf8 = new Utf8();
56
57 6
        $this->strings = new StringResource();
58 6
    }
59
60
61
    function setString($str)
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for setString.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
62
    {
63
        $this->string = $str;
64
    }
65
66
    public function cleanArray($array)
67
    {
68
        return $this->clean($array);
69
    }
70
71
    /**
72
     * @param $str
73
     * @return array|mixed
74
     */
75 6
    public function clean($str)
76
    {
77
        // reset
78 6
        $this->_xss_found = null;
79
80
        // check for an array of strings
81 6
        if (\is_array($str)) {
82 2
            foreach ($str as $key => $value) {
83 2
                $str[$key] = $this->clean($value);
84
            }
85 2
            return $str;
86
        }
87
88 6
        $this->input = $str;
89 6
        $this->output = $this->process($str);
90 6
        return $this->output;
91
    }
92
93 6
    private function process($str)
0 ignored issues
show
Coding Style Naming introduced by
The variable $old_str is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
94
    {
95
96
        // process
97
        do {
98 6
            $old_str = $str;
99 6
            $str = $this->_do($str);
100 6
        } while ($old_str !== $str);
101
102
        // keep the old value, if there wasn't any XSS attack
103 6
        if ($this->_xss_found !== true) {
104 6
            $str = $this->input;
105
        }
106
107 6
        return $str;
108
    }
109
110
    /**
111
     * @param StringResource $str
112
     *
113
     * @return mixed
114
     */
115 6
    private function _do($str)
0 ignored issues
show
Coding Style Naming introduced by
The method _do is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $str_backup is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 10 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
116
    {
117 6
        $str = (string)$str;
118 6
        $strInt = (int)$str;
119 6
        $strFloat = (float)$str;
120 6
        if (!$str || (string)$strInt === $str || (string)$strFloat === $str) {
0 ignored issues
show
Coding Style introduced by
Blank line found at start of control structure
Loading history...
121
122
            // no xss found
123 6
            if ($this->_xss_found !== true) {
124 6
                $this->_xss_found = false;
125
            }
126
127 6
            return $str;
128
        }
129
130
        // remove the BOM from UTF-8 / UTF-16 / UTF-32 strings
131 6
        $str = $this->utf8->remove_bom($str);
132
133
        // replace the diamond question mark (�) and invalid-UTF8 chars
134 6
        $str = $this->utf8->replace_diamond_question_mark($str, '');
135
136
        // replace invisible characters with one single space
137 6
        $str = $this->utf8->remove_invisible_characters($str, true, ' ');
138
139 6
        $str = $this->utf8->normalize_whitespace($str);
140 6
        $str = $this->strings->replace($str);
141
142
        // decode UTF-7 characters
143 6
        $str = $this->utf7->repack($str);
144
145
        // decode the string
146 6
        $str = $this->decodeString($str); // RW Partly DONE
147
148
        // backup the string (for later comparision)
149 6
        $str_backup = $str;
150
151
        // remove strings that are never allowed
152 6
        $str = $this->neverAllowed->doNeverAllowed($str); //RW DONE
153
154
        // corrects words before the browser will do it
155 6
        $str = $this->exploded->compactExplodedString($str); //RW DONE
156
157
        // remove disallowed javascript calls in links, images etc.
158 6
        $str = $this->javascript->removeDisallowedJavascript($str);
159
160
        // remove evil attributes such as style, onclick and xmlns
161 6
        $str = $this->attributes->removeEvilAttributes($str);
162
163
        // sanitize naughty JavaScript elements
164 6
        $str = $this->javascript->naughtyJavascript($str);
165
166
        // sanitize naughty HTML elements
167 6
        $str = $this->html->naughtyHtml($str);
168
169
        // final clean up
170
        //
171
        // -> This adds a bit of extra precaution in case something got through the above filters.
172 6
        $str = $this->neverAllowed->doNeverAllowedAfterwards($str);
173
174
        // check for xss
175 6
        if ($this->_xss_found !== true) {
176 6
            $this->_xss_found = !($str_backup === $str);
177
        }
178
179 6
        return $str;
180
    }
181
182 6
    public function decodeString($str)
183
    {
184
        // init
185 6
        $regExForHtmlTags = '/<\p{L}+.*+/us';
186
187 6
        if (strpos($str, '<') !== false && preg_match($regExForHtmlTags, $str, $matches) === 1) {
188 6
            $str = (string)preg_replace_callback(
189 6
                $regExForHtmlTags,
190
                function ($matches) {
191 6
                    return $this->decodeEntity($matches);
192 6
                },
193 6
                $str
194
            );
195
        } else {
0 ignored issues
show
Coding Style introduced by
The method decodeString uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
196 6
            $str = $this->utf8->rawurldecode($str);
197
        }
198
199 6
        return $str;
200
    }
201
202 6
    private function decodeEntity(array $match)
203
    {
204
        // init
205 6
        $str = $match[0];
206
207
        // protect GET variables without XSS in URLs
208 6
        if (preg_match_all("/[\?|&]?[\\p{L}0-9_\-\[\]]+\s*=\s*(?<wrapped>\"|\042|'|\047)(?<attr>[^\\1]*?)\\g{wrapped}/ui", $str, $matches)) {
209 6
            if (isset($matches['attr'])) {
210 6
                foreach ($matches['attr'] as $matchInner) {
211 6
                    $tmpAntiXss = clone $this;
212 6
                    $urlPartClean = $tmpAntiXss->clean($matchInner);
213
214 6
                    if ($tmpAntiXss->isXssFound() === true) {
215 6
                        $this->_xss_found = true;
216 6
                        $str = \str_replace($matchInner, $this->utf8->rawurldecode($urlPartClean), $str);
217
                    }
218
                }
219
            }
220
        } else {
0 ignored issues
show
Coding Style introduced by
The method decodeEntity uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
221 6
            $str = $this->_entity_decode($this->utf8->rawurldecode($str));
222
        }
223
224 6
        return $str;
225
    }
226
227
    /**
228
     * @return null
229
     */
230 6
    public function isXssFound()
231
    {
232 6
        return $this->_xss_found;
233
    }
234
235
    /**
236
     * Entity-decoding.
237
     *
238
     * @param StringResource $str
239
     *
240
     * @return StringResource
241
     */
242 6
    private function _entity_decode($str)
0 ignored issues
show
Coding Style Naming introduced by
The method _entity_decode is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $HTML_ENTITIES_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 13 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style introduced by
Method name "Xss::_entity_decode" is not in camel caps format
Loading history...
243
    {
244 6
        static $HTML_ENTITIES_CACHE;
245
246 6
        $flags = ENT_QUOTES | ENT_HTML5 | ENT_DISALLOWED | ENT_SUBSTITUTE;
247
248
        // decode
249 6
        $str = html_entity_decode($str, $flags);
250
251
252
        // decode-again, for e.g. HHVM or miss configured applications ...
253 6
        if (preg_match_all('/(?<html_entity>&[A-Za-z]{2,}[;]{0})/', $str, $matches)) {
254
            if ($HTML_ENTITIES_CACHE === null) {
0 ignored issues
show
Coding Style introduced by
Blank line found at start of control structure
Loading history...
255
256
                // links:
257
                // - http://dev.w3.org/html5/html-author/charref
258
                // - http://www.w3schools.com/charsets/ref_html_entities_n.asp
259
                $entitiesSecurity = [
260
                    '&#x00000;' => '',
261
                    '&#0;' => '',
262
                    '&#x00001;' => '',
263
                    '&#1;' => '',
264
                    '&nvgt;' => '',
265
                    '&#61253;' => '',
266
                    '&#x0EF45;' => '',
267
                    '&shy;' => '',
268
                    '&#x000AD;' => '',
269
                    '&#173;' => '',
270
                    '&colon;' => ':',
271
                    '&#x0003A;' => ':',
272
                    '&#58;' => ':',
273
                    '&lpar;' => '(',
274
                    '&#x00028;' => '(',
275
                    '&#40;' => '(',
276
                    '&rpar;' => ')',
277
                    '&#x00029;' => ')',
278
                    '&#41;' => ')',
279
                    '&quest;' => '?',
280
                    '&#x0003F;' => '?',
281
                    '&#63;' => '?',
282
                    '&sol;' => '/',
283
                    '&#x0002F;' => '/',
284
                    '&#47;' => '/',
285
                    '&apos;' => '\'',
286
                    '&#x00027;' => '\'',
287
                    '&#039;' => '\'',
288
                    '&#39;' => '\'',
289
                    '&#x27;' => '\'',
290
                    '&bsol;' => '\'',
291
                    '&#x0005C;' => '\\',
292
                    '&#92;' => '\\',
293
                    '&comma;' => ',',
294
                    '&#x0002C;' => ',',
295
                    '&#44;' => ',',
296
                    '&period;' => '.',
297
                    '&#x0002E;' => '.',
298
                    '&quot;' => '"',
299
                    '&QUOT;' => '"',
300
                    '&#x00022;' => '"',
301
                    '&#34;' => '"',
302
                    '&grave;' => '`',
303
                    '&DiacriticalGrave;' => '`',
304
                    '&#x00060;' => '`',
305
                    '&#96;' => '`',
306
                    '&#46;' => '.',
307
                    '&equals;' => '=',
308
                    '&#x0003D;' => '=',
309
                    '&#61;' => '=',
310
                    '&newline;' => "\n",
311
                    '&#x0000A;' => "\n",
312
                    '&#10;' => "\n",
313
                    '&tab;' => "\t",
314
                    '&#x00009;' => "\t",
315
                    '&#9;' => "\t",
316
                ];
317
318
                $HTML_ENTITIES_CACHE = \array_merge(
319
                    $entitiesSecurity,
320
                    \array_flip(\get_html_translation_table(HTML_ENTITIES, $flags)),
321
                    \array_flip($this->_get_data('entities_fallback'))
322
                );
323
            }
324
325
            $search = [];
326
            $replace = [];
327
            foreach ($matches['html_entity'] as $match) {
328
                $match .= ';';
329
                if (isset($HTML_ENTITIES_CACHE[$match])) {
330
                    $search[$match] = $match;
331
                    $replace[$match] = $HTML_ENTITIES_CACHE[$match];
332
                }
333
            }
334
335
            if (\count($replace) > 0) {
336
                $str = \str_replace($search, $replace, $str);
337
            }
338
        }
339
340 6
        return $str;
341
    }
342
343
    private function _get_data($file)
0 ignored issues
show
Coding Style Naming introduced by
The method _get_data is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Xss::_get_data" is not in camel caps format
Loading history...
344
    {
345
        /** @noinspection PhpIncludeInspection */
346
        return include __DIR__ . '/../voku/Data/' . $file . '.php';
347
    }
348
349
    public function cleanString($str)
350
    {
351
        return $this->clean($str);
352
    }
353
354
    /**
355
     * @param $str
356
     * @return array|mixed
357
     */
358 3
    public function cleanUrl($str)
0 ignored issues
show
Coding Style Naming introduced by
The variable $decode_str is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 18 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
359
    {
360 3
        $this->input = $str;
361 3
        $str = $this->clean($str);
362
363 3
        if (is_numeric($str) || is_null($str)) {
364 1
            return $str;
365
        }
366
367 3
        if (is_array($str)) {
368 1
            foreach ($str as $key => $value) {
369 1
                $str[$key] = $this->cleanUrl($value);
370
            }
371 1
            return $str;
372
        }
373
374
        do {
375 3
            $decode_str = rawurldecode($str);
376 3
            $str = $this->_do($str);
0 ignored issues
show
Documentation introduced by
$str is of type object|string|boolean, but the function expects a object<devtoolboxuk\sote...sources\StringResource>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
377 3
        } while ($decode_str !== $str);
378
379 3
        $this->output = $str;
380
381 3
        return $str;
382
    }
383
384
    function result()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for result.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
385
    {
386
        $valid = false;
387
        if (!$this->_xss_found) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->_xss_found of type null|boolean is loosely compared to false; this is ambiguous if the boolean can be false. You might want to explicitly use !== null instead.

If an expression can have both false, and null as possible values. It is generally a good practice to always use strict comparison to clearly distinguish between those two values.

$a = canBeFalseAndNull();

// Instead of
if ( ! $a) { }

// Better use one of the explicit versions:
if ($a !== null) { }
if ($a !== false) { }
if ($a !== null && $a !== false) { }
Loading history...
388
            $valid = true;
389
        }
390
        return new SoteriaModel($this->input, $this->output, $valid);
391
    }
392
}