Completed
Push — master ( d14a3f...1d5cf0 )
by Rob
01:51
created

Xss::_entity_decode()   B

Complexity

Conditions 6
Paths 5

Size

Total Lines 100

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 6
CRAP Score 19.1203

Importance

Changes 0
Metric Value
dl 0
loc 100
ccs 6
cts 21
cp 0.2857
rs 7.3777
c 0
b 0
f 0
cc 6
nc 5
nop 1
crap 19.1203

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace devtoolboxuk\soteria\handlers;
4
5
use devtoolboxuk\soteria\models\SoteriaModel;
6
7
use devtoolboxuk\soteriautf\Resources\Attributes;
8
use devtoolboxuk\soteriautf\Resources\Exploded;
9
use devtoolboxuk\soteriautf\Resources\Html;
10
use devtoolboxuk\soteriautf\Resources\JavaScript;
11
use devtoolboxuk\soteriautf\Resources\NeverAllowed;
12
use devtoolboxuk\soteriautf\Resources\System;
13
use devtoolboxuk\soteriautf\Resources\StringResource;
14
15
use devtoolboxuk\soteriautf\Utf8;
16
use devtoolboxuk\soteriautf\Utf7;
17
18
19
class Xss
0 ignored issues
show
Coding Style introduced by
The property $_xss_found is not named in camelCase.

This check marks property names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
20
{
21
22
    private $_xss_found = null;
23
    private $neverAllowed;
24
    private $exploded;
25
    private $string;
26
    private $attributes;
27
    private $javascript;
28
    private $html;
29
    private $utf7;
30
    private $utf8;
31
    private $strings;
32
    private $system;
33
    private $input;
34
    private $output;
35
36
37 6
    function __construct()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for __construct.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
38
    {
39 6
        $this->init();
40 6
    }
41
42 6
    function init()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for init.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
43
    {
44
45 6
        $this->neverAllowed = new NeverAllowed();
46
47 6
        $this->exploded = new Exploded();
48
49 6
        $this->attributes = new Attributes();
50 6
        $this->javascript = new JavaScript();
51 6
        $this->html = new Html();
52
53 6
        $this->system = new System();
54
55 6
        $this->utf7 = new Utf7();
56 6
        $this->utf8 = new Utf8();
57
58 6
        $this->strings = new StringResource();
59 6
    }
60
61
62
    function setString($str)
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for setString.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
63
    {
64
        $this->string = $str;
65
    }
66
67
    public function cleanArray($array)
68
    {
69
        return $this->clean($array);
70
    }
71
72
    /**
73
     * @param $str
74
     * @return array|mixed
75
     */
76 6
    public function clean($str)
77
    {
78
        // reset
79 6
        $this->_xss_found = null;
80
81
        // check for an array of strings
82 6
        if (\is_array($str)) {
83 2
            foreach ($str as $key => $value) {
84 2
                $str[$key] = $this->clean($value);
85
            }
86 2
            return $str;
87
        }
88
89 6
        $this->input = $str;
90 6
        $this->output = $this->process($str);
91 6
        return $this->output;
92
    }
93
94 6
    private function process($str)
0 ignored issues
show
Coding Style Naming introduced by
The variable $old_str is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
95
    {
96
97
        // process
98
        do {
99 6
            $old_str = $str;
100 6
            $str = $this->_do($str);
101 6
        } while ($old_str !== $str);
102
103
        // keep the old value, if there wasn't any XSS attack
104 6
        if ($this->_xss_found !== true) {
105 6
            $str = $this->input;
106
        }
107
108 6
        return $str;
109
    }
110
111
    /**
112
     * @param StringResource $str
113
     *
114
     * @return mixed
115
     */
116 6
    private function _do($str)
0 ignored issues
show
Coding Style Naming introduced by
The method _do is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $str_backup is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 10 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
117
    {
118 6
        $str = (string)$str;
119 6
        $strInt = (int)$str;
120 6
        $strFloat = (float)$str;
121 6
        if (!$str || (string)$strInt === $str || (string)$strFloat === $str) {
0 ignored issues
show
Coding Style introduced by
Blank line found at start of control structure
Loading history...
122
123
            // no xss found
124 6
            if ($this->_xss_found !== true) {
125 6
                $this->_xss_found = false;
126
            }
127
128 6
            return $str;
129
        }
130
131
        // remove the BOM from UTF-8 / UTF-16 / UTF-32 strings
132 6
        $str = $this->utf8->remove_bom($str);
133
134
        // replace the diamond question mark (�) and invalid-UTF8 chars
135 6
        $str = $this->utf8->replace_diamond_question_mark($str, '');
136
137
        // replace invisible characters with one single space
138 6
        $str = $this->utf8->remove_invisible_characters($str, true, ' ');
139
140 6
        $str = $this->utf8->normalize_whitespace($str);
141 6
        $str = $this->strings->replace($str);
142
143
        // decode UTF-7 characters
144 6
        $str = $this->utf7->repack($str);
145
146
        // decode the string
147 6
        $str = $this->decodeString($str); // RW Partly DONE
148
149
        // backup the string (for later comparision)
150 6
        $str_backup = $str;
151
152
        // remove strings that are never allowed
153 6
        $str = $this->neverAllowed->doNeverAllowed($str); //RW DONE
154
155
        // corrects words before the browser will do it
156 6
        $str = $this->exploded->compactExplodedString($str); //RW DONE
157
158
        // remove disallowed javascript calls in links, images etc.
159 6
        $str = $this->javascript->removeDisallowedJavascript($str);
160
161
        // remove evil attributes such as style, onclick and xmlns
162 6
        $str = $this->attributes->removeEvilAttributes($str);
163
164
        // sanitize naughty JavaScript elements
165 6
        $str = $this->javascript->naughtyJavascript($str);
166
167
        // sanitize naughty HTML elements
168 6
        $str = $this->html->naughtyHtml($str);
169
170
        // final clean up
171
        //
172
        // -> This adds a bit of extra precaution in case something got through the above filters.
173 6
        $str = $this->neverAllowed->doNeverAllowedAfterwards($str);
174
175
        // check for xss
176 6
        if ($this->_xss_found !== true) {
177 6
            $this->_xss_found = !($str_backup === $str);
178
        }
179
180 6
        return $str;
181
    }
182
183 6
    public function decodeString($str)
184
    {
185
        // init
186 6
        $regExForHtmlTags = '/<\p{L}+.*+/us';
187
188 6
        if (strpos($str, '<') !== false && preg_match($regExForHtmlTags, $str, $matches) === 1) {
189 6
            $str = (string)preg_replace_callback(
190 6
                $regExForHtmlTags,
191
                function ($matches) {
192 6
                    return $this->decodeEntity($matches);
193 6
                },
194 6
                $str
195
            );
196
        } else {
0 ignored issues
show
Coding Style introduced by
The method decodeString uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
197 6
            $str = $this->utf8->rawurldecode($str);
198
        }
199
200 6
        return $str;
201
    }
202
203 6
    private function decodeEntity(array $match)
204
    {
205
        // init
206 6
        $str = $match[0];
207
208
        // protect GET variables without XSS in URLs
209 6
        if (preg_match_all("/[\?|&]?[\\p{L}0-9_\-\[\]]+\s*=\s*(?<wrapped>\"|\042|'|\047)(?<attr>[^\\1]*?)\\g{wrapped}/ui", $str, $matches)) {
210 6
            if (isset($matches['attr'])) {
211 6
                foreach ($matches['attr'] as $matchInner) {
212 6
                    $tmpAntiXss = clone $this;
213 6
                    $urlPartClean = $tmpAntiXss->clean($matchInner);
214
215 6
                    if ($tmpAntiXss->isXssFound() === true) {
216 6
                        $this->_xss_found = true;
217 6
                        $str = \str_replace($matchInner, $this->utf8->rawurldecode($urlPartClean), $str);
218
                    }
219
                }
220
            }
221
        } else {
0 ignored issues
show
Coding Style introduced by
The method decodeEntity uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
222 6
            $str = $this->_entity_decode($this->utf8->rawurldecode($str));
223
        }
224
225 6
        return $str;
226
    }
227
228
    /**
229
     * @return null
230
     */
231 6
    public function isXssFound()
232
    {
233 6
        return $this->_xss_found;
234
    }
235
236
    /**
237
     * Entity-decoding.
238
     *
239
     * @param StringResource $str
240
     *
241
     * @return StringResource
242
     */
243 6
    private function _entity_decode($str)
0 ignored issues
show
Coding Style Naming introduced by
The method _entity_decode is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $HTML_ENTITIES_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 13 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style introduced by
Method name "Xss::_entity_decode" is not in camel caps format
Loading history...
244
    {
245 6
        static $HTML_ENTITIES_CACHE;
246
247 6
        $flags = ENT_QUOTES | ENT_HTML5 | ENT_DISALLOWED | ENT_SUBSTITUTE;
248
249
        // decode
250 6
        $str = html_entity_decode($str, $flags);
251
252
253
        // decode-again, for e.g. HHVM or miss configured applications ...
254 6
        if (preg_match_all('/(?<html_entity>&[A-Za-z]{2,}[;]{0})/', $str, $matches)) {
255
            if ($HTML_ENTITIES_CACHE === null) {
0 ignored issues
show
Coding Style introduced by
Blank line found at start of control structure
Loading history...
256
257
                // links:
258
                // - http://dev.w3.org/html5/html-author/charref
259
                // - http://www.w3schools.com/charsets/ref_html_entities_n.asp
260
                $entitiesSecurity = [
261
                    '&#x00000;' => '',
262
                    '&#0;' => '',
263
                    '&#x00001;' => '',
264
                    '&#1;' => '',
265
                    '&nvgt;' => '',
266
                    '&#61253;' => '',
267
                    '&#x0EF45;' => '',
268
                    '&shy;' => '',
269
                    '&#x000AD;' => '',
270
                    '&#173;' => '',
271
                    '&colon;' => ':',
272
                    '&#x0003A;' => ':',
273
                    '&#58;' => ':',
274
                    '&lpar;' => '(',
275
                    '&#x00028;' => '(',
276
                    '&#40;' => '(',
277
                    '&rpar;' => ')',
278
                    '&#x00029;' => ')',
279
                    '&#41;' => ')',
280
                    '&quest;' => '?',
281
                    '&#x0003F;' => '?',
282
                    '&#63;' => '?',
283
                    '&sol;' => '/',
284
                    '&#x0002F;' => '/',
285
                    '&#47;' => '/',
286
                    '&apos;' => '\'',
287
                    '&#x00027;' => '\'',
288
                    '&#039;' => '\'',
289
                    '&#39;' => '\'',
290
                    '&#x27;' => '\'',
291
                    '&bsol;' => '\'',
292
                    '&#x0005C;' => '\\',
293
                    '&#92;' => '\\',
294
                    '&comma;' => ',',
295
                    '&#x0002C;' => ',',
296
                    '&#44;' => ',',
297
                    '&period;' => '.',
298
                    '&#x0002E;' => '.',
299
                    '&quot;' => '"',
300
                    '&QUOT;' => '"',
301
                    '&#x00022;' => '"',
302
                    '&#34;' => '"',
303
                    '&grave;' => '`',
304
                    '&DiacriticalGrave;' => '`',
305
                    '&#x00060;' => '`',
306
                    '&#96;' => '`',
307
                    '&#46;' => '.',
308
                    '&equals;' => '=',
309
                    '&#x0003D;' => '=',
310
                    '&#61;' => '=',
311
                    '&newline;' => "\n",
312
                    '&#x0000A;' => "\n",
313
                    '&#10;' => "\n",
314
                    '&tab;' => "\t",
315
                    '&#x00009;' => "\t",
316
                    '&#9;' => "\t",
317
                ];
318
319
                $HTML_ENTITIES_CACHE = \array_merge(
320
                    $entitiesSecurity,
321
                    \array_flip(\get_html_translation_table(HTML_ENTITIES, $flags)),
322
                    \array_flip($this->_get_data('entities_fallback'))
323
                );
324
            }
325
326
            $search = [];
327
            $replace = [];
328
            foreach ($matches['html_entity'] as $match) {
329
                $match .= ';';
330
                if (isset($HTML_ENTITIES_CACHE[$match])) {
331
                    $search[$match] = $match;
332
                    $replace[$match] = $HTML_ENTITIES_CACHE[$match];
333
                }
334
            }
335
336
            if (\count($replace) > 0) {
337
                $str = \str_replace($search, $replace, $str);
338
            }
339
        }
340
341 6
        return $str;
342
    }
343
344
    private function _get_data($file)
0 ignored issues
show
Coding Style Naming introduced by
The method _get_data is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Xss::_get_data" is not in camel caps format
Loading history...
345
    {
346
        /** @noinspection PhpIncludeInspection */
347
        return include __DIR__ . '/../vendor/devtoolboxuk/soteriautf/src/Data/' . $file . '.php';
348
    }
349
350
    public function cleanString($str)
351
    {
352
        return $this->clean($str);
353
    }
354
355
    /**
356
     * @param $str
357
     * @return array|mixed
358
     */
359 3
    public function cleanUrl($str)
0 ignored issues
show
Coding Style Naming introduced by
The variable $decode_str is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 18 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
360
    {
361 3
        $this->input = $str;
362 3
        $str = $this->clean($str);
363
364 3
        if (is_numeric($str) || is_null($str)) {
365 1
            return $str;
366
        }
367
368 3
        if (is_array($str)) {
369 1
            foreach ($str as $key => $value) {
370 1
                $str[$key] = $this->cleanUrl($value);
371
            }
372 1
            return $str;
373
        }
374
375
        do {
376 3
            $decode_str = rawurldecode($str);
377 3
            $str = $this->_do($str);
0 ignored issues
show
Documentation introduced by
$str is of type object|string|boolean, but the function expects a object<devtoolboxuk\sote...sources\StringResource>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
378 3
        } while ($decode_str !== $str);
379
380 3
        $this->output = $str;
381
382 3
        return $str;
383
    }
384
385
    function result()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for result.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
386
    {
387
        $valid = false;
388
        if (!$this->_xss_found) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->_xss_found of type null|boolean is loosely compared to false; this is ambiguous if the boolean can be false. You might want to explicitly use !== null instead.

If an expression can have both false, and null as possible values. It is generally a good practice to always use strict comparison to clearly distinguish between those two values.

$a = canBeFalseAndNull();

// Instead of
if ( ! $a) { }

// Better use one of the explicit versions:
if ($a !== null) { }
if ($a !== false) { }
if ($a !== null && $a !== false) { }
Loading history...
389
            $valid = true;
390
        }
391
        return new SoteriaModel($this->input, $this->output, $valid);
392
    }
393
}