Completed
Push — master ( 7bccf6...d15e2a )
by Rob
01:53
created

Xss::cleanString()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 4

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 0
CRAP Score 2

Importance

Changes 0
Metric Value
dl 0
loc 4
ccs 0
cts 2
cp 0
rs 10
c 0
b 0
f 0
cc 1
nc 1
nop 1
crap 2
1
<?php
2
3
namespace devtoolboxuk\soteria\handlers;
4
5
use devtoolboxuk\soteria\voku\Resources\Attributes;
6
7
use devtoolboxuk\soteria\voku\Resources\Exploded;
8
use devtoolboxuk\soteria\voku\Resources\Html;
9
use devtoolboxuk\soteria\voku\Resources\JavaScript;
10
use devtoolboxuk\soteria\voku\Resources\NeverAllowed;
11
12
use devtoolboxuk\soteria\voku\Resources\System;
13
use devtoolboxuk\soteria\voku\Resources\Utf7;
14
use devtoolboxuk\soteria\voku\Resources\Utf8;
15
use devtoolboxuk\soteria\voku\Resources\StringResource;
16
17
18
class Xss
0 ignored issues
show
Coding Style introduced by
The property $_xss_found is not named in camelCase.

This check marks property names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
19
{
20
21
    private $_xss_found = null;
22
    private $neverAllowed;
23
    private $exploded;
24
    private $string;
25
    private $attributes;
26
    private $javascript;
27
    private $html;
28
    private $utf7;
29
    private $utf8;
30
    private $strings;
31
    private $system;
32
33
34 6
    function __construct()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for __construct.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
35
    {
36 6
        $this->init();
37 6
    }
38
39 6
    function init()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for init.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
40
    {
41 6
        $this->neverAllowed = new NeverAllowed();
42
43 6
        $this->exploded = new Exploded();
44
45 6
        $this->attributes = new Attributes();
46 6
        $this->javascript = new JavaScript();
47 6
        $this->html = new Html();
48
49 6
        $this->system = new System();
50
51 6
        $this->utf7 = new Utf7();
52 6
        $this->utf8 = new Utf8();
53
54 6
        $this->strings = new StringResource();
55 6
    }
56
57
58
    function setString($str)
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
Comprehensibility Best Practice introduced by
It is recommend to declare an explicit visibility for setString.

Generally, we recommend to declare visibility for all methods in your source code. This has the advantage of clearly communication to other developers, and also yourself, how this method should be consumed.

If you are not sure which visibility to choose, it is a good idea to start with the most restrictive visibility, and then raise visibility as needed, i.e. start with private, and only raise it to protected if a sub-class needs to have access, or public if an external class needs access.

Loading history...
59
    {
60
        $this->string = $str;
61
    }
62
63
    public function cleanArray($array)
64
    {
65
        return $this->clean($array);
66
    }
67
68
    /**
69
     * @param $str
70
     * @return array|mixed
71
     */
72 6
    public function clean($str)
0 ignored issues
show
Coding Style Naming introduced by
The variable $old_str_backup is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
73
    {
74
        // reset
75 6
        $this->_xss_found = null;
76
77
        // check for an array of strings
78 6
        if (\is_array($str)) {
79 2
            foreach ($str as $key => $value) {
80 2
                $str[$key] = $this->clean($value);
81 2
            }
82 2
            return $str;
83
        }
84
85 6
        $old_str_backup = $str;
86
87 6
        return $this->process($str, $old_str_backup);
88
    }
89
90 6
    private function process($str, $old_str_backup)
0 ignored issues
show
Coding Style Naming introduced by
The parameter $old_str_backup is not named in camelCase.

This check marks parameter names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $old_str is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $old_str_backup is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
91
    {
92
93
        // process
94
        do {
95 6
            $old_str = $str;
96 6
            $str = $this->_do($str);
97 6
        } while ($old_str !== $str);
98
99
        // keep the old value, if there wasn't any XSS attack
100 6
        if ($this->_xss_found !== true) {
101 6
            $str = $old_str_backup;
102 6
        }
103
104 6
        return $str;
105
    }
106
107
    /**
108
     * @param StringResource $str
109
     *
110
     * @return mixed
111
     */
112 6
    private function _do($str)
0 ignored issues
show
Coding Style Naming introduced by
The method _do is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $str_backup is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 10 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
113
    {
114 6
        $str = (string)$str;
115 6
        $strInt = (int)$str;
116 6
        $strFloat = (float)$str;
117 6
        if (!$str || (string)$strInt === $str || (string)$strFloat === $str) {
0 ignored issues
show
Coding Style introduced by
Blank line found at start of control structure
Loading history...
118
119
            // no xss found
120 6
            if ($this->_xss_found !== true) {
121 6
                $this->_xss_found = false;
122 6
            }
123
124 6
            return $str;
125
        }
126
127
        // remove the BOM from UTF-8 / UTF-16 / UTF-32 strings
128 6
        $str = $this->utf8->remove_bom($str);
129
130
        // replace the diamond question mark (�) and invalid-UTF8 chars
131 6
        $str = $this->utf8->replace_diamond_question_mark($str, '');
132
133
        // replace invisible characters with one single space
134 6
        $str = $this->utf8->remove_invisible_characters($str, true, ' ');
135
136 6
        $str = $this->utf8->normalize_whitespace($str);
137 6
        $str = $this->strings->replace($str);
138
139
        // decode UTF-7 characters
140 6
        $str = $this->utf7->repack($str);
141
142
        // decode the string
143 6
        $str = $this->decodeString($str); // RW Partly DONE
144
145
        // backup the string (for later comparision)
146 6
        $str_backup = $str;
147
148
        // remove strings that are never allowed
149 6
        $str = $this->neverAllowed->doNeverAllowed($str); //RW DONE
150
151
        // corrects words before the browser will do it
152 6
        $str = $this->exploded->compactExplodedString($str); //RW DONE
153
154
        // remove disallowed javascript calls in links, images etc.
155 6
        $str = $this->javascript->removeDisallowedJavascript($str);
156
157
        // remove evil attributes such as style, onclick and xmlns
158 6
        $str = $this->attributes->removeEvilAttributes($str);
159
160
        // sanitize naughty JavaScript elements
161 6
        $str = $this->javascript->naughtyJavascript($str);
162
163
        // sanitize naughty HTML elements
164 6
        $str = $this->html->naughtyHtml($str);
165
166
        // final clean up
167
        //
168
        // -> This adds a bit of extra precaution in case something got through the above filters.
169 6
        $str = $this->neverAllowed->doNeverAllowedAfterwards($str);
170
171
        // check for xss
172 6
        if ($this->_xss_found !== true) {
173 6
            $this->_xss_found = !($str_backup === $str);
174 6
        }
175
176 6
        return $str;
177
    }
178
179 6
    public function decodeString($str)
180
    {
181
        // init
182 6
        $regExForHtmlTags = '/<\p{L}+.*+/us';
183
184 6
        if (strpos($str, '<') !== false && preg_match($regExForHtmlTags, $str, $matches) === 1) {
185 6
            $str = (string)preg_replace_callback(
186 6
                $regExForHtmlTags,
187 6
                function ($matches) {
188 6
                    return $this->decodeEntity($matches);
189 6
                },
190
                $str
191 6
            );
192 6
        } else {
0 ignored issues
show
Coding Style introduced by
The method decodeString uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
193 6
            $str = $this->utf8->rawurldecode($str);
194
        }
195
196 6
        return $str;
197
    }
198
199 6
    private function decodeEntity(array $match)
200
    {
201
        // init
202 6
        $str = $match[0];
203
204
        // protect GET variables without XSS in URLs
205 6
        if (preg_match_all("/[\?|&]?[\\p{L}0-9_\-\[\]]+\s*=\s*(?<wrapped>\"|\042|'|\047)(?<attr>[^\\1]*?)\\g{wrapped}/ui", $str, $matches)) {
206 6
            if (isset($matches['attr'])) {
207 6
                foreach ($matches['attr'] as $matchInner) {
208 6
                    $tmpAntiXss = clone $this;
209 6
                    $urlPartClean = $tmpAntiXss->clean($matchInner);
210
211 6
                    if ($tmpAntiXss->isXssFound() === true) {
212 6
                        $this->_xss_found = true;
213 6
                        $str = \str_replace($matchInner, $this->utf8->rawurldecode($urlPartClean), $str);
214 6
                    }
215 6
                }
216 6
            }
217 6
        } else {
0 ignored issues
show
Coding Style introduced by
The method decodeEntity uses an else expression. Else is never necessary and you can simplify the code to work without else.
Loading history...
218 6
            $str = $this->_entity_decode($this->utf8->rawurldecode($str));
219
        }
220
221 6
        return $str;
222
    }
223
224
    /**
225
     * @return null
226
     */
227 6
    public function isXssFound()
228
    {
229 6
        return $this->_xss_found;
230
    }
231
232
    /**
233
     * Entity-decoding.
234
     *
235
     * @param StringResource $str
236
     *
237
     * @return StringResource
238
     */
239 6
    private function _entity_decode($str)
0 ignored issues
show
Coding Style Naming introduced by
The method _entity_decode is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style Naming introduced by
The variable $HTML_ENTITIES_CACHE is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 13 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
Coding Style introduced by
Method name "Xss::_entity_decode" is not in camel caps format
Loading history...
240
    {
241 6
        static $HTML_ENTITIES_CACHE;
242
243 6
        $flags = ENT_QUOTES | ENT_HTML5 | ENT_DISALLOWED | ENT_SUBSTITUTE;
244
245
        // decode
246 6
        $str = html_entity_decode($str, $flags);
247
248
249
        // decode-again, for e.g. HHVM or miss configured applications ...
250 6
        if (preg_match_all('/(?<html_entity>&[A-Za-z]{2,}[;]{0})/', $str, $matches)) {
251
            if ($HTML_ENTITIES_CACHE === null) {
0 ignored issues
show
Coding Style introduced by
Blank line found at start of control structure
Loading history...
252
253
                // links:
254
                // - http://dev.w3.org/html5/html-author/charref
255
                // - http://www.w3schools.com/charsets/ref_html_entities_n.asp
256
                $entitiesSecurity = [
257
                    '&#x00000;' => '',
258
                    '&#0;' => '',
259
                    '&#x00001;' => '',
260
                    '&#1;' => '',
261
                    '&nvgt;' => '',
262
                    '&#61253;' => '',
263
                    '&#x0EF45;' => '',
264
                    '&shy;' => '',
265
                    '&#x000AD;' => '',
266
                    '&#173;' => '',
267
                    '&colon;' => ':',
268
                    '&#x0003A;' => ':',
269
                    '&#58;' => ':',
270
                    '&lpar;' => '(',
271
                    '&#x00028;' => '(',
272
                    '&#40;' => '(',
273
                    '&rpar;' => ')',
274
                    '&#x00029;' => ')',
275
                    '&#41;' => ')',
276
                    '&quest;' => '?',
277
                    '&#x0003F;' => '?',
278
                    '&#63;' => '?',
279
                    '&sol;' => '/',
280
                    '&#x0002F;' => '/',
281
                    '&#47;' => '/',
282
                    '&apos;' => '\'',
283
                    '&#x00027;' => '\'',
284
                    '&#039;' => '\'',
285
                    '&#39;' => '\'',
286
                    '&#x27;' => '\'',
287
                    '&bsol;' => '\'',
288
                    '&#x0005C;' => '\\',
289
                    '&#92;' => '\\',
290
                    '&comma;' => ',',
291
                    '&#x0002C;' => ',',
292
                    '&#44;' => ',',
293
                    '&period;' => '.',
294
                    '&#x0002E;' => '.',
295
                    '&quot;' => '"',
296
                    '&QUOT;' => '"',
297
                    '&#x00022;' => '"',
298
                    '&#34;' => '"',
299
                    '&grave;' => '`',
300
                    '&DiacriticalGrave;' => '`',
301
                    '&#x00060;' => '`',
302
                    '&#96;' => '`',
303
                    '&#46;' => '.',
304
                    '&equals;' => '=',
305
                    '&#x0003D;' => '=',
306
                    '&#61;' => '=',
307
                    '&newline;' => "\n",
308
                    '&#x0000A;' => "\n",
309
                    '&#10;' => "\n",
310
                    '&tab;' => "\t",
311
                    '&#x00009;' => "\t",
312
                    '&#9;' => "\t",
313
                ];
314
315
                $HTML_ENTITIES_CACHE = \array_merge(
316
                    $entitiesSecurity,
317
                    \array_flip(\get_html_translation_table(HTML_ENTITIES, $flags)),
318
                    \array_flip($this->_get_data('entities_fallback'))
319
                );
320
            }
321
322
            $search = [];
323
            $replace = [];
324
            foreach ($matches['html_entity'] as $match) {
325
                $match .= ';';
326
                if (isset($HTML_ENTITIES_CACHE[$match])) {
327
                    $search[$match] = $match;
328
                    $replace[$match] = $HTML_ENTITIES_CACHE[$match];
329
                }
330
            }
331
332
            if (\count($replace) > 0) {
333
                $str = \str_replace($search, $replace, $str);
334
            }
335
        }
336
337 6
        return $str;
338
    }
339
340
    private function _get_data($file)
0 ignored issues
show
Coding Style Naming introduced by
The method _get_data is not named in camelCase.

This check marks method names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Coding Style introduced by
Method name "Xss::_get_data" is not in camel caps format
Loading history...
341
    {
342
        /** @noinspection PhpIncludeInspection */
343
        return include __DIR__ . '/../voku/Data/' . $file . '.php';
344
    }
345
346
    public function cleanString($str)
347
    {
348
        return $this->clean($str);
349
    }
350
351
    /**
352
     * @param $str
353
     * @return array|mixed
354
     */
355 3
    public function cleanUrl($str)
0 ignored issues
show
Coding Style Naming introduced by
The variable $decode_str is not named in camelCase.

This check marks variable names that have not been written in camelCase.

In camelCase names are written without any punctuation, the start of each new word being marked by a capital letter. Thus the name database connection string becomes databaseConnectionString.

Loading history...
Complexity introduced by
This operation has 18 execution paths which exceeds the configured maximum of 10.

A high number of execution paths generally suggests many nested conditional statements and make the code less readible. This can usually be fixed by splitting the method into several smaller methods.

You can also find more information in the “Code” section of your repository.

Loading history...
356
    {
357 3
        $str = $this->clean($str);
358
359 3
        if (is_numeric($str) || is_null($str)) {
360 1
            return $str;
361
        }
362
363 3
        if (is_array($str)) {
364 1
            foreach ($str as $key => $value) {
365 1
                $str[$key] = $this->cleanUrl($value);
366 1
            }
367 1
            return $str;
368
        }
369
370
        do {
371 3
            $decode_str = rawurldecode($str);
372 3
            $str = $this->_do($str);
0 ignored issues
show
Documentation introduced by
$str is of type object|string|boolean, but the function expects a object<devtoolboxuk\sote...sources\StringResource>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
373 3
        } while ($decode_str !== $str);
374
375 3
        return $str;
376
    }
377
}