Utf8EncodingSniff   A
last analyzed

Complexity

Total Complexity 31

Size/Duplication

Total Lines 188
Duplicated Lines 6.91 %

Coupling/Cohesion

Components 1
Dependencies 1

Importance

Changes 0
Metric Value
dl 13
loc 188
rs 9.8
c 0
b 0
f 0
wmc 31
lcom 1
cbo 1

6 Methods

Rating   Name   Duplication   Size   Complexity  
A register() 0 7 1
B process() 13 25 6
B _checkUtf8W3c() 0 26 3
C _checkUtf8Rfc3629() 0 34 13
B mb_chunk_split() 0 16 5
A mbStringToArray() 0 10 3

How to fix   Duplicated Code   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

1
<?php
2
/**
3
 * CodeIgniter_Sniffs_Files_Utf8EncodingSniff.
4
 *
5
 * PHP version 5
6
 *
7
 * @category  PHP
8
 * @package   PHP_CodeSniffer
9
 * @author    Thomas Ernest <[email protected]>
10
 * @copyright 2006 Thomas Ernest
11
 * @license   http://thomas.ernest.fr/developement/php_cs/licence GNU General Public License
12
 * @link      http://pear.php.net/package/PHP_CodeSniffer
13
 */
14
15
/**
16
 * CodeIgniter_Sniffs_Files_Utf8EncodingSniff.
17
 *
18
 * Ensures that PHP files are encoded with Unicode (UTF-8) encoding.
19
 *
20
 * @category  PHP
21
 * @package   PHP_CodeSniffer
22
 * @author    Thomas Ernest <[email protected]>
23
 * @copyright 2006 Thomas Ernest
24
 * @license   http://thomas.ernest.fr/developement/php_cs/licence GNU General Public License
25
 * @link      http://pear.php.net/package/PHP_CodeSniffer
26
 */
27
28
namespace CodeIgniter\Sniffs\Files;
29
30
use PHP_CodeSniffer\Sniffs\Sniff;
31
use PHP_CodeSniffer\Files\File;
32
33
class Utf8EncodingSniff implements Sniff
34
{
35
36
    /**
37
     * Returns an array of tokens this test wants to listen for.
38
     *
39
     * @return array
40
     */
41
    public function register()
42
    {
43
        return array(
44
            T_OPEN_TAG
45
        );
46
47
    }//end register()
48
49
50
    /**
51
     * Processes this test, when one of its tokens is encountered.
52
     *
53
     * @param File $phpcsFile The current file being scanned.
54
     * @param int                  $stackPtr  The position of the current token
55
     *                                        in the stack passed in $tokens.
56
     *
57
     * @return void
58
     */
59
    public function process(File $phpcsFile, $stackPtr)
60
    {
61
        // We are only interested if this is the first open tag.
62 View Code Duplication
        if ($stackPtr !== 0) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
63
            if ($phpcsFile->findPrevious(T_OPEN_TAG, ($stackPtr - 1)) !== false) {
64
                return;
65
            }
66
        }
67
68
        $file_path = $phpcsFile->getFilename();
69
        $file_name = basename($file_path);
70
        $file_content = file_get_contents($file_path);
71
        if (false === mb_check_encoding($file_content, 'UTF-8')) {
72
            $error = 'File "' . $file_name . '" should be saved with Unicode (UTF-8) encoding.';
73
            $phpcsFile->addError($error, 0);
0 ignored issues
show
Bug introduced by
The call to addError() misses a required argument $code.

This check looks for function calls that miss required arguments.

Loading history...
74
        }
75 View Code Duplication
        if ( ! self::_checkUtf8W3c($file_content)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
76
            $error = 'File "' . $file_name . '" should be saved with Unicode (UTF-8) encoding, but it did not successfully pass the W3C test.';
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 143 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
77
            $phpcsFile->addError($error, 0);
0 ignored issues
show
Bug introduced by
The call to addError() misses a required argument $code.

This check looks for function calls that miss required arguments.

Loading history...
78
        }
79 View Code Duplication
        if ( ! self::_checkUtf8Rfc3629($file_content)) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
80
            $error = 'File "' . $file_name . '" should be saved with Unicode (UTF-8) encoding, but it did not meet RFC3629 requirements.';
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 138 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
81
            $phpcsFile->addError($error, 0);
0 ignored issues
show
Bug introduced by
The call to addError() misses a required argument $code.

This check looks for function calls that miss required arguments.

Loading history...
82
        }
83
    }//end process()
84
85
86
    /**
87
     * Checks that the string $content contains only valid UTF-8 chars
88
     * using W3C's method.
89
     * Returns true if $content contains only UTF-8 chars, false otherwise.
90
     *
91
     * @param string $content String to check.
92
     *
93
     * @return bool true if $content contains only UTF-8 chars, false otherwise.
94
     *
95
     * @see http://w3.org/International/questions/qa-forms-utf-8.html
96
     */
97
    private static function _checkUtf8W3c($content)
98
    {
99
        $content_chunks=self::mb_chunk_split($content, 4096, '');
100
    	foreach($content_chunks as $content_chunk)
0 ignored issues
show
Bug introduced by
The expression $content_chunks of type false|array is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
101
		{
102
			$preg_result= preg_match(
103
            '%^(?:
104
                  [\x09\x0A\x0D\x20-\x7E]            # ASCII
105
                | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
106
                |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
107
                | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
108
                |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
109
                |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
110
                | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
111
                |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
112
            )*$%xs',
113
            $content_chunk
114
			);
115
			if($preg_result!==1)
116
			{
117
				return false;
118
			}
119
120
		}
121
		return true;
122
    }//end _checkUtf8W3c()
123
124
    /**
125
     * Checks that the string $content contains only valid UTF-8 chars
126
     * using the method described in RFC 3629.
127
     * Returns true if $content contains only UTF-8 chars, false otherwise.
128
     *
129
     * @param string $content String to check.
130
     *
131
     * @return bool true if $content contains only UTF-8 chars, false otherwise.
132
     *
133
     * @see http://www.php.net/manual/en/function.mb-detect-encoding.php#85294
134
     */
135
    private static function _checkUtf8Rfc3629($content)
136
    {
137
        $len = strlen($content);
138
        for ($i = 0; $i < $len; $i++) {
139
            $c = ord($content[$i]);
140
            if ($c > 128) {
141
                if (($c >= 254)) {
142
                    return false;
143
                } elseif ($c >= 252) {
144
                    $bits=6;
0 ignored issues
show
Unused Code introduced by
$bits is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
145
                } elseif ($c >= 248) {
146
                    $bits=5;
0 ignored issues
show
Unused Code introduced by
$bits is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
147
                } elseif ($c >= 240) {
148
                    $bytes = 4;
149
                } elseif ($c >= 224) {
150
                    $bytes = 3;
151
                } elseif ($c >= 192) {
152
                    $bytes = 2;
153
                } else {
154
                    return false;
155
                } if (($i + $bytes) > $len) {
0 ignored issues
show
Bug introduced by
The variable $bytes does not seem to be defined for all execution paths leading up to this point.

If you define a variable conditionally, it can happen that it is not defined for all execution paths.

Let’s take a look at an example:

function myFunction($a) {
    switch ($a) {
        case 'foo':
            $x = 1;
            break;

        case 'bar':
            $x = 2;
            break;
    }

    // $x is potentially undefined here.
    echo $x;
}

In the above example, the variable $x is defined if you pass “foo” or “bar” as argument for $a. However, since the switch statement has no default case statement, if you pass any other value, the variable $x would be undefined.

Available Fixes

  1. Check for existence of the variable explicitly:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        if (isset($x)) { // Make sure it's always set.
            echo $x;
        }
    }
    
  2. Define a default value for the variable:

    function myFunction($a) {
        $x = ''; // Set a default which gets overridden for certain paths.
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        echo $x;
    }
    
  3. Add a value for the missing path:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
    
            // We add support for the missing case.
            default:
                $x = '';
                break;
        }
    
        echo $x;
    }
    
Loading history...
156
                    return false;
157
                } while ($bytes > 1) {
158
                    $i++;
159
                    $b = ord($content[$i]);
160
                    if ($b < 128 || $b > 191) {
161
                        return false;
162
                    }
163
                    $bytes--;
164
                }
165
            }
166
        }
167
        return true;
168
    }//_checkUtf8Rfc3629()
169
170
	 /**
171
     * Splits a string to chunks of given size
172
	 * This helps to avoid segmentation fault errors when large text is given
173
     * Returns array of strings after splitting
174
     *
175
     * @param string $str String to split.
176
	 * @param int $len number of characters per chunk
177
     *
178
     * @return array string array after splitting
0 ignored issues
show
Documentation introduced by
Should the return type not be false|array?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
179
     *
180
     * @see http://php.net/manual/en/function.chunk-split.php
181
     */
182
	private static function mb_chunk_split($str, $len, $glue)
183
	{
184
		if (empty($str)) return false;
185
		$array = self::mbStringToArray ($str);
186
		$n = -1;
187
		$new = Array();
188
		foreach ($array as $char) {
0 ignored issues
show
Bug introduced by
The expression $array of type false|array is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
189
			$n++;
190
			if ($n < $len) $new []= $char;
191
			elseif ($n == $len) {
192
				$new []= $glue . $char;
193
				$n = 0;
194
			}
195
		}
196
		return $new;
197
	}//mb_chunk_split
198
	/**
199
     * Supporting function for mb_chunk_split
200
     *
201
     * @param string $str
202
	 *
203
     * @return array
0 ignored issues
show
Documentation introduced by
Should the return type not be false|array?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
204
     *
205
     * @see http://php.net/manual/en/function.chunk-split.php
206
     */
207
	private static function mbStringToArray ($str)
208
	{
209
		if (empty($str)) return false;
210
		$len = mb_strlen($str);
211
		$array = array();
212
		for ($i = 0; $i < $len; $i++) {
213
			$array[] = mb_substr($str, $i, 1);
214
		}
215
		return $array;
216
	}
217
218
219
220
}//end class
221
222
?>
0 ignored issues
show
Best Practice introduced by
It is not recommended to use PHP's closing tag ?> in files other than templates.

Using a closing tag in PHP files that only contain PHP code is not recommended as you might accidentally add whitespace after the closing tag which would then be output by PHP. This can cause severe problems, for example headers cannot be sent anymore.

A simple precaution is to leave off the closing tag as it is not required, and it also has no negative effects whatsoever.

Loading history...
Coding Style introduced by
As per coding style, files should not end with a newline character.

This check marks files that end in a newline character, i.e. an empy line.

Loading history...
223