Encoding   B
last analyzed

Complexity

Total Complexity 44

Size/Duplication

Total Lines 347
Duplicated Lines 5.19 %

Coupling/Cohesion

Components 2
Dependencies 2

Importance

Changes 0
Metric Value
dl 18
loc 347
rs 8.8798
c 0
b 0
f 0
wmc 44
lcom 2
cbo 2

9 Methods

Rating   Name   Duplication   Size   Complexity  
A toISO8859() 0 4 1
A toWin1252() 0 19 4
D toUTF8() 18 67 26
A fixUTF8() 0 25 4
A UTF8FixWin1252Chars() 0 4 1
A removeBOM() 0 7 2
A encode() 0 10 3
A normalizeEncoding() 0 22 2
A toLatin1() 0 4 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like Encoding often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Encoding, and based on these observations, apply Extract Interface, too.

1
<?php
2
/*
3
Copyright (c) 2008 Sebastián Grignoli
4
All rights reserved.
5
6
Redistribution and use in source and binary forms, with or without
7
modification, are permitted provided that the following conditions
8
are met:
9
1. Redistributions of source code must retain the above copyright
10
     notice, this list of conditions and the following disclaimer.
11
2. Redistributions in binary form must reproduce the above copyright
12
     notice, this list of conditions and the following disclaimer in the
13
     documentation and/or other materials provided with the distribution.
14
3. Neither the name of copyright holders nor the names of its
15
     contributors may be used to endorse or promote products derived
16
     from this software without specific prior written permission.
17
18
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
19
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21
PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL COPYRIGHT HOLDERS OR CONTRIBUTORS
22
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28
POSSIBILITY OF SUCH DAMAGE.
29
*/
30
31
/**
32
 * @author   "Sebastián Grignoli" <[email protected]>
33
 * @package  Encoding
34
 * @version  1.2
35
 * @link     https://github.com/neitanod/forceutf8
36
 * @example  https://github.com/neitanod/forceutf8
37
 * @license  Revised BSD
38
 */
39
40
namespace PHPDaemon\Utils;
41
42
class Encoding
43
{
44
    use \PHPDaemon\Traits\ClassWatchdog;
45
    use \PHPDaemon\Traits\StaticObjectWatchdog;
46
47
    protected static $win1252ToUtf8 = array(
48
        128 => "\xe2\x82\xac",
49
50
        130 => "\xe2\x80\x9a",
51
        131 => "\xc6\x92",
52
        132 => "\xe2\x80\x9e",
53
        133 => "\xe2\x80\xa6",
54
        134 => "\xe2\x80\xa0",
55
        135 => "\xe2\x80\xa1",
56
        136 => "\xcb\x86",
57
        137 => "\xe2\x80\xb0",
58
        138 => "\xc5\xa0",
59
        139 => "\xe2\x80\xb9",
60
        140 => "\xc5\x92",
61
62
        142 => "\xc5\xbd",
63
64
65
        145 => "\xe2\x80\x98",
66
        146 => "\xe2\x80\x99",
67
        147 => "\xe2\x80\x9c",
68
        148 => "\xe2\x80\x9d",
69
        149 => "\xe2\x80\xa2",
70
        150 => "\xe2\x80\x93",
71
        151 => "\xe2\x80\x94",
72
        152 => "\xcb\x9c",
73
        153 => "\xe2\x84\xa2",
74
        154 => "\xc5\xa1",
75
        155 => "\xe2\x80\xba",
76
        156 => "\xc5\x93",
77
78
        158 => "\xc5\xbe",
79
        159 => "\xc5\xb8"
80
    );
81
82
    protected static $brokenUtf8ToUtf8 = array(
83
        "\xc2\x80" => "\xe2\x82\xac",
84
85
        "\xc2\x82" => "\xe2\x80\x9a",
86
        "\xc2\x83" => "\xc6\x92",
87
        "\xc2\x84" => "\xe2\x80\x9e",
88
        "\xc2\x85" => "\xe2\x80\xa6",
89
        "\xc2\x86" => "\xe2\x80\xa0",
90
        "\xc2\x87" => "\xe2\x80\xa1",
91
        "\xc2\x88" => "\xcb\x86",
92
        "\xc2\x89" => "\xe2\x80\xb0",
93
        "\xc2\x8a" => "\xc5\xa0",
94
        "\xc2\x8b" => "\xe2\x80\xb9",
95
        "\xc2\x8c" => "\xc5\x92",
96
97
        "\xc2\x8e" => "\xc5\xbd",
98
99
100
        "\xc2\x91" => "\xe2\x80\x98",
101
        "\xc2\x92" => "\xe2\x80\x99",
102
        "\xc2\x93" => "\xe2\x80\x9c",
103
        "\xc2\x94" => "\xe2\x80\x9d",
104
        "\xc2\x95" => "\xe2\x80\xa2",
105
        "\xc2\x96" => "\xe2\x80\x93",
106
        "\xc2\x97" => "\xe2\x80\x94",
107
        "\xc2\x98" => "\xcb\x9c",
108
        "\xc2\x99" => "\xe2\x84\xa2",
109
        "\xc2\x9a" => "\xc5\xa1",
110
        "\xc2\x9b" => "\xe2\x80\xba",
111
        "\xc2\x9c" => "\xc5\x93",
112
113
        "\xc2\x9e" => "\xc5\xbe",
114
        "\xc2\x9f" => "\xc5\xb8"
115
    );
116
117
    protected static $utf8ToWin1252 = array(
118
        "\xe2\x82\xac" => "\x80",
119
120
        "\xe2\x80\x9a" => "\x82",
121
        "\xc6\x92" => "\x83",
122
        "\xe2\x80\x9e" => "\x84",
123
        "\xe2\x80\xa6" => "\x85",
124
        "\xe2\x80\xa0" => "\x86",
125
        "\xe2\x80\xa1" => "\x87",
126
        "\xcb\x86" => "\x88",
127
        "\xe2\x80\xb0" => "\x89",
128
        "\xc5\xa0" => "\x8a",
129
        "\xe2\x80\xb9" => "\x8b",
130
        "\xc5\x92" => "\x8c",
131
132
        "\xc5\xbd" => "\x8e",
133
134
135
        "\xe2\x80\x98" => "\x91",
136
        "\xe2\x80\x99" => "\x92",
137
        "\xe2\x80\x9c" => "\x93",
138
        "\xe2\x80\x9d" => "\x94",
139
        "\xe2\x80\xa2" => "\x95",
140
        "\xe2\x80\x93" => "\x96",
141
        "\xe2\x80\x94" => "\x97",
142
        "\xcb\x9c" => "\x98",
143
        "\xe2\x84\xa2" => "\x99",
144
        "\xc5\xa1" => "\x9a",
145
        "\xe2\x80\xba" => "\x9b",
146
        "\xc5\x93" => "\x9c",
147
148
        "\xc5\xbe" => "\x9e",
149
        "\xc5\xb8" => "\x9f"
150
    );
151
152
    /**
153
     * toISO8859
154
     * @param  string $text Any string
155
     * @return string       The same string, Win1252 encoded
0 ignored issues
show
Documentation introduced by
Should the return type not be array|string?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
156
     */
157
    public static function toISO8859($text)
158
    {
159
        return self::toWin1252($text);
160
    }
161
162
    /**
163
     * toWin1252
164
     * @param  string $text Any string
165
     * @return string       The same string, Win1252 encoded
0 ignored issues
show
Documentation introduced by
Should the return type not be array|string?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
166
     */
167
    public static function toWin1252($text)
168
    {
169
        if (is_array($text)) {
170
            foreach ($text as $k => $v) {
171
                $text[$k] = self::toWin1252($v);
172
            }
173
            return $text;
174
        } elseif (is_string($text)) {
175
            return utf8_decode(
176
                str_replace(
177
                    array_keys(self::$utf8ToWin1252),
178
                    array_values(self::$utf8ToWin1252),
179
                    self::toUTF8($text)
180
                )
181
            );
182
        } else {
183
            return $text;
184
        }
185
    }
186
187
    /**
188
     * Function Encoding::toUTF8
189
     *
190
     * This function leaves UTF8 characters alone, while converting almost all non-UTF8 to UTF8.
191
     *
192
     * It assumes that the encoding of the original string is either Windows-1252 or ISO 8859-1.
193
     *
194
     * It may fail to convert characters to UTF-8 if they fall into one of these scenarios:
195
     *
196
     * 1) when any of these characters:   ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
197
     *    are followed by any of these:  ("group B")
198
     *                                    ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶•¸¹º»¼½¾¿
199
     * For example:   %ABREPRESENT%C9%BB. «REPRESENTÉ»
200
     * The "«" (%AB) character will be converted, but the "É" followed by "»" (%C9%BB)
201
     * is also a valid unicode character, and will be left unchanged.
202
     *
203
     * 2) when any of these: àáâãäåæçèéêëìíîï  are followed by TWO chars from group B,
204
     * 3) when any of these: ðñòó  are followed by THREE chars from group B.
205
     *
206
     * @name toUTF8
207
     * @param  string $text Any string
208
     * @return string       The same string, UTF8 encoded
0 ignored issues
show
Documentation introduced by
Should the return type not be array|string?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
209
     *
210
     */
211
    public static function toUTF8($text)
212
    {
213
        if (is_array($text)) {
214
            foreach ($text as $k => $v) {
215
                $text[$k] = self::toUTF8($v);
216
            }
217
            return $text;
218
        } elseif (is_string($text)) {
219
            $max = mb_orig_strlen($text);
220
221
            $buf = "";
222
            for ($i = 0; $i < $max; $i++) {
223
                $c1 = $text{$i};
224
                if ($c1 >= "\xc0") { //Should be converted to UTF8, if it's not UTF8 already
225
                    $c2 = $i + 1 >= $max ? "\x00" : $text{$i + 1};
226
                    $c3 = $i + 2 >= $max ? "\x00" : $text{$i + 2};
227
                    $c4 = $i + 3 >= $max ? "\x00" : $text{$i + 3};
228
                    if ($c1 >= "\xc0" & $c1 <= "\xdf") { //looks like 2 bytes UTF8
229
                        if ($c2 >= "\x80" && $c2 <= "\xbf") { //yeah, almost sure it's UTF8 already
230
                            $buf .= $c1 . $c2;
231
                            $i++;
232
                        } else { //not valid UTF8.  Convert it.
233
                            $cc1 = (chr(ord($c1) / 64) | "\xc0");
234
                            $cc2 = ($c1 & "\x3f") | "\x80";
235
                            $buf .= $cc1 . $cc2;
236
                        }
237 View Code Duplication
                    } elseif ($c1 >= "\xe0" & $c1 <= "\xef") { //looks like 3 bytes UTF8
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
238
                        if ($c2 >= "\x80" && $c2 <= "\xbf" && $c3 >= "\x80" && $c3 <= "\xbf") { //yeah, almost sure it's UTF8 already
239
                            $buf .= $c1 . $c2 . $c3;
240
                            $i = $i + 2;
241
                        } else { //not valid UTF8.  Convert it.
242
                            $cc1 = (chr(ord($c1) / 64) | "\xc0");
243
                            $cc2 = ($c1 & "\x3f") | "\x80";
244
                            $buf .= $cc1 . $cc2;
245
                        }
246
                    } elseif ($c1 >= "\xf0" & $c1 <= "\xf7") { //looks like 4 bytes UTF8
247 View Code Duplication
                        if ($c2 >= "\x80" && $c2 <= "\xbf" && $c3 >= "\x80" && $c3 <= "\xbf" && $c4 >= "\x80" && $c4 <= "\xbf") { //yeah, almost sure it's UTF8 already
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
248
                            $buf .= $c1 . $c2 . $c3;
249
                            $i = $i + 2;
250
                        } else { //not valid UTF8.  Convert it.
251
                            $cc1 = (chr(ord($c1) / 64) | "\xc0");
252
                            $cc2 = ($c1 & "\x3f") | "\x80";
253
                            $buf .= $cc1 . $cc2;
254
                        }
255
                    } else { //doesn't look like UTF8, but should be converted
256
                        $cc1 = (chr(ord($c1) / 64) | "\xc0");
257
                        $cc2 = (($c1 & "\x3f") | "\x80");
258
                        $buf .= $cc1 . $cc2;
259
                    }
260
                } elseif (($c1 & "\xc0") == "\x80") { // needs conversion
261
                    if (isset(self::$win1252ToUtf8[ord($c1)])) { //found in Windows-1252 special cases
262
                        $buf .= self::$win1252ToUtf8[ord($c1)];
263
                    } else {
264
                        $cc1 = (chr(ord($c1) / 64) | "\xc0");
265
                        $cc2 = (($c1 & "\x3f") | "\x80");
266
                        $buf .= $cc1 . $cc2;
267
                    }
268
                } else { // it doesn't need conversion
269
                    $buf .= $c1;
270
                }
271
            }
272
273
            return $buf;
274
        } else {
275
            return $text;
276
        }
277
    }
278
279
    /**
280
     * fixUTF8
281
     * @param  string $text Any string
282
     * @return string
0 ignored issues
show
Documentation introduced by
Should the return type not be array|string?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
283
     */
284
    public static function fixUTF8($text)
285
    {
286
        if (is_array($text)) {
287
            foreach ($text as $k => $v) {
288
                $text[$k] = self::fixUTF8($v);
289
            }
290
            return $text;
291
        }
292
293
        $last = "";
294
        while ($last <> $text) {
295
            $last = $text;
296
            $text = self::toUTF8(
297
                utf8_decode(
298
                    str_replace(array_keys(self::$utf8ToWin1252), array_values(self::$utf8ToWin1252), $text)
299
                )
300
            );
301
        }
302
303
        return self::toUTF8(
304
            utf8_decode(
305
                str_replace(array_keys(self::$utf8ToWin1252), array_values(self::$utf8ToWin1252), $text)
306
            )
307
        );
308
    }
309
310
    /**
311
     * If you received an UTF-8 string that was converted from Windows-1252 as it was ISO8859-1
312
     * (ignoring Windows-1252 chars from 80 to 9F) use this function to fix it.
313
     * See: http://en.wikipedia.org/wiki/Windows-1252
314
     * @param  string $text Any string
315
     * @return string
316
     */
317
    public static function UTF8FixWin1252Chars($text)
318
    {
319
        return str_replace(array_keys(self::$brokenUtf8ToUtf8), array_values(self::$brokenUtf8ToUtf8), $text);
320
    }
321
322
    /**
323
     * Remove BOM
324
     * @param  string $str Any string
325
     * @return string
326
     */
327
    public static function removeBOM($str = "")
328
    {
329
        if (substr($str, 0, 3) == pack("CCC", 0xef, 0xbb, 0xbf)) {
330
            $str = substr($str, 3);
331
        }
332
        return $str;
333
    }
334
335
    /**
336
     * Encode
337
     * @param  string $str Any string
0 ignored issues
show
Bug introduced by
There is no parameter named $str. Was it maybe removed?

This check looks for PHPDoc comments describing methods or function parameters that do not exist on the corresponding method or function.

Consider the following example. The parameter $italy is not defined by the method finale(...).

/**
 * @param array $germany
 * @param array $island
 * @param array $italy
 */
function finale($germany, $island) {
    return "2:1";
}

The most likely cause is that the parameter was removed, but the annotation was not.

Loading history...
338
     * @return string
0 ignored issues
show
Documentation introduced by
Should the return type not be array|string|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
339
     */
340
    public static function encode($encodingLabel, $text)
341
    {
342
        $encodingLabel = self::normalizeEncoding($encodingLabel);
343
        if ($encodingLabel === 'UTF-8') {
344
            return Encoding::toUTF8($text);
345
        }
346
        if ($encodingLabel === 'ISO-8859-1') {
347
            return Encoding::toLatin1($text);
348
        }
349
    }
350
351
    /**
352
     * Normalize encoding name
353
     * @param  string $str Encoding name
0 ignored issues
show
Bug introduced by
There is no parameter named $str. Was it maybe removed?

This check looks for PHPDoc comments describing methods or function parameters that do not exist on the corresponding method or function.

Consider the following example. The parameter $italy is not defined by the method finale(...).

/**
 * @param array $germany
 * @param array $island
 * @param array $italy
 */
function finale($germany, $island) {
    return "2:1";
}

The most likely cause is that the parameter was removed, but the annotation was not.

Loading history...
354
     * @return string
355
     */
356
    public static function normalizeEncoding($encodingLabel)
357
    {
358
        $encoding = strtoupper($encodingLabel);
359
        $encoding = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);
360
        $equivalences = array(
361
            'ISO88591' => 'ISO-8859-1',
362
            'ISO8859' => 'ISO-8859-1',
363
            'ISO' => 'ISO-8859-1',
364
            'LATIN1' => 'ISO-8859-1',
365
            'LATIN' => 'ISO-8859-1',
366
            'UTF8' => 'UTF-8',
367
            'UTF' => 'UTF-8',
368
            'WIN1252' => 'ISO-8859-1',
369
            'WINDOWS1252' => 'ISO-8859-1'
370
        );
371
372
        if (empty($equivalences[$encoding])) {
373
            return 'UTF-8';
374
        }
375
376
        return $equivalences[$encoding];
377
    }
378
379
    /**
380
     * toLatin1
381
     * @param  string $text Any string
382
     * @return string       The same string, Win1252 encoded
0 ignored issues
show
Documentation introduced by
Should the return type not be array|string?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
383
     */
384
    public static function toLatin1($text)
385
    {
386
        return self::toWin1252($text);
387
    }
388
}
389