Encoding - Code Metrics - kakserpom/phpdaemon - Measure and Improve Code Quality continuously with Scrutinizer

Encoding B
last analyzed 2021-06-02 17:17 UTC

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	347
Duplicated Lines	5.19 %

Coupling/Cohesion

Components	2
Dependencies	2

Importance

Changes

Metric	Value
dl	18
loc	347
rs	8.8798
c	0
b	0
f	0
wmc	44
lcom	2
cbo	2

9 Methods

Rating	Name	Duplication	Size	Complexity
A	toISO8859()	0	4	1
A	toWin1252()	0	19	4
D	toUTF8()	18	67	26
A	fixUTF8()	0	25	4
A	UTF8FixWin1252Chars()	0	4	1
A	removeBOM()	0	7	2
A	encode()	0	10	3
A	normalizeEncoding()	0	22	2
A	toLatin1()	0	4	1

How to fix Duplicated Code Complexity

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

If you have the same expression in different places: Extract expression to a method
If you have the same method in different sub-classes: Extract method, and pull up field to the parent class
If you have the same code in unrelated classes: Consider extracting the code to a new class, and injecting that class

Complex Class

Tip: Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like Encoding often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Encoding, and based on these observations, apply Extract Interface, too.

<?php
/*
Copyright (c) 2008 Sebastián Grignoli
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
     notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
     notice, this list of conditions and the following disclaimer in the
     documentation and/or other materials provided with the distribution.
3. Neither the name of copyright holders nor the names of its
     contributors may be used to endorse or promote products derived
     from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL COPYRIGHT HOLDERS OR CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/

/**
 * @author   "Sebastián Grignoli" <[email protected]>
 * @package  Encoding
 * @version  1.2
 * @link     https://github.com/neitanod/forceutf8
 * @example  https://github.com/neitanod/forceutf8
 * @license  Revised BSD
 */

namespace PHPDaemon\Utils;

class Encoding
{
    use \PHPDaemon\Traits\ClassWatchdog;
    use \PHPDaemon\Traits\StaticObjectWatchdog;

    protected static $win1252ToUtf8 = array(
        128 => "\xe2\x82\xac",

        130 => "\xe2\x80\x9a",
        131 => "\xc6\x92",
        132 => "\xe2\x80\x9e",
        133 => "\xe2\x80\xa6",
        134 => "\xe2\x80\xa0",
        135 => "\xe2\x80\xa1",
        136 => "\xcb\x86",
        137 => "\xe2\x80\xb0",
        138 => "\xc5\xa0",
        139 => "\xe2\x80\xb9",
        140 => "\xc5\x92",

        142 => "\xc5\xbd",


        145 => "\xe2\x80\x98",
        146 => "\xe2\x80\x99",
        147 => "\xe2\x80\x9c",
        148 => "\xe2\x80\x9d",
        149 => "\xe2\x80\xa2",
        150 => "\xe2\x80\x93",
        151 => "\xe2\x80\x94",
        152 => "\xcb\x9c",
        153 => "\xe2\x84\xa2",
        154 => "\xc5\xa1",
        155 => "\xe2\x80\xba",
        156 => "\xc5\x93",

        158 => "\xc5\xbe",
        159 => "\xc5\xb8"
    );

    protected static $brokenUtf8ToUtf8 = array(
        "\xc2\x80" => "\xe2\x82\xac",

        "\xc2\x82" => "\xe2\x80\x9a",
        "\xc2\x83" => "\xc6\x92",
        "\xc2\x84" => "\xe2\x80\x9e",
        "\xc2\x85" => "\xe2\x80\xa6",
        "\xc2\x86" => "\xe2\x80\xa0",
        "\xc2\x87" => "\xe2\x80\xa1",
        "\xc2\x88" => "\xcb\x86",
        "\xc2\x89" => "\xe2\x80\xb0",
        "\xc2\x8a" => "\xc5\xa0",
        "\xc2\x8b" => "\xe2\x80\xb9",
        "\xc2\x8c" => "\xc5\x92",

        "\xc2\x8e" => "\xc5\xbd",


        "\xc2\x91" => "\xe2\x80\x98",
        "\xc2\x92" => "\xe2\x80\x99",
        "\xc2\x93" => "\xe2\x80\x9c",
        "\xc2\x94" => "\xe2\x80\x9d",
        "\xc2\x95" => "\xe2\x80\xa2",
        "\xc2\x96" => "\xe2\x80\x93",
        "\xc2\x97" => "\xe2\x80\x94",
        "\xc2\x98" => "\xcb\x9c",
        "\xc2\x99" => "\xe2\x84\xa2",
        "\xc2\x9a" => "\xc5\xa1",
        "\xc2\x9b" => "\xe2\x80\xba",
        "\xc2\x9c" => "\xc5\x93",

        "\xc2\x9e" => "\xc5\xbe",
        "\xc2\x9f" => "\xc5\xb8"
    );

    protected static $utf8ToWin1252 = array(
        "\xe2\x82\xac" => "\x80",

        "\xe2\x80\x9a" => "\x82",
        "\xc6\x92" => "\x83",
        "\xe2\x80\x9e" => "\x84",
        "\xe2\x80\xa6" => "\x85",
        "\xe2\x80\xa0" => "\x86",
        "\xe2\x80\xa1" => "\x87",
        "\xcb\x86" => "\x88",
        "\xe2\x80\xb0" => "\x89",
        "\xc5\xa0" => "\x8a",
        "\xe2\x80\xb9" => "\x8b",
        "\xc5\x92" => "\x8c",

        "\xc5\xbd" => "\x8e",


        "\xe2\x80\x98" => "\x91",
        "\xe2\x80\x99" => "\x92",
        "\xe2\x80\x9c" => "\x93",
        "\xe2\x80\x9d" => "\x94",
        "\xe2\x80\xa2" => "\x95",
        "\xe2\x80\x93" => "\x96",
        "\xe2\x80\x94" => "\x97",
        "\xcb\x9c" => "\x98",
        "\xe2\x84\xa2" => "\x99",
        "\xc5\xa1" => "\x9a",
        "\xe2\x80\xba" => "\x9b",
        "\xc5\x93" => "\x9c",

        "\xc5\xbe" => "\x9e",
        "\xc5\xb8" => "\x9f"
    );

    /**
     * toISO8859
     * @param  string $text Any string
     * @return string       The same string, Win1252 encoded

     */
    public static function toISO8859($text)
    {
        return self::toWin1252($text);
    }

    /**
     * toWin1252
     * @param  string $text Any string
     * @return string       The same string, Win1252 encoded

     */
    public static function toWin1252($text)
    {
        if (is_array($text)) {
            foreach ($text as $k => $v) {
                $text[$k] = self::toWin1252($v);
            }
            return $text;
        } elseif (is_string($text)) {
            return utf8_decode(
                str_replace(
                    array_keys(self::$utf8ToWin1252),
                    array_values(self::$utf8ToWin1252),
                    self::toUTF8($text)
                )
            );
        } else {
            return $text;
        }
    }

    /**
     * Function Encoding::toUTF8
     *
     * This function leaves UTF8 characters alone, while converting almost all non-UTF8 to UTF8.
     *
     * It assumes that the encoding of the original string is either Windows-1252 or ISO 8859-1.
     *
     * It may fail to convert characters to UTF-8 if they fall into one of these scenarios:
     *
     * 1) when any of these characters:   ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
     *    are followed by any of these:  ("group B")
     *                                    ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶•¸¹º»¼½¾¿
     * For example:   %ABREPRESENT%C9%BB. «REPRESENTÉ»
     * The "«" (%AB) character will be converted, but the "É" followed by "»" (%C9%BB)
     * is also a valid unicode character, and will be left unchanged.
     *
     * 2) when any of these: àáâãäåæçèéêëìíîï  are followed by TWO chars from group B,
     * 3) when any of these: ðñòó  are followed by THREE chars from group B.
     *
     * @name toUTF8
     * @param  string $text Any string
     * @return string       The same string, UTF8 encoded

     *
     */
    public static function toUTF8($text)
    {
        if (is_array($text)) {
            foreach ($text as $k => $v) {
                $text[$k] = self::toUTF8($v);
            }
            return $text;
        } elseif (is_string($text)) {
            $max = mb_orig_strlen($text);

            $buf = "";
            for ($i = 0; $i < $max; $i++) {
                $c1 = $text{$i};
                if ($c1 >= "\xc0") { //Should be converted to UTF8, if it's not UTF8 already
                    $c2 = $i + 1 >= $max ? "\x00" : $text{$i + 1};
                    $c3 = $i + 2 >= $max ? "\x00" : $text{$i + 2};
                    $c4 = $i + 3 >= $max ? "\x00" : $text{$i + 3};
                    if ($c1 >= "\xc0" & $c1 <= "\xdf") { //looks like 2 bytes UTF8
                        if ($c2 >= "\x80" && $c2 <= "\xbf") { //yeah, almost sure it's UTF8 already
                            $buf .= $c1 . $c2;
                            $i++;
                        } else { //not valid UTF8.  Convert it.
                            $cc1 = (chr(ord($c1) / 64) | "\xc0");
                            $cc2 = ($c1 & "\x3f") | "\x80";
                            $buf .= $cc1 . $cc2;
                        }
                    } elseif ($c1 >= "\xe0" & $c1 <= "\xef") { //looks like 3 bytes UTF8

                        if ($c2 >= "\x80" && $c2 <= "\xbf" && $c3 >= "\x80" && $c3 <= "\xbf") { //yeah, almost sure it's UTF8 already
                            $buf .= $c1 . $c2 . $c3;
                            $i = $i + 2;
                        } else { //not valid UTF8.  Convert it.
                            $cc1 = (chr(ord($c1) / 64) | "\xc0");
                            $cc2 = ($c1 & "\x3f") | "\x80";
                            $buf .= $cc1 . $cc2;
                        }
                    } elseif ($c1 >= "\xf0" & $c1 <= "\xf7") { //looks like 4 bytes UTF8
                        if ($c2 >= "\x80" && $c2 <= "\xbf" && $c3 >= "\x80" && $c3 <= "\xbf" && $c4 >= "\x80" && $c4 <= "\xbf") { //yeah, almost sure it's UTF8 already

                            $buf .= $c1 . $c2 . $c3;
                            $i = $i + 2;
                        } else { //not valid UTF8.  Convert it.
                            $cc1 = (chr(ord($c1) / 64) | "\xc0");
                            $cc2 = ($c1 & "\x3f") | "\x80";
                            $buf .= $cc1 . $cc2;
                        }
                    } else { //doesn't look like UTF8, but should be converted
                        $cc1 = (chr(ord($c1) / 64) | "\xc0");
                        $cc2 = (($c1 & "\x3f") | "\x80");
                        $buf .= $cc1 . $cc2;
                    }
                } elseif (($c1 & "\xc0") == "\x80") { // needs conversion
                    if (isset(self::$win1252ToUtf8[ord($c1)])) { //found in Windows-1252 special cases
                        $buf .= self::$win1252ToUtf8[ord($c1)];
                    } else {
                        $cc1 = (chr(ord($c1) / 64) | "\xc0");
                        $cc2 = (($c1 & "\x3f") | "\x80");
                        $buf .= $cc1 . $cc2;
                    }
                } else { // it doesn't need conversion
                    $buf .= $c1;
                }
            }

            return $buf;
        } else {
            return $text;
        }
    }

    /**
     * fixUTF8
     * @param  string $text Any string
     * @return string

     */
    public static function fixUTF8($text)
    {
        if (is_array($text)) {
            foreach ($text as $k => $v) {
                $text[$k] = self::fixUTF8($v);
            }
            return $text;
        }

        $last = "";
        while ($last <> $text) {
            $last = $text;
            $text = self::toUTF8(
                utf8_decode(
                    str_replace(array_keys(self::$utf8ToWin1252), array_values(self::$utf8ToWin1252), $text)
                )
            );
        }

        return self::toUTF8(
            utf8_decode(
                str_replace(array_keys(self::$utf8ToWin1252), array_values(self::$utf8ToWin1252), $text)
            )
        );
    }

    /**
     * If you received an UTF-8 string that was converted from Windows-1252 as it was ISO8859-1
     * (ignoring Windows-1252 chars from 80 to 9F) use this function to fix it.
     * See: http://en.wikipedia.org/wiki/Windows-1252
     * @param  string $text Any string
     * @return string
     */
    public static function UTF8FixWin1252Chars($text)
    {
        return str_replace(array_keys(self::$brokenUtf8ToUtf8), array_values(self::$brokenUtf8ToUtf8), $text);
    }

    /**
     * Remove BOM
     * @param  string $str Any string
     * @return string
     */
    public static function removeBOM($str = "")
    {
        if (substr($str, 0, 3) == pack("CCC", 0xef, 0xbb, 0xbf)) {
            $str = substr($str, 3);
        }
        return $str;
    }

    /**
     * Encode
     * @param  string $str Any string
/**
 * @param array $germany
 * @param array $island
 * @param array $italy
 */
function finale($germany, $island) {
    return "2:1";
}
     * @return string

     */
    public static function encode($encodingLabel, $text)
    {
        $encodingLabel = self::normalizeEncoding($encodingLabel);
        if ($encodingLabel === 'UTF-8') {
            return Encoding::toUTF8($text);
        }
        if ($encodingLabel === 'ISO-8859-1') {
            return Encoding::toLatin1($text);
        }
    }

    /**
     * Normalize encoding name
     * @param  string $str Encoding name
/**
 * @param array $germany
 * @param array $island
 * @param array $italy
 */
function finale($germany, $island) {
    return "2:1";
}
     * @return string
     */
    public static function normalizeEncoding($encodingLabel)
    {
        $encoding = strtoupper($encodingLabel);
        $encoding = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);
        $equivalences = array(
            'ISO88591' => 'ISO-8859-1',
            'ISO8859' => 'ISO-8859-1',
            'ISO' => 'ISO-8859-1',
            'LATIN1' => 'ISO-8859-1',
            'LATIN' => 'ISO-8859-1',
            'UTF8' => 'UTF-8',
            'UTF' => 'UTF-8',
            'WIN1252' => 'ISO-8859-1',
            'WINDOWS1252' => 'ISO-8859-1'
        );

        if (empty($equivalences[$encoding])) {
            return 'UTF-8';
        }

        return $equivalences[$encoding];
    }

    /**
     * toLatin1
     * @param  string $text Any string
     * @return string       The same string, Win1252 encoded

     */
    public static function toLatin1($text)
    {
        return self::toWin1252($text);
    }
}


1		<?php
2		/*
3		Copyright (c) 2008 Sebastián Grignoli
4		All rights reserved.
5
6		Redistribution and use in source and binary forms, with or without
7		modification, are permitted provided that the following conditions
8		are met:
9		1. Redistributions of source code must retain the above copyright
10		notice, this list of conditions and the following disclaimer.
11		2. Redistributions in binary form must reproduce the above copyright
12		notice, this list of conditions and the following disclaimer in the
13		documentation and/or other materials provided with the distribution.
14		3. Neither the name of copyright holders nor the names of its
15		contributors may be used to endorse or promote products derived
16		from this software without specific prior written permission.
17
18		THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
19		``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20		TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21		PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS OR CONTRIBUTORS
22		BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23		CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24		SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25		INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26		CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27		ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28		POSSIBILITY OF SUCH DAMAGE.
29		*/
30
31		/**
32		* @author "Sebastián Grignoli" <[email protected]>
33		* @package Encoding
34		* @version 1.2
35		* @link https://github.com/neitanod/forceutf8
36		* @example https://github.com/neitanod/forceutf8
37		* @license Revised BSD
38		*/
39
40		namespace PHPDaemon\Utils;
41
42		class Encoding
43		{
44		use \PHPDaemon\Traits\ClassWatchdog;
45		use \PHPDaemon\Traits\StaticObjectWatchdog;
46
47		protected static $win1252ToUtf8 = array(
48		128 => "\xe2\x82\xac",
49
50		130 => "\xe2\x80\x9a",
51		131 => "\xc6\x92",
52		132 => "\xe2\x80\x9e",
53		133 => "\xe2\x80\xa6",
54		134 => "\xe2\x80\xa0",
55		135 => "\xe2\x80\xa1",
56		136 => "\xcb\x86",
57		137 => "\xe2\x80\xb0",
58		138 => "\xc5\xa0",
59		139 => "\xe2\x80\xb9",
60		140 => "\xc5\x92",
61
62		142 => "\xc5\xbd",
63
64
65		145 => "\xe2\x80\x98",
66		146 => "\xe2\x80\x99",
67		147 => "\xe2\x80\x9c",
68		148 => "\xe2\x80\x9d",
69		149 => "\xe2\x80\xa2",
70		150 => "\xe2\x80\x93",
71		151 => "\xe2\x80\x94",
72		152 => "\xcb\x9c",
73		153 => "\xe2\x84\xa2",
74		154 => "\xc5\xa1",
75		155 => "\xe2\x80\xba",
76		156 => "\xc5\x93",
77
78		158 => "\xc5\xbe",
79		159 => "\xc5\xb8"
80		);
81
82		protected static $brokenUtf8ToUtf8 = array(
83		"\xc2\x80" => "\xe2\x82\xac",
84
85		"\xc2\x82" => "\xe2\x80\x9a",
86		"\xc2\x83" => "\xc6\x92",
87		"\xc2\x84" => "\xe2\x80\x9e",
88		"\xc2\x85" => "\xe2\x80\xa6",
89		"\xc2\x86" => "\xe2\x80\xa0",
90		"\xc2\x87" => "\xe2\x80\xa1",
91		"\xc2\x88" => "\xcb\x86",
92		"\xc2\x89" => "\xe2\x80\xb0",
93		"\xc2\x8a" => "\xc5\xa0",
94		"\xc2\x8b" => "\xe2\x80\xb9",
95		"\xc2\x8c" => "\xc5\x92",
96
97		"\xc2\x8e" => "\xc5\xbd",
98
99
100		"\xc2\x91" => "\xe2\x80\x98",
101		"\xc2\x92" => "\xe2\x80\x99",
102		"\xc2\x93" => "\xe2\x80\x9c",
103		"\xc2\x94" => "\xe2\x80\x9d",
104		"\xc2\x95" => "\xe2\x80\xa2",
105		"\xc2\x96" => "\xe2\x80\x93",
106		"\xc2\x97" => "\xe2\x80\x94",
107		"\xc2\x98" => "\xcb\x9c",
108		"\xc2\x99" => "\xe2\x84\xa2",
109		"\xc2\x9a" => "\xc5\xa1",
110		"\xc2\x9b" => "\xe2\x80\xba",
111		"\xc2\x9c" => "\xc5\x93",
112
113		"\xc2\x9e" => "\xc5\xbe",
114		"\xc2\x9f" => "\xc5\xb8"
115		);
116
117		protected static $utf8ToWin1252 = array(
118		"\xe2\x82\xac" => "\x80",
119
120		"\xe2\x80\x9a" => "\x82",
121		"\xc6\x92" => "\x83",
122		"\xe2\x80\x9e" => "\x84",
123		"\xe2\x80\xa6" => "\x85",
124		"\xe2\x80\xa0" => "\x86",
125		"\xe2\x80\xa1" => "\x87",
126		"\xcb\x86" => "\x88",
127		"\xe2\x80\xb0" => "\x89",
128		"\xc5\xa0" => "\x8a",
129		"\xe2\x80\xb9" => "\x8b",
130		"\xc5\x92" => "\x8c",
131
132		"\xc5\xbd" => "\x8e",
133
134
135		"\xe2\x80\x98" => "\x91",
136		"\xe2\x80\x99" => "\x92",
137		"\xe2\x80\x9c" => "\x93",
138		"\xe2\x80\x9d" => "\x94",
139		"\xe2\x80\xa2" => "\x95",
140		"\xe2\x80\x93" => "\x96",
141		"\xe2\x80\x94" => "\x97",
142		"\xcb\x9c" => "\x98",
143		"\xe2\x84\xa2" => "\x99",
144		"\xc5\xa1" => "\x9a",
145		"\xe2\x80\xba" => "\x9b",
146		"\xc5\x93" => "\x9c",
147
148		"\xc5\xbe" => "\x9e",
149		"\xc5\xb8" => "\x9f"
150		);
151
152		/**
153		* toISO8859
154		* @param string $text Any string
155		* @return string The same string, Win1252 encoded
		0 ignored issues – show Documentation introduced 2016-05-18 23:30 UTC by Report Bug Copy Issue Report Should the return type not be `array\|string`? This check compares the return type specified in the `@return` annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch. Loading history...
156		*/
157		public static function toISO8859($text)
158		{
159		return self::toWin1252($text);
160		}
161
162		/**
163		* toWin1252
164		* @param string $text Any string
165		* @return string The same string, Win1252 encoded
		0 ignored issues – show Documentation introduced 2016-05-18 23:30 UTC by Report Bug Copy Issue Report Should the return type not be `array\|string`? This check compares the return type specified in the `@return` annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch. Loading history...
166		*/
167		public static function toWin1252($text)
168		{
169		if (is_array($text)) {
170		foreach ($text as $k => $v) {
171		$text[$k] = self::toWin1252($v);
172		}
173		return $text;
174		} elseif (is_string($text)) {
175		return utf8_decode(
176		str_replace(
177		array_keys(self::$utf8ToWin1252),
178		array_values(self::$utf8ToWin1252),
179		self::toUTF8($text)
180		)
181		);
182		} else {
183		return $text;
184		}
185		}
186
187		/**
188		* Function Encoding::toUTF8
189		*
190		* This function leaves UTF8 characters alone, while converting almost all non-UTF8 to UTF8.
191		*
192		* It assumes that the encoding of the original string is either Windows-1252 or ISO 8859-1.
193		*
194		* It may fail to convert characters to UTF-8 if they fall into one of these scenarios:
195		*
196		* 1) when any of these characters: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
197		* are followed by any of these: ("group B")
198		* ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶•¸¹º»¼½¾¿
199		* For example: %ABREPRESENT%C9%BB. «REPRESENTÉ»
200		* The "«" (%AB) character will be converted, but the "É" followed by "»" (%C9%BB)
201		* is also a valid unicode character, and will be left unchanged.
202		*
203		* 2) when any of these: àáâãäåæçèéêëìíîï are followed by TWO chars from group B,
204		* 3) when any of these: ðñòó are followed by THREE chars from group B.
205		*
206		* @name toUTF8
207		* @param string $text Any string
208		* @return string The same string, UTF8 encoded
		0 ignored issues – show Documentation introduced 2015-12-06 14:42 UTC by Report Bug Copy Issue Report Should the return type not be `array\|string`? This check compares the return type specified in the `@return` annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch. Loading history...
209		*
210		*/
211		public static function toUTF8($text)
212		{
213		if (is_array($text)) {
214		foreach ($text as $k => $v) {
215		$text[$k] = self::toUTF8($v);
216		}
217		return $text;
218		} elseif (is_string($text)) {
219		$max = mb_orig_strlen($text);
220
221		$buf = "";
222		for ($i = 0; $i < $max; $i++) {
223		$c1 = $text{$i};
224		if ($c1 >= "\xc0") { //Should be converted to UTF8, if it's not UTF8 already
225		$c2 = $i + 1 >= $max ? "\x00" : $text{$i + 1};
226		$c3 = $i + 2 >= $max ? "\x00" : $text{$i + 2};
227		$c4 = $i + 3 >= $max ? "\x00" : $text{$i + 3};
228		if ($c1 >= "\xc0" & $c1 <= "\xdf") { //looks like 2 bytes UTF8
229		if ($c2 >= "\x80" && $c2 <= "\xbf") { //yeah, almost sure it's UTF8 already
230		$buf .= $c1 . $c2;
231		$i++;
232		} else { //not valid UTF8. Convert it.
233		$cc1 = (chr(ord($c1) / 64) \| "\xc0");
234		$cc2 = ($c1 & "\x3f") \| "\x80";
235		$buf .= $cc1 . $cc2;
236		}
237	View Code Duplication	} elseif ($c1 >= "\xe0" & $c1 <= "\xef") { //looks like 3 bytes UTF8
		0 ignored issues – show Duplication introduced 2015-12-06 14:42 UTC by Report Bug Copy Issue Report This code seems to be duplicated across your project. Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation. You can also find more detailed suggestions in the “Code” section of your repository. Loading history...
238		if ($c2 >= "\x80" && $c2 <= "\xbf" && $c3 >= "\x80" && $c3 <= "\xbf") { //yeah, almost sure it's UTF8 already
239		$buf .= $c1 . $c2 . $c3;
240		$i = $i + 2;
241		} else { //not valid UTF8. Convert it.
242		$cc1 = (chr(ord($c1) / 64) \| "\xc0");
243		$cc2 = ($c1 & "\x3f") \| "\x80";
244		$buf .= $cc1 . $cc2;
245		}
246		} elseif ($c1 >= "\xf0" & $c1 <= "\xf7") { //looks like 4 bytes UTF8
247	View Code Duplication	if ($c2 >= "\x80" && $c2 <= "\xbf" && $c3 >= "\x80" && $c3 <= "\xbf" && $c4 >= "\x80" && $c4 <= "\xbf") { //yeah, almost sure it's UTF8 already
		0 ignored issues – show Duplication introduced 2015-12-06 14:42 UTC by Report Bug Copy Issue Report This code seems to be duplicated across your project. Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation. You can also find more detailed suggestions in the “Code” section of your repository. Loading history...
248		$buf .= $c1 . $c2 . $c3;
249		$i = $i + 2;
250		} else { //not valid UTF8. Convert it.
251		$cc1 = (chr(ord($c1) / 64) \| "\xc0");
252		$cc2 = ($c1 & "\x3f") \| "\x80";
253		$buf .= $cc1 . $cc2;
254		}
255		} else { //doesn't look like UTF8, but should be converted
256		$cc1 = (chr(ord($c1) / 64) \| "\xc0");
257		$cc2 = (($c1 & "\x3f") \| "\x80");
258		$buf .= $cc1 . $cc2;
259		}
260		} elseif (($c1 & "\xc0") == "\x80") { // needs conversion
261		if (isset(self::$win1252ToUtf8[ord($c1)])) { //found in Windows-1252 special cases
262		$buf .= self::$win1252ToUtf8[ord($c1)];
263		} else {
264		$cc1 = (chr(ord($c1) / 64) \| "\xc0");
265		$cc2 = (($c1 & "\x3f") \| "\x80");
266		$buf .= $cc1 . $cc2;
267		}
268		} else { // it doesn't need conversion
269		$buf .= $c1;
270		}
271		}
272
273		return $buf;
274		} else {
275		return $text;
276		}
277		}
278
279		/**
280		* fixUTF8
281		* @param string $text Any string
282		* @return string
		0 ignored issues – show Documentation introduced 2015-12-06 14:42 UTC by Report Bug Copy Issue Report Should the return type not be `array\|string`? This check compares the return type specified in the `@return` annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch. Loading history...
283		*/
284		public static function fixUTF8($text)
285		{
286		if (is_array($text)) {
287		foreach ($text as $k => $v) {
288		$text[$k] = self::fixUTF8($v);
289		}
290		return $text;
291		}
292
293		$last = "";
294		while ($last <> $text) {
295		$last = $text;
296		$text = self::toUTF8(
297		utf8_decode(
298		str_replace(array_keys(self::$utf8ToWin1252), array_values(self::$utf8ToWin1252), $text)
299		)
300		);
301		}
302
303		return self::toUTF8(
304		utf8_decode(
305		str_replace(array_keys(self::$utf8ToWin1252), array_values(self::$utf8ToWin1252), $text)
306		)
307		);
308		}
309
310		/**
311		* If you received an UTF-8 string that was converted from Windows-1252 as it was ISO8859-1
312		* (ignoring Windows-1252 chars from 80 to 9F) use this function to fix it.
313		* See: http://en.wikipedia.org/wiki/Windows-1252
314		* @param string $text Any string
315		* @return string
316		*/
317		public static function UTF8FixWin1252Chars($text)
318		{
319		return str_replace(array_keys(self::$brokenUtf8ToUtf8), array_values(self::$brokenUtf8ToUtf8), $text);
320		}
321
322		/**
323		* Remove BOM
324		* @param string $str Any string
325		* @return string
326		*/
327		public static function removeBOM($str = "")
328		{
329		if (substr($str, 0, 3) == pack("CCC", 0xef, 0xbb, 0xbf)) {
330		$str = substr($str, 3);
331		}
332		return $str;
333		}
334
335		/**
336		* Encode
337		* @param string $str Any string
		0 ignored issues – show Bug introduced 2015-12-06 14:42 UTC by Report Bug Copy Issue Report There is no parameter named `$str`. Was it maybe removed? This check looks for PHPDoc comments describing methods or function parameters that do not exist on the corresponding method or function. Consider the following example. The parameter `$italy` is not defined by the method `finale(...)`. /** * @param array $germany * @param array $island * @param array $italy */ function finale($germany, $island) { return "2:1"; } The most likely cause is that the parameter was removed, but the annotation was not. Loading history...
338		* @return string
		0 ignored issues – show Documentation introduced 2015-12-06 14:42 UTC by Report Bug Copy Issue Report Should the return type not be `array\|string\|null`? This check compares the return type specified in the `@return` annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch. Loading history...
339		*/
340		public static function encode($encodingLabel, $text)
341		{
342		$encodingLabel = self::normalizeEncoding($encodingLabel);
343		if ($encodingLabel === 'UTF-8') {
344		return Encoding::toUTF8($text);
345		}
346		if ($encodingLabel === 'ISO-8859-1') {
347		return Encoding::toLatin1($text);
348		}
349		}
350
351		/**
352		* Normalize encoding name
353		* @param string $str Encoding name
		0 ignored issues – show Bug introduced 2015-12-06 14:42 UTC by Report Bug Copy Issue Report There is no parameter named `$str`. Was it maybe removed? This check looks for PHPDoc comments describing methods or function parameters that do not exist on the corresponding method or function. Consider the following example. The parameter `$italy` is not defined by the method `finale(...)`. /** * @param array $germany * @param array $island * @param array $italy */ function finale($germany, $island) { return "2:1"; } The most likely cause is that the parameter was removed, but the annotation was not. Loading history...
354		* @return string
355		*/
356		public static function normalizeEncoding($encodingLabel)
357		{
358		$encoding = strtoupper($encodingLabel);
359		$encoding = preg_replace('/[^a-zA-Z0-9\s]/', '', $encoding);
360		$equivalences = array(
361		'ISO88591' => 'ISO-8859-1',
362		'ISO8859' => 'ISO-8859-1',
363		'ISO' => 'ISO-8859-1',
364		'LATIN1' => 'ISO-8859-1',
365		'LATIN' => 'ISO-8859-1',
366		'UTF8' => 'UTF-8',
367		'UTF' => 'UTF-8',
368		'WIN1252' => 'ISO-8859-1',
369		'WINDOWS1252' => 'ISO-8859-1'
370		);
371
372		if (empty($equivalences[$encoding])) {
373		return 'UTF-8';
374		}
375
376		return $equivalences[$encoding];
377		}
378
379		/**
380		* toLatin1
381		* @param string $text Any string
382		* @return string The same string, Win1252 encoded
		0 ignored issues – show Documentation introduced 2016-05-18 23:30 UTC by Report Bug Copy Issue Report Should the return type not be `array\|string`? This check compares the return type specified in the `@return` annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch. Loading history...
383		*/
384		public static function toLatin1($text)
385		{
386		return self::toWin1252($text);
387		}
388		}
389

kakserpom / phpdaemon

Encoding B last analyzed 2021-06-02 17:17 UTC