Completed
Push — master ( ed92f4...60669a )
by Daniel
03:02
created

thirdparty/html5lib/HTML5/Data.php (11 issues)

Upgrade to new PHP Analysis Engine

These results are based on our legacy PHP analysis, consider migrating to our new PHP analysis engine instead. Learn more

1
<?php
2
3
// warning: this file is encoded in UTF-8!
4
5
class HTML5_Data
0 ignored issues
show
Coding Style Compatibility introduced by
PSR1 recommends that each class must be in a namespace of at least one level to avoid collisions.

You can fix this by adding a namespace to your class:

namespace YourVendor;

class YourClass { }

When choosing a vendor namespace, try to pick something that is not too generic to avoid conflicts with other libraries.

Loading history...
6
{
7
8
    // at some point this should be moved to a .ser file. Another
9
    // possible optimization is to give UTF-8 bytes, not Unicode
10
    // codepoints
11
    // XXX: Not quite sure why it's named this; this is
12
    // actually the numeric entity dereference table.
13
    protected static $realCodepointTable = array(
14
        0x00 => 0xFFFD, // REPLACEMENT CHARACTER
15
        0x0D => 0x000A, // LINE FEED (LF)
16
        0x80 => 0x20AC, // EURO SIGN ('€')
0 ignored issues
show
Unused Code Comprehensibility introduced by
38% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
17
        0x81 => 0x0081, // <control>
18
        0x82 => 0x201A, // SINGLE LOW-9 QUOTATION MARK ('‚')
19
        0x83 => 0x0192, // LATIN SMALL LETTER F WITH HOOK ('ƒ')
20
        0x84 => 0x201E, // DOUBLE LOW-9 QUOTATION MARK ('„')
21
        0x85 => 0x2026, // HORIZONTAL ELLIPSIS ('…')
0 ignored issues
show
Unused Code Comprehensibility introduced by
38% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
22
        0x86 => 0x2020, // DAGGER ('†')
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
23
        0x87 => 0x2021, // DOUBLE DAGGER ('‡')
0 ignored issues
show
Unused Code Comprehensibility introduced by
38% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
24
        0x88 => 0x02C6, // MODIFIER LETTER CIRCUMFLEX ACCENT ('ˆ')
25
        0x89 => 0x2030, // PER MILLE SIGN ('‰')
26
        0x8A => 0x0160, // LATIN CAPITAL LETTER S WITH CARON ('Š')
27
        0x8B => 0x2039, // SINGLE LEFT-POINTING ANGLE QUOTATION MARK ('‹')
28
        0x8C => 0x0152, // LATIN CAPITAL LIGATURE OE ('Œ')
29
        0x8D => 0x008D, // <control>
30
        0x8E => 0x017D, // LATIN CAPITAL LETTER Z WITH CARON ('Ž')
31
        0x8F => 0x008F, // <control>
32
        0x90 => 0x0090, // <control>
33
        0x91 => 0x2018, // LEFT SINGLE QUOTATION MARK ('‘')
34
        0x92 => 0x2019, // RIGHT SINGLE QUOTATION MARK ('’')
35
        0x93 => 0x201C, // LEFT DOUBLE QUOTATION MARK ('“')
36
        0x94 => 0x201D, // RIGHT DOUBLE QUOTATION MARK ('”')
37
        0x95 => 0x2022, // BULLET ('•')
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
38
        0x96 => 0x2013, // EN DASH ('–')
0 ignored issues
show
Unused Code Comprehensibility introduced by
38% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
39
        0x97 => 0x2014, // EM DASH ('—')
0 ignored issues
show
Unused Code Comprehensibility introduced by
38% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
40
        0x98 => 0x02DC, // SMALL TILDE ('˜')
0 ignored issues
show
Unused Code Comprehensibility introduced by
38% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
41
        0x99 => 0x2122, // TRADE MARK SIGN ('™')
42
        0x9A => 0x0161, // LATIN SMALL LETTER S WITH CARON ('š')
43
        0x9B => 0x203A, // SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ('›')
44
        0x9C => 0x0153, // LATIN SMALL LIGATURE OE ('œ')
45
        0x9D => 0x009D, // <control>
46
        0x9E => 0x017E, // LATIN SMALL LETTER Z WITH CARON ('ž')
47
        0x9F => 0x0178, // LATIN CAPITAL LETTER Y WITH DIAERESIS ('Ÿ')
48
    );
49
50
    protected static $namedCharacterReferences;
51
52
    protected static $namedCharacterReferenceMaxLength;
53
54
    /**
55
     * Returns the "real" Unicode codepoint of a malformed character
56
     * reference.
57
     */
58
    public static function getRealCodepoint($ref) {
59
        if (!isset(self::$realCodepointTable[$ref])) return false;
60
        else return self::$realCodepointTable[$ref];
61
    }
62
63
    public static function getNamedCharacterReferences() {
64
        if (!self::$namedCharacterReferences) {
65
            self::$namedCharacterReferences = unserialize(
66
                file_get_contents(dirname(__FILE__) . '/named-character-references.ser'));
67
        }
68
        return self::$namedCharacterReferences;
69
    }
70
71
    /**
72
     * Converts a Unicode codepoint to sequence of UTF-8 bytes.
73
     * @note Shamelessly stolen from HTML Purifier, which is also
74
     *       shamelessly stolen from Feyd (which is in public domain).
75
     */
76
    public static function utf8chr($code) {
77
        /* We don't care: we live dangerously
78
         * if($code > 0x10FFFF or $code < 0x0 or
79
          ($code >= 0xD800 and $code <= 0xDFFF) ) {
80
            // bits are set outside the "valid" range as defined
81
            // by UNICODE 4.1.0
82
            return "\xEF\xBF\xBD";
83
          }*/
84
85
        $x = $y = $z = $w = 0;
0 ignored issues
show
$x is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
86
        if ($code < 0x80) {
87
            // regular ASCII character
88
            $x = $code;
89
        } else {
90
            // set up bits for UTF-8
91
            $x = ($code & 0x3F) | 0x80;
92
            if ($code < 0x800) {
93
               $y = (($code & 0x7FF) >> 6) | 0xC0;
94
            } else {
95
                $y = (($code & 0xFC0) >> 6) | 0x80;
96
                if($code < 0x10000) {
97
                    $z = (($code >> 12) & 0x0F) | 0xE0;
98
                } else {
99
                    $z = (($code >> 12) & 0x3F) | 0x80;
100
                    $w = (($code >> 18) & 0x07) | 0xF0;
101
                }
102
            }
103
        }
104
        // set up the actual character
105
        $ret = '';
106
        if($w) $ret .= chr($w);
107
        if($z) $ret .= chr($z);
108
        if($y) $ret .= chr($y);
109
        $ret .= chr($x);
110
111
        return $ret;
112
    }
113
114
}
0 ignored issues
show
According to PSR2, the closing brace of classes should be placed on the next line directly after the body.

Below you find some examples:

// Incorrect placement according to PSR2
class MyClass
{
    public function foo()
    {

    }
    // This blank line is not allowed.

}

// Correct
class MyClass
{
    public function foo()
    {

    } // No blank lines after this line.
}
Loading history...
115