Completed
Push — master ( 60ba61...279e03 )
by Auke
87:58 queued 69:03
created

h2tHtmlParser::skipToBlanksOrSlashInTag()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 3
Code Lines 2

Duplication

Lines 0
Ratio 0 %
Metric Value
cc 1
eloc 2
nc 1
nop 0
dl 0
loc 3
rs 10
1
<?php
2
3
/*
4
 * Copyright (c) 2003 Jose Solorzano.  All rights reserved.
5
 * Redistribution of source must retain this copyright notice.
6
 *
7
 * Jose Solorzano (http://jexpert.us) is a software consultant.
8
 *
9
 * Contributions by:
10
 * - Leo West (performance improvements)
11
 */
12
13
define ("NODE_TYPE_START",0);
14
define ("NODE_TYPE_ELEMENT",1);
15
define ("NODE_TYPE_ENDELEMENT",2);
16
define ("NODE_TYPE_TEXT",3);
17
define ("NODE_TYPE_COMMENT",4);
18
define ("NODE_TYPE_DONE",5);
19
20
/**
21
 * Class h2tHtmlParser.
22
 * To use, create an instance of the class passing
23
 * HTML text. Then invoke parse() until it's false.
24
 * When parse() returns true, $iNodeType, $iNodeName
25
 * $iNodeValue and $iNodeAttributes are updated.
26
 *
27
 * To create an h2tHtmlParser instance you may also
28
 * use convenience functions h2tHtmlParser_ForFile
29
 * and h2tHtmlParser_ForURL.
30
 */
31
class h2tHtmlParser {
32
33
    /**
34
     * Field iNodeType.
35
     * May be one of the NODE_TYPE_* constants above.
36
     */
37
    var $iNodeType;
38
39
    /**
40
     * Field iNodeName.
41
     * For elements, it's the name of the element.
42
     */
43
    var $iNodeName = "";
44
45
    /**
46
     * Field iNodeValue.
47
     * For text nodes, it's the text.
48
     */
49
    var $iNodeValue = "";
50
51
    /**
52
     * Field iNodeAttributes.
53
     * A string-indexed array containing attribute values
54
     * of the current node. Indexes are always lowercase.
55
     */
56
    var $iNodeAttributes;
57
58
    // The following fields should be 
59
    // considered private:
60
61
    var $iHtmlText;
62
    var $iHtmlTextLength;
63
    var $iHtmlTextIndex = 0;
64
    var $iHtmlCurrentChar;
65
    var $BOE_ARRAY;
66
    var $B_ARRAY;
67
    var $BOS_ARRAY;
68
    
69
    /**
70
     * Constructor.
71
     * Constructs an h2tHtmlParser instance with
72
     * the HTML text given.
73
     */
74
    function __construct ($aHtmlText) {
75
        $this->iHtmlText = $aHtmlText;
76
        $this->iHtmlTextLength = strlen($aHtmlText);
77
        $this->iNodeAttributes = array();
78
        $this->setTextIndex (0);
79
80
        $this->BOE_ARRAY = array (" ", "\t", "\r", "\n", "=" );
81
        $this->B_ARRAY = array (" ", "\t", "\r", "\n" );
82
        $this->BOS_ARRAY = array (" ", "\t", "\r", "\n", "/" );
83
    }
84
85
    /**
86
     * Method parse.
87
     * Parses the next node. Returns false only if
88
     * the end of the HTML text has been reached.
89
     * Updates values of iNode* fields.
90
     */
91
    function parse() {
92
        $text = $this->skipToElement();
93 View Code Duplication
        if ($text != "") {
94
            $this->iNodeType = NODE_TYPE_TEXT;
95
            $this->iNodeName = "Text";
96
            $this->iNodeValue = $text;
97
            return true;
98
        }
99
        return $this->readTag();
100
    }
101
102
    function clearAttributes() {
103
        $this->iNodeAttributes = array();
104
    }
105
106
    function readTag() {
107
        if ($this->iCurrentChar != "<") {
0 ignored issues
show
Bug introduced by
The property iCurrentChar does not exist. Did you maybe forget to declare it?

In PHP it is possible to write to properties without declaring them. For example, the following is perfectly valid PHP code:

class MyClass { }

$x = new MyClass();
$x->foo = true;

Generally, it is a good practice to explictly declare properties to avoid accidental typos and provide IDE auto-completion:

class MyClass {
    public $foo;
}

$x = new MyClass();
$x->foo = true;
Loading history...
108
            $this->iNodeType = NODE_TYPE_DONE;
109
            return false;
110
        }
111
        $this->clearAttributes();
112
        $this->skipMaxInTag ("<", 1);
113
        if ($this->iCurrentChar == '/') {
114
            $this->moveNext();
115
            $name = $this->skipToBlanksInTag();
116
            $this->iNodeType = NODE_TYPE_ENDELEMENT;
117
            $this->iNodeName = $name;
118
            $this->iNodeValue = "";            
119
            $this->skipEndOfTag();
120
            return true;
121
        }
122
        $name = $this->skipToBlanksOrSlashInTag();
123
        if (!$this->isValidTagIdentifier ($name)) {
124
                $comment = false;
125
                if (strpos($name, "!--") === 0) {
126
                    $ppos = strpos($name, "--", 3);
0 ignored issues
show
Unused Code introduced by
$ppos is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
127
                    if (strpos($name, "--", 3) === (strlen($name) - 2)) {
128
                        $this->iNodeType = NODE_TYPE_COMMENT;
129
                        $this->iNodeName = "Comment";
130
                        $this->iNodeValue = "<" . $name . ">";
131
                        $comment = true;                        
132
                    }
133
                    else {
134
                        $rest = $this->skipToStringInTag ("-->");    
135
                        if ($rest != "") {
136
                            $this->iNodeType = NODE_TYPE_COMMENT;
137
                            $this->iNodeName = "Comment";
138
                            $this->iNodeValue = "<" . $name . $rest;
139
                            $comment = true;
0 ignored issues
show
Unused Code introduced by
$comment is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
140
                            // Already skipped end of tag
141
                            return true;
142
                        }
143
                    }
144
                }
145 View Code Duplication
                if (!$comment) {
146
                    $this->iNodeType = NODE_TYPE_TEXT;
147
                    $this->iNodeName = "Text";
148
                    $this->iNodeValue = "<" . $name;
149
                    return true;
150
                }
151
        }
152
        else {
153
                $this->iNodeType = NODE_TYPE_ELEMENT;
154
                $this->iNodeValue = "";
155
                $this->iNodeName = $name;
156
                while ($this->skipBlanksInTag()) {
157
                    $attrName = $this->skipToBlanksOrEqualsInTag();
158
                    if ($attrName != "" && $attrName != "/") {
159
                        $this->skipBlanksInTag();
160
                        if ($this->iCurrentChar == "=") {
161
                            $this->skipEqualsInTag();
162
                            $this->skipBlanksInTag();
163
                            $value = $this->readValueInTag();
164
                            $this->iNodeAttributes[strtolower($attrName)] = $value;
165
                        }
166
                        else {
167
                            $this->iNodeAttributes[strtolower($attrName)] = "";
168
                        }
169
                    }
170
                }
171
        }
172
        $this->skipEndOfTag();
173
        return true;            
174
    }
175
176
    function isValidTagIdentifier ($name) {
177
        return preg_match("/^[A-Za-z0-9_\\-]+$/", $name);
178
    }
179
    
180
    function skipBlanksInTag() {
181
        return "" != ($this->skipInTag ($this->B_ARRAY));
182
    }
183
184
    function skipToBlanksOrEqualsInTag() {
185
        return $this->skipToInTag ($this->BOE_ARRAY);
186
    }
187
188
    function skipToBlanksInTag() {
189
        return $this->skipToInTag ($this->B_ARRAY);
190
    }
191
192
    function skipToBlanksOrSlashInTag() {
193
        return $this->skipToInTag ($this->BOS_ARRAY);
194
    }
195
196
    function skipEqualsInTag() {
197
        return $this->skipMaxInTag ("=", 1);
198
    }
199
200
    function readValueInTag() {
201
        $ch = $this->iCurrentChar;
202
        $value = "";
0 ignored issues
show
Unused Code introduced by
$value is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
203
        if ($ch == "\"") {
204
            $this->skipMaxInTag ("\"", 1);
205
            $value = $this->skipToInTag ("\"");
206
            $this->skipMaxInTag ("\"", 1);
207
        }
208
        else if ($ch == "'") {
209
            $this->skipMaxInTag ("'", 1);
210
            $value = $this->skipToInTag ("'");
211
            $this->skipMaxInTag ("'", 1);
212
        }                
213
        else {
214
            $value = $this->skipToBlanksInTag();
215
        }
216
        return $value;
217
    }
218
219
    function setTextIndex ($index) {
220
        $this->iHtmlTextIndex = $index;
221
        if ($index >= $this->iHtmlTextLength) {
222
            $this->iCurrentChar = -1;
223
        }
224
        else {
225
            $this->iCurrentChar = $this->iHtmlText{$index};
226
        }
227
    }
228
229
    function moveNext() {
230
        if ($this->iHtmlTextIndex < $this->iHtmlTextLength) {
231
            $this->setTextIndex ($this->iHtmlTextIndex + 1);
232
            return true;
233
        }
234
        else {
235
            return false;
236
        }
237
    }
238
239
    function skipEndOfTag() {
240
        while (($ch = $this->iCurrentChar) !== -1) {
241
            if ($ch == ">") {
242
                $this->moveNext();
243
                return;
244
            }
245
            $this->moveNext();
246
        }
247
    }
248
249 View Code Duplication
    function skipInTag ($chars) {
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
250
        $sb = "";
251
        while (($ch = $this->iCurrentChar) !== -1) {
252
            if ($ch == ">") {
253
                return $sb;
254
            } else {
255
                $match = false;
256
                for ($idx = 0; $idx < count($chars); $idx++) {
0 ignored issues
show
Performance Best Practice introduced by
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
257
                    if ($ch == $chars[$idx]) {
258
                        $match = true;
259
                        break;
260
                    }
261
                }
262
                if (!$match) {
263
                    return $sb;
264
                }
265
                $sb .= $ch;
266
                $this->moveNext();
267
            }
268
        }
269
        return $sb;
270
    }
271
272
    function skipMaxInTag ($chars, $maxChars) {
273
        $sb = "";
274
        $count = 0;
275
        while (($ch = $this->iCurrentChar) !== -1 && $count++ < $maxChars) {
276
            if ($ch == ">") {
277
                return $sb;
278
            } else {
279
                $match = false;
280
                for ($idx = 0; $idx < count($chars); $idx++) {
0 ignored issues
show
Performance Best Practice introduced by
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
281
                    if ($ch == $chars[$idx]) {
282
                        $match = true;
283
                        break;
284
                    }
285
                }
286
                if (!$match) {
287
                    return $sb;
288
                }
289
                $sb .= $ch;
290
                $this->moveNext();
291
            }
292
        }
293
        return $sb;
294
    }
295
296 View Code Duplication
    function skipToInTag ($chars) {
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
297
        $sb = "";
298
        while (($ch = $this->iCurrentChar) !== -1) {
299
            $match = $ch == ">";
300
            if (!$match) {
301
                for ($idx = 0; $idx < count($chars); $idx++) {
0 ignored issues
show
Performance Best Practice introduced by
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
302
                    if ($ch == $chars[$idx]) {
303
                        $match = true;
304
                        break;
305
                    }
306
                }
307
            }
308
            if ($match) {
309
                return $sb;
310
            }
311
            $sb .= $ch;
312
            $this->moveNext();
313
        }
314
        return $sb;
315
    }
316
317
    function skipToElement() {
318
        $sb = "";
319
        while (($ch = $this->iCurrentChar) !== -1) {
320
            if ($ch == "<") {
321
                return $sb;
322
            }
323
            $sb .= $ch;
324
            $this->moveNext();
325
        }
326
        return $sb;             
327
    }
328
329
    /**
330
     * Returns text between current position and $needle,
331
     * inclusive, or "" if not found. The current index is moved to a point
332
     * after the location of $needle, or not moved at all
333
     * if nothing is found.
334
     */
335
    function skipToStringInTag ($needle) {
336
        $pos = strpos ($this->iHtmlText, $needle, $this->iHtmlTextIndex);
337
        if ($pos === false) {
338
            return "";
339
        }
340
        $top = $pos + strlen($needle);
341
        $retvalue = substr ($this->iHtmlText, $this->iHtmlTextIndex, $top - $this->iHtmlTextIndex);
342
        $this->setTextIndex ($top);
343
        return $retvalue;
344
    }
345
}
346
347
function h2tHtmlParser_ForFile ($fileName) { 
348
    return h2tHtmlParser_ForURL($fileName);
349
}
350
351
function h2tHtmlParser_ForURL ($url) {
352
    $fp = fopen ($url, "r");
353
    $content = "";
354
    while (true) {
355
        $data = fread ($fp, 8192);
356
        if (strlen($data) == 0) {
357
            break;
358
        }
359
        $content .= $data;
360
    }
361
    fclose ($fp);
362
    return new h2tHtmlParser ($content);
363
}
364
365
php?>
0 ignored issues
show
Best Practice introduced by
It is not recommended to use PHP's closing tag ?> in files other than templates.

Using a closing tag in PHP files that only contain PHP code is not recommended as you might accidentally add whitespace after the closing tag which would then be output by PHP. This can cause severe problems, for example headers cannot be sent anymore.

A simple precaution is to leave off the closing tag as it is not required, and it also has no negative effects whatsoever.

Loading history...