simple_html_dom::parse()   A
last analyzed

Complexity

Conditions 2
Paths 2

Size

Total Lines 12
Code Lines 7

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 1 Features 0
Metric Value
cc 2
eloc 7
c 1
b 1
f 0
nc 2
nop 0
dl 0
loc 12
rs 10
1
<?php
2
/**
3
 * Website: http://sourceforge.net/projects/simplehtmldom/
4
 * Additional projects that may be used: http://sourceforge.net/projects/debugobject/
5
 * Acknowledge: Jose Solorzano (https://sourceforge.net/projects/php-html/)
6
 * Contributions by:
7
 *	 Yousuke Kumakura (Attribute filters)
8
 *	 Vadim Voituk (Negative indexes supports of "find" method)
9
 *	 Antcs (Constructor with automatically load contents either text or file/url)
10
 *
11
 * all affected sections have comments starting with "PaperG"
12
 *
13
 * Paperg - Added case insensitive testing of the value of the selector.
14
 * Paperg - Added tag_start for the starting index of tags - NOTE: This works but not accurately.
15
 *  This tag_start gets counted AFTER \r\n have been crushed out, and after the remove_noice calls so it will not reflect the REAL position of the tag in the source,
16
 *  it will almost always be smaller by some amount.
17
 *  We use this to determine how far into the file the tag in question is.  This "percentage will never be accurate as the $dom->size is the "real" number of bytes the dom was created from.
18
 *  but for most purposes, it's a really good estimation.
19
 * Paperg - Added the forceTagsClosed to the dom constructor.  Forcing tags closed is great for malformed html, but it CAN lead to parsing errors.
20
 * Allow the user to tell us how much they trust the html.
21
 * Paperg add the text and plaintext to the selectors for the find syntax.  plaintext implies text in the innertext of a node.  text implies that the tag is a text node.
22
 * This allows for us to find tags based on the text they contain. 
23
* Create find_ancestor_tag to see if a tag is - at any level - inside of another specific tag.
24
 * Paperg: added parse_charset so that we know about the character set of the source document.
25
 *  NOTE:  If the user's system has a routine called get_last_retrieve_url_contents_content_type availalbe, we will assume it's returning the content-type header from the
26
 *  last transfer or curl_exec, and we will parse that and use it in preference to any other method of charset detection.
27
 *
28
 * Found infinite loop in the case of broken html in restore_noise.  Rewrote to protect from that.
29
 * PaperG (John Schlick) Added get_display_size for "IMG" tags.
30
 *
31
 * Licensed under The MIT License
32
 * Redistributions of files must retain the above copyright notice.
33
 *
34
 * @author S.C. Chen <[email protected]>
35
 * @author John Schlick
36
 * @author Rus Carroll
37
 * @version 1.5 ($Rev: 210 $)
38
 * @package PlaceLocalInclude
39
 * @subpackage simple_html_dom
40
 */
41
42
/**
43
 * All of the Defines for the classes below.
44
 * @author S.C. Chen <[email protected]>
45
 */
46
define('HDOM_TYPE_ELEMENT', 1);
47
define('HDOM_TYPE_COMMENT', 2);
48
define('HDOM_TYPE_TEXT', 3);
49
define('HDOM_TYPE_ENDTAG', 4);
50
define('HDOM_TYPE_ROOT', 5);
51
define('HDOM_TYPE_UNKNOWN', 6);
52
define('HDOM_QUOTE_DOUBLE', 0);
53
define('HDOM_QUOTE_SINGLE', 1);
54
define('HDOM_QUOTE_NO', 3);
55
define('HDOM_INFO_BEGIN', 0);
56
define('HDOM_INFO_END', 1);
57
define('HDOM_INFO_QUOTE', 2);
58
define('HDOM_INFO_SPACE', 3);
59
define('HDOM_INFO_TEXT', 4);
60
define('HDOM_INFO_INNER', 5);
61
define('HDOM_INFO_OUTER', 6);
62
define('HDOM_INFO_ENDSPACE', 7);
63
define('DEFAULT_TARGET_CHARSET', 'UTF-8');
64
define('DEFAULT_BR_TEXT', "\r\n");
65
define('DEFAULT_SPAN_TEXT', " ");
66
define('MAX_FILE_SIZE', 600000);
67
// helper functions
68
// -----------------------------------------------------------------------------
69
// get html dom from file
70
// $maxlen is defined in the code as PHP_STREAM_COPY_ALL which is defined as -1.
71
function file_get_html($url, $use_include_path = false, $context = null, $offset = -1, $maxLen = -1, $lowercase = true, $forceTagsClosed = true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN = true, $defaultBRText = DEFAULT_BR_TEXT, $defaultSpanText = DEFAULT_SPAN_TEXT)
0 ignored issues
show
Unused Code introduced by
The parameter $maxLen is not used and could be removed. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unused  annotation

71
function file_get_html($url, $use_include_path = false, $context = null, $offset = -1, /** @scrutinizer ignore-unused */ $maxLen = -1, $lowercase = true, $forceTagsClosed = true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN = true, $defaultBRText = DEFAULT_BR_TEXT, $defaultSpanText = DEFAULT_SPAN_TEXT)

This check looks for parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
72
{
73
    // We DO force the tags to be terminated.
74
    $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText);
75
    // For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done.
76
    $contents = file_get_contents($url, $use_include_path, $context, $offset);
77
    // Paperg - use our own mechanism for getting the contents as we want to control the timeout.
78
    //$contents = retrieve_url_contents($url);
79
    if (empty($contents) || strlen($contents) > MAX_FILE_SIZE) {
80
        return false;
81
    }
82
    // The second parameter can force the selectors to all be lowercase.
83
    $dom->load($contents, $lowercase, $stripRN);
84
    return $dom;
85
}
86
87
// get html dom from string
88
function str_get_html($str, $lowercase = true, $forceTagsClosed = true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN = true, $defaultBRText = DEFAULT_BR_TEXT, $defaultSpanText = DEFAULT_SPAN_TEXT)
89
{
90
    $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText);
91
    if (empty($str) || strlen($str) > MAX_FILE_SIZE) {
92
        $dom->clear();
93
        return false;
94
    }
95
    $dom->load($str, $lowercase, $stripRN);
96
    return $dom;
97
}
98
99
// dump html dom tree
100
function dump_html_tree($node, $show_attr = true, $deep = 0)
0 ignored issues
show
Unused Code introduced by
The parameter $deep is not used and could be removed. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unused  annotation

100
function dump_html_tree($node, $show_attr = true, /** @scrutinizer ignore-unused */ $deep = 0)

This check looks for parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
Unused Code introduced by
The parameter $show_attr is not used and could be removed. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unused  annotation

100
function dump_html_tree($node, /** @scrutinizer ignore-unused */ $show_attr = true, $deep = 0)

This check looks for parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
101
{
102
    $node->dump($node);
103
}
104
105
/**
106
 * simple html dom node
107
 * PaperG - added ability for "find" routine to lowercase the value of the selector.
108
 * PaperG - added $tag_start to track the start position of the tag in the total byte index
109
 *
110
 * @package PlaceLocalInclude
111
 */
112
class simple_html_dom_node
113
{
114
    public $nodetype = HDOM_TYPE_TEXT;
115
    public $tag = 'text';
116
    public $attr = array();
117
    public $children = array();
118
    public $nodes = array();
119
    public $parent = null;
120
    // The "info" array - see HDOM_INFO_... for what each element contains.
121
    public $_ = array();
122
    public $tag_start = 0;
123
    private $dom = null;
124
125
    public function __construct($dom)
126
    {
127
        $this->dom = $dom;
128
        $dom->nodes[] = $this;
129
    }
130
131
    public function __destruct()
132
    {
133
        $this->clear();
134
    }
135
136
    public function __toString()
137
    {
138
        return $this->outertext();
139
    }
140
141
    // clean up memory due to php5 circular references memory leak...
142
    public function clear()
143
    {
144
        $this->dom = null;
145
        $this->nodes = null;
146
        $this->parent = null;
147
        $this->children = null;
148
    }
149
150
    // dump node's tree
151
    public function dump($show_attr = true, $deep = 0)
152
    {
153
        $lead = str_repeat('	', $deep);
154
155
        echo $lead . $this->tag;
156
        if ($show_attr && count($this->attr) > 0) {
157
            echo '(';
158
            foreach ($this->attr as $k => $v) {
159
                echo "[$k]=>\"" . $this->$k . '", ';
160
            }
161
            echo ')';
162
        }
163
        echo "\n";
164
165
        if ($this->nodes) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->nodes of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
166
            foreach ($this->nodes as $c) {
167
                $c->dump($show_attr, $deep + 1);
168
            }
169
        }
170
    }
171
172
    // Debugging function to dump a single dom node with a bunch of information about it.
173
    public function dump_node($node, $echo = true)
174
    {
175
        $string = $this->tag;
176
        if (count($this->attr) > 0) {
177
            $string .= '(';
178
            foreach ($this->attr as $k => $v) {
179
                $string .= "[$k]=>\"" . $this->$k . '", ';
180
            }
181
            $string .= ')';
182
        }
183
        if (count($this->_) > 0) {
184
            $string .= ' $_ (';
185
            foreach ($this->_ as $k => $v) {
186
                if (is_array($v)) {
187
                    $string .= "[$k]=>(";
188
                    foreach ($v as $k2 => $v2) {
189
                        $string .= "[$k2]=>\"" . $v2 . '", ';
190
                    }
191
                    $string .= ")";
192
                } else {
193
                    $string .= "[$k]=>\"" . $v . '", ';
194
                }
195
            }
196
            $string .= ")";
197
        }
198
199
        if (isset($this->text)) {
0 ignored issues
show
Bug Best Practice introduced by
The property text does not exist on simple_html_dom_node. Since you implemented __get, consider adding a @property annotation.
Loading history...
200
            $string .= " text: (" . $this->text . ")";
201
        }
202
203
        $string .= " HDOM_INNER_INFO: '";
204
        if (isset($node->_[HDOM_INFO_INNER])) {
205
            $string .= $node->_[HDOM_INFO_INNER] . "'";
206
        } else {
207
            $string .= ' NULL ';
208
        }
209
210
        $string .= " children: " . count($this->children);
211
        $string .= " nodes: " . count($this->nodes);
212
        $string .= " tag_start: " . $this->tag_start;
213
        $string .= "\n";
214
215
        if ($echo) {
216
            echo $string;
217
            return;
218
        } else {
219
            return $string;
220
        }
221
    }
222
223
    // returns the parent of node
224
    // If a node is passed in, it will reset the parent of the current node to that one.
225
    public function parent($parent = null)
226
    {
227
        // I am SURE that this doesn't work properly.
228
        // It fails to unset the current node from it's current parents nodes or children list first.
229
        if ($parent !== null) {
230
            $this->parent = $parent;
231
            $this->parent->nodes[] = $this;
232
            $this->parent->children[] = $this;
233
        }
234
235
        return $this->parent;
236
    }
237
238
    // verify that node has children
239
    public function has_child()
240
    {
241
        return !empty($this->children);
242
    }
243
244
    // returns children of node
245
    public function children($idx = -1)
246
    {
247
        if ($idx === -1) {
248
            return $this->children;
249
        }
250
        if (isset($this->children[$idx])) {
251
            return $this->children[$idx];
252
        }
253
        return null;
254
    }
255
256
    // returns the first child of node
257
    public function first_child()
258
    {
259
        if (count($this->children) > 0) {
260
            return $this->children[0];
261
        }
262
        return null;
263
    }
264
265
    // returns the last child of node
266
    public function last_child()
267
    {
268
        if (($count=count($this->children))>0) {
269
            return $this->children[$count-1];
270
        }
271
        return null;
272
    }
273
274
    // returns the next sibling of node
275
    public function next_sibling()
276
    {
277
        if ($this->parent===null) {
278
            return null;
279
        }
280
281
        $idx = 0;
282
        $count = count($this->parent->children);
283
        while ($idx<$count && $this!==$this->parent->children[$idx]) {
284
            ++$idx;
285
        }
286
        if (++$idx>=$count) {
287
            return null;
288
        }
289
        return $this->parent->children[$idx];
290
    }
291
292
    // returns the previous sibling of node
293
    public function prev_sibling()
294
    {
295
        if ($this->parent===null) {
296
            return null;
297
        }
298
        $idx = 0;
299
        $count = count($this->parent->children);
300
        while ($idx<$count && $this!==$this->parent->children[$idx]) {
301
            ++$idx;
302
        }
303
        if (--$idx<0) {
304
            return null;
305
        }
306
        return $this->parent->children[$idx];
307
    }
308
309
    // function to locate a specific ancestor tag in the path to the root.
310
    public function find_ancestor_tag($tag)
311
    {
312
        global $debug_object;
313
        if (is_object($debug_object)) {
314
            $debug_object->debug_log_entry(1);
315
        }
316
317
        // Start by including ourselves in the comparison.
318
        $returnDom = $this;
319
320
        while (!is_null($returnDom)) {
321
            if (is_object($debug_object)) {
322
                $debug_object->debug_log(2, "Current tag is: " . $returnDom->tag);
323
            }
324
325
            if ($returnDom->tag == $tag) {
326
                break;
327
            }
328
            $returnDom = $returnDom->parent;
329
        }
330
        return $returnDom;
331
    }
332
333
    // get dom node's inner html
334
    public function innertext()
335
    {
336
        if (isset($this->_[HDOM_INFO_INNER])) {
337
            return $this->_[HDOM_INFO_INNER];
338
        }
339
        if (isset($this->_[HDOM_INFO_TEXT])) {
340
            return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);
341
        }
342
343
        $ret = '';
344
        foreach ($this->nodes as $n) {
345
            $ret .= $n->outertext();
346
        }
347
        return $ret;
348
    }
349
350
    // get dom node's outer text (with tag)
351
    public function outertext()
352
    {
353
        global $debug_object;
354
        if (is_object($debug_object)) {
355
            $text = '';
356
            if ($this->tag == 'text') {
357
                if (!empty($this->text)) {
0 ignored issues
show
Bug Best Practice introduced by
The property text does not exist on simple_html_dom_node. Since you implemented __get, consider adding a @property annotation.
Loading history...
358
                    $text = " with text: " . $this->text;
359
                }
360
            }
361
            $debug_object->debug_log(1, 'Innertext of tag: ' . $this->tag . $text);
362
        }
363
364
        if ($this->tag==='root') {
365
            return $this->innertext();
366
        }
367
        // trigger callback
368
        if ($this->dom && $this->dom->callback!==null) {
369
            call_user_func_array($this->dom->callback, array($this));
370
        }
371
372
        if (isset($this->_[HDOM_INFO_OUTER])) {
373
            return $this->_[HDOM_INFO_OUTER];
374
        }
375
        if (isset($this->_[HDOM_INFO_TEXT])) {
376
            return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);
377
        }
378
379
        // render begin tag
380
        if ($this->dom && $this->dom->nodes[$this->_[HDOM_INFO_BEGIN]]) {
381
            $ret = $this->dom->nodes[$this->_[HDOM_INFO_BEGIN]]->makeup();
382
        } else {
383
            $ret = "";
384
        }
385
386
        // render inner text
387
        if (isset($this->_[HDOM_INFO_INNER])) {
388
            // If it's a br tag...  don't return the HDOM_INNER_INFO that we may or may not have added.
389
            if ($this->tag != "br") {
390
                $ret .= $this->_[HDOM_INFO_INNER];
391
            }
392
        } else {
393
            if ($this->nodes) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->nodes of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
394
                foreach ($this->nodes as $n) {
395
                    $ret .= $this->convert_text($n->outertext());
396
                }
397
            }
398
        }
399
400
        // render end tag
401
        if (isset($this->_[HDOM_INFO_END]) && $this->_[HDOM_INFO_END]!=0) {
402
            $ret .= '</'.$this->tag.'>';
403
        }
404
        return $ret;
405
    }
406
407
    // get dom node's plain text
408
    public function text()
409
    {
410
        if (isset($this->_[HDOM_INFO_INNER])) {
411
            return $this->_[HDOM_INFO_INNER];
412
        }
413
        switch ($this->nodetype) {
414
            case HDOM_TYPE_TEXT: return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);
415
            case HDOM_TYPE_COMMENT: return '';
416
            case HDOM_TYPE_UNKNOWN: return '';
417
        }
418
        if (strcasecmp($this->tag, 'script')===0) {
419
            return '';
420
        }
421
        if (strcasecmp($this->tag, 'style')===0) {
422
            return '';
423
        }
424
425
        $ret = '';
426
        // In rare cases, (always node type 1 or HDOM_TYPE_ELEMENT - observed for some span tags, and some p tags) $this->nodes is set to NULL.
427
        // NOTE: This indicates that there is a problem where it's set to NULL without a clear happening.
428
        // WHY is this happening?
429
        if (!is_null($this->nodes)) {
0 ignored issues
show
introduced by
The condition is_null($this->nodes) is always false.
Loading history...
430
            foreach ($this->nodes as $n) {
431
                $ret .= $this->convert_text($n->text());
432
            }
433
434
            // If this node is a span... add a space at the end of it so multiple spans don't run into each other.  This is plaintext after all.
435
            if ($this->tag == "span") {
436
                $ret .= $this->dom->default_span_text;
437
            }
438
        }
439
        return $ret;
440
    }
441
442
    public function xmltext()
443
    {
444
        $ret0 = $this->innertext();
445
        $ret1 = str_ireplace('<![CDATA[', '', $ret0);
446
        $ret2 = str_replace(']]>', '', $ret1);
447
        return $ret2;
448
    }
449
450
    // build node's text with tag
451
    public function makeup()
452
    {
453
        // text, comment, unknown
454
        if (isset($this->_[HDOM_INFO_TEXT])) {
455
            return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);
456
        }
457
        $ret = '<'.$this->tag;
458
        $i = -1;
459
460
        foreach ($this->attr as $key=>$val) {
461
            ++$i;
462
463
            // skip removed attribute
464
            if ($val===null || $val===false) {
465
                continue;
466
            }
467
            $ret .= $this->_[HDOM_INFO_SPACE][$i][0];
468
            //no value attr: nowrap, checked selected...
469
            if ($val===true) {
470
                $ret .= $key;
471
            } else {
472
                switch ($this->_[HDOM_INFO_QUOTE][$i]) {
473
                    case HDOM_QUOTE_DOUBLE: $quote = '"'; break;
474
                    case HDOM_QUOTE_SINGLE: $quote = '\''; break;
475
                    default: $quote = '';
476
                }
477
                $ret .= $key.$this->_[HDOM_INFO_SPACE][$i][1].'='.$this->_[HDOM_INFO_SPACE][$i][2].$quote.$val.$quote;
478
            }
479
        }
480
        $ret = $this->dom->restore_noise($ret);
481
        return $ret . $this->_[HDOM_INFO_ENDSPACE] . '>';
482
    }
483
484
    // find elements by css selector
485
    //PaperG - added ability for find to lowercase the value of the selector.
486
    public function find($selector, $idx=null, $lowercase=false)
487
    {
488
        $selectors = $this->parse_selector($selector);
489
        if (($count=count($selectors))===0) {
490
            return array();
491
        }
492
        $found_keys = array();
493
494
        // find each selector
495
        for ($c=0; $c<$count; ++$c) {
496
            // The change on the below line was documented on the sourceforge code tracker id 2788009
497
            // used to be: if (($levle=count($selectors[0]))===0) return array();
498
            if (($levle=count($selectors[$c]))===0) {
499
                return array();
500
            }
501
            if (!isset($this->_[HDOM_INFO_BEGIN])) {
502
                return array();
503
            }
504
505
            $head = array($this->_[HDOM_INFO_BEGIN]=>1);
506
507
            // handle descendant selectors, no recursive!
508
            for ($l=0; $l<$levle; ++$l) {
509
                $ret = array();
510
                foreach ($head as $k=>$v) {
511
                    $n = ($k===-1) ? $this->dom->root : $this->dom->nodes[$k];
512
                    //PaperG - Pass this optional parameter on to the seek function.
513
                    $n->seek($selectors[$c][$l], $ret, $lowercase);
514
                }
515
                $head = $ret;
516
            }
517
518
            foreach ($head as $k=>$v) {
519
                if (!isset($found_keys[$k])) {
520
                    $found_keys[$k] = 1;
521
                }
522
            }
523
        }
524
525
        // sort keys
526
        ksort($found_keys);
527
528
        $found = array();
529
        foreach ($found_keys as $k=>$v) {
530
            $found[] = $this->dom->nodes[$k];
531
        }
532
        // return nth-element or array
533
        if (is_null($idx)) {
534
            return $found;
535
        } elseif ($idx<0) {
536
            $idx = count($found) + $idx;
537
        }
538
        return (isset($found[$idx])) ? $found[$idx] : null;
539
    }
540
541
    // seek for given conditions
542
    // PaperG - added parameter to allow for case insensitive testing of the value of a selector.
543
    protected function seek($selector, &$ret, $lowercase=false)
544
    {
545
        global $debug_object;
546
        if (is_object($debug_object)) {
547
            $debug_object->debug_log_entry(1);
548
        }
549
550
        list($tag, $key, $val, $exp, $no_key) = $selector;
551
552
        // xpath index
553
        if ($tag && $key && is_numeric($key)) {
554
            $count = 0;
555
            foreach ($this->children as $c) {
556
                if ($tag==='*' || $tag===$c->tag) {
557
                    if (++$count==$key) {
558
                        $ret[$c->_[HDOM_INFO_BEGIN]] = 1;
559
                        return;
560
                    }
561
                }
562
            }
563
            return;
564
        }
565
566
        $end = (!empty($this->_[HDOM_INFO_END])) ? $this->_[HDOM_INFO_END] : 0;
567
        if ($end==0) {
568
            $parent = $this->parent;
569
            while (!isset($parent->_[HDOM_INFO_END]) && $parent!==null) {
570
                $end -= 1;
571
                $parent = $parent->parent;
572
            }
573
            $end += $parent->_[HDOM_INFO_END];
574
        }
575
576
        for ($i=$this->_[HDOM_INFO_BEGIN]+1; $i<$end; ++$i) {
577
            $node = $this->dom->nodes[$i];
578
579
            $pass = true;
580
581
            if ($tag==='*' && !$key) {
582
                if (in_array($node, $this->children, true)) {
583
                    $ret[$i] = 1;
584
                }
585
                continue;
586
            }
587
588
            // compare tag
589
            if ($tag && $tag!=$node->tag && $tag!=='*') {
590
                $pass=false;
591
            }
592
            // compare key
593
            if ($pass && $key) {
594
                if ($no_key) {
595
                    if (isset($node->attr[$key])) {
596
                        $pass=false;
597
                    }
598
                } else {
599
                    if (($key != "plaintext") && !isset($node->attr[$key])) {
600
                        $pass=false;
601
                    }
602
                }
603
            }
604
            // compare value
605
            if ($pass && $key && $val  && $val!=='*') {
606
                // If they have told us that this is a "plaintext" search then we want the plaintext of the node - right?
607
                if ($key == "plaintext") {
608
                    // $node->plaintext actually returns $node->text();
609
                    $nodeKeyValue = $node->text();
610
                } else {
611
                    // this is a normal search, we want the value of that attribute of the tag.
612
                    $nodeKeyValue = $node->attr[$key];
613
                }
614
                if (is_object($debug_object)) {
615
                    $debug_object->debug_log(2, "testing node: " . $node->tag . " for attribute: " . $key . $exp . $val . " where nodes value is: " . $nodeKeyValue);
616
                }
617
618
                //PaperG - If lowercase is set, do a case insensitive test of the value of the selector.
619
                if ($lowercase) {
620
                    $check = $this->match($exp, strtolower($val), strtolower($nodeKeyValue));
621
                } else {
622
                    $check = $this->match($exp, $val, $nodeKeyValue);
623
                }
624
                if (is_object($debug_object)) {
625
                    $debug_object->debug_log(2, "after match: " . ($check ? "true" : "false"));
626
                }
627
628
                // handle multiple class
629
                if (!$check && strcasecmp($key, 'class')===0) {
630
                    foreach (explode(' ', $node->attr[$key]) as $k) {
631
                        // Without this, there were cases where leading, trailing, or double spaces lead to our comparing blanks - bad form.
632
                        if (!empty($k)) {
633
                            if ($lowercase) {
634
                                $check = $this->match($exp, strtolower($val), strtolower($k));
635
                            } else {
636
                                $check = $this->match($exp, $val, $k);
637
                            }
638
                            if ($check) {
639
                                break;
640
                            }
641
                        }
642
                    }
643
                }
644
                if (!$check) {
645
                    $pass = false;
646
                }
647
            }
648
            if ($pass) {
649
                $ret[$i] = 1;
650
            }
651
            unset($node);
652
        }
653
        // It's passed by reference so this is actually what this function returns.
654
        if (is_object($debug_object)) {
655
            $debug_object->debug_log(1, "EXIT - ret: ", $ret);
656
        }
657
    }
658
659
    protected function match($exp, $pattern, $value)
660
    {
661
        global $debug_object;
662
        if (is_object($debug_object)) {
663
            $debug_object->debug_log_entry(1);
664
        }
665
666
        switch ($exp) {
667
            case '=':
668
                return ($value===$pattern);
669
            case '!=':
670
                return ($value!==$pattern);
671
            case '^=':
672
                return preg_match("/^".preg_quote($pattern, '/')."/", $value);
673
            case '$=':
674
                return preg_match("/".preg_quote($pattern, '/')."$/", $value);
675
            case '*=':
676
                if ($pattern[0]=='/') {
677
                    return preg_match($pattern, $value);
678
                }
679
                return preg_match("/".$pattern."/i", $value);
680
        }
681
    }
682
683
    protected function parse_selector($selector_string)
684
    {
685
        global $debug_object;
686
        if (is_object($debug_object)) {
687
            $debug_object->debug_log_entry(1);
688
        }
689
690
        // pattern of CSS selectors, modified from mootools
691
        // Paperg: Add the colon to the attrbute, so that it properly finds <tag attr:ibute="something" > like google does.
692
        // Note: if you try to look at this attribute, yo MUST use getAttribute since $dom->x:y will fail the php syntax check.
693
        // Notice the \[ starting the attbute?  and the @? following?  This implies that an attribute can begin with an @ sign that is not captured.
694
        // This implies that an html attribute specifier may start with an @ sign that is NOT captured by the expression.
695
        // farther study is required to determine of this should be documented or removed.
696
//		$pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
697
        $pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
698
        preg_match_all($pattern, trim($selector_string).' ', $matches, PREG_SET_ORDER);
699
        if (is_object($debug_object)) {
700
            $debug_object->debug_log(2, "Matches Array: ", $matches);
701
        }
702
703
        $selectors = array();
704
        $result = array();
705
        //print_r($matches);
706
707
        foreach ($matches as $m) {
708
            $m[0] = trim($m[0]);
709
            if ($m[0]==='' || $m[0]==='/' || $m[0]==='//') {
710
                continue;
711
            }
712
            // for browser generated xpath
713
            if ($m[1]==='tbody') {
714
                continue;
715
            }
716
            list($tag, $key, $val, $exp, $no_key) = array($m[1], null, null, '=', false);
717
            if (!empty($m[2])) {
718
                $key='id';
719
                $val=$m[2];
720
            }
721
            if (!empty($m[3])) {
722
                $key='class';
723
                $val=$m[3];
724
            }
725
            if (!empty($m[4])) {
726
                $key=$m[4];
727
            }
728
            if (!empty($m[5])) {
729
                $exp=$m[5];
730
            }
731
            if (!empty($m[6])) {
732
                $val=$m[6];
733
            }
734
735
            // convert to lowercase
736
            if ($this->dom->lowercase) {
737
                $tag=strtolower($tag);
738
                $key=strtolower($key);
739
            }
740
            //elements that do NOT have the specified attribute
741
            if (isset($key[0]) && $key[0]==='!') {
742
                $key=substr($key, 1);
743
                $no_key=true;
744
            }
745
746
            $result[] = array($tag, $key, $val, $exp, $no_key);
747
            if (trim($m[7])===',') {
748
                $selectors[] = $result;
749
                $result = array();
750
            }
751
        }
752
        if (count($result)>0) {
753
            $selectors[] = $result;
754
        }
755
        return $selectors;
756
    }
757
758
    public function __get($name)
759
    {
760
        if (isset($this->attr[$name])) {
761
            return $this->convert_text($this->attr[$name]);
762
        }
763
        switch ($name) {
764
            case 'outertext': return $this->outertext();
765
            case 'innertext': return $this->innertext();
766
            case 'plaintext': return $this->text();
767
            case 'xmltext': return $this->xmltext();
768
            default: return array_key_exists($name, $this->attr);
769
        }
770
    }
771
772
    public function __set($name, $value)
773
    {
774
        global $debug_object;
775
        if (is_object($debug_object)) {
776
            $debug_object->debug_log_entry(1);
777
        }
778
779
        switch ($name) {
780
            case 'outertext':
781
                return $this->_[HDOM_INFO_OUTER] = $value;
782
            case 'innertext':
783
                if (isset($this->_[HDOM_INFO_TEXT])) {
784
                    return $this->_[HDOM_INFO_TEXT] = $value;
785
                }
786
                return $this->_[HDOM_INFO_INNER] = $value;
787
        }
788
        if (!isset($this->attr[$name])) {
789
            $this->_[HDOM_INFO_SPACE][] = array(' ', '', '');
790
            $this->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE;
791
        }
792
        $this->attr[$name] = $value;
793
    }
794
795
    public function __isset($name)
796
    {
797
        switch ($name) {
798
            case 'outertext': return true;
799
            case 'innertext': return true;
800
            case 'plaintext': return true;
801
        }
802
        //no value attr: nowrap, checked selected...
803
        return (array_key_exists($name, $this->attr)) ? true : isset($this->attr[$name]);
804
    }
805
806
    public function __unset($name)
807
    {
808
        if (isset($this->attr[$name])) {
809
            unset($this->attr[$name]);
810
        }
811
    }
812
813
    // PaperG - Function to convert the text from one character set to another if the two sets are not the same.
814
    public function convert_text($text)
815
    {
816
        global $debug_object;
817
        if (is_object($debug_object)) {
818
            $debug_object->debug_log_entry(1);
819
        }
820
821
        $converted_text = $text;
822
823
        $sourceCharset = "";
824
        $targetCharset = "";
825
826
        if ($this->dom) {
827
            $sourceCharset = strtoupper($this->dom->_charset);
828
            $targetCharset = strtoupper($this->dom->_target_charset);
829
        }
830
        if (is_object($debug_object)) {
831
            $debug_object->debug_log(3, "source charset: " . $sourceCharset . " target charaset: " . $targetCharset);
832
        }
833
834
        if (!empty($sourceCharset) && !empty($targetCharset) && (strcasecmp($sourceCharset, $targetCharset) != 0)) {
835
            // Check if the reported encoding could have been incorrect and the text is actually already UTF-8
836
            if ((strcasecmp($targetCharset, 'UTF-8') == 0) && ($this->is_utf8($text))) {
837
                $converted_text = $text;
838
            } else {
839
                $converted_text = iconv($sourceCharset, $targetCharset, $text);
840
            }
841
        }
842
843
        // Lets make sure that we don't have that silly BOM issue with any of the utf-8 text we output.
844
        if ($targetCharset == 'UTF-8') {
845
            if (substr($converted_text, 0, 3) == "\xef\xbb\xbf") {
846
                $converted_text = substr($converted_text, 3);
847
            }
848
            if (substr($converted_text, -3) == "\xef\xbb\xbf") {
849
                $converted_text = substr($converted_text, 0, -3);
850
            }
851
        }
852
853
        return $converted_text;
854
    }
855
856
    /**
857
    * Returns true if $string is valid UTF-8 and false otherwise.
858
    *
859
    * @param mixed $str String to be tested
860
    * @return boolean
861
    */
862
    public static function is_utf8($str)
863
    {
864
        $c=0;
0 ignored issues
show
Unused Code introduced by
The assignment to $c is dead and can be removed.
Loading history...
865
        $b=0;
0 ignored issues
show
Unused Code introduced by
The assignment to $b is dead and can be removed.
Loading history...
866
        $bits=0;
867
        $len=strlen($str);
868
        for ($i=0; $i<$len; $i++) {
869
            $c=ord($str[$i]);
870
            if ($c > 128) {
871
                if (($c >= 254)) {
872
                    return false;
873
                } elseif ($c >= 252) {
874
                    $bits=6;
875
                } elseif ($c >= 248) {
876
                    $bits=5;
877
                } elseif ($c >= 240) {
878
                    $bits=4;
879
                } elseif ($c >= 224) {
880
                    $bits=3;
881
                } elseif ($c >= 192) {
882
                    $bits=2;
883
                } else {
884
                    return false;
885
                }
886
                if (($i+$bits) > $len) {
887
                    return false;
888
                }
889
                while ($bits > 1) {
890
                    $i++;
891
                    $b=ord($str[$i]);
892
                    if ($b < 128 || $b > 191) {
893
                        return false;
894
                    }
895
                    $bits--;
896
                }
897
            }
898
        }
899
        return true;
900
    }
901
    
902
    /*
903
    function is_utf8($string)
904
    {
905
        //this is buggy
906
        return (utf8_encode(utf8_decode($string)) == $string);
907
    }
908
    */
909
910
    /**
911
     * Function to try a few tricks to determine the displayed size of an img on the page.
912
     * NOTE: This will ONLY work on an IMG tag. Returns FALSE on all other tag types.
913
     *
914
     * @author John Schlick
915
     * @version April 19 2012
916
     * @return array an array containing the 'height' and 'width' of the image on the page or -1 if we can't figure it out.
917
     */
918
    public function get_display_size()
919
    {
920
        global $debug_object;
921
922
        $width = -1;
923
        $height = -1;
924
925
        if ($this->tag !== 'img') {
926
            return false;
0 ignored issues
show
Bug Best Practice introduced by
The expression return false returns the type false which is incompatible with the documented return type array.
Loading history...
927
        }
928
929
            // See if there is aheight or width attribute in the tag itself.
930
            if (isset($this->attr['width'])) {
931
                $width = $this->attr['width'];
932
            }
933
934
        if (isset($this->attr['height'])) {
935
            $height = $this->attr['height'];
936
        }
937
938
            // Now look for an inline style.
939
            if (isset($this->attr['style'])) {
940
                // Thanks to user gnarf from stackoverflow for this regular expression.
941
                    $attributes = array();
942
                preg_match_all("/([\w-]+)\s*:\s*([^;]+)\s*;?/", $this->attr['style'], $matches, PREG_SET_ORDER);
943
                foreach ($matches as $match) {
944
                    $attributes[$match[1]] = $match[2];
945
                }
946
947
                    // If there is a width in the style attributes:
948
                    if (isset($attributes['width']) && $width == -1) {
949
                        // check that the last two characters are px (pixels)
950
                            if (strtolower(substr($attributes['width'], -2)) == 'px') {
951
                                $proposed_width = substr($attributes['width'], 0, -2);
952
                                    // Now make sure that it's an integer and not something stupid.
953
                                    if (filter_var($proposed_width, FILTER_VALIDATE_INT)) {
954
                                        $width = $proposed_width;
955
                                    }
956
                            }
957
                    }
958
959
                    // If there is a width in the style attributes:
960
                    if (isset($attributes['height']) && $height == -1) {
961
                        // check that the last two characters are px (pixels)
962
                            if (strtolower(substr($attributes['height'], -2)) == 'px') {
963
                                $proposed_height = substr($attributes['height'], 0, -2);
964
                                    // Now make sure that it's an integer and not something stupid.
965
                                    if (filter_var($proposed_height, FILTER_VALIDATE_INT)) {
966
                                        $height = $proposed_height;
967
                                    }
968
                            }
969
                    }
970
            }
971
972
            // Future enhancement:
973
            // Look in the tag to see if there is a class or id specified that has a height or width attribute to it.
974
975
            // Far future enhancement
976
            // Look at all the parent tags of this image to see if they specify a class or id that has an img selector that specifies a height or width
977
            // Note that in this case, the class or id will have the img subselector for it to apply to the image.
978
979
            // ridiculously far future development
980
            // If the class or id is specified in a SEPARATE css file thats not on the page, go get it and do what we were just doing for the ones on the page.
981
982
            $result = array('height' => $height,
983
                                            'width' => $width);
984
        return $result;
985
    }
986
987
    // camel naming conventions
988
    public function getAllAttributes()
989
    {
990
        return $this->attr;
991
    }
992
    public function getAttribute($name)
993
    {
994
        return $this->__get($name);
995
    }
996
    public function setAttribute($name, $value)
997
    {
998
        $this->__set($name, $value);
999
    }
1000
    public function hasAttribute($name)
1001
    {
1002
        return $this->__isset($name);
1003
    }
1004
    public function removeAttribute($name)
1005
    {
1006
        $this->__set($name, null);
1007
    }
1008
    public function getElementById($id)
1009
    {
1010
        return $this->find("#$id", 0);
1011
    }
1012
    public function getElementsById($id, $idx=null)
1013
    {
1014
        return $this->find("#$id", $idx);
1015
    }
1016
    public function getElementByTagName($name)
1017
    {
1018
        return $this->find($name, 0);
1019
    }
1020
    public function getElementsByTagName($name, $idx=null)
1021
    {
1022
        return $this->find($name, $idx);
1023
    }
1024
    public function parentNode()
1025
    {
1026
        return $this->parent();
1027
    }
1028
    public function childNodes($idx=-1)
1029
    {
1030
        return $this->children($idx);
1031
    }
1032
    public function firstChild()
1033
    {
1034
        return $this->first_child();
1035
    }
1036
    public function lastChild()
1037
    {
1038
        return $this->last_child();
1039
    }
1040
    public function nextSibling()
1041
    {
1042
        return $this->next_sibling();
1043
    }
1044
    public function previousSibling()
1045
    {
1046
        return $this->prev_sibling();
1047
    }
1048
    public function hasChildNodes()
1049
    {
1050
        return $this->has_child();
1051
    }
1052
    public function nodeName()
1053
    {
1054
        return $this->tag;
1055
    }
1056
    public function appendChild($node)
1057
    {
1058
        $node->parent($this);
1059
        return $node;
1060
    }
1061
}
1062
1063
/**
1064
* simple html dom parser
1065
* Paperg - in the find routine: allow us to specify that we want case insensitive testing of the value of the selector.
1066
* Paperg - change $size from protected to public so we can easily access it
1067
* Paperg - added ForceTagsClosed in the constructor which tells us whether we trust the html or not.  Default is to NOT trust it.
1068
*
1069
* @package PlaceLocalInclude
1070
*/
1071
class simple_html_dom
1072
{
1073
    public $root = null;
1074
    public $nodes = array();
1075
    public $callback = null;
1076
    public $lowercase = false;
1077
    // Used to keep track of how large the text was when we started.
1078
    public $original_size;
1079
    public $size;
1080
    protected $pos;
1081
    protected $doc;
1082
    protected $char;
1083
    protected $cursor;
1084
    protected $parent;
1085
    protected $noise = array();
1086
    protected $token_blank = " \t\r\n";
1087
    protected $token_equal = ' =/>';
1088
    protected $token_slash = " />\r\n\t";
1089
    protected $token_attr = ' >';
1090
    // Note that this is referenced by a child node, and so it needs to be public for that node to see this information.
1091
    public $_charset = '';
1092
    public $_target_charset = '';
1093
    protected $default_br_text = "";
1094
    public $default_span_text = "";
1095
1096
    // use isset instead of in_array, performance boost about 30%...
1097
    protected $self_closing_tags = array('img'=>1, 'br'=>1, 'input'=>1, 'meta'=>1, 'link'=>1, 'hr'=>1, 'base'=>1, 'embed'=>1, 'spacer'=>1);
1098
    protected $block_tags = array('root'=>1, 'body'=>1, 'form'=>1, 'div'=>1, 'span'=>1, 'table'=>1);
1099
    // Known sourceforge issue #2977341
1100
    // B tags that are not closed cause us to return everything to the end of the document.
1101
    protected $optional_closing_tags = array(
1102
            'tr'=>array('tr'=>1, 'td'=>1, 'th'=>1),
1103
            'th'=>array('th'=>1),
1104
            'td'=>array('td'=>1),
1105
            'li'=>array('li'=>1),
1106
            'dt'=>array('dt'=>1, 'dd'=>1),
1107
            'dd'=>array('dd'=>1, 'dt'=>1),
1108
            'dl'=>array('dd'=>1, 'dt'=>1),
1109
            'p'=>array('p'=>1),
1110
            'nobr'=>array('nobr'=>1),
1111
            'b'=>array('b'=>1),
1112
            'option'=>array('option'=>1),
1113
    );
1114
1115
    public function __construct($str=null, $lowercase=true, $forceTagsClosed=true, $target_charset=DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
1116
    {
1117
        if ($str) {
1118
            if (preg_match("/^http:\/\//i", $str) || is_file($str)) {
1119
                $this->load_file($str);
1120
            } else {
1121
                $this->load($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText);
1122
            }
1123
        }
1124
            // Forcing tags to be closed implies that we don't trust the html, but it can lead to parsing errors if we SHOULD trust the html.
1125
            if (!$forceTagsClosed) {
1126
                $this->optional_closing_array=array();
0 ignored issues
show
Bug Best Practice introduced by
The property optional_closing_array does not exist. Although not strictly required by PHP, it is generally a best practice to declare properties explicitly.
Loading history...
1127
            }
1128
        $this->_target_charset = $target_charset;
1129
    }
1130
1131
    public function __destruct()
1132
    {
1133
        $this->clear();
1134
    }
1135
1136
    // load html from string
1137
    public function load($str, $lowercase=true, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
1138
    {
1139
        global $debug_object;
1140
1141
            // prepare
1142
            $this->prepare($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText);
1143
            // strip out cdata
1144
            $this->remove_noise("'<!\[CDATA\[(.*?)\]\]>'is", true);
1145
            // strip out comments
1146
            $this->remove_noise("'<!--(.*?)-->'is");
1147
            // Per sourceforge http://sourceforge.net/tracker/?func=detail&aid=2949097&group_id=218559&atid=1044037
1148
            // Script tags removal now preceeds style tag removal.
1149
            // strip out <script> tags
1150
            $this->remove_noise("'<\s*script[^>]*[^/]>(.*?)<\s*/\s*script\s*>'is");
1151
        $this->remove_noise("'<\s*script\s*>(.*?)<\s*/\s*script\s*>'is");
1152
            // strip out <style> tags
1153
            $this->remove_noise("'<\s*style[^>]*[^/]>(.*?)<\s*/\s*style\s*>'is");
1154
        $this->remove_noise("'<\s*style\s*>(.*?)<\s*/\s*style\s*>'is");
1155
            // strip out preformatted tags
1156
            $this->remove_noise("'<\s*(?:code)[^>]*>(.*?)<\s*/\s*(?:code)\s*>'is");
1157
            // strip out server side scripts
1158
            $this->remove_noise("'(<\?)(.*?)(\?>)'s", true);
1159
            // strip smarty scripts
1160
            $this->remove_noise("'(\{\w)(.*?)(\})'s", true);
1161
1162
            // parsing
1163
            while ($this->parse());
1164
            // end
1165
            $this->root->_[HDOM_INFO_END] = $this->cursor;
1166
        $this->parse_charset();
1167
1168
            // make load function chainable
1169
            return $this;
1170
    }
1171
1172
    // load html from file
1173
    public function load_file()
1174
    {
1175
        $args = func_get_args();
1176
        $this->load(call_user_func_array('file_get_contents', $args), true);
1177
            // Throw an error if we can't properly load the dom.
1178
            if (($error=error_get_last())!==null) {
0 ignored issues
show
Unused Code introduced by
The assignment to $error is dead and can be removed.
Loading history...
1179
                $this->clear();
1180
                return false;
1181
            }
1182
    }
1183
1184
    // set callback function
1185
    public function set_callback($function_name)
1186
    {
1187
        $this->callback = $function_name;
1188
    }
1189
1190
    // remove callback function
1191
    public function remove_callback()
1192
    {
1193
        $this->callback = null;
1194
    }
1195
1196
    // save dom as string
1197
    public function save($filepath='')
1198
    {
1199
        $ret = $this->root->innertext();
1200
        if ($filepath!=='') {
1201
            file_put_contents($filepath, $ret, LOCK_EX);
1202
        }
1203
        return $ret;
1204
    }
1205
1206
    // find dom node by css selector
1207
    // Paperg - allow us to specify that we want case insensitive testing of the value of the selector.
1208
    public function find($selector, $idx=null, $lowercase=false)
1209
    {
1210
        return $this->root->find($selector, $idx, $lowercase);
1211
    }
1212
1213
    // clean up memory due to php5 circular references memory leak...
1214
    public function clear()
1215
    {
1216
        foreach ($this->nodes as $n) {
1217
            $n->clear();
1218
            $n = null;
0 ignored issues
show
Unused Code introduced by
The assignment to $n is dead and can be removed.
Loading history...
1219
        }
1220
            // This add next line is documented in the sourceforge repository. 2977248 as a fix for ongoing memory leaks that occur even with the use of clear.
1221
            if (isset($this->children)) {
1222
                foreach ($this->children as $n) {
1223
                    $n->clear();
1224
                    $n = null;
1225
                }
1226
            }
1227
        if (isset($this->parent)) {
1228
            $this->parent->clear();
1229
            unset($this->parent);
1230
        }
1231
        if (isset($this->root)) {
1232
            $this->root->clear();
1233
            unset($this->root);
1234
        }
1235
        unset($this->doc);
1236
        unset($this->noise);
1237
    }
1238
1239
    public function dump($show_attr=true)
1240
    {
1241
        $this->root->dump($show_attr);
1242
    }
1243
1244
    // prepare HTML data and init everything
1245
    protected function prepare($str, $lowercase=true, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
1246
    {
1247
        $this->clear();
1248
1249
            // set the length of content before we do anything to it.
1250
            $this->size = strlen($str);
1251
            // Save the original size of the html that we got in.  It might be useful to someone.
1252
            $this->original_size = $this->size;
1253
1254
            //before we save the string as the doc...  strip out the \r \n's if we are told to.
1255
            if ($stripRN) {
1256
                $str = str_replace("\r", " ", $str);
1257
                $str = str_replace("\n", " ", $str);
1258
1259
                    // set the length of content since we have changed it.
1260
                    $this->size = strlen($str);
1261
            }
1262
1263
        $this->doc = $str;
1264
        $this->pos = 0;
1265
        $this->cursor = 1;
1266
        $this->noise = array();
1267
        $this->nodes = array();
1268
        $this->lowercase = $lowercase;
1269
        $this->default_br_text = $defaultBRText;
1270
        $this->default_span_text = $defaultSpanText;
1271
        $this->root = new simple_html_dom_node($this);
1272
        $this->root->tag = 'root';
1273
        $this->root->_[HDOM_INFO_BEGIN] = -1;
1274
        $this->root->nodetype = HDOM_TYPE_ROOT;
1275
        $this->parent = $this->root;
1276
        if ($this->size>0) {
1277
            $this->char = $this->doc[0];
1278
        }
1279
    }
1280
1281
    // parse html content
1282
    protected function parse()
1283
    {
1284
        if (($s = $this->copy_until_char('<'))==='') {
1285
            return $this->read_tag();
1286
        }
1287
1288
            // text
1289
            $node = new simple_html_dom_node($this);
1290
        ++$this->cursor;
1291
        $node->_[HDOM_INFO_TEXT] = $s;
1292
        $this->link_nodes($node, false);
1293
        return true;
1294
    }
1295
1296
    // PAPERG - dkchou - added this to try to identify the character set of the page we have just parsed so we know better how to spit it out later.
1297
    // NOTE:  IF you provide a routine called get_last_retrieve_url_contents_content_type which returns the CURLINFO_CONTENT_TYPE from the last curl_exec
1298
    // (or the content_type header from the last transfer), we will parse THAT, and if a charset is specified, we will use it over any other mechanism.
1299
    protected function parse_charset()
1300
    {
1301
        global $debug_object;
1302
1303
        $charset = null;
1304
1305
        if (function_exists('get_last_retrieve_url_contents_content_type')) {
1306
            $contentTypeHeader = get_last_retrieve_url_contents_content_type();
1307
            $success = preg_match('/charset=(.+)/', $contentTypeHeader, $matches);
1308
            if ($success) {
1309
                $charset = $matches[1];
1310
                if (is_object($debug_object)) {
1311
                    $debug_object->debug_log(2, 'header content-type found charset of: ' . $charset);
1312
                }
1313
            }
1314
        }
1315
1316
        if (empty($charset)) {
1317
            $el = $this->root->find('meta[http-equiv=Content-Type]', 0, true);
1318
            if (!empty($el)) {
1319
                $fullvalue = $el->content;
1320
                if (is_object($debug_object)) {
1321
                    $debug_object->debug_log(2, 'meta content-type tag found' . $fullvalue);
1322
                }
1323
1324
                if (!empty($fullvalue)) {
1325
                    $success = preg_match('/charset=(.+)/i', $fullvalue, $matches);
1326
                    if ($success) {
1327
                        $charset = $matches[1];
1328
                    } else {
1329
                        // If there is a meta tag, and they don't specify the character set, research says that it's typically ISO-8859-1
1330
                                            if (is_object($debug_object)) {
1331
                                                $debug_object->debug_log(2, 'meta content-type tag couldn\'t be parsed. using iso-8859 default.');
1332
                                            }
1333
                        $charset = 'ISO-8859-1';
1334
                    }
1335
                }
1336
            }
1337
        }
1338
1339
            // If we couldn't find a charset above, then lets try to detect one based on the text we got...
1340
            if (empty($charset)) {
1341
                // Use this in case mb_detect_charset isn't installed/loaded on this machine.
1342
                    $charset = false;
1343
                if (function_exists('mb_detect_encoding')) {
1344
                    // Have php try to detect the encoding from the text given to us.
1345
                            $charset = mb_detect_encoding($this->root->plaintext . "ascii", $encoding_list = array( "UTF-8", "CP1252" ));
1346
                    if (is_object($debug_object)) {
1347
                        $debug_object->debug_log(2, 'mb_detect found: ' . $charset);
1348
                    }
1349
                }
1350
1351
                    // and if this doesn't work...  then we need to just wrongheadedly assume it's UTF-8 so that we can move on - cause this will usually give us most of what we need...
1352
                    if ($charset === false) {
1353
                        if (is_object($debug_object)) {
1354
                            $debug_object->debug_log(2, 'since mb_detect failed - using default of utf-8');
1355
                        }
1356
                        $charset = 'UTF-8';
1357
                    }
1358
            }
1359
1360
            // Since CP1252 is a superset, if we get one of it's subsets, we want it instead.
1361
            if ((strtolower($charset) == strtolower('ISO-8859-1')) || (strtolower($charset) == strtolower('Latin1')) || (strtolower($charset) == strtolower('Latin-1'))) {
1362
                if (is_object($debug_object)) {
1363
                    $debug_object->debug_log(2, 'replacing ' . $charset . ' with CP1252 as its a superset');
1364
                }
1365
                $charset = 'CP1252';
1366
            }
1367
1368
        if (is_object($debug_object)) {
1369
            $debug_object->debug_log(1, 'EXIT - ' . $charset);
1370
        }
1371
1372
        return $this->_charset = $charset;
1373
    }
1374
1375
    // read tag info
1376
    protected function read_tag()
1377
    {
1378
        if ($this->char!=='<') {
1379
            $this->root->_[HDOM_INFO_END] = $this->cursor;
1380
            return false;
1381
        }
1382
        $begin_tag_pos = $this->pos;
1383
        $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1384
1385
            // end tag
1386
            if ($this->char==='/') {
1387
                $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1388
                    // This represents the change in the simple_html_dom trunk from revision 180 to 181.
1389
                    // $this->skip($this->token_blank_t);
1390
                    $this->skip($this->token_blank);
1391
                $tag = $this->copy_until_char('>');
1392
1393
                    // skip attributes in end tag
1394
                    if (($pos = strpos($tag, ' '))!==false) {
1395
                        $tag = substr($tag, 0, $pos);
1396
                    }
1397
1398
                $parent_lower = strtolower($this->parent->tag);
1399
                $tag_lower = strtolower($tag);
1400
1401
                if ($parent_lower!==$tag_lower) {
1402
                    if (isset($this->optional_closing_tags[$parent_lower]) && isset($this->block_tags[$tag_lower])) {
1403
                        $this->parent->_[HDOM_INFO_END] = 0;
1404
                        $org_parent = $this->parent;
1405
1406
                        while (($this->parent->parent) && strtolower($this->parent->tag)!==$tag_lower) {
1407
                            $this->parent = $this->parent->parent;
1408
                        }
1409
1410
                        if (strtolower($this->parent->tag)!==$tag_lower) {
1411
                            $this->parent = $org_parent; // restore origonal parent
1412
                                            if ($this->parent->parent) {
1413
                                                $this->parent = $this->parent->parent;
1414
                                            }
1415
                            $this->parent->_[HDOM_INFO_END] = $this->cursor;
1416
                            return $this->as_text_node($tag);
1417
                        }
1418
                    } elseif (($this->parent->parent) && isset($this->block_tags[$tag_lower])) {
1419
                        $this->parent->_[HDOM_INFO_END] = 0;
1420
                        $org_parent = $this->parent;
1421
1422
                        while (($this->parent->parent) && strtolower($this->parent->tag)!==$tag_lower) {
1423
                            $this->parent = $this->parent->parent;
1424
                        }
1425
1426
                        if (strtolower($this->parent->tag)!==$tag_lower) {
1427
                            $this->parent = $org_parent; // restore origonal parent
1428
                                            $this->parent->_[HDOM_INFO_END] = $this->cursor;
1429
                            return $this->as_text_node($tag);
1430
                        }
1431
                    } elseif (($this->parent->parent) && strtolower($this->parent->parent->tag)===$tag_lower) {
1432
                        $this->parent->_[HDOM_INFO_END] = 0;
1433
                        $this->parent = $this->parent->parent;
1434
                    } else {
1435
                        return $this->as_text_node($tag);
1436
                    }
1437
                }
1438
1439
                $this->parent->_[HDOM_INFO_END] = $this->cursor;
1440
                if ($this->parent->parent) {
1441
                    $this->parent = $this->parent->parent;
1442
                }
1443
1444
                $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1445
                    return true;
1446
            }
1447
1448
        $node = new simple_html_dom_node($this);
1449
        $node->_[HDOM_INFO_BEGIN] = $this->cursor;
1450
        ++$this->cursor;
1451
        $tag = $this->copy_until($this->token_slash);
1452
        $node->tag_start = $begin_tag_pos;
1453
1454
            // doctype, cdata & comments...
1455
            if (isset($tag[0]) && $tag[0]==='!') {
1456
                $node->_[HDOM_INFO_TEXT] = '<' . $tag . $this->copy_until_char('>');
1457
1458
                if (isset($tag[2]) && $tag[1]==='-' && $tag[2]==='-') {
1459
                    $node->nodetype = HDOM_TYPE_COMMENT;
1460
                    $node->tag = 'comment';
1461
                } else {
1462
                    $node->nodetype = HDOM_TYPE_UNKNOWN;
1463
                    $node->tag = 'unknown';
1464
                }
1465
                if ($this->char==='>') {
1466
                    $node->_[HDOM_INFO_TEXT].='>';
1467
                }
1468
                $this->link_nodes($node, true);
1469
                $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1470
                    return true;
1471
            }
1472
1473
            // text
1474
            if ($pos=strpos($tag, '<')!==false) {
0 ignored issues
show
Unused Code introduced by
The assignment to $pos is dead and can be removed.
Loading history...
1475
                $tag = '<' . substr($tag, 0, -1);
1476
                $node->_[HDOM_INFO_TEXT] = $tag;
1477
                $this->link_nodes($node, false);
1478
                $this->char = $this->doc[--$this->pos]; // prev
1479
                    return true;
1480
            }
1481
1482
        if (!preg_match("/^[\w-:]+$/", $tag)) {
1483
            $node->_[HDOM_INFO_TEXT] = '<' . $tag . $this->copy_until('<>');
1484
            if ($this->char==='<') {
1485
                $this->link_nodes($node, false);
1486
                return true;
1487
            }
1488
1489
            if ($this->char==='>') {
1490
                $node->_[HDOM_INFO_TEXT].='>';
1491
            }
1492
            $this->link_nodes($node, false);
1493
            $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1494
                    return true;
1495
        }
1496
1497
            // begin tag
1498
            $node->nodetype = HDOM_TYPE_ELEMENT;
1499
        $tag_lower = strtolower($tag);
1500
        $node->tag = ($this->lowercase) ? $tag_lower : $tag;
1501
1502
            // handle optional closing tags
1503
            if (isset($this->optional_closing_tags[$tag_lower])) {
1504
                while (isset($this->optional_closing_tags[$tag_lower][strtolower($this->parent->tag)])) {
1505
                    $this->parent->_[HDOM_INFO_END] = 0;
1506
                    $this->parent = $this->parent->parent;
1507
                }
1508
                $node->parent = $this->parent;
1509
            }
1510
1511
        $guard = 0; // prevent infinity loop
1512
            $space = array($this->copy_skip($this->token_blank), '', '');
1513
1514
            // attributes
1515
            do {
1516
                if ($this->char!==null && $space[0]==='') {
1517
                    break;
1518
                }
1519
                $name = $this->copy_until($this->token_equal);
1520
                if ($guard===$this->pos) {
1521
                    $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1522
                            continue;
1523
                }
1524
                $guard = $this->pos;
1525
1526
                    // handle endless '<'
1527
                    if ($this->pos>=$this->size-1 && $this->char!=='>') {
1528
                        $node->nodetype = HDOM_TYPE_TEXT;
1529
                        $node->_[HDOM_INFO_END] = 0;
1530
                        $node->_[HDOM_INFO_TEXT] = '<'.$tag . $space[0] . $name;
1531
                        $node->tag = 'text';
1532
                        $this->link_nodes($node, false);
1533
                        return true;
1534
                    }
1535
1536
                    // handle mismatch '<'
1537
                    if ($this->doc[$this->pos-1]=='<') {
1538
                        $node->nodetype = HDOM_TYPE_TEXT;
1539
                        $node->tag = 'text';
1540
                        $node->attr = array();
1541
                        $node->_[HDOM_INFO_END] = 0;
1542
                        $node->_[HDOM_INFO_TEXT] = substr($this->doc, $begin_tag_pos, $this->pos-$begin_tag_pos-1);
1543
                        $this->pos -= 2;
1544
                        $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1545
                            $this->link_nodes($node, false);
1546
                        return true;
1547
                    }
1548
1549
                if ($name!=='/' && $name!=='') {
1550
                    $space[1] = $this->copy_skip($this->token_blank);
1551
                    $name = $this->restore_noise($name);
1552
                    if ($this->lowercase) {
1553
                        $name = strtolower($name);
1554
                    }
1555
                    if ($this->char==='=') {
1556
                        $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1557
                                    $this->parse_attr($node, $name, $space);
1558
                    } else {
1559
                        //no value attr: nowrap, checked selected...
1560
                                    $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_NO;
1561
                        $node->attr[$name] = true;
1562
                        if ($this->char!='>') {
1563
                            $this->char = $this->doc[--$this->pos];
1564
                        } // prev
1565
                    }
1566
                    $node->_[HDOM_INFO_SPACE][] = $space;
1567
                    $space = array($this->copy_skip($this->token_blank), '', '');
1568
                } else {
1569
                    break;
1570
                }
1571
            } while ($this->char!=='>' && $this->char!=='/');
1572
1573
        $this->link_nodes($node, true);
1574
        $node->_[HDOM_INFO_ENDSPACE] = $space[0];
1575
1576
            // check self closing
1577
            if ($this->copy_until_char_escape('>')==='/') {
1578
                $node->_[HDOM_INFO_ENDSPACE] .= '/';
1579
                $node->_[HDOM_INFO_END] = 0;
1580
            } else {
1581
                // reset parent
1582
                    if (!isset($this->self_closing_tags[strtolower($node->tag)])) {
1583
                        $this->parent = $node;
1584
                    }
1585
            }
1586
        $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1587
1588
            // If it's a BR tag, we need to set it's text to the default text.
1589
            // This way when we see it in plaintext, we can generate formatting that the user wants.
1590
            // since a br tag never has sub nodes, this works well.
1591
            if ($node->tag == "br") {
1592
                $node->_[HDOM_INFO_INNER] = $this->default_br_text;
1593
            }
1594
1595
        return true;
1596
    }
1597
1598
    // parse attributes
1599
    protected function parse_attr($node, $name, &$space)
1600
    {
1601
        // Per sourceforge: http://sourceforge.net/tracker/?func=detail&aid=3061408&group_id=218559&atid=1044037
1602
            // If the attribute is already defined inside a tag, only pay atetntion to the first one as opposed to the last one.
1603
            if (isset($node->attr[$name])) {
1604
                return;
1605
            }
1606
1607
        $space[2] = $this->copy_skip($this->token_blank);
1608
        switch ($this->char) {
1609
                    case '"':
1610
                            $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE;
1611
                            $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1612
                            $node->attr[$name] = $this->restore_noise($this->copy_until_char_escape('"'));
1613
                            $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1614
                            break;
1615
                    case '\'':
1616
                            $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_SINGLE;
1617
                            $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1618
                            $node->attr[$name] = $this->restore_noise($this->copy_until_char_escape('\''));
1619
                            $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1620
                            break;
1621
                    default:
1622
                            $node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_NO;
1623
                            $node->attr[$name] = $this->restore_noise($this->copy_until($this->token_attr));
1624
            }
1625
            // PaperG: Attributes should not have \r or \n in them, that counts as html whitespace.
1626
            $node->attr[$name] = str_replace("\r", "", $node->attr[$name]);
1627
        $node->attr[$name] = str_replace("\n", "", $node->attr[$name]);
1628
            // PaperG: If this is a "class" selector, lets get rid of the preceeding and trailing space since some people leave it in the multi class case.
1629
            if ($name == "class") {
1630
                $node->attr[$name] = trim($node->attr[$name]);
1631
            }
1632
    }
1633
1634
    // link node's parent
1635
    protected function link_nodes(&$node, $is_child)
1636
    {
1637
        $node->parent = $this->parent;
1638
        $this->parent->nodes[] = $node;
1639
        if ($is_child) {
1640
            $this->parent->children[] = $node;
1641
        }
1642
    }
1643
1644
    // as a text node
1645
    protected function as_text_node($tag)
1646
    {
1647
        $node = new simple_html_dom_node($this);
1648
        ++$this->cursor;
1649
        $node->_[HDOM_INFO_TEXT] = '</' . $tag . '>';
1650
        $this->link_nodes($node, false);
1651
        $this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1652
            return true;
1653
    }
1654
1655
    protected function skip($chars)
1656
    {
1657
        $this->pos += strspn($this->doc, $chars, $this->pos);
1658
        $this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1659
    }
1660
1661
    protected function copy_skip($chars)
1662
    {
1663
        $pos = $this->pos;
1664
        $len = strspn($this->doc, $chars, $pos);
1665
        $this->pos += $len;
1666
        $this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1667
            if ($len===0) {
1668
                return '';
1669
            }
1670
        return substr($this->doc, $pos, $len);
1671
    }
1672
1673
    protected function copy_until($chars)
1674
    {
1675
        $pos = $this->pos;
1676
        $len = strcspn($this->doc, $chars, $pos);
1677
        $this->pos += $len;
1678
        $this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null; // next
1679
            return substr($this->doc, $pos, $len);
1680
    }
1681
1682
    protected function copy_until_char($char)
1683
    {
1684
        if ($this->char===null) {
1685
            return '';
1686
        }
1687
1688
        if (($pos = strpos($this->doc, $char, $this->pos))===false) {
1689
            $ret = substr($this->doc, $this->pos, $this->size-$this->pos);
1690
            $this->char = null;
1691
            $this->pos = $this->size;
1692
            return $ret;
1693
        }
1694
1695
        if ($pos===$this->pos) {
1696
            return '';
1697
        }
1698
        $pos_old = $this->pos;
1699
        $this->char = $this->doc[$pos];
1700
        $this->pos = $pos;
1701
        return substr($this->doc, $pos_old, $pos-$pos_old);
1702
    }
1703
1704
    protected function copy_until_char_escape($char)
1705
    {
1706
        if ($this->char===null) {
1707
            return '';
1708
        }
1709
1710
        $start = $this->pos;
1711
        while (1) {
1712
            if (($pos = strpos($this->doc, $char, $start))===false) {
1713
                $ret = substr($this->doc, $this->pos, $this->size-$this->pos);
1714
                $this->char = null;
1715
                $this->pos = $this->size;
1716
                return $ret;
1717
            }
1718
1719
            if ($pos===$this->pos) {
1720
                return '';
1721
            }
1722
1723
            if ($this->doc[$pos-1]==='\\') {
1724
                $start = $pos+1;
1725
                continue;
1726
            }
1727
1728
            $pos_old = $this->pos;
1729
            $this->char = $this->doc[$pos];
1730
            $this->pos = $pos;
1731
            return substr($this->doc, $pos_old, $pos-$pos_old);
1732
        }
1733
    }
1734
1735
    // remove noise from html content
1736
    // save the noise in the $this->noise array.
1737
    protected function remove_noise($pattern, $remove_tag=false)
1738
    {
1739
        global $debug_object;
1740
        if (is_object($debug_object)) {
1741
            $debug_object->debug_log_entry(1);
1742
        }
1743
1744
        $count = preg_match_all($pattern, $this->doc, $matches, PREG_SET_ORDER|PREG_OFFSET_CAPTURE);
1745
1746
        for ($i=$count-1; $i>-1; --$i) {
1747
            $key = '___noise___'.sprintf('% 5d', count($this->noise)+1000);
1748
            if (is_object($debug_object)) {
1749
                $debug_object->debug_log(2, 'key is: ' . $key);
1750
            }
1751
            $idx = ($remove_tag) ? 0 : 1;
1752
            $this->noise[$key] = $matches[$i][$idx][0];
1753
            $this->doc = substr_replace($this->doc, $key, $matches[$i][$idx][1], strlen($matches[$i][$idx][0]));
1754
        }
1755
1756
            // reset the length of content
1757
            $this->size = strlen($this->doc);
1758
        if ($this->size>0) {
1759
            $this->char = $this->doc[0];
1760
        }
1761
    }
1762
1763
    // restore noise to html content
1764
    public function restore_noise($text)
1765
    {
1766
        global $debug_object;
1767
        if (is_object($debug_object)) {
1768
            $debug_object->debug_log_entry(1);
1769
        }
1770
1771
        while (($pos=strpos($text, '___noise___'))!==false) {
1772
            // Sometimes there is a broken piece of markup, and we don't GET the pos+11 etc... token which indicates a problem outside of us...
1773
            if (strlen($text) > $pos+15) {
1774
                $key = '___noise___'.$text[$pos+11].$text[$pos+12].$text[$pos+13].$text[$pos+14].$text[$pos+15];
1775
                if (is_object($debug_object)) {
1776
                    $debug_object->debug_log(2, 'located key of: ' . $key);
1777
                }
1778
1779
                if (isset($this->noise[$key])) {
1780
                    $text = substr($text, 0, $pos).$this->noise[$key].substr($text, $pos+16);
1781
                } else {
1782
                    // do this to prevent an infinite loop.
1783
                    $text = substr($text, 0, $pos).'UNDEFINED NOISE FOR KEY: '.$key . substr($text, $pos+16);
1784
                }
1785
            } else {
1786
                // There is no valid key being given back to us... We must get rid of the ___noise___ or we will have a problem.
1787
                $text = substr($text, 0, $pos).'NO NUMERIC NOISE KEY' . substr($text, $pos+11);
1788
            }
1789
        }
1790
        return $text;
1791
    }
1792
1793
    // Sometimes we NEED one of the noise elements.
1794
    public function search_noise($text)
1795
    {
1796
        global $debug_object;
1797
        if (is_object($debug_object)) {
1798
            $debug_object->debug_log_entry(1);
1799
        }
1800
1801
        foreach ($this->noise as $noiseElement) {
1802
            if (strpos($noiseElement, $text)!==false) {
1803
                return $noiseElement;
1804
            }
1805
        }
1806
    }
1807
    
1808
    public function __toString()
1809
    {
1810
        return $this->root->innertext();
1811
    }
1812
1813
    public function __get($name)
1814
    {
1815
        switch ($name) {
1816
            case 'outertext':
1817
                return $this->root->innertext();
1818
            case 'innertext':
1819
                return $this->root->innertext();
1820
            case 'plaintext':
1821
                return $this->root->text();
1822
            case 'charset':
1823
                return $this->_charset;
1824
            case 'target_charset':
1825
                return $this->_target_charset;
1826
        }
1827
    }
1828
1829
    // camel naming conventions
1830
    public function childNodes($idx=-1)
1831
    {
1832
        return $this->root->childNodes($idx);
1833
    }
1834
    public function firstChild()
1835
    {
1836
        return $this->root->first_child();
1837
    }
1838
    public function lastChild()
1839
    {
1840
        return $this->root->last_child();
1841
    }
1842
    public function createElement($name, $value=null)
1843
    {
1844
        return @str_get_html("<$name>$value</$name>")->first_child();
0 ignored issues
show
Bug introduced by
The method first_child() does not exist on simple_html_dom. Did you maybe mean firstChild()? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

1844
        return @str_get_html("<$name>$value</$name>")->/** @scrutinizer ignore-call */ first_child();

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
1845
    }
1846
    public function createTextNode($value)
1847
    {
1848
        return @end(str_get_html($value)->nodes);
1849
    }
1850
    public function getElementById($id)
1851
    {
1852
        return $this->find("#$id", 0);
1853
    }
1854
    public function getElementsById($id, $idx=null)
1855
    {
1856
        return $this->find("#$id", $idx);
1857
    }
1858
    public function getElementByTagName($name)
1859
    {
1860
        return $this->find($name, 0);
1861
    }
1862
    public function getElementsByTagName($name, $idx=-1)
1863
    {
1864
        return $this->find($name, $idx);
1865
    }
1866
    public function loadFile()
1867
    {
1868
        $args = func_get_args();
1869
        $this->load_file($args);
1870
    }
1871
}
1872