Completed
Push — master ( 3c077b...77d3a9 )
by Gilles
06:36
created

AbstractNode   B

Complexity

Total Complexity 48

Size/Duplication

Total Lines 459
Duplicated Lines 0 %

Coupling/Cohesion

Components 2
Dependencies 4

Importance

Changes 12
Bugs 3 Features 3
Metric Value
wmc 48
c 12
b 3
f 3
lcom 2
cbo 4
dl 0
loc 459
rs 8.4864

26 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 4 1
B __get() 0 17 5
A __destruct() 0 7 1
A __toString() 0 4 1
A id() 0 4 1
A getParent() 0 4 1
A setParent() 0 22 3
A propagateEncoding() 0 5 1
A isAncestor() 0 8 2
A getAncestor() 0 12 3
A firstChild() 0 7 1
A lastChild() 0 7 1
A nextSibling() 0 8 2
A previousSibling() 0 8 2
A getTag() 0 4 1
A getAttributes() 0 9 2
A getAttribute() 0 9 2
A setAttribute() 0 6 1
A ancestorByTag() 0 15 3
A find() 0 16 3
B get_display_size() 0 39 6
B getLength() 0 15 5
innerHtml() 0 1 ?
outerHtml() 0 1 ?
text() 0 1 ?
clear() 0 1 ?

How to fix   Complexity   

Complex Class

Complex classes like AbstractNode often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use AbstractNode, and based on these observations, apply Extract Interface, too.

1
<?php
2
namespace PHPHtmlParser\Dom;
3
4
use PHPHtmlParser\Selector;
5
use PHPHtmlParser\Exceptions\CircularException;
6
use PHPHtmlParser\Exceptions\ParentNotFoundException;
7
use stringEncode\Encode;
8
9
/**
10
 * Dom node object.
11
 *
12
 * @property string outerhtml
13
 * @property string innerhtml
14
 * @property string text
15
 */
16
abstract class AbstractNode
17
{
18
19
    /**
20
     * Contains the tag name/type
21
     *
22
     * @var \PHPHtmlParser\Dom\Tag
23
     */
24
    protected $tag;
25
26
    /**
27
     * Contains a list of attributes on this tag.
28
     *
29
     * @var array
30
     */
31
    protected $attr = [];
32
33
    /**
34
     * Contains the parent Node.
35
     *
36
     * @var InnerNode
37
     */
38
    protected $parent = null;
39
40
    /**
41
     * The unique id of the class. Given by PHP.
42
     *
43
     * @var string
44
     */
45
    protected $id;
46
47
    /**
48
     * The encoding class used to encode strings.
49
     *
50
     * @var mixed
51
     */
52
    protected $encode;
53
54
    /**
55
     * Creates a unique spl hash for this node.
56
     */
57
    public function __construct()
58
    {
59
        $this->id = spl_object_hash($this);
60
    }
61
62
    /**
63
     * Magic get method for attributes and certain methods.
64
     *
65
     * @param string $key
66
     * @return mixed
67
     */
68
    public function __get($key)
69
    {
70
        // check attribute first
71
        if ( ! is_null($this->getAttribute($key))) {
72
            return $this->getAttribute($key);
73
        }
74
        switch (strtolower($key)) {
75
            case 'outerhtml':
76
                return $this->outerHtml();
77
            case 'innerhtml':
78
                return $this->innerHtml();
79
            case 'text':
80
                return $this->text();
81
        }
82
83
        return null;
84
    }
85
86
    /**
87
     * Attempts to clear out any object references.
88
     */
89
    public function __destruct()
90
    {
91
        $this->tag      = null;
92
        $this->attr     = [];
93
        $this->parent   = null;
94
        $this->children = [];
0 ignored issues
show
Bug introduced by
The property children does not exist. Did you maybe forget to declare it?

In PHP it is possible to write to properties without declaring them. For example, the following is perfectly valid PHP code:

class MyClass { }

$x = new MyClass();
$x->foo = true;

Generally, it is a good practice to explictly declare properties to avoid accidental typos and provide IDE auto-completion:

class MyClass {
    public $foo;
}

$x = new MyClass();
$x->foo = true;
Loading history...
95
    }
96
97
    /**
98
     * Simply calls the outer text method.
99
     *
100
     * @return string
101
     */
102
    public function __toString()
103
    {
104
        return $this->outerHtml();
105
    }
106
107
    /**
108
     * Returns the id of this object.
109
     */
110
    public function id()
111
    {
112
        return $this->id;
113
    }
114
115
    /**
116
     * Returns the parent of node.
117
     *
118
     * @return AbstractNode
119
     */
120
    public function getParent()
121
    {
122
        return $this->parent;
123
    }
124
125
    /**
126
     * Sets the parent node.
127
     *
128
     * @param InnerNode $parent
129
     * @return $this
130
     * @throws CircularException
131
     */
132
    public function setParent(InnerNode $parent)
133
    {
134
        // remove from old parent
135
        if ( ! is_null($this->parent)) {
136
            if ($this->parent->id() == $parent->id()) {
137
                // already the parent
138
                return $this;
139
            }
140
141
            $this->parent->removeChild($this->id);
142
        }
143
144
        $this->parent = $parent;
145
146
        // assign child to parent
147
        $this->parent->addChild($this);
148
149
        //clear any cache
150
        $this->clear();
151
152
        return $this;
153
    }
154
155
    /**
156
     * Sets the encoding class to this node.
157
     *
158
     * @param Encode $encode
159
     * @return void
160
     */
161
    public function propagateEncoding(Encode $encode)
162
    {
163
        $this->encode = $encode;
164
        $this->tag->setEncoding($encode);
165
    }
166
167
    /**
168
     * Checks if the given node id is an ancestor of
169
     * the current node.
170
     *
171
     * @param int $id
172
     * @return bool
173
     */
174
    public function isAncestor($id)
175
    {
176
        if ( ! is_null($this->getAncestor($id))) {
177
            return true;
178
        }
179
180
        return false;
181
    }
182
183
    /**
184
     * Attempts to get an ancestor node by the given id.
185
     *
186
     * @param int $id
187
     * @return null|AbstractNode
188
     */
189
    public function getAncestor($id)
190
    {
191
        if ( ! is_null($this->parent)) {
192
            if ($this->parent->id() == $id) {
193
                return $this->parent;
194
            }
195
196
            return $this->parent->getAncestor($id);
197
        }
198
199
        return null;
200
    }
201
202
    /**
203
     * Shortcut to return the first child.
204
     *
205
     * @return AbstractNode
206
     * @uses $this->getChild()
207
     */
208
    public function firstChild()
209
    {
210
        reset($this->children);
211
        $key = key($this->children);
212
213
        return $this->getChild($key);
0 ignored issues
show
Bug introduced by
It seems like you code against a specific sub-type and not the parent class PHPHtmlParser\Dom\AbstractNode as the method getChild() does only exist in the following sub-classes of PHPHtmlParser\Dom\AbstractNode: PHPHtmlParser\Dom\HtmlNode, PHPHtmlParser\Dom\InnerNode, PHPHtmlParser\Dom\MockNode. Maybe you want to instanceof check for one of these explicitly?

Let’s take a look at an example:

abstract class User
{
    /** @return string */
    abstract public function getPassword();
}

class MyUser extends User
{
    public function getPassword()
    {
        // return something
    }

    public function getDisplayName()
    {
        // return some name.
    }
}

class AuthSystem
{
    public function authenticate(User $user)
    {
        $this->logger->info(sprintf('Authenticating %s.', $user->getDisplayName()));
        // do something.
    }
}

In the above example, the authenticate() method works fine as long as you just pass instances of MyUser. However, if you now also want to pass a different sub-classes of User which does not have a getDisplayName() method, the code will break.

Available Fixes

  1. Change the type-hint for the parameter:

    class AuthSystem
    {
        public function authenticate(MyUser $user) { /* ... */ }
    }
    
  2. Add an additional type-check:

    class AuthSystem
    {
        public function authenticate(User $user)
        {
            if ($user instanceof MyUser) {
                $this->logger->info(/** ... */);
            }
    
            // or alternatively
            if ( ! $user instanceof MyUser) {
                throw new \LogicException(
                    '$user must be an instance of MyUser, '
                   .'other instances are not supported.'
                );
            }
    
        }
    }
    
Note: PHP Analyzer uses reverse abstract interpretation to narrow down the types inside the if block in such a case.
  1. Add the method to the parent class:

    abstract class User
    {
        /** @return string */
        abstract public function getPassword();
    
        /** @return string */
        abstract public function getDisplayName();
    }
    
Loading history...
214
    }
215
216
    /**
217
     * Attempts to get the last child.
218
     *
219
     * @return AbstractNode
220
     */
221
    public function lastChild()
222
    {
223
        end($this->children);
224
        $key = key($this->children);
225
226
        return $this->getChild($key);
0 ignored issues
show
Bug introduced by
It seems like you code against a specific sub-type and not the parent class PHPHtmlParser\Dom\AbstractNode as the method getChild() does only exist in the following sub-classes of PHPHtmlParser\Dom\AbstractNode: PHPHtmlParser\Dom\HtmlNode, PHPHtmlParser\Dom\InnerNode, PHPHtmlParser\Dom\MockNode. Maybe you want to instanceof check for one of these explicitly?

Let’s take a look at an example:

abstract class User
{
    /** @return string */
    abstract public function getPassword();
}

class MyUser extends User
{
    public function getPassword()
    {
        // return something
    }

    public function getDisplayName()
    {
        // return some name.
    }
}

class AuthSystem
{
    public function authenticate(User $user)
    {
        $this->logger->info(sprintf('Authenticating %s.', $user->getDisplayName()));
        // do something.
    }
}

In the above example, the authenticate() method works fine as long as you just pass instances of MyUser. However, if you now also want to pass a different sub-classes of User which does not have a getDisplayName() method, the code will break.

Available Fixes

  1. Change the type-hint for the parameter:

    class AuthSystem
    {
        public function authenticate(MyUser $user) { /* ... */ }
    }
    
  2. Add an additional type-check:

    class AuthSystem
    {
        public function authenticate(User $user)
        {
            if ($user instanceof MyUser) {
                $this->logger->info(/** ... */);
            }
    
            // or alternatively
            if ( ! $user instanceof MyUser) {
                throw new \LogicException(
                    '$user must be an instance of MyUser, '
                   .'other instances are not supported.'
                );
            }
    
        }
    }
    
Note: PHP Analyzer uses reverse abstract interpretation to narrow down the types inside the if block in such a case.
  1. Add the method to the parent class:

    abstract class User
    {
        /** @return string */
        abstract public function getPassword();
    
        /** @return string */
        abstract public function getDisplayName();
    }
    
Loading history...
227
    }
228
229
    /**
230
     * Attempts to get the next sibling.
231
     *
232
     * @return AbstractNode
233
     * @throws ParentNotFoundException
234
     */
235
    public function nextSibling()
236
    {
237
        if (is_null($this->parent)) {
238
            throw new ParentNotFoundException('Parent is not set for this node.');
239
        }
240
241
        return $this->parent->nextChild($this->id);
242
    }
243
244
    /**
245
     * Attempts to get the previous sibling
246
     *
247
     * @return AbstractNode
248
     * @throws ParentNotFoundException
249
     */
250
    public function previousSibling()
251
    {
252
        if (is_null($this->parent)) {
253
            throw new ParentNotFoundException('Parent is not set for this node.');
254
        }
255
256
        return $this->parent->previousChild($this->id);
257
    }
258
259
    /**
260
     * Gets the tag object of this node.
261
     *
262
     * @return Tag
263
     */
264
    public function getTag()
265
    {
266
        return $this->tag;
267
    }
268
269
    /**
270
     * A wrapper method that simply calls the getAttribute method
271
     * on the tag of this node.
272
     *
273
     * @return array
274
     */
275
    public function getAttributes()
276
    {
277
        $attributes = $this->tag->getAttributes();
278
        foreach ($attributes as $name => $info) {
279
            $attributes[$name] = $info['value'];
280
        }
281
282
        return $attributes;
283
    }
284
285
    /**
286
     * A wrapper method that simply calls the getAttribute method
287
     * on the tag of this node.
288
     *
289
     * @param string $key
290
     * @return mixed
291
     */
292
    public function getAttribute($key)
293
    {
294
        $attribute = $this->tag->getAttribute($key);
295
        if ( ! is_null($attribute)) {
296
            $attribute = $attribute['value'];
297
        }
298
299
        return $attribute;
300
    }
301
302
    /**
303
     * A wrapper method that simply calls the setAttribute method
304
     * on the tag of this node.
305
     *
306
     * @param string $key
307
     * @param string $value
308
     * @return $this
309
     */
310
    public function setAttribute($key, $value)
311
    {
312
        $this->tag->setAttribute($key, $value);
313
314
        return $this;
315
    }
316
317
    /**
318
     * Function to locate a specific ancestor tag in the path to the root.
319
     *
320
     * @param  string $tag
321
     * @return AbstractNode
322
     * @throws ParentNotFoundException
323
     */
324
    public function ancestorByTag($tag)
325
    {
326
        // Start by including ourselves in the comparison.
327
        $node = $this;
328
329
        while ( ! is_null($node)) {
330
            if ($node->tag->name() == $tag) {
331
                return $node;
332
            }
333
334
            $node = $node->getParent();
335
        }
336
337
        throw new ParentNotFoundException('Could not find an ancestor with "'.$tag.'" tag');
338
    }
339
340
    /**
341
     * Find elements by css selector
342
     *
343
     * @param string $selector
344
     * @param int $nth
345
     * @return array|AbstractNode
346
     */
347
    public function find($selector, $nth = null)
348
    {
349
        $selector = new Selector($selector);
350
        $nodes    = $selector->find($this);
351
352
        if ( ! is_null($nth)) {
353
            // return nth-element or array
354
            if (isset($nodes[$nth])) {
355
                return $nodes[$nth];
356
            }
357
358
            return null;
359
        }
360
361
        return $nodes;
0 ignored issues
show
Bug Best Practice introduced by
The return type of return $nodes; (PHPHtmlParser\Dom\Collection) is incompatible with the return type documented by PHPHtmlParser\Dom\AbstractNode::find of type array|PHPHtmlParser\Dom\AbstractNode.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
362
    }
363
364
    /**
365
     * Function to try a few tricks to determine the displayed size of an img on the page.
366
     * NOTE: This will ONLY work on an IMG tag. Returns FALSE on all other tag types.
367
     *
368
     * Future enhancement:
369
     * Look in the tag to see if there is a class or id specified that has a height or width attribute to it.
370
     *
371
     * Far future enhancement
372
     * Look at all the parent tags of this image to see if they specify a class or id that has an img selector that specifies a height or width
373
     * Note that in this case, the class or id will have the img sub-selector for it to apply to the image.
374
     *
375
     * ridiculously far future development
376
     * If the class or id is specified in a SEPARATE css file that's not on the page, go get it and do what we were just doing for the ones on the page.
377
     *
378
     * @author John Schlick
379
     * @return array an array containing the 'height' and 'width' of the image on the page or -1 if we can't figure it out.
380
     */
381
    public function get_display_size()
382
    {
383
        $width  = -1;
384
        $height = -1;
385
386
        if ($this->tag->name() != 'img') {
387
            return false;
388
        }
389
390
        // See if there is a height or width attribute in the tag itself.
391
        if ( ! is_null($this->tag->getAttribute('width'))) {
392
            $width = $this->tag->getAttribute('width');
393
        }
394
395
        if ( ! is_null($this->tag->getAttribute('height'))) {
396
            $height = $this->tag->getAttribute('height');
397
        }
398
399
        // Now look for an inline style.
400
        if ( ! is_null($this->tag->getAttribute('style'))) {
401
            // Thanks to user 'gnarf' from stackoverflow for this regular expression.
402
            $attributes = [];
403
            preg_match_all("/([\w-]+)\s*:\s*([^;]+)\s*;?/", $this->tag->getAttribute('style'), $matches,
404
                PREG_SET_ORDER);
405
            foreach ($matches as $match) {
0 ignored issues
show
Bug introduced by
The expression $matches of type null|array<integer,array<integer,string>> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
406
                $attributes[$match[1]] = $match[2];
407
            }
408
409
            $width = $this->getLength($attributes, $width, 'width');
410
            $height = $this->getLength($attributes, $width, 'height');
411
        }
412
413
        $result = [
414
            'height' => $height,
415
            'width'  => $width,
416
        ];
417
418
        return $result;
419
    }
420
421
    /**
422
     * If there is a length in the style attributes use it.
423
     *
424
     * @param array $attributes
425
     * @param int $length
426
     * @param string $key
427
     * @return int
428
     */
429
    protected function getLength(array $attributes, $length, $key)
430
    {
431
        if (isset($attributes[$key]) && $length == -1) {
432
            // check that the last two characters are px (pixels)
433
            if (strtolower(substr($attributes[$key], -2)) == 'px') {
434
                $proposed_length = substr($attributes[$key], 0, -2);
435
                // Now make sure that it's an integer and not something stupid.
436
                if (filter_var($proposed_length, FILTER_VALIDATE_INT)) {
437
                    $length = $proposed_length;
438
                }
439
            }
440
        }
441
442
        return $length;
443
    }
444
445
    /**
446
     * Gets the inner html of this node.
447
     *
448
     * @return string
449
     */
450
    abstract public function innerHtml();
451
452
    /**
453
     * Gets the html of this node, including it's own
454
     * tag.
455
     *
456
     * @return string
457
     */
458
    abstract public function outerHtml();
459
460
    /**
461
     * Gets the text of this node (if there is any text).
462
     *
463
     * @return string
464
     */
465
    abstract public function text();
466
467
    /**
468
     * Call this when something in the node tree has changed. Like a child has been added
469
     * or a parent has been changed.
470
     *
471
     * @return void
472
     */
473
    abstract protected function clear();
474
}
475