Passed
Push — master ( 6700a9...62f4d8 )
by Dev
12:41
created

CrawlerUrl   A

Complexity

Total Complexity 20

Size/Duplication

Total Lines 133
Duplicated Lines 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
eloc 57
dl 0
loc 133
rs 10
c 1
b 0
f 0
wmc 20

9 Methods

Rating   Name   Duplication   Size   Complexity  
A getLinks() 0 3 1
A getHarvester() 0 22 5
A isRedirection() 0 12 3
A harvestBreadcrumb() 0 7 3
A harvestLinks() 0 10 1
A isNetworkError() 0 11 2
A __construct() 0 6 1
A defaultHarvesting() 0 16 1
A harvest() 0 11 3
1
<?php
2
3
namespace PiedWeb\SeoPocketCrawler;
4
5
use PiedWeb\UrlHarvester\Harvest;
6
use PiedWeb\UrlHarvester\Indexable;
7
use PiedWeb\UrlHarvester\Link;
8
9
class CrawlerUrl
10
{
11
    /** @var Harvest */
12
    protected $harvest;
13
    /** @var Url */
14
    protected $url;
15
    /** @var CrawlerConfig */
16
    protected $config;
17
18
    /** @var array internal links from the current Url */
19
    protected $links = [];
20
21
    public function __construct(Url $url, CrawlerConfig $config)
22
    {
23
        $this->url = $url;
24
        $this->config = $config;
25
26
        $this->harvest();
27
    }
28
29
    protected function harvest()
30
    {
31
        if ($this->isNetworkError()) {
32
            return null;
33
        }
34
35
        if ($this->isRedirection()) {
36
            return null;
37
        }
38
39
        $this->defaultHarvesting();
40
    }
41
42
    /**
43
     * permit to easily extend and change what is harvested, for example adding :
44
     * $this->harvestBreadcrumb();
45
     * $this->url->setKws(','.implode(',', array_keys($this->getHarvester()->getKws())).','); // Slow ~20%
46
     * $this->url->setRatioTextCode($this->getHarvester()->getRatioTxtCode()); // Slow ~30%
47
     * $this->url->setH1($this->getHarvester()->getUniqueTag('h1') ?? '');.
48
     */
49
    protected function defaultHarvesting()
50
    {
51
        $this->url->setIndexable($this->getHarvester()->indexable()); // slow ~30%
52
53
        $this->url->setMimeType($this->getHarvester()->getResponse()->getMimeType());
0 ignored issues
show
Bug introduced by
It seems like $this->getHarvester()->g...sponse()->getMimeType() can also be of type null; however, parameter $mimeType of PiedWeb\SeoPocketCrawler\Url::setMimeType() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

53
        $this->url->setMimeType(/** @scrutinizer ignore-type */ $this->getHarvester()->getResponse()->getMimeType());
Loading history...
54
55
        $this->harvestLinks();
56
57
        // Old way: $this->getHarvester()->getTextAnalysis()->getWordNumber();
58
        $this->url->setWordCount($this->getHarvester()->getWordCount());
59
60
        $this->url->setLoadTime($this->getHarvester()->getResponse()->getInfo('total_time'));
61
62
        $this->url->setSize($this->getHarvester()->getResponse()->getInfo('size_download'));
63
64
        $this->url->setTitle($this->getHarvester()->getUniqueTag('head title') ?? '');
65
    }
66
67
    protected function isNetworkError()
68
    {
69
        if (!$this->getHarvester() instanceof Harvest) {
70
            $this->url->setIndexable(Indexable::NOT_INDEXABLE_NETWORK_ERROR);
71
72
            return true;
73
        }
74
75
        $this->config->getRecorder()->cache($this->getHarvester(), $this->url);
76
77
        return false;
78
    }
79
80
    protected function isRedirection()
81
    {
82
        if ($redir = $this->getHarvester()->getRedirectionLink()) {
83
            if ($redir->isInternalLink()) { // add to $links to permits to update counter & co
84
                $this->links[] = $redir;
85
            }
86
            $this->url->setIndexable(Indexable::NOT_INDEXABLE_3XX);
87
88
            return true;
89
        }
90
91
        return false;
92
    }
93
94
    protected function harvestLinks()
95
    {
96
        $this->config->getRecorder()->recordOutboundLink($this->url, $this->getHarvester()->getLinks()); // ~10%
97
        $this->url->links = count($this->getHarvester()->getLinks());
98
        $this->url->links_duplicate = $this->getHarvester()->getNbrDuplicateLinks();
99
        $this->url->links_internal = count($this->getHarvester()->getLinks(Link::LINK_INTERNAL));
100
        $this->url->links_self = count($this->getHarvester()->getLinks(Link::LINK_SELF));
101
        $this->url->links_sub = count($this->getHarvester()->getLinks(Link::LINK_SUB));
102
        $this->url->links_external = count($this->getHarvester()->getLinks(Link::LINK_EXTERNAL));
103
        $this->links = $this->getHarvester()->getLinks(Link::LINK_INTERNAL);
104
    }
105
106
    public function getHarvester()
107
    {
108
        if (null !== $this->harvest) {
109
            return $this->harvest;
110
        }
111
112
        $this->harvest = Harvest::fromUrl(
0 ignored issues
show
Documentation Bug introduced by
It seems like PiedWeb\UrlHarvester\Har...ig->getRequestCached()) can also be of type integer. However, the property $harvest is declared as type PiedWeb\UrlHarvester\Harvest. Maybe add an additional type check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a mixed type is assigned to a property that is type hinted more strictly.

For example, imagine you have a variable $accountId that can either hold an Id object or false (if there is no account id yet). Your code now assigns that value to the id property of an instance of the Account class. This class holds a proper account, so the id value must no longer be false.

Either this assignment is in error or a type check should be added for that assignment.

class Id
{
    public $id;

    public function __construct($id)
    {
        $this->id = $id;
    }

}

class Account
{
    /** @var  Id $id */
    public $id;
}

$account_id = false;

if (starsAreRight()) {
    $account_id = new Id(42);
}

$account = new Account();
if ($account instanceof Id)
{
    $account->id = $account_id;
}
Loading history...
113
            $this->config->getBase().$this->url->getUri(),
114
            $this->config->getUserAgent(),
115
            'en,en-US;q=0.5',
116
            $this->config->getRequestCached()
117
        );
118
119
        if (!$this->harvest instanceof Harvest) { // could be an int corresponding to curl error
120
            $this->harvest = null;
121
        }
122
123
        if (null !== $this->harvest && null !== $this->config->getRobotsTxtCached()) {
124
            $this->harvest->setRobotsTxt($this->config->getRobotsTxtCached());
125
        }
126
127
        return $this->harvest;
128
    }
129
130
    public function getLinks()
131
    {
132
        return $this->links;
133
    }
134
135
    protected function harvestBreadcrumb()
136
    {
137
        $breadcrumb = $this->getHarvester()->getBreadCrumb();
138
        if (is_array($breadcrumb)) {
139
            $this->url->setBreadcrumbLevel(count($breadcrumb));
0 ignored issues
show
Bug introduced by
The method setBreadcrumbLevel() does not exist on PiedWeb\SeoPocketCrawler\Url. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

139
            $this->url->/** @scrutinizer ignore-call */ 
140
                        setBreadcrumbLevel(count($breadcrumb));

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
140
            $this->url->setBreadcrumbFirst(isset($breadcrumb[1]) ? $breadcrumb[1]->getCleanName() : '');
0 ignored issues
show
Bug introduced by
The method setBreadcrumbFirst() does not exist on PiedWeb\SeoPocketCrawler\Url. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

140
            $this->url->/** @scrutinizer ignore-call */ 
141
                        setBreadcrumbFirst(isset($breadcrumb[1]) ? $breadcrumb[1]->getCleanName() : '');

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
141
            $this->url->setBreadcrumbText($this->getHarvester()->getBreadCrumb('//'));
0 ignored issues
show
Bug introduced by
The method setBreadcrumbText() does not exist on PiedWeb\SeoPocketCrawler\Url. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

141
            $this->url->/** @scrutinizer ignore-call */ 
142
                        setBreadcrumbText($this->getHarvester()->getBreadCrumb('//'));

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
142
        }
143
    }
144
}
145