Completed
Pull Request — master (#150)
by Brent
03:06
created

CrawlRequestFulfilled   A

Complexity

Total Complexity 17

Size/Duplication

Total Lines 97
Duplicated Lines 32.99 %

Coupling/Cohesion

Components 1
Dependencies 10

Importance

Changes 0
Metric Value
wmc 17
c 0
b 0
f 0
lcom 1
cbo 10
dl 32
loc 97
rs 10

6 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 5 1
B __invoke() 0 28 5
A convertBodyToString() 0 8 1
A handleCrawled() 0 10 2
A mayIndex() 16 16 4
A mayFollow() 16 16 4

How to fix   Duplicated Code   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

1
<?php
2
3
namespace Spatie\Crawler\Handlers;
4
5
use Psr\Http\Message\ResponseInterface;
6
use Psr\Http\Message\StreamInterface;
7
use Spatie\Crawler\Crawler;
8
use Spatie\Crawler\CrawlSubdomains;
9
use Spatie\Crawler\CrawlUrl;
10
use Spatie\Crawler\LinkAdder;
11
use Spatie\Robots\RobotsHeaders;
12
use Spatie\Robots\RobotsMeta;
13
14
class CrawlRequestFulfilled
15
{
16
    /** @var \Spatie\Crawler\Crawler */
17
    protected $crawler;
18
19
    /** @var \Spatie\Crawler\LinkAdder */
20
    protected $linkAdder;
21
22
    public function __construct(Crawler $crawler) {
23
        $this->crawler = $crawler;
24
25
        $this->linkAdder = new LinkAdder($this->crawler);
26
    }
27
28
    public function __invoke(ResponseInterface $response, $index)
29
    {
30
        $crawlUrl = $this->crawler->getCrawlQueue()->getUrlById($index);
31
32
        $body = $this->convertBodyToString($response->getBody(), $this->crawler->getMaximumResponseSize());
33
34
        $robotsHeaders = RobotsHeaders::create($response->getHeaders());
35
36
        $robotsMeta = RobotsMeta::create($body);
37
38
        if (! $this->mayIndex($robotsHeaders, $robotsMeta)) {
39
            return;
40
        }
41
42
        $this->handleCrawled($response, $crawlUrl);
43
44
        if (! $this->crawler->getCrawlProfile() instanceof CrawlSubdomains) {
45
            if ($crawlUrl->url->getHost() !== $this->crawler->getBaseUrl()->getHost()) {
46
                return;
47
            }
48
        }
49
50
        if (! $this->mayFollow($robotsHeaders, $robotsMeta)) {
51
            return;
52
        }
53
54
        $this->linkAdder->addFromHtml($body, $crawlUrl->url);
55
    }
56
57
    protected function convertBodyToString(StreamInterface $bodyStream, $readMaximumBytes = 1024 * 1024 * 2): string
58
    {
59
        $bodyStream->rewind();
60
61
        $body = $bodyStream->read($readMaximumBytes);
62
63
        return $body;
64
    }
65
66
    protected function handleCrawled(ResponseInterface $response, CrawlUrl $crawlUrl)
67
    {
68
        foreach ($this->crawler->getCrawlObservers() as $crawlObserver) {
69
            $crawlObserver->crawled(
70
                $crawlUrl->url,
71
                $response,
72
                $crawlUrl->foundOnUrl
73
            );
74
        }
75
    }
76
77 View Code Duplication
    protected function mayIndex(RobotsHeaders $robotsHeaders, RobotsMeta $robotsMeta): bool
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
78
    {
79
        if (! $this->crawler->mustRespectRobots()) {
80
            return true;
81
        }
82
83
        if (! $robotsHeaders->mayIndex()) {
84
            return false;
85
        }
86
87
        if (! $robotsMeta->mayIndex()) {
88
            return false;
89
        }
90
91
        return true;
92
    }
93
94 View Code Duplication
    protected function mayFollow(RobotsHeaders $robotsHeaders, RobotsMeta $robotsMeta): bool
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
95
    {
96
        if (! $this->crawler->mustRespectRobots()) {
97
            return true;
98
        }
99
100
        if (! $robotsHeaders->mayFollow()) {
101
            return false;
102
        }
103
104
        if (! $robotsMeta->mayFollow()) {
105
            return false;
106
        }
107
108
        return true;
109
    }
110
}
111