Completed
Pull Request — master (#157)
by
unknown
01:37
created

SitemapGenerator::writeToFile()   B

Complexity

Conditions 2
Paths 2

Size

Total Lines 24
Code Lines 13

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 24
rs 8.9713
c 0
b 0
f 0
cc 2
eloc 13
nc 2
nop 1
1
<?php
2
3
namespace Spatie\Sitemap;
4
5
use GuzzleHttp\Psr7\Uri;
6
use Illuminate\Support\Collection;
7
use Spatie\Crawler\Crawler;
8
use Spatie\Sitemap\Tags\Url;
9
use Spatie\Crawler\CrawlProfile;
10
use Psr\Http\Message\UriInterface;
11
use Spatie\Sitemap\Crawler\Profile;
12
use Spatie\Sitemap\Crawler\Observer;
13
use Psr\Http\Message\ResponseInterface;
14
15
class SitemapGenerator
16
{
17
    /** @var \Illuminate\Support\Collection */
18
    protected $sitemaps;
19
20
    /** @var \GuzzleHttp\Psr7\Uri */
21
    protected $urlToBeCrawled = '';
22
23
    /** @var \Spatie\Crawler\Crawler */
24
    protected $crawler;
25
26
    /** @var callable */
27
    protected $shouldCrawl;
28
29
    /** @var callable */
30
    protected $hasCrawled;
31
32
    /** @var int */
33
    protected $concurrency = 10;
34
35
    /** @var bool $chunk */
36
    protected $chunk = false;
37
38
    /** @var int|null */
39
    protected $maximumCrawlCount = null;
40
41
    /**
42
     * @param string $urlToBeCrawled
43
     *
44
     * @return static
45
     */
46
    public static function create(string $urlToBeCrawled)
47
    {
48
        return app(static::class)->setUrl($urlToBeCrawled);
49
    }
50
51
    public function __construct(Crawler $crawler)
52
    {
53
        $this->crawler = $crawler;
54
55
        $this->sitemaps = new Collection([new Sitemap]);
56
57
        $this->hasCrawled = function (Url $url, ResponseInterface $response = null) {
0 ignored issues
show
Unused Code introduced by
The parameter $response is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
58
            return $url;
59
        };
60
    }
61
62
    public function setConcurrency(int $concurrency)
63
    {
64
        $this->concurrency = $concurrency;
65
    }
66
67
    public function setMaximumCrawlCount(int $maximumCrawlCount)
68
    {
69
        $this->maximumCrawlCount = $maximumCrawlCount;
70
    }
71
72
    /**
73
     * Enable chunk
74
     *
75
     * @param int $chunk
76
     * @return self
77
     */
78
    public function setChunck(int $chunk = 50000)
79
    {
80
        $this->chunk = $chunk;
0 ignored issues
show
Documentation Bug introduced by
The property $chunk was declared of type boolean, but $chunk is of type integer. Maybe add a type cast?

This check looks for assignments to scalar types that may be of the wrong type.

To ensure the code behaves as expected, it may be a good idea to add an explicit type cast.

$answer = 42;

$correct = false;

$correct = (bool) $answer;
Loading history...
81
82
        return $this;
83
    }
84
85
    public function setUrl(string $urlToBeCrawled)
86
    {
87
        $this->urlToBeCrawled = new Uri($urlToBeCrawled);
88
89
        if ($this->urlToBeCrawled->getPath() === '') {
90
            $this->urlToBeCrawled = $this->urlToBeCrawled->withPath('/');
91
        }
92
93
        return $this;
94
    }
95
96
    public function shouldCrawl(callable $shouldCrawl)
97
    {
98
        $this->shouldCrawl = $shouldCrawl;
99
100
        return $this;
101
    }
102
103
    public function hasCrawled(callable $hasCrawled)
104
    {
105
        $this->hasCrawled = $hasCrawled;
106
107
        return $this;
108
    }
109
110
    public function getSitemap(): Sitemap
111
    {
112
        if (config('sitemap.execute_javascript')) {
113
            $this->crawler->executeJavaScript(config('sitemap.chrome_binary_path'));
0 ignored issues
show
Unused Code introduced by
The call to Crawler::executeJavaScript() has too many arguments starting with config('sitemap.chrome_binary_path').

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress.

In this case you can add the @ignore PhpDoc annotation to the duplicate definition and it will be ignored.

Loading history...
114
        }
115
116
        if (! is_null($this->maximumCrawlCount)) {
117
            $this->crawler->setMaximumCrawlCount($this->maximumCrawlCount);
118
        }
119
120
        $this->crawler
121
            ->setCrawlProfile($this->getCrawlProfile())
122
            ->setCrawlObserver($this->getCrawlObserver())
123
            ->setConcurrency($this->concurrency)
124
            ->startCrawling($this->urlToBeCrawled);
125
126
        return $this->sitemaps->first();
127
    }
128
129
    /**
130
     * @param string $path
131
     *
132
     * @return $this
133
     */
134
    public function writeToFile(string $path)
135
    {
136
        $sitemap = $this->getSitemap();
137
138
        if ($this->chunk) {
139
            // Call the sitemap generation and process each created sitemap
140
            $index = SitemapIndex::create();
141
            $format = preg_replace('/\.xml/', '_%d.xml', $path);
142
            $this->sitemaps->each(function (Sitemap $sitemap, int $key) use ($index, $format) {
143
                $path = sprintf($format, $key);
144
145
                $sitemap->writeToFile(sprintf($format, $key));
146
                $index->add(last(explode('public', $path)));
147
            });
148
149
            $index->writeToFile($path);
150
        }
151
152
        else {
153
            $sitemap->writeToFile($path);
154
        }
155
156
        return $this;
157
    }
158
159
    protected function getCrawlProfile(): CrawlProfile
160
    {
161
        $shouldCrawl = function (UriInterface $url) {
162
            if ($url->getHost() !== $this->urlToBeCrawled->getHost()) {
163
                return false;
164
            }
165
166
            if (! is_callable($this->shouldCrawl)) {
167
                return true;
168
            }
169
170
            return ($this->shouldCrawl)($url);
171
        };
172
173
        $profileClass = config('sitemap.crawl_profile', Profile::class);
174
        $profile = new $profileClass($this->urlToBeCrawled);
175
176
        if (method_exists($profile, 'shouldCrawlCallback')) {
177
            $profile->shouldCrawlCallback($shouldCrawl);
178
        }
179
180
        return $profile;
181
    }
182
183
    protected function getCrawlObserver(): Observer
184
    {
185
        $performAfterUrlHasBeenCrawled = function (UriInterface $crawlerUrl, ResponseInterface $response = null) {
186
            $sitemapUrl = ($this->hasCrawled)(Url::create((string) $crawlerUrl), $response);
187
188
            if ($this->chunk and count($this->sitemaps->first()->getTags()) >= $this->chunk) {
0 ignored issues
show
Comprehensibility Best Practice introduced by
Using logical operators such as and instead of && is generally not recommended.

PHP has two types of connecting operators (logical operators, and boolean operators):

  Logical Operators Boolean Operator
AND - meaning and &&
OR - meaning or ||

The difference between these is the order in which they are executed. In most cases, you would want to use a boolean operator like &&, or ||.

Let’s take a look at a few examples:

// Logical operators have lower precedence:
$f = false or true;

// is executed like this:
($f = false) or true;


// Boolean operators have higher precedence:
$f = false || true;

// is executed like this:
$f = (false || true);

Logical Operators are used for Control-Flow

One case where you explicitly want to use logical operators is for control-flow such as this:

$x === 5
    or die('$x must be 5.');

// Instead of
if ($x !== 5) {
    die('$x must be 5.');
}

Since die introduces problems of its own, f.e. it makes our code hardly testable, and prevents any kind of more sophisticated error handling; you probably do not want to use this in real-world code. Unfortunately, logical operators cannot be combined with throw at this point:

// The following is currently a parse error.
$x === 5
    or throw new RuntimeException('$x must be 5.');

These limitations lead to logical operators rarely being of use in current PHP code.

Loading history...
189
                $this->sitemaps->prepend(new Sitemap);
190
            }
191
192
            if ($sitemapUrl) {
193
                $this->sitemaps->first()->add($sitemapUrl);
194
            }
195
        };
196
197
        return new Observer($performAfterUrlHasBeenCrawled);
198
    }
199
}
200