Completed
Pull Request — master (#19)
by Matthijs
03:25
created

CrawlerDiscoverer::getFilteredCrawler()

Size

Total Lines 1

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
c 1
b 0
f 0
dl 0
loc 1
nc 1
1
<?php
2
namespace VDB\Spider\Discoverer;
3
4
use VDB\Spider\Discoverer\DiscovererInterface;
5
use VDB\Spider\Discoverer\Discoverer;
6
use VDB\Spider\Resource;
7
use VDB\Uri\Exception\UriSyntaxException;
8
use VDB\Uri\Uri;
9
use VDB\Spider\Uri\DiscoveredUri;
10
use VDB\Uri\UriInterface;
11
12
/**
13
 * @author Matthijs van den Bos
14
 * @copyright 2013 Matthijs van den Bos
15
 */
16
abstract class CrawlerDiscoverer extends Discoverer implements DiscovererInterface
17
{
18
    /** @var string */
19
    protected $selector;
20
21
    /**
22
     * @param $selector
23
     */
24
    public function __construct($selector)
25
    {
26
        $this->selector = $selector;
27
    }
28
29
    abstract protected function getFilteredCrawler(Resource $resource);
0 ignored issues
show
Documentation introduced by
For interfaces and abstract methods it is generally a good practice to add a @return annotation even if it is just @return void or @return null, so that implementors know what to do in the overridden method.

For interface and abstract methods, it is impossible to infer the return type from the immediate code. In these cases, it is generally advisible to explicitly annotate these methods with a @return doc comment to communicate to implementors of these methods what they are expected to return.

Loading history...
30
31
    /**
32
     * @param Resource $resource
33
     * @return DiscoveredUri[]
34
     */
35
    public function discover(Resource $resource)
36
    {
37
        $crawler = $this->getFilteredCrawler($resource);
38
39
        $uris = array();
40
        foreach ($crawler as $node) {
41
            try {
42
                $uris[] = new DiscoveredUri(new Uri($node->getAttribute('href'), $resource->getUri()->toString()));
43
            } catch (UriSyntaxException $e) {
44
                // do nothing. We simply ignore invalid URI's
45
            }
46
        }
47
        return $uris;
48
    }
49
}
50