Passed
Pull Request — master (#2)
by Arthur
02:14
created

QueryRegexFinder::getName()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 3
Code Lines 1

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 1
eloc 1
nc 1
nop 0
dl 0
loc 3
rs 10
c 0
b 0
f 0
1
<?php
2
3
namespace WebThumbnailer\Finder;
4
5
use WebThumbnailer\Application\ConfigManager;
6
use WebThumbnailer\Application\WebAccess\WebAccess;
7
use WebThumbnailer\Application\WebAccess\WebAccessCUrl;
8
use WebThumbnailer\Application\WebAccess\WebAccessFactory;
9
use WebThumbnailer\Exception\BadRulesException;
10
use WebThumbnailer\Utils\FinderUtils;
11
12
/**
13
 * Class QueryRegexFinder
14
 *
15
 * Generic Finder using regex rules on remote web content.
16
 * It will use regex rules to resolve a thumbnail in web a page.
17
 *
18
 * Mandatory rules:
19
 *   - image_regex
20
 *   - thumbnail_url
21
 *
22
 * Example:
23
 *   1. `http://domain.tld/page` content will be downloaded.
24
 *   2. `image_regex` will be apply on the content
25
 *   3. Matches will be use to generate `thumbnail_url`.
26
 *
27
 * @package WebThumbnailer\Finder
28
 */
29
class QueryRegexFinder extends FinderCommon
30
{
31
    /**
32
     * @var WebAccess instance.
33
     */
34
    protected $webAccess;
35
36
    /**
37
    * @var string thumbnail_url rule.
38
    */
39
    protected $thumbnailUrlFormat;
40
41
    /**
42
     * @var string Regex to apply on provided URL.
43
     */
44
    protected $urlRegex;
45
46
    /**
47
     * @inheritdoc
48
     *
49
     * @throws BadRulesException
50
     */
51
    public function __construct($domain, $url, $rules, $options)
52
    {
53
        $this->webAccess = WebAccessFactory::getWebAccess($url);
54
        $this->url = $url;
55
        $this->domains = $domain;
56
        $this->loadRules($rules);
57
        $this->finderOptions = $options;
58
    }
59
60
    /**
61
     * This finder downloads target URL page, and apply the regex given in rules on its content
62
     * to extract the thumbnail image.
63
     * The thumb URL must include ${number} to be replaced from the regex match.
64
     * Also replace eventual URL options.
65
     *
66
     * @inheritdoc
67
     *
68
     * @throws BadRulesException
69
     */
70
    public function find()
71
    {
72
        $thumbnail = $content = null;
73
        $callback = $this->webAccess instanceof WebAccessCUrl
74
            ? $this->getCurlCallback($content, $thumbnail)
75
            : null;
76
        list($headers, $content) = $this->webAccess->getContent(
77
            $this->url,
78
            ConfigManager::get('settings.default.timeout', 30),
0 ignored issues
show
Bug introduced by
It seems like WebThumbnailer\Applicati...s.default.timeout', 30) can also be of type string; however, parameter $timeout of WebThumbnailer\Applicati...WebAccess::getContent() does only seem to accept integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

78
            /** @scrutinizer ignore-type */ ConfigManager::get('settings.default.timeout', 30),
Loading history...
79
            ConfigManager::get('settings.default.max_img_dl', 16777216),
0 ignored issues
show
Bug introduced by
It seems like WebThumbnailer\Applicati....max_img_dl', 16777216) can also be of type string; however, parameter $maxBytes of WebThumbnailer\Applicati...WebAccess::getContent() does only seem to accept integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

79
            /** @scrutinizer ignore-type */ ConfigManager::get('settings.default.max_img_dl', 16777216),
Loading history...
80
            $callback,
81
            $content
82
        );
83
        if (empty($content)
84
            || empty($headers)
85
            || (empty($thumbnail) && strpos($headers[0], '200') === false)
86
        ) {
87
            return false;
88
        }
89
90
        // With curl, the thumb is extracted during the download
91
        if ($this->webAccess instanceof WebAccessCUrl && ! empty($thumbnail)) {
92
            return $thumbnail;
93
        }
94
95
        return $this->extractThumbContent($content);
96
    }
97
98
    /**
99
     * Get a callback for curl write function.
100
     *
101
     * @param string $content   A variable reference in which the downloaded content should be stored.
102
     * @param string $thumbnail A variable reference in which extracted thumb URL should be stored.
103
     *
104
     * @return \Closure CURLOPT_WRITEFUNCTION callback
105
     */
106
    protected function getCurlCallback(&$content, &$thumbnail)
107
    {
108
        $url = $this->url;
109
        /**
110
         * cURL callback function for CURLOPT_WRITEFUNCTION (called during the download).
111
         *
112
         * While downloading the remote page, we check that the HTTP code is 200 and content type is 'html/text'
113
         * Then we extract the title and the charset and stop the download when it's done.
114
         *
115
         * Note that when using CURLOPT_WRITEFUNCTION, we have to manually handle the content retrieved,
116
         * hence the $content reference variable.
117
         *
118
         * @param resource $ch   cURL resource
119
         * @param string   $data chunk of data being downloaded
120
         *
121
         * @return int|bool length of $data or false if we need to stop the download
122
         */
123
        return function(&$ch, $data) use ($url, &$content, &$thumbnail) {
0 ignored issues
show
Unused Code introduced by
The import $url is not used and could be removed.

This check looks for imports that have been defined, but are not used in the scope.

Loading history...
124
            $content .= $data;
125
            $responseCode = curl_getinfo($ch, CURLINFO_RESPONSE_CODE);
126
127
            if (!empty($responseCode) && $responseCode != 200) {
128
                return false;
129
            }
130
            $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
131
            if (!empty($contentType) && strpos($contentType, 'text/html') === false) {
132
                return false;
133
            }
134
            if (empty($thumbnail)) {
135
                $thumbnail = $this->extractThumbContent($data);
136
            }
137
            // We got everything we want, stop the download.
138
            if (!empty($responseCode) && !empty($contentType) && !empty($thumbnail)) {
139
                return false;
140
            }
141
142
            return strlen($data);
143
        };
144
    }
145
146
    /**
147
     * @param $content
148
     * @return bool|mixed|string
149
     * @throws BadRulesException
150
     */
151
    public function extractThumbContent($content)
152
    {
153
        $thumbnailUrl = $this->thumbnailUrlFormat;
154
        if (preg_match($this->urlRegex, $content, $matches) != false) {
0 ignored issues
show
Bug Best Practice introduced by
It seems like you are loosely comparing preg_match($this->urlRegex, $content, $matches) of type integer to the boolean false. If you are specifically checking for non-zero, consider using something more explicit like > 0 or !== 0 instead.
Loading history...
155
            for ($i = 1; $i < count($matches); $i++) {
0 ignored issues
show
Performance Best Practice introduced by
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
156
                $thumbnailUrl = str_replace('${'. $i . '}', $matches[$i], $thumbnailUrl);
157
            }
158
159
            // Match only options (not ${number})
160
            if (preg_match_all('/\${((?!\d)\w+?)}/', $thumbnailUrl, $optionsMatch, PREG_PATTERN_ORDER)) {
161
                foreach ($optionsMatch[1] as $value) {
162
                    $thumbnailUrl = $this->replaceOption($thumbnailUrl, $value);
163
                }
164
            }
165
            return $thumbnailUrl;
166
        }
167
        return false;
168
    }
169
170
    /**
171
     * @inheritdoc
172
     */
173
    public function checkRules($rules)
174
    {
175
        if (! FinderUtils::checkMandatoryRules($rules, [
176
            'image_regex',
177
            'thumbnail_url'
178
        ])) {
179
            throw new BadRulesException();
180
        }
181
    }
182
183
    /**
184
     * @inheritdoc
185
     *
186
     * @throws BadRulesException
187
     */
188
    public function loadRules($rules)
189
    {
190
        $this->checkRules($rules);
191
        $this->urlRegex = FinderUtils::buildRegex($rules['image_regex'], 'im');
192
        $this->thumbnailUrlFormat = $rules['thumbnail_url'];
193
    }
194
195
    /**
196
     * @inheritdoc
197
     */
198
    public function getName()
199
    {
200
        return 'Query Regex';
201
    }
202
}
203