Passed
Pull Request — master (#2)
by Arthur
02:14
created

DefaultFinder::checkRules()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 2
Code Lines 0

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 1
eloc 0
nc 1
nop 1
dl 0
loc 2
rs 10
c 0
b 0
f 0
1
<?php
2
3
namespace WebThumbnailer\Finder;
4
5
use WebThumbnailer\Application\ConfigManager;
6
use WebThumbnailer\Application\WebAccess\WebAccess;
7
use WebThumbnailer\Application\WebAccess\WebAccessCUrl;
8
use WebThumbnailer\Application\WebAccess\WebAccessFactory;
9
use WebThumbnailer\Utils\ImageUtils;
10
use WebThumbnailer\Utils\UrlUtils;
11
12
/**
13
 * Class DefaultFinder
14
 *
15
 * This finder isn't linked to any domain.
16
 * It will return the resource if it is an image (by extension, or by content).
17
 * Otherwise, it'll try to retrieve an OpenGraph resource.
18
 *
19
 * @package WebThumbnailer\Finder
20
 */
21
class DefaultFinder extends FinderCommon
22
{
23
    /**
24
     * @var WebAccess instance.
25
     */
26
    protected $webAccess;
27
28
    /**
29
     * @inheritdoc
30
     */
31
    public function __construct($domain, $url, $rules, $options)
32
    {
33
        $this->webAccess = WebAccessFactory::getWebAccess($url);
34
        $this->url = $url;
35
        $this->domains = $domain;
36
    }
37
38
    /**
39
     * Generic finder.
40
     *
41
     * @inheritdoc
42
     */
43
    function find()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
44
    {
45
        if (ImageUtils::isImageExtension(UrlUtils::getUrlFileExtension($this->url))) {
0 ignored issues
show
Bug introduced by
It seems like WebThumbnailer\Utils\Url...leExtension($this->url) can also be of type false; however, parameter $ext of WebThumbnailer\Utils\Ima...ils::isImageExtension() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

45
        if (ImageUtils::isImageExtension(/** @scrutinizer ignore-type */ UrlUtils::getUrlFileExtension($this->url))) {
Loading history...
46
            return $this->url;
47
        }
48
49
        $content = $thumbnail = null;
50
        $callback = $this->webAccess instanceof WebAccessCUrl
51
            ? $this->getCurlCallback($content, $thumbnail)
52
            : null;
53
        list($headers, $content) = $this->webAccess->getContent(
54
            $this->url,
55
            ConfigManager::get('settings.default.timeout', 30),
0 ignored issues
show
Bug introduced by
It seems like WebThumbnailer\Applicati...s.default.timeout', 30) can also be of type string; however, parameter $timeout of WebThumbnailer\Applicati...WebAccess::getContent() does only seem to accept integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

55
            /** @scrutinizer ignore-type */ ConfigManager::get('settings.default.timeout', 30),
Loading history...
56
            ConfigManager::get('settings.default.max_img_dl', 16777216),
0 ignored issues
show
Bug introduced by
It seems like WebThumbnailer\Applicati....max_img_dl', 16777216) can also be of type string; however, parameter $maxBytes of WebThumbnailer\Applicati...WebAccess::getContent() does only seem to accept integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

56
            /** @scrutinizer ignore-type */ ConfigManager::get('settings.default.max_img_dl', 16777216),
Loading history...
57
            $callback,
58
            $content
59
        );
60
61
        if (empty($thumbnail) && ImageUtils::isImageString($content)) {
62
            return $this->url;
63
        }
64
65
        if (empty($thumbnail) && ! empty($headers) && strpos($headers[0], '200') === false) {
66
            return false;
67
        }
68
69
        // With curl, the thumb is extracted during the download
70
        if ($this->webAccess instanceof WebAccessCUrl && ! empty($thumbnail)) {
71
            return $thumbnail;
72
        }
73
74
        return ! empty($content) ? self::extractMetaTag($content) : false;
75
    }
76
77
    /**
78
     * Get a callback for curl write function.
79
     *
80
     * @param string $content   A variable reference in which the downloaded content should be stored.
81
     * @param string $thumbnail A variable reference in which extracted thumb URL should be stored.
82
     *
83
     * @return \Closure CURLOPT_WRITEFUNCTION callback
84
     */
85
    protected function getCurlCallback(&$content, &$thumbnail)
86
    {
87
        $url = $this->url;
88
        /**
89
         * cURL callback function for CURLOPT_WRITEFUNCTION (called during the download).
90
         *
91
         * While downloading the remote page, we check that the HTTP code is 200 and content type is 'html/text'
92
         * Then we extract the title and the charset and stop the download when it's done.
93
         *
94
         * Note that when using CURLOPT_WRITEFUNCTION, we have to manually handle the content retrieved,
95
         * hence the $content reference variable.
96
         *
97
         * @param resource $ch   cURL resource
98
         * @param string   $data chunk of data being downloaded
99
         *
100
         * @return int|bool length of $data or false if we need to stop the download
101
         */
102
        return function(&$ch, $data) use ($url, &$content, &$thumbnail) {
103
            $content .= $data;
104
            $responseCode = curl_getinfo($ch, CURLINFO_RESPONSE_CODE);
105
            if (!empty($responseCode) && $responseCode != 200) {
106
                return false;
107
            }
108
            $contentType = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
109
            // we look for image, and ignore application/octet-stream,
110
            // which is a the default content type for any binary
111
            // @see https://developer.mozilla.org/fr/docs/Web/HTTP/Basics_of_HTTP/MIME_types
112
            if (!empty($contentType)
113
                && strpos($contentType, 'image/') !== false
114
                && strpos($contentType, 'application/octet-stream') === false
115
            ) {
116
                $thumbnail = $url;
117
                return false;
118
            } else if (!empty($contentType)
119
                && strpos($contentType, 'text/html') === false
120
                && strpos($contentType, 'application/octet-stream') === false
121
            ) {
122
                return false;
123
            }
124
            if (empty($thumbnail)) {
125
                $thumbnail = DefaultFinder::extractMetaTag($data);
126
            }
127
            // We got everything we want, stop the download.
128
            if (!empty($responseCode) && !empty($contentType) && !empty($thumbnail)) {
129
                return false;
130
            }
131
132
            return strlen($data);
133
        };
134
    }
135
136
    /**
137
     * Applies the regexp on the HTML $content to extract the thumb URL.
138
     *
139
     * @param string $content Downloaded HTML content
140
     *
141
     * @return string|bool Extracted thumb URL or false if not found.
142
     */
143
    public static function extractMetaTag($content)
144
    {
145
        $propertiesKey = ['property', 'name', 'itemprop'];
146
        // Try to retrieve OpenGraph image.
147
        $ogRegex = '#<meta[^>]+(?:'. implode('|', $propertiesKey) .')=["\']?og:image["\'\s][^>]*content=["\']?(.*?)["\'\s>]#';
148
        // If the attributes are not in the order property => content (e.g. Github)
149
        // New regex to keep this readable... more or less.
150
        $ogRegexReverse = '#<meta[^>]+content=["\']?([^"\'\s]+)[^>]+(?:'. implode('|', $propertiesKey) .')=["\']?og:image["\'\s/>]#';
151
152
        if (preg_match($ogRegex, $content, $matches) > 0
153
            || preg_match($ogRegexReverse, $content, $matches) > 0
154
        ) {
155
            return $matches[1];
156
        }
157
158
        return false;
159
    }
160
161
    /**
162
     * @inheritdoc
163
     */
164
    public function isHotlinkAllowed()
165
    {
166
        return true;
167
    }
168
169
    /**
170
     * @inheritdoc
171
     */
172
    function checkRules($rules)
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
173
    {
174
    }
175
176
    /**
177
     * @inheritdoc
178
     */
179
    function loadRules($rules)
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
180
    {
181
    }
182
183
    /**
184
     * @inheritdoc
185
     */
186
    function getName()
0 ignored issues
show
Best Practice introduced by
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
187
    {
188
        return 'default';
189
    }
190
}
191