Passed
Pull Request — master (#31)
by Yoshiaki
01:53
created

Checker.php (4 issues)

Upgrade to new PHP Analysis Engine

These results are based on our legacy PHP analysis, consider migrating to our new PHP analysis engine instead. Learn more

1
<?php
0 ignored issues
show
Coding Style Compatibility introduced by
For compatibility and reusability of your code, PSR1 recommends that a file should introduce either new symbols (like classes, functions, etc.) or have side-effects (like outputting something, or including other files), but not both at the same time. The first symbol is defined on line 12 and the first side effect is on line 5.

The PSR-1: Basic Coding Standard recommends that a file should either introduce new symbols, that is classes, functions, constants or similar, or have side effects. Side effects are anything that executes logic, like for example printing output, changing ini settings or writing to a file.

The idea behind this recommendation is that merely auto-loading a class should not change the state of an application. It also promotes a cleaner style of programming and makes your code less prone to errors, because the logic is not spread out all over the place.

To learn more about the PSR-1, please see the PHP-FIG site on the PSR-1.

Loading history...
2
3
namespace Error;
4
5
require_once (__DIR__ . '/vendor/autoload.php');
6
7
/**
8
 * Description of Checker Main
9
 *
10
 * @author bootjp
11
 */
12
class Checker
13
{
14
    protected $client;
15
16
    protected $contentsSize = 500;
17
18
    protected $doubleCheck = true;
19
20
    protected $recursion = false;
21
22
    protected $garbage = [];
23
24
    protected $isContentsFetch = true;
25
26
27
    /**
28
     * initialisation.
29
     * @param array $args
30
     */
31
    public function __construct(array $args)
32
    {
33
        $this->client = new \GuzzleHttp\Client([
34
                'defaults' => [
35
                    'exceptions' => false,
36
                    'headers' => [
37
                        'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) ' .
38
                        'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.111 Safari/537.36'
39
                    ]
40
                ]
41
            ]
42
        );
43
        if (array_key_exists('contentSize', $args)) {
44
            $this->contentsSize = (int) $args['contentSize'];
45
        }
46
47
        if (array_key_exists('doubleCheck', $args)) {
48
            $this->doubleCheck = (bool) $args['doubleCheck'];
49
        }
50
51
        if (array_key_exists('isContentsFetch', $args)) {
52
            $this->isContentsFetch = (bool) $args['isContentsFetch'];
53
        }
54
55
        if (array_key_exists('recursion', $args)) {
56
            $this->recursion = (bool) $args['recursion'];
57
        }
58
59
        if (array_key_exists('auth', $args)) {
60
            list($username, $password) = explode(':', $args['auth'], 2);
61
            $this->client->setDefaultOption('auth', [$username, $password]);
62
        }
63
64
    }
65
66
    /**
67
     * Wrapper
68
     * @param  mixed $url [require]
69
     * @return array
70
     * @throws \ErrorException
71
     * @throws \ReflectionException
72
     */
73
    public function start($url)
74
    {
75
        $urlList = [];
76
        $result = [];
77
        $result['white'] = [];
78
        $result['black'] = [];
79
        if (array_key_exists(0, $list)) {
80
            $getFlag = $list[0];
0 ignored issues
show
The variable $list does not exist. Did you forget to declare it?

This check marks access to variables or properties that have not been declared yet. While PHP has no explicit notion of declaring a variable, accessing it before a value is assigned to it is most likely a bug.

Loading history...
$getFlag is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
81
        }
82
        if (array_key_exists(1, $list)) {
83
            $this->recursion = $list[1];
84
        }
85
86
        if ((bool) $this->isContentsFetch) {
87
            echo 'Contents fetching..';
88
            $url = $this->fetchByContents($url);
89
90
            if ((bool) $this->recursion) {
91
                $url = $this->urlFilter($url);
92
            }
93
        }
94
95
        if (is_null($url)) {
96
            throw new \RuntimeException('Start URL is not null.');
97
        } else if (is_array($url)) {
98
            $urlList = $this->urlFilter($url);
99
        } else if (is_string($url)) {
100
            $urlList[] = $url;
101
        } else if (is_object($url)) {
102
            $urlList[] = (string) $url;
103
        }
104
105
        echo "\n";
106
        echo 'Cheking..';
107
108
        foreach ($urlList as $key => $url) {
109
            try {
110
                $metaData = $this->client->get($url);
111
            } catch (\Exception $e) {
112
                echo "\n {$url}\t {$e->getMessage()}";
113
            }
114
            $hardCheck = (array) $this->hardCheckByHeader($metaData);
115
            $softCheck = (array) $this->softCheckByContents($metaData);
116
117
            if ($hardCheck['result'] && $softCheck['result']) {
118
                $result['white'][$key]['url'] = $url;
119
                $result['white'][$key]['status'] = 'OK';
120
            } else {
121
                $result['black'][$key]['url'] = $url;
122
                $result['black'][$key]['status'] = array_key_exists('status', $hardCheck) ? $hardCheck['status'] : $softCheck['status'];
123
            }
124
125
            usleep(500000);
126
            echo '.';
127
        }
128
        $result['UnknownLinks'] = $this->garbage;
129
130
        return $result;
131
    }
132
133
    /**
134
     * Fetch Page Contents Links
135
     * @param  mixed $baseUrl
136
     * @return array URlList
137
     * @throws \ErrorException
138
     */
139
    private function fetchByContents($baseUrl)
140
    {
141
        $urlList = [];
142
        $matches = [];
143
        $urlList['baseUrl'] = (string) $baseUrl;
144
        try {
145
            $contents = $this->client->get($baseUrl)->getBody()->getContents();
146
        } catch (\Exception $e) {
147
            echo "\n {$baseUrl}\t {$e->getMessage()}";
148
        }
149
150
        preg_match_all('{<a.+?href=[\"|\'](?<url>.+?)[\"\|\'].*?>}is', $contents, $matches);
151
152
        if (!array_key_exists('url', $matches)) {
153
            throw new \RuntimeException('Not match contents on url.');
154
        }
155
156
        foreach ($matches['url'] as $url) {
157
158
            if (preg_match('{https?://[\w/:%#\$&\?\(\)~\.=\+\-]+}i', $url)) {
159
                $urlList[] = $url;
160
            } else if (preg_match('{^https?:\/\/[\w/:%#\$&\?\(\)~\.=\+\-]+$}i', $baseUrl . $url)) {
161
                if (preg_match("{(^#[A-Z0-9].+?$)}i", $url)) {
162
                    $this->garbage[] = $url;
163
                } else if (preg_match("#javascript.*#i", $url)) {
164
                    $this->garbage[] = $url;
165
                } else {
166
                    // start slash ?
167
                    $startSlash = substr($url, 0, 1) === '/';
168
                    $secondSlash = substr($url, 1, 1) === '/';
169
                    if ($startSlash && $secondSlash) {
170
                        $parsedUrl = parse_url($baseUrl);
171
                        $urlList[] = $parsedUrl['scheme'] . ':' . $url;
172
                    } else if ($startSlash) {
173
                        // end is slash?
174
                        $parsedUrl = parse_url($baseUrl);
175
                        if (substr($baseUrl, -1, 1) === '/') {
176
                            // has slash
177
                            $root = $parsedUrl['scheme'] . '://' . $parsedUrl['host'];
178
                        } else {
179
                            // add slash
180
                            $root = $parsedUrl['scheme'] . '://' . $parsedUrl['host'] . '/';
181
                        }
182
                        $urlList[] = $root . $url;
183
                    } else {
184
                        $urlList[] = $baseUrl . $url;
185
                    }
186
                }
187
            } else {
188
                $this->garbage[] = $url;
189
            }
190
191
            usleep(500000);
192
            echo '.';
193
        }
194
195
        return array_unique($urlList);
196
    }
197
198
    /**
199
     * Error check by header
200
     * @param \GuzzleHttp\Message\Response $metaData
201
     * @return array
202
     */
203
    private function hardCheckByHeader($metaData)
0 ignored issues
show
This method is not used, and could be removed.
Loading history...
204
    {
205
        $headers = array_change_key_case($metaData->getHeaders());
206
        $statusCode = (int) $metaData->getStatusCode();
207
208
        $isErrorPageCode = [
209
            '40x' => [401, 403, 404],
210
            '50x' => [500, 502, 503],
211
            '30x' => [301, 302, 308]
212
        ];
213
214
        foreach($isErrorPageCode as $errorType => $statuses) {
215
            if (in_array($statusCode, $statuses)) {
216
                return [
217
                    'result' => false,
218
                    'status' => "NG : status code {$errorType}"
219
                ];
220
            }
221
        }
222
223
        if ($statusCode === 200 && $statusCode === 304) {
224
            return [
225
                'result' => true
226
            ];
227
        }
228
229
        if (array_key_exists('content-length', $headers) && $headers['content-length'][0] < $this->contentsSize) {
230
            return [
231
                'result' => false,
232
                'status' => 'NG : contentsSize'
233
            ];
234
        }
235
236
        return [
237
            'result' => true
238
        ];
239
    }
240
241
    /**
242
     * Soft404 check by contents Length
243
     * @param \GuzzleHttp\Message\Response $metaData
244
     * @return array
245
     */
246
    public function softCheckByContents($metaData)
247
    {
248
        if ($metaData->getBody()->getSize() <= $this->contentsSize) {
249
            return [
250
                'result' => false,
251
                'status' => 'NG : contentsSize'
252
            ];
253
        }
254
255
        if ($this->doubleCheck) {
256
            $result = $this->softCheckByContentsWords($metaData);
257
            if (!$result['result']) {
258
                return [
259
                    'result' => $result['result'],
260
                    'status' => $result['status']
261
                ];
262
            }
263
        }
264
265
        return [
266
            'result' => true
267
        ];
268
    }
269
270
    /**
271
     * Soft404 Error check by words
272
     * @param \GuzzleHttp\Message\Response $metaData
273
     * @return array Result
274
     */
275
    private function softCheckByContentsWords($metaData)
276
    {
277
        foreach (self::getSoftErrorWords() as $word) {
278
            if (mb_stripos($metaData->getBody()->getContents(), $word) !== false) {
279
                return [
280
                    'result' => false,
281
                    'status' => 'NG WORD : ' . $word
282
                ];
283
            }
284
        }
285
286
        return [
287
            'result' => true
288
        ];
289
290
    }
291
292
    /**
293
     * Return soft404 Page on Words.
294
     * @param  none
295
     * @return array
296
     */
297
    private static function getSoftErrorWords()
298
    {
299
        return file(__DIR__ . '/ErrorPageWords.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
300
    }
301
302
    /**
303
     * multidimensional array to single arry comvert.
304
     * @param array $urlList
305
     * @return array URLLIST
306
     */
307
    private function urlFilter(array $urlList)
308
    {
309
        $result = [];
310
        array_walk_recursive($urlList, function($v) use (&$result) {
311
            $result[] = $v;
312
        });
313
314
        return array_values(array_unique($result));
315
    }
316
}
317