Snoopy::fetch()   F
last analyzed

Complexity

Conditions 33
Paths 3536

Size

Total Lines 113
Code Lines 76

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
eloc 76
dl 0
loc 113
rs 0
c 0
b 0
f 0
cc 33
nc 3536
nop 1

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/*************************************************
4
 *
5
 * Snoopy - the PHP net client
6
 * Author: Monte Ohrt <[email protected]>
7
 * Copyright (c): 1999-2014, all rights reserved
8
 * Version: 1.2.5
9
 * This library is free software; you can redistribute it and/or
10
 * modify it under the terms of the GNU Lesser General Public
11
 * License as published by the Free Software Foundation; either
12
 * version 2.1 of the License, or (at your option) any later version.
13
 *
14
 * This library is distributed in the hope that it will be useful,
15
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
16
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
17
 * Lesser General Public License for more details.
18
 *
19
 * You should have received a copy of the GNU Lesser General Public
20
 * License along with this library; if not, write to the Free Software
21
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
22
 *
23
 * You may contact the author of Snoopy by e-mail at:
24
 * [email protected]
25
 *
26
 * The latest version of Snoopy can be obtained from:
27
 * http://snoopy.sourceforge.net/
28
 *************************************************/
29
class Snoopy
30
{
31
    public function __construct()
32
    {
33
        $GLOBALS['xoopsLogger']->addDeprecated("Use of Snoopy in XOOPS is deprecated and has been replaced in core with XoopsHttpGet. Snoopy will be removed in future versions..");
34
    }
35
36
    /**** Public variables ****/
37
38
    /* user definable vars */
39
40
    var $host = "www.php.net"; // host name we are connecting to
41
    var $port = 80; // port we are connecting to
42
    var $proxy_host = ""; // proxy host to use
43
    var $proxy_port = ""; // proxy port to use
44
    var $proxy_user = ""; // proxy user to use
45
    var $proxy_pass = ""; // proxy password to use
46
47
    var $agent = "Snoopy v1.2.5"; // agent we masquerade as
48
    var $referer = ""; // referer info to pass
49
    var $cookies = array(); // array of cookies to pass
50
    // $cookies["username"]="joe";
51
    var $rawheaders = array(); // array of raw headers to send
52
    // $rawheaders["Content-type"]="text/html";
53
54
    var $maxredirs = 5; // http redirection depth maximum. 0 = disallow
55
    var $lastredirectaddr = ""; // contains address of last redirected address
56
    var $offsiteok = true; // allows redirection off-site
57
    var $maxframes = 0; // frame content depth maximum. 0 = disallow
58
    var $expandlinks = true; // expand links to fully qualified URLs.
59
    // this only applies to fetchlinks()
60
    // submitlinks(), and submittext()
61
    var $passcookies = true; // pass set cookies back through redirects
62
    // NOTE: this currently does not respect
63
    // dates, domains or paths.
64
65
    var $user = ""; // user for http authentication
66
    var $pass = ""; // password for http authentication
67
68
    // http accept types
69
    var $accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*";
70
71
    var $results = ""; // where the content is put
72
73
    var $error = ""; // error messages sent here
74
    var $response_code = ""; // response code returned from server
75
    var $headers = array(); // headers returned from server sent here
76
    var $maxlength = 500000; // max return data length (body)
77
    var $read_timeout = 0; // timeout on read operations, in seconds
78
    // supported only since PHP 4 Beta 4
79
    // set to 0 to disallow timeouts
80
    var $timed_out = false; // if a read operation timed out
81
    var $status = 0; // http request status
82
83
    var $temp_dir = "/tmp"; // temporary directory that the webserver
84
    // has permission to write to.
85
    // under Windows, this should be C:\temp
86
87
    var $curl_path = "/usr/bin/curl";
88
    // Snoopy will use cURL for fetching
89
    // SSL content if a full system path to
90
    // the cURL binary is supplied here.
91
    // set to false if you do not have
92
    // cURL installed. See http://curl.haxx.se
93
    // for details on installing cURL.
94
    // Snoopy does *not* use the cURL
95
    // library functions built into php,
96
    // as these functions are not stable
97
    // as of this Snoopy release.
98
99
    // send Accept-encoding: gzip?
100
    var $use_gzip = true;
101
    /**** Private variables ****/
102
103
    var $_maxlinelen = 4096; // max line length (headers)
104
105
    var $_httpmethod = "GET"; // default http request method
106
    var $_httpversion = "HTTP/1.0"; // default http request version
107
    var $_submit_method = "POST"; // default submit method
108
    var $_submit_type = "application/x-www-form-urlencoded"; // default submit type
109
    var $_mime_boundary = ""; // MIME boundary for multipart/form-data submit type
110
    var $_redirectaddr = false; // will be set if page fetched is a redirect
111
    var $_redirectdepth = 0; // increments on an http redirect
112
    var $_frameurls = array(); // frame src urls
113
    var $_framedepth = 0; // increments on frame depth
114
115
    var $_isproxy = false; // set if using a proxy server
116
    var $_fp_timeout = 30; // timeout for socket connection
117
118
    /*======================================================================*\
119
        Function:	fetch
120
        Purpose:	fetch the contents of a web page
121
                    (and possibly other protocols in the
122
                    future like ftp, nntp, gopher, etc.)
123
        Input:		$URI	the location of the page to fetch
124
        Output:		$this->results	the output text from the fetch
125
    \*======================================================================*/
126
127
    function fetch($URI)
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
128
    {
129
130
        //preg_match("|^([^:]+)://([^:/]+)(:[\d]+)*(.*)|",$URI,$URI_PARTS);
131
        $URI_PARTS = parse_url($URI);
132
        if (!empty($URI_PARTS["user"]))
133
            $this->user = $URI_PARTS["user"];
134
        if (!empty($URI_PARTS["pass"]))
135
            $this->pass = $URI_PARTS["pass"];
136
        if (empty($URI_PARTS["query"]))
137
            $URI_PARTS["query"] = '';
138
        if (empty($URI_PARTS["path"]))
139
            $URI_PARTS["path"] = '';
140
141
        switch (strtolower($URI_PARTS["scheme"])) {
142
            case "http":
143
                $this->host = $URI_PARTS["host"];
144
                if (!empty($URI_PARTS["port"]))
145
                    $this->port = $URI_PARTS["port"];
146
                if ($this->_connect($fp)) {
147
                    if ($this->_isproxy) {
148
                        // using proxy, send entire URI
149
                        $this->_httprequest($URI, $fp, $URI, $this->_httpmethod);
150
                    } else {
151
                        $path = $URI_PARTS["path"] . ($URI_PARTS["query"] ? "?" . $URI_PARTS["query"] : "");
152
                        // no proxy, send only the path
153
                        $this->_httprequest($path, $fp, $URI, $this->_httpmethod);
154
                    }
155
156
                    $this->_disconnect($fp);
157
158
                    if ($this->_redirectaddr) {
159
                        /* url was redirected, check if we've hit the max depth */
160
                        if ($this->maxredirs > $this->_redirectdepth) {
161
                            // only follow redirect if it's on this site, or offsiteok is true
162
                            if (preg_match("|^http://" . preg_quote($this->host) . "|i", $this->_redirectaddr) || $this->offsiteok) {
163
                                /* follow the redirect */
164
                                $this->_redirectdepth++;
165
                                $this->lastredirectaddr = $this->_redirectaddr;
166
                                $this->fetch($this->_redirectaddr);
167
                            }
168
                        }
169
                    }
170
171
                    if ($this->_framedepth < $this->maxframes && count($this->_frameurls) > 0) {
172
                        $frameurls = $this->_frameurls;
173
                        $this->_frameurls = array();
174
175
                        while (list(, $frameurl) = each($frameurls)) {
176
                            if ($this->_framedepth < $this->maxframes) {
177
                                $this->fetch($frameurl);
178
                                $this->_framedepth++;
179
                            } else
180
                                break;
181
                        }
182
                    }
183
                } else {
184
                    return false;
185
                }
186
                return true;
187
                break;
0 ignored issues
show
Unused Code introduced by beckmi
break is not strictly necessary here and could be removed.

The break statement is not necessary if it is preceded for example by a return statement:

switch ($x) {
    case 1:
        return 'foo';
        break; // This break is not necessary and can be left off.
}

If you would like to keep this construct to be consistent with other case statements, you can safely mark this issue as a false-positive.

Loading history...
188
            case "https":
189
                if (!$this->curl_path)
190
                    return false;
191
                if (function_exists("is_executable"))
192
                    if (!is_executable($this->curl_path))
193
                        return false;
194
                $this->host = $URI_PARTS["host"];
195
                if (!empty($URI_PARTS["port"]))
196
                    $this->port = $URI_PARTS["port"];
197
                if ($this->_isproxy) {
198
                    // using proxy, send entire URI
199
                    $this->_httpsrequest($URI, $URI, $this->_httpmethod);
200
                } else {
201
                    $path = $URI_PARTS["path"] . ($URI_PARTS["query"] ? "?" . $URI_PARTS["query"] : "");
202
                    // no proxy, send only the path
203
                    $this->_httpsrequest($path, $URI, $this->_httpmethod);
204
                }
205
206
                if ($this->_redirectaddr) {
207
                    /* url was redirected, check if we've hit the max depth */
208
                    if ($this->maxredirs > $this->_redirectdepth) {
209
                        // only follow redirect if it's on this site, or offsiteok is true
210
                        if (preg_match("|^http://" . preg_quote($this->host) . "|i", $this->_redirectaddr) || $this->offsiteok) {
0 ignored issues
show
Bug introduced by mambax7
$this->_redirectaddr of type true is incompatible with the type string expected by parameter $subject of preg_match(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

210
                        if (preg_match("|^http://" . preg_quote($this->host) . "|i", /** @scrutinizer ignore-type */ $this->_redirectaddr) || $this->offsiteok) {
Loading history...
211
                            /* follow the redirect */
212
                            $this->_redirectdepth++;
213
                            $this->lastredirectaddr = $this->_redirectaddr;
214
                            $this->fetch($this->_redirectaddr);
215
                        }
216
                    }
217
                }
218
219
                if ($this->_framedepth < $this->maxframes && count($this->_frameurls) > 0) {
220
                    $frameurls = $this->_frameurls;
221
                    $this->_frameurls = array();
222
223
                    while (list(, $frameurl) = each($frameurls)) {
224
                        if ($this->_framedepth < $this->maxframes) {
225
                            $this->fetch($frameurl);
226
                            $this->_framedepth++;
227
                        } else
228
                            break;
229
                    }
230
                }
231
                return true;
232
                break;
233
            default:
234
                // not a valid protocol
235
                $this->error = 'Invalid protocol "' . $URI_PARTS["scheme"] . '"\n';
236
                return false;
237
                break;
238
        }
239
        return true;
0 ignored issues
show
Unused Code introduced by beckmi
return true is not reachable.

This check looks for unreachable code. It uses sophisticated control flow analysis techniques to find statements which will never be executed.

Unreachable code is most often the result of return, die or exit statements that have been added for debug purposes.

function fx() {
    try {
        doSomething();
        return true;
    }
    catch (\Exception $e) {
        return false;
    }

    return false;
}

In the above example, the last return false will never be executed, because a return statement has already been met in every possible execution path.

Loading history...
240
    }
241
242
    /*======================================================================*\
243
        Function:	submit
244
        Purpose:	submit an http form
245
        Input:		$URI	the location to post the data
246
                    $formvars	the formvars to use.
247
                        format: $formvars["var"] = "val";
248
                    $formfiles  an array of files to submit
249
                        format: $formfiles["var"] = "/dir/filename.ext";
250
        Output:		$this->results	the text output from the post
251
    \*======================================================================*/
252
253
    function submit($URI, $formvars = "", $formfiles = "")
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
254
    {
255
        unset($postdata);
0 ignored issues
show
Comprehensibility Best Practice introduced by beckmi
The variable $postdata seems to be never defined.
Loading history...
256
257
        $postdata = $this->_prepare_post_body($formvars, $formfiles);
258
259
        $URI_PARTS = parse_url($URI);
260
        if (!empty($URI_PARTS["user"]))
261
            $this->user = $URI_PARTS["user"];
262
        if (!empty($URI_PARTS["pass"]))
263
            $this->pass = $URI_PARTS["pass"];
264
        if (empty($URI_PARTS["query"]))
265
            $URI_PARTS["query"] = '';
266
        if (empty($URI_PARTS["path"]))
267
            $URI_PARTS["path"] = '';
268
269
        switch (strtolower($URI_PARTS["scheme"])) {
270
            case "http":
271
                $this->host = $URI_PARTS["host"];
272
                if (!empty($URI_PARTS["port"]))
273
                    $this->port = $URI_PARTS["port"];
274
                if ($this->_connect($fp)) {
275
                    if ($this->_isproxy) {
276
                        // using proxy, send entire URI
277
                        $this->_httprequest($URI, $fp, $URI, $this->_submit_method, $this->_submit_type, $postdata);
278
                    } else {
279
                        $path = $URI_PARTS["path"] . ($URI_PARTS["query"] ? "?" . $URI_PARTS["query"] : "");
280
                        // no proxy, send only the path
281
                        $this->_httprequest($path, $fp, $URI, $this->_submit_method, $this->_submit_type, $postdata);
282
                    }
283
284
                    $this->_disconnect($fp);
285
286
                    if ($this->_redirectaddr) {
287
                        /* url was redirected, check if we've hit the max depth */
288
                        if ($this->maxredirs > $this->_redirectdepth) {
289
                            if (!preg_match("|^" . $URI_PARTS["scheme"] . "://|", $this->_redirectaddr))
290
                                $this->_redirectaddr = $this->_expandlinks($this->_redirectaddr, $URI_PARTS["scheme"] . "://" . $URI_PARTS["host"]);
291
292
                            // only follow redirect if it's on this site, or offsiteok is true
293
                            if (preg_match("|^http://" . preg_quote($this->host) . "|i", $this->_redirectaddr) || $this->offsiteok) {
294
                                /* follow the redirect */
295
                                $this->_redirectdepth++;
296
                                $this->lastredirectaddr = $this->_redirectaddr;
297
                                if (strpos($this->_redirectaddr, "?") > 0)
298
                                    $this->fetch($this->_redirectaddr); // the redirect has changed the request method from post to get
299
                                else
300
                                    $this->submit($this->_redirectaddr, $formvars, $formfiles);
301
                            }
302
                        }
303
                    }
304
305
                    if ($this->_framedepth < $this->maxframes && count($this->_frameurls) > 0) {
306
                        $frameurls = $this->_frameurls;
307
                        $this->_frameurls = array();
308
309
                        while (list(, $frameurl) = each($frameurls)) {
310
                            if ($this->_framedepth < $this->maxframes) {
311
                                $this->fetch($frameurl);
312
                                $this->_framedepth++;
313
                            } else
314
                                break;
315
                        }
316
                    }
317
318
                } else {
319
                    return false;
320
                }
321
                return true;
322
                break;
0 ignored issues
show
Unused Code introduced by beckmi
break is not strictly necessary here and could be removed.

The break statement is not necessary if it is preceded for example by a return statement:

switch ($x) {
    case 1:
        return 'foo';
        break; // This break is not necessary and can be left off.
}

If you would like to keep this construct to be consistent with other case statements, you can safely mark this issue as a false-positive.

Loading history...
323
            case "https":
324
                if (!$this->curl_path)
325
                    return false;
326
                if (function_exists("is_executable"))
327
                    if (!is_executable($this->curl_path))
328
                        return false;
329
                $this->host = $URI_PARTS["host"];
330
                if (!empty($URI_PARTS["port"]))
331
                    $this->port = $URI_PARTS["port"];
332
                if ($this->_isproxy) {
333
                    // using proxy, send entire URI
334
                    $this->_httpsrequest($URI, $URI, $this->_submit_method, $this->_submit_type, $postdata);
335
                } else {
336
                    $path = $URI_PARTS["path"] . ($URI_PARTS["query"] ? "?" . $URI_PARTS["query"] : "");
337
                    // no proxy, send only the path
338
                    $this->_httpsrequest($path, $URI, $this->_submit_method, $this->_submit_type, $postdata);
339
                }
340
341
                if ($this->_redirectaddr) {
342
                    /* url was redirected, check if we've hit the max depth */
343
                    if ($this->maxredirs > $this->_redirectdepth) {
344
                        if (!preg_match("|^" . $URI_PARTS["scheme"] . "://|", $this->_redirectaddr))
0 ignored issues
show
Bug introduced by geekwright
$this->_redirectaddr of type true is incompatible with the type string expected by parameter $subject of preg_match(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

344
                        if (!preg_match("|^" . $URI_PARTS["scheme"] . "://|", /** @scrutinizer ignore-type */ $this->_redirectaddr))
Loading history...
345
                            $this->_redirectaddr = $this->_expandlinks($this->_redirectaddr, $URI_PARTS["scheme"] . "://" . $URI_PARTS["host"]);
346
347
                        // only follow redirect if it's on this site, or offsiteok is true
348
                        if (preg_match("|^http://" . preg_quote($this->host) . "|i", $this->_redirectaddr) || $this->offsiteok) {
349
                            /* follow the redirect */
350
                            $this->_redirectdepth++;
351
                            $this->lastredirectaddr = $this->_redirectaddr;
352
                            if (strpos($this->_redirectaddr, "?") > 0)
0 ignored issues
show
Bug introduced by geekwright
It seems like $this->_redirectaddr can also be of type true; however, parameter $haystack of strpos() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

352
                            if (strpos(/** @scrutinizer ignore-type */ $this->_redirectaddr, "?") > 0)
Loading history...
353
                                $this->fetch($this->_redirectaddr); // the redirect has changed the request method from post to get
354
                            else
355
                                $this->submit($this->_redirectaddr, $formvars, $formfiles);
356
                        }
357
                    }
358
                }
359
360
                if ($this->_framedepth < $this->maxframes && count($this->_frameurls) > 0) {
361
                    $frameurls = $this->_frameurls;
362
                    $this->_frameurls = array();
363
364
                    while (list(, $frameurl) = each($frameurls)) {
365
                        if ($this->_framedepth < $this->maxframes) {
366
                            $this->fetch($frameurl);
367
                            $this->_framedepth++;
368
                        } else
369
                            break;
370
                    }
371
                }
372
                return true;
373
                break;
374
375
            default:
376
                // not a valid protocol
377
                $this->error = 'Invalid protocol "' . $URI_PARTS["scheme"] . '"\n';
378
                return false;
379
                break;
380
        }
381
        return true;
0 ignored issues
show
Unused Code introduced by beckmi
return true is not reachable.

This check looks for unreachable code. It uses sophisticated control flow analysis techniques to find statements which will never be executed.

Unreachable code is most often the result of return, die or exit statements that have been added for debug purposes.

function fx() {
    try {
        doSomething();
        return true;
    }
    catch (\Exception $e) {
        return false;
    }

    return false;
}

In the above example, the last return false will never be executed, because a return statement has already been met in every possible execution path.

Loading history...
382
    }
383
384
    /*======================================================================*\
385
        Function:	fetchlinks
386
        Purpose:	fetch the links from a web page
387
        Input:		$URI	where you are fetching from
388
        Output:		$this->results	an array of the URLs
389
    \*======================================================================*/
390
391
    function fetchlinks($URI)
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
392
    {
393
        if ($this->fetch($URI)) {
394
            if ($this->lastredirectaddr)
395
                $URI = $this->lastredirectaddr;
396
            if (is_array($this->results)) {
0 ignored issues
show
introduced by beckmi
The condition is_array($this->results) is always false.
Loading history...
397
                for ($x = 0; $x < count($this->results); $x++)
0 ignored issues
show
Performance Best Practice introduced by geekwright
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
398
                    $this->results[$x] = $this->_striplinks($this->results[$x]);
399
            } else
400
                $this->results = $this->_striplinks($this->results);
401
402
            if ($this->expandlinks)
403
                $this->results = $this->_expandlinks($this->results, $URI);
404
            return true;
405
        } else
406
            return false;
407
    }
408
409
    /*======================================================================*\
410
        Function:	fetchform
411
        Purpose:	fetch the form elements from a web page
412
        Input:		$URI	where you are fetching from
413
        Output:		$this->results	the resulting html form
414
    \*======================================================================*/
415
416
    function fetchform($URI)
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
417
    {
418
419
        if ($this->fetch($URI)) {
420
421
            if (is_array($this->results)) {
0 ignored issues
show
introduced by beckmi
The condition is_array($this->results) is always false.
Loading history...
422
                for ($x = 0; $x < count($this->results); $x++)
0 ignored issues
show
Performance Best Practice introduced by geekwright
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
423
                    $this->results[$x] = $this->_stripform($this->results[$x]);
424
            } else
425
                $this->results = $this->_stripform($this->results);
426
427
            return true;
428
        } else
429
            return false;
430
    }
431
432
433
    /*======================================================================*\
434
        Function:	fetchtext
435
        Purpose:	fetch the text from a web page, stripping the links
436
        Input:		$URI	where you are fetching from
437
        Output:		$this->results	the text from the web page
438
    \*======================================================================*/
439
440
    function fetchtext($URI)
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
441
    {
442
        if ($this->fetch($URI)) {
443
            if (is_array($this->results)) {
0 ignored issues
show
introduced by beckmi
The condition is_array($this->results) is always false.
Loading history...
444
                for ($x = 0; $x < count($this->results); $x++)
0 ignored issues
show
Performance Best Practice introduced by geekwright
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
445
                    $this->results[$x] = $this->_striptext($this->results[$x]);
446
            } else
447
                $this->results = $this->_striptext($this->results);
448
            return true;
449
        } else
450
            return false;
451
    }
452
453
    /*======================================================================*\
454
        Function:	submitlinks
455
        Purpose:	grab links from a form submission
456
        Input:		$URI	where you are submitting from
457
        Output:		$this->results	an array of the links from the post
458
    \*======================================================================*/
459
460
    function submitlinks($URI, $formvars = "", $formfiles = "")
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
461
    {
462
        if ($this->submit($URI, $formvars, $formfiles)) {
463
            if ($this->lastredirectaddr)
464
                $URI = $this->lastredirectaddr;
465
            if (is_array($this->results)) {
0 ignored issues
show
introduced by beckmi
The condition is_array($this->results) is always false.
Loading history...
466
                for ($x = 0; $x < count($this->results); $x++) {
0 ignored issues
show
Performance Best Practice introduced by geekwright
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
467
                    $this->results[$x] = $this->_striplinks($this->results[$x]);
468
                    if ($this->expandlinks)
469
                        $this->results[$x] = $this->_expandlinks($this->results[$x], $URI);
470
                }
471
            } else {
472
                $this->results = $this->_striplinks($this->results);
473
                if ($this->expandlinks)
474
                    $this->results = $this->_expandlinks($this->results, $URI);
475
            }
476
            return true;
477
        } else
478
            return false;
479
    }
480
481
    /*======================================================================*\
482
        Function:	submittext
483
        Purpose:	grab text from a form submission
484
        Input:		$URI	where you are submitting from
485
        Output:		$this->results	the text from the web page
486
    \*======================================================================*/
487
488
    function submittext($URI, $formvars = "", $formfiles = "")
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
489
    {
490
        if ($this->submit($URI, $formvars, $formfiles)) {
491
            if ($this->lastredirectaddr)
492
                $URI = $this->lastredirectaddr;
493
            if (is_array($this->results)) {
0 ignored issues
show
introduced by beckmi
The condition is_array($this->results) is always false.
Loading history...
494
                for ($x = 0; $x < count($this->results); $x++) {
0 ignored issues
show
Performance Best Practice introduced by geekwright
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
495
                    $this->results[$x] = $this->_striptext($this->results[$x]);
496
                    if ($this->expandlinks)
497
                        $this->results[$x] = $this->_expandlinks($this->results[$x], $URI);
498
                }
499
            } else {
500
                $this->results = $this->_striptext($this->results);
501
                if ($this->expandlinks)
502
                    $this->results = $this->_expandlinks($this->results, $URI);
503
            }
504
            return true;
505
        } else
506
            return false;
507
    }
508
509
510
    /*======================================================================*\
511
        Function:	set_submit_multipart
512
        Purpose:	Set the form submission content type to
513
                    multipart/form-data
514
    \*======================================================================*/
515
    function set_submit_multipart()
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
516
    {
517
        $this->_submit_type = "multipart/form-data";
518
    }
519
520
521
    /*======================================================================*\
522
        Function:	set_submit_normal
523
        Purpose:	Set the form submission content type to
524
                    application/x-www-form-urlencoded
525
    \*======================================================================*/
526
    function set_submit_normal()
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
527
    {
528
        $this->_submit_type = "application/x-www-form-urlencoded";
529
    }
530
531
532
533
534
    /*======================================================================*\
535
        Private functions
536
    \*======================================================================*/
537
538
539
    /*======================================================================*\
540
        Function:	_striplinks
541
        Purpose:	strip the hyperlinks from an html document
542
        Input:		$document	document to strip.
543
        Output:		$match		an array of the links
544
    \*======================================================================*/
545
546
    function _striplinks($document)
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
547
    {
548
        preg_match_all("'<\s*a\s.*?href\s*=\s*			# find <a href=
549
						([\"\'])?					# find single or double quote
550
						(?(1) (.*?)\\1 | ([^\s\>]+))		# if quote found, match up to next matching
551
													# quote, otherwise match up to next space
552
						'isx", $document, $links);
553
554
555
        // catenate the non-empty matches from the conditional subpattern
556
557
        while (list($key, $val) = each($links[2])) {
558
            if (!empty($val))
559
                $match[] = $val;
560
        }
561
562
        while (list($key, $val) = each($links[3])) {
563
            if (!empty($val))
564
                $match[] = $val;
565
        }
566
567
        // return the links
568
        return $match;
0 ignored issues
show
Comprehensibility Best Practice introduced by beckmi
The variable $match does not seem to be defined for all execution paths leading up to this point.
Loading history...
569
    }
570
571
    /*======================================================================*\
572
        Function:	_stripform
573
        Purpose:	strip the form elements from an html document
574
        Input:		$document	document to strip.
575
        Output:		$match		an array of the links
576
    \*======================================================================*/
577
578
    function _stripform($document)
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
579
    {
580
        preg_match_all("'<\/?(FORM|INPUT|SELECT|TEXTAREA|(OPTION))[^<>]*>(?(2)(.*(?=<\/?(option|select)[^<>]*>[\r\n]*)|(?=[\r\n]*))|(?=[\r\n]*))'Usi", $document, $elements);
581
582
        // catenate the matches
583
        $match = implode("\r\n", $elements[0]);
584
585
        // return the links
586
        return $match;
587
    }
588
589
590
    /*======================================================================*\
591
        Function:	_striptext
592
        Purpose:	strip the text from an html document
593
        Input:		$document	document to strip.
594
        Output:		$text		the resulting text
595
    \*======================================================================*/
596
597
    function _striptext($document)
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
598
    {
599
600
        // I didn't use preg eval (//e) since that is only available in PHP 4.0.
601
        // so, list your entities one by one here. I included some of the
602
        // more common ones.
603
604
        $search = array("'<script[^>]*?>.*?</script>'si", // strip out javascript
605
            "'<[\/\!]*?[^<>]*?>'si", // strip out html tags
606
            "'([\r\n])[\s]+'", // strip out white space
607
            "'&(quot|#34|#034|#x22);'i", // replace html entities
608
            "'&(amp|#38|#038|#x26);'i", // added hexadecimal values
609
            "'&(lt|#60|#060|#x3c);'i",
610
            "'&(gt|#62|#062|#x3e);'i",
611
            "'&(nbsp|#160|#xa0);'i",
612
            "'&(iexcl|#161);'i",
613
            "'&(cent|#162);'i",
614
            "'&(pound|#163);'i",
615
            "'&(copy|#169);'i",
616
            "'&(reg|#174);'i",
617
            "'&(deg|#176);'i",
618
            "'&(#39|#039|#x27);'",
619
            "'&(euro|#8364);'i", // europe
620
            "'&a(uml|UML);'", // german
621
            "'&o(uml|UML);'",
622
            "'&u(uml|UML);'",
623
            "'&A(uml|UML);'",
624
            "'&O(uml|UML);'",
625
            "'&U(uml|UML);'",
626
            "'&szlig;'i",
627
        );
628
        $replace = array("",
629
            "",
630
            "\\1",
631
            "\"",
632
            "&",
633
            "<",
634
            ">",
635
            " ",
636
            chr(161),
637
            chr(162),
638
            chr(163),
639
            chr(169),
640
            chr(174),
641
            chr(176),
642
            chr(39),
643
            chr(128),
644
            "ä",
645
            "ö",
646
            "ü",
647
            "Ä",
648
            "Ö",
649
            "Ü",
650
            "ß",
651
        );
652
653
        $text = preg_replace($search, $replace, $document);
654
655
        return $text;
656
    }
657
658
    /*======================================================================*\
659
        Function:	_expandlinks
660
        Purpose:	expand each link into a fully qualified URL
661
        Input:		$links			the links to qualify
662
                    $URI			the full URI to get the base from
663
        Output:		$expandedLinks	the expanded links
664
    \*======================================================================*/
665
666
    function _expandlinks($links, $URI)
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
667
    {
668
669
        preg_match("/^[^\?]+/", $URI, $match);
670
671
        $match = preg_replace("|/[^\/\.]+\.[^\/\.]+$|", "", $match[0]);
672
        $match = preg_replace("|/$|", "", $match);
673
        $match_part = parse_url($match);
674
        $match_root =
675
            $match_part["scheme"] . "://" . $match_part["host"];
676
677
        $search = array("|^http://" . preg_quote($this->host) . "|i",
678
            "|^(\/)|i",
679
            "|^(?!http://)(?!mailto:)|i",
680
            "|/\./|",
681
            "|/[^\/]+/\.\./|"
682
        );
683
684
        $replace = array("",
685
            $match_root . "/",
686
            $match . "/",
687
            "/",
688
            "/"
689
        );
690
691
        $expandedLinks = preg_replace($search, $replace, $links);
692
693
        return $expandedLinks;
694
    }
695
696
    /*======================================================================*\
697
        Function:	_httprequest
698
        Purpose:	go get the http data from the server
699
        Input:		$url		the url to fetch
700
                    $fp			the current open file pointer
701
                    $URI		the full URI
702
                    $body		body contents to send if any (POST)
703
        Output:
704
    \*======================================================================*/
705
706
    function _httprequest($url, $fp, $URI, $http_method, $content_type = "", $body = "")
0 ignored issues
show
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
707
    {
708
        $cookie_headers = '';
709
        if ($this->passcookies && $this->_redirectaddr)
710
            $this->setcookies();
711
712
        $URI_PARTS = parse_url($URI);
713
        if (empty($url))
714
            $url = "/";
715
        $headers = $http_method . " " . $url . " " . $this->_httpversion . "\r\n";
716
        if (!empty($this->agent))
717
            $headers .= "User-Agent: " . $this->agent . "\r\n";
718
        if (!empty($this->host) && !isset($this->rawheaders['Host'])) {
719
            $headers .= "Host: " . $this->host;
720
            if (!empty($this->port) && $this->port != '80')
721
                $headers .= ":" . $this->port;
722
            $headers .= "\r\n";
723
        }
724
        if (!empty($this->accept))
725
            $headers .= "Accept: " . $this->accept . "\r\n";
726
        if ($this->use_gzip) {
727
            // make sure PHP was built with --with-zlib
728
            // and we can handle gzipp'ed data
729
            if (function_exists('gzinflate')) {
730
                $headers .= "Accept-encoding: gzip\r\n";
731
            } else {
732
                trigger_error(
733
                    "use_gzip is on, but PHP was built without zlib support." .
734
                    "  Requesting file(s) without gzip encoding.",
735
                    E_USER_NOTICE);
736
            }
737
        }
738
        if (!empty($this->referer))
739
            $headers .= "Referer: " . $this->referer . "\r\n";
740
        if (!empty($this->cookies)) {
741
            if (!is_array($this->cookies))
0 ignored issues
show
introduced by geekwright
The condition is_array($this->cookies) is always true.
Loading history...
742
                $this->cookies = (array)$this->cookies;
743
744
            reset($this->cookies);
745
            if (count($this->cookies) > 0) {
746
                $cookie_headers .= 'Cookie: ';
747
                foreach ($this->cookies as $cookieKey => $cookieVal) {
748
                    $cookie_headers .= $cookieKey . "=" . urlencode($cookieVal) . "; ";
749
                }
750
                $headers .= substr($cookie_headers, 0, -2) . "\r\n";
751
            }
752
        }
753
        if (!empty($this->rawheaders)) {
754
            if (!is_array($this->rawheaders))
0 ignored issues
show
introduced by geekwright
The condition is_array($this->rawheaders) is always true.
Loading history...
755
                $this->rawheaders = (array)$this->rawheaders;
756
            while (list($headerKey, $headerVal) = each($this->rawheaders))
757
                $headers .= $headerKey . ": " . $headerVal . "\r\n";
758
        }
759
        if (!empty($content_type)) {
760
            $headers .= "Content-type: $content_type";
761
            if ($content_type == "multipart/form-data")
762
                $headers .= "; boundary=" . $this->_mime_boundary;
763
            $headers .= "\r\n";
764
        }
765
        if (!empty($body))
766
            $headers .= "Content-length: " . strlen($body) . "\r\n";
767
        if (!empty($this->user) || !empty($this->pass))
768
            $headers .= "Authorization: Basic " . base64_encode($this->user . ":" . $this->pass) . "\r\n";
769
770
        //add proxy auth headers
771
        if (!empty($this->proxy_user))
772
            $headers .= 'Proxy-Authorization: ' . 'Basic ' . base64_encode($this->proxy_user . ':' . $this->proxy_pass) . "\r\n";
773
774
775
        $headers .= "\r\n";
776
777
        // set the read timeout if needed
778
        if ($this->read_timeout > 0)
779
            socket_set_timeout($fp, $this->read_timeout);
780
        $this->timed_out = false;
781
782
        fwrite($fp, $headers . $body, strlen($headers . $body));
783
784
        $this->_redirectaddr = false;
785
        unset($this->headers);
786
787
        // content was returned gzip encoded?
788
        $is_gzipped = false;
789
790
        while ($currentHeader = fgets($fp, $this->_maxlinelen)) {
791
            if ($this->read_timeout > 0 && $this->_check_timeout($fp)) {
792
                $this->status = -100;
793
                return false;
794
            }
795
796
            if ($currentHeader == "\r\n")
797
                break;
798
799
            // if a header begins with Location: or URI:, set the redirect
800
            if (preg_match("/^(Location:|URI:)/i", $currentHeader)) {
801
                // get URL portion of the redirect
802
                preg_match("/^(Location:|URI:)[ ]+(.*)/i", chop($currentHeader), $matches);
803
                // look for :// in the Location header to see if hostname is included
804
                if (!preg_match("|\:\/\/|", $matches[2])) {
805
                    // no host in the path, so prepend
806
                    $this->_redirectaddr = $URI_PARTS["scheme"] . "://" . $this->host . ":" . $this->port;
807
                    // eliminate double slash
808
                    if (!preg_match("|^/|", $matches[2]))
809
                        $this->_redirectaddr .= "/" . $matches[2];
810
                    else
811
                        $this->_redirectaddr .= $matches[2];
812
                } else
813
                    $this->_redirectaddr = $matches[2];
814
            }
815
816
            if (preg_match("|^HTTP/|", $currentHeader)) {
817
                if (preg_match("|^HTTP/[^\s]*\s(.*?)\s|", $currentHeader, $status)) {
818
                    $this->status = $status[1];
819
                }
820
                $this->response_code = $currentHeader;
821
            }
822
823
            if (preg_match("/Content-Encoding: gzip/", $currentHeader)) {
824
                $is_gzipped = true;
825
            }
826
827
            $this->headers[] = $currentHeader;
828
        }
829
830
        $results = '';
831
        do {
832
            $_data = fread($fp, $this->maxlength);
833
            if (strlen($_data) == 0) {
834
                break;
835
            }
836
            $results .= $_data;
837
        } while (true);
838
839
        // gunzip
840
        if ($is_gzipped) {
841
            // per http://www.php.net/manual/en/function.gzencode.php
842
            $results = substr($results, 10);
843
            $results = gzinflate($results);
844
        }
845
846
        if ($this->read_timeout > 0 && $this->_check_timeout($fp)) {
847
            $this->status = -100;
848
            return false;
849
        }
850
851
        // check if there is a a redirect meta tag
852
853
        if (preg_match("'<meta[\s]*http-equiv[^>]*?content[\s]*=[\s]*[\"\']?\d+;[\s]*URL[\s]*=[\s]*([^\"\']*?)[\"\']?>'i", $results, $match)) {
854
            $this->_redirectaddr = $this->_expandlinks($match[1], $URI);
855
        }
856
857
        // have we hit our frame depth and is there frame src to fetch?
858
        if (($this->_framedepth < $this->maxframes) && preg_match_all("'<frame\s+.*src[\s]*=[\'\"]?([^\'\"\>]+)'i", $results, $match)) {
859
            $this->results[] = $results;
860
            for ($x = 0; $x < count($match[1]); $x++)
0 ignored issues
show
Performance Best Practice introduced by geekwright
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
861
                $this->_frameurls[] = $this->_expandlinks($match[1][$x], $URI_PARTS["scheme"] . "://" . $this->host);
862
        } // have we already fetched framed content?
863
        elseif (is_array($this->results))
0 ignored issues
show
introduced by geekwright
The condition is_array($this->results) is always false.
Loading history...
864
            $this->results[] = $results;
865
        // no framed content
866
        else
867
            $this->results = $results;
868
869
        return true;
870
    }
871
872
    /*======================================================================*\
873
        Function:	_httpsrequest
874
        Purpose:	go get the https data from the server using curl
875
        Input:		$url		the url to fetch
876
                    $URI		the full URI
877
                    $body		body contents to send if any (POST)
878
        Output:
879
    \*======================================================================*/
880
881
    function _httpsrequest($url, $URI, $http_method, $content_type = "", $body = "")
0 ignored issues
show
Unused Code introduced by geekwright
The parameter $http_method is not used and could be removed. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unused  annotation

881
    function _httpsrequest($url, $URI, /** @scrutinizer ignore-unused */ $http_method, $content_type = "", $body = "")

This check looks for parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
Best Practice introduced by geekwright
It is generally recommended to explicitly declare the visibility for methods.

Adding explicit visibility (private, protected, or public) is generally recommend to communicate to other developers how, and from where this method is intended to be used.

Loading history...
882
    {
883
        if ($this->passcookies && $this->_redirectaddr)
884
            $this->setcookies();
885
886
        $headers = array();
887
888
        $URI_PARTS = parse_url($URI);