Completed
Push — master ( 17759d...0bd945 )
by cam
08:07
created

distant.php ➔ need_proxy()   C

Complexity

Conditions 12
Paths 48

Size

Total Lines 41

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 12
nc 48
nop 3
dl 0
loc 41
rs 6.9666
c 0
b 0
f 0

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/***************************************************************************\
4
 *  SPIP, Systeme de publication pour l'internet                           *
5
 *                                                                         *
6
 *  Copyright (c) 2001-2018                                                *
7
 *  Arnaud Martin, Antoine Pitrou, Philippe Riviere, Emmanuel Saint-James  *
8
 *                                                                         *
9
 *  Ce programme est un logiciel libre distribue sous licence GNU/GPL.     *
10
 *  Pour plus de details voir le fichier COPYING.txt ou l'aide en ligne.   *
11
\***************************************************************************/
12
13
/**
14
 * Ce fichier gère l'obtention de données distantes
15
 *
16
 * @package SPIP\Core\Distant
17
 **/
18
if (!defined('_ECRIRE_INC_VERSION')) {
19
	return;
20
}
21
22
if (!defined('_INC_DISTANT_VERSION_HTTP')) {
23
	define('_INC_DISTANT_VERSION_HTTP', 'HTTP/1.0');
24
}
25
if (!defined('_INC_DISTANT_CONTENT_ENCODING')) {
26
	define('_INC_DISTANT_CONTENT_ENCODING', 'gzip');
27
}
28
if (!defined('_INC_DISTANT_USER_AGENT')) {
29
	define('_INC_DISTANT_USER_AGENT', 'SPIP-' . $GLOBALS['spip_version_affichee'] . ' (' . $GLOBALS['home_server'] . ')');
30
}
31
if (!defined('_INC_DISTANT_MAX_SIZE')) {
32
	define('_INC_DISTANT_MAX_SIZE', 2097152);
33
}
34
if (!defined('_INC_DISTANT_CONNECT_TIMEOUT')) {
35
	define('_INC_DISTANT_CONNECT_TIMEOUT', 10);
36
}
37
38
define('_REGEXP_COPIE_LOCALE', ',' 	.
39
	preg_replace(
40
		'@^https?:@',
41
		'https?:',
42
		(isset($GLOBALS['meta']['adresse_site']) ? $GLOBALS['meta']['adresse_site'] : '')
43
	)
44
	. '/?spip.php[?]action=acceder_document.*file=(.*)$,');
45
46
//@define('_COPIE_LOCALE_MAX_SIZE',2097152); // poids (inc/utils l'a fait)
47
48
/**
49
 * Crée au besoin la copie locale d'un fichier distant
50
 *
51
 * Prend en argument un chemin relatif au rep racine, ou une URL
52
 * Renvoie un chemin relatif au rep racine, ou false
53
 *
54
 * @link http://www.spip.net/4155
55
 * @pipeline_appel post_edition
56
 *
57
 * @param string $source
58
 * @param string $mode
59
 *   - 'test' - ne faire que tester
60
 *   - 'auto' - charger au besoin
61
 *   - 'modif' - Si deja present, ne charger que si If-Modified-Since
62
 *   - 'force' - charger toujours (mettre a jour)
63
 * @param string $local
0 ignored issues
show
Documentation introduced by
Should the type for parameter $local not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
64
 *   permet de specifier le nom du fichier local (stockage d'un cache par exemple, et non document IMG)
65
 * @param int $taille_max
0 ignored issues
show
Documentation introduced by
Should the type for parameter $taille_max not be integer|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
66
 *   taille maxi de la copie local, par defaut _COPIE_LOCALE_MAX_SIZE
67
 * @return bool|string
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use string|false.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
68
 */
69
function copie_locale($source, $mode = 'auto', $local = null, $taille_max = null) {
70
71
	// si c'est la protection de soi-meme, retourner le path
72
	if ($mode !== 'force' and preg_match(_REGEXP_COPIE_LOCALE, $source, $match)) {
73
		$source = substr(_DIR_IMG, strlen(_DIR_RACINE)) . urldecode($match[1]);
74
75
		return @file_exists($source) ? $source : false;
76
	}
77
78
	if (is_null($local)) {
79
		$local = fichier_copie_locale($source);
80 View Code Duplication
	} else {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
81
		if (_DIR_RACINE and strncmp(_DIR_RACINE, $local, strlen(_DIR_RACINE)) == 0) {
82
			$local = substr($local, strlen(_DIR_RACINE));
83
		}
84
	}
85
86
	// si $local = '' c'est un fichier refuse par fichier_copie_locale(),
87
	// par exemple un fichier qui ne figure pas dans nos documents ;
88
	// dans ce cas on n'essaie pas de le telecharger pour ensuite echouer
89
	if (!$local) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $local of type string|null is loosely compared to false; this is ambiguous if the string can be empty. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
90
		return false;
91
	}
92
93
	$localrac = _DIR_RACINE . $local;
94
	$t = ($mode == 'force') ? false : @file_exists($localrac);
95
96
	// test d'existence du fichier
97
	if ($mode == 'test') {
98
		return $t ? $local : '';
99
	}
100
101
	// sinon voir si on doit/peut le telecharger
102
	if ($local == $source or !tester_url_absolue($source)) {
103
		return $local;
104
	}
105
106
	if ($mode == 'modif' or !$t) {
107
		// passer par un fichier temporaire unique pour gerer les echecs en cours de recuperation
108
		// et des eventuelles recuperations concurantes
109
		include_spip('inc/acces');
110
		if (!$taille_max) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $taille_max of type integer|null is loosely compared to false; this is ambiguous if the integer can be zero. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
111
			$taille_max = _COPIE_LOCALE_MAX_SIZE;
112
		}
113
		$res = recuperer_url(
114
			$source,
115
			array('file' => $localrac, 'taille_max' => $taille_max, 'if_modified_since' => $t ? filemtime($localrac) : '')
116
		);
117
		if (!$res or (!$res['length'] and $res['status'] != 304)) {
118
			spip_log("copie_locale : Echec recuperation $source sur $localrac status : " . $res['status'], _LOG_INFO_IMPORTANTE);
119
		}
120
		if (!$res['length']) {
121
			// si $t c'est sans doute juste un not-modified-since
122
			return $t ? $local : false;
123
		}
124
		spip_log("copie_locale : recuperation $source sur $localrac taille " . $res['length'] . ' OK');
125
126
		// pour une eventuelle indexation
127
		pipeline(
128
			'post_edition',
129
			array(
130
				'args' => array(
131
					'operation' => 'copie_locale',
132
					'source' => $source,
133
					'fichier' => $local,
134
					'http_res' => $res['length'],
135
				),
136
				'data' => null
137
			)
138
		);
139
	}
140
141
	return $local;
142
}
143
144
/**
145
 * Valider qu'une URL d'un document distant est bien distante
146
 * et pas une url localhost qui permet d'avoir des infos sur le serveur
147
 * inspiree de https://core.trac.wordpress.org/browser/trunk/src/wp-includes/http.php?rev=36435#L500
148
 * 
149
 * @param string $url
150
 * @param array $known_hosts
151
 *   url/hosts externes connus et acceptes
152
 * @return false|string 
153
 *   url ou false en cas d'echec
154
 */
155
function valider_url_distante($url, $known_hosts = array()) {
156
	if (!function_exists('protocole_verifier')){
157
		include_spip('inc/filtres_mini');
158
	}
159
160
	if (!protocole_verifier($url, array('http', 'https'))) {
161
		return false;
162
	}
163
	
164
	$parsed_url = parse_url($url);
165
	if (!$parsed_url or empty($parsed_url['host']) ) {
166
		return false;
167
	}
168
169
	if (isset($parsed_url['user']) or isset($parsed_url['pass'])) {
170
		return false;
171
	}
172
173
	if (false !== strpbrk($parsed_url['host'], ':#?[]')) {
174
		return false;
175
	}
176
177
	if (!is_array($known_hosts)) {
178
		$known_hosts = array($known_hosts);
179
	}
180
	$known_hosts[] = $GLOBALS['meta']['adresse_site'];
181
	$known_hosts[] = url_de_base();
182
	$known_hosts = pipeline('declarer_hosts_distants', $known_hosts);
183
184
	$is_known_host = false;
185
	foreach ($known_hosts as $known_host) {
186
		$parse_known = parse_url($known_host);
187
		if ($parse_known
188
		  and strtolower($parse_known['host']) === strtolower($parsed_url['host'])) {
189
			$is_known_host = true;
190
			break;
191
		}
192
	}
193
194
	if (!$is_known_host) {
195
		$host = trim($parsed_url['host'], '.');
196
		if (preg_match('#^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$#', $host)) {
197
			$ip = $host;
198
		} else {
199
			$ip = gethostbyname($host);
200
			if ($ip === $host) {
201
				// Error condition for gethostbyname()
202
				$ip = false;
203
			}
204
		}
205
		if ($ip) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $ip of type false|string is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
206
			$parts = array_map('intval', explode( '.', $ip ));
207
			if (127 === $parts[0] or 10 === $parts[0] or 0 === $parts[0]
208
			  or ( 172 === $parts[0] and 16 <= $parts[1] and 31 >= $parts[1] )
209
			  or ( 192 === $parts[0] && 168 === $parts[1] )
210
			) {
211
				return false;
212
			}
213
		}
214
	}
215
216
	if (empty($parsed_url['port'])) {
217
		return $url;
218
	}
219
220
	$port = $parsed_url['port'];
221
	if ($port === 80  or $port === 443  or $port === 8080) {
0 ignored issues
show
Unused Code Bug introduced by
The strict comparison === seems to always evaluate to false as the types of $port (string) and 80 (integer) can never be identical. Maybe you want to use a loose comparison == instead?
Loading history...
222
		return $url;
223
	}
224
225
	if ($is_known_host) {
226
		foreach ($known_hosts as $known_host) {
227
			$parse_known = parse_url($known_host);
228
			if ($parse_known
229
				and !empty($parse_known['port'])
230
			  and strtolower($parse_known['host']) === strtolower($parsed_url['host'])
231
			  and $parse_known['port'] == $port) {
232
				return $url;
233
			}
234
		}
235
	}
236
237
	return false;
238
}
239
240
/**
241
 * Preparer les donnes pour un POST
242
 * si $donnees est une chaine
243
 *  - charge a l'envoyeur de la boundariser, de gerer le Content-Type etc...
244
 *  - on traite les retour ligne pour les mettre au bon format
245
 *  - on decoupe en entete/corps (separes par ligne vide)
246
 * si $donnees est un tableau
247
 *  - structuration en chaine avec boundary si necessaire ou fournie et bon Content-Type
248
 *
249
 * @param string|array $donnees
250
 * @param string $boundary
251
 * @return array
252
 *   entete,corps
253
 */
254
function prepare_donnees_post($donnees, $boundary = '') {
255
256
	// permettre a la fonction qui a demande le post de formater elle meme ses donnees
257
	// pour un appel soap par exemple
258
	// l'entete est separe des donnees par un double retour a la ligne
259
	// on s'occupe ici de passer tous les retours lignes (\r\n, \r ou \n) en \r\n
260
	if (is_string($donnees) && strlen($donnees)) {
261
		$entete = '';
262
		// on repasse tous les \r\n et \r en simples \n
263
		$donnees = str_replace("\r\n", "\n", $donnees);
264
		$donnees = str_replace("\r", "\n", $donnees);
265
		// un double retour a la ligne signifie la fin de l'entete et le debut des donnees
266
		$p = strpos($donnees, "\n\n");
267
		if ($p !== false) {
268
			$entete = str_replace("\n", "\r\n", substr($donnees, 0, $p + 1));
269
			$donnees = substr($donnees, $p + 2);
270
		}
271
		$chaine = str_replace("\n", "\r\n", $donnees);
272
	} else {
273
		/* boundary automatique */
274
		// Si on a plus de 500 octects de donnees, on "boundarise"
275
		if ($boundary === '') {
276
			$taille = 0;
277
			foreach ($donnees as $cle => $valeur) {
0 ignored issues
show
Bug introduced by
The expression $donnees of type array|string is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
278
				if (is_array($valeur)) {
279
					foreach ($valeur as $val2) {
280
						$taille += strlen($val2);
281
					}
282
				} else {
283
					// faut-il utiliser spip_strlen() dans inc/charsets ?
284
					$taille += strlen($valeur);
285
				}
286
			}
287
			if ($taille > 500) {
288
				$boundary = substr(md5(rand() . 'spip'), 0, 8);
289
			}
290
		}
291
292
		if (is_string($boundary) and strlen($boundary)) {
293
			// fabrique une chaine HTTP pour un POST avec boundary
294
			$entete = "Content-Type: multipart/form-data; boundary=$boundary\r\n";
295
			$chaine = '';
296
			if (is_array($donnees)) {
297
				foreach ($donnees as $cle => $valeur) {
298
					if (is_array($valeur)) {
299
						foreach ($valeur as $val2) {
300
							$chaine .= "\r\n--$boundary\r\n";
301
							$chaine .= "Content-Disposition: form-data; name=\"{$cle}[]\"\r\n";
302
							$chaine .= "\r\n";
303
							$chaine .= $val2;
304
						}
305
					} else {
306
						$chaine .= "\r\n--$boundary\r\n";
307
						$chaine .= "Content-Disposition: form-data; name=\"$cle\"\r\n";
308
						$chaine .= "\r\n";
309
						$chaine .= $valeur;
310
					}
311
				}
312
				$chaine .= "\r\n--$boundary\r\n";
313
			}
314
		} else {
315
			// fabrique une chaine HTTP simple pour un POST
316
			$entete = 'Content-Type: application/x-www-form-urlencoded' . "\r\n";
317
			$chaine = array();
318
			if (is_array($donnees)) {
319
				foreach ($donnees as $cle => $valeur) {
320
					if (is_array($valeur)) {
321
						foreach ($valeur as $val2) {
322
							$chaine[] = rawurlencode($cle) . '[]=' . rawurlencode($val2);
323
						}
324
					} else {
325
						$chaine[] = rawurlencode($cle) . '=' . rawurlencode($valeur);
326
					}
327
				}
328
				$chaine = implode('&', $chaine);
329
			} else {
330
				$chaine = $donnees;
331
			}
332
		}
333
	}
334
335
	return array($entete, $chaine);
336
}
337
338
/**
339
 * Convertir une URL dont le host est en utf8 en ascii
340
 * Utilise la librairie https://github.com/phlylabs/idna-convert/tree/v0.9.1
341
 * dans sa derniere version compatible toutes version PHP 5
342
 * La fonction PHP idn_to_ascii depend d'un package php5-intl et est rarement disponible
343
 *
344
 * @param string $url_idn
345
 * @return array|string
346
 */
347
function url_to_ascii($url_idn) {
348
349
	if ($parts = parse_url($url_idn)) {
350
		$host = $parts['host'];
351
		if (!preg_match(',^[a-z0-9_\.\-]+$,i', $host)) {
352
			include_spip('inc/idna_convert.class');
353
			$IDN = new idna_convert();
354
			$host_ascii = $IDN->encode($host);
355
			$url_idn = explode($host, $url_idn, 2);
356
			$url_idn = implode($host_ascii, $url_idn);
357
		}
358
	}
359
360
	return $url_idn;
361
}
362
363
/**
364
 * Récupère le contenu d'une URL
365
 * au besoin encode son contenu dans le charset local
366
 *
367
 * @uses init_http()
368
 * @uses recuperer_entetes()
369
 * @uses recuperer_body()
370
 * @uses transcoder_page()
371
 *
372
 * @param string $url
373
 * @param array $options
374
 *   bool transcoder : true si on veut transcoder la page dans le charset du site
375
 *   string methode : Type de requête HTTP à faire (HEAD, GET ou POST)
376
 *   int taille_max : Arrêter le contenu au-delà (0 = seulement les entetes ==> requête HEAD). Par defaut taille_max = 1Mo ou 16Mo si copie dans un fichier
377
 *   string|array datas : Pour envoyer des donnees (array) et/ou entetes (string) (force la methode POST si donnees non vide)
378
 *   string boundary : boundary pour formater les datas au format array
379
 *   bool refuser_gz : Pour forcer le refus de la compression (cas des serveurs orthographiques)
380
 *   int if_modified_since : Un timestamp unix pour arrêter la récuperation si la page distante n'a pas été modifiée depuis une date donnée
381
 *   string uri_referer : Pour préciser un référer différent
382
 *   string file : nom du fichier dans lequel copier le contenu
383
 *   int follow_location : nombre de redirections a suivre (0 pour ne rien suivre)
384
 *   string version_http : version du protocole HTTP a utiliser (par defaut defini par la constante _INC_DISTANT_VERSION_HTTP)
385
 * @return array|bool
386
 *   false si echec
387
 *   array sinon :
388
 *     int status : le status de la page
389
 *     string headers : les entetes de la page
390
 *     string page : le contenu de la page (vide si copie dans un fichier)
391
 *     int last_modified : timestamp de derniere modification
392
 *     string location : url de redirection envoyee par la page
393
 *     string url : url reelle de la page recuperee
394
 *     int length : taille du contenu ou du fichier
395
 *
396
 *     string file : nom du fichier si enregistre dans un fichier
397
 */
398
function recuperer_url($url, $options = array()) {
399
	$default = array(
400
		'transcoder' => false,
401
		'methode' => 'GET',
402
		'taille_max' => null,
403
		'datas' => '',
404
		'boundary' => '',
405
		'refuser_gz' => false,
406
		'if_modified_since' => '',
407
		'uri_referer' => '',
408
		'file' => '',
409
		'follow_location' => 10,
410
		'version_http' => _INC_DISTANT_VERSION_HTTP,
411
	);
412
	$options = array_merge($default, $options);
413
	// copier directement dans un fichier ?
414
	$copy = $options['file'];
415
416
	if ($options['methode'] == 'HEAD') {
417
		$options['taille_max'] = 0;
418
	}
419
	if (is_null($options['taille_max'])) {
420
		$options['taille_max'] = $copy ? _COPIE_LOCALE_MAX_SIZE : _INC_DISTANT_MAX_SIZE;
421
	}
422
423
	if (!empty($options['datas'])) {
424
		list($head, $postdata) = prepare_donnees_post($options['datas'], $options['boundary']);
425
		if (stripos($head, 'Content-Length:') === false) {
426
			$head .= 'Content-Length: ' . strlen($postdata);
427
		}
428
		$options['datas'] = $head . "\r\n\r\n" . $postdata;
429
		if (strlen($postdata)) {
430
			$options['methode'] = 'POST';
431
		}
432
	}
433
434
	// Accepter les URLs au format feed:// ou qui ont oublie le http:// ou les urls relatives au protocole
435
	$url = preg_replace(',^feed://,i', 'http://', $url);
436
	if (!tester_url_absolue($url)) {
437
		$url = 'http://' . $url;
438
	} elseif (strncmp($url, '//', 2) == 0) {
439
		$url = 'http:' . $url;
440
	}
441
442
	$url = url_to_ascii($url);
443
444
	$result = array(
445
		'status' => 0,
446
		'headers' => '',
447
		'page' => '',
448
		'length' => 0,
449
		'last_modified' => '',
450
		'location' => '',
451
		'url' => $url
452
	);
453
454
	// si on ecrit directement dans un fichier, pour ne pas manipuler en memoire refuser gz
455
	$refuser_gz = (($options['refuser_gz'] or $copy) ? true : false);
456
457
	// ouvrir la connexion et envoyer la requete et ses en-tetes
458
	list($handle, $fopen) = init_http(
459
		$options['methode'],
460
		$url,
0 ignored issues
show
Bug introduced by
It seems like $url can also be of type array; however, init_http() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
461
		$refuser_gz,
462
		$options['uri_referer'],
463
		$options['datas'],
464
		$options['version_http'],
465
		$options['if_modified_since']
466
	);
467
	if (!$handle) {
468
		spip_log("ECHEC init_http $url");
469
470
		return false;
471
	}
472
473
	// Sauf en fopen, envoyer le flux d'entree
474
	// et recuperer les en-tetes de reponses
475
	if (!$fopen) {
476
		$res = recuperer_entetes_complets($handle, $options['if_modified_since']);
477
		if (!$res) {
478
			fclose($handle);
479
			$t = @parse_url($url);
480
			$host = $t['host'];
481
			// Chinoisierie inexplicable pour contrer
482
			// les actions liberticides de l'empire du milieu
483
			if (!need_proxy($host)
484
				and $res = @file_get_contents($url)
485
			) {
486
				$result['length'] = strlen($res);
487
				if ($copy) {
488
					ecrire_fichier($copy, $res);
489
					$result['file'] = $copy;
490
				} else {
491
					$result['page'] = $res;
492
				}
493
				$res = array(
494
					'status' => 200,
495
				);
496
			} else {
497
				return false;
498
			}
499
		} elseif ($res['location'] and $options['follow_location']) {
500
			$options['follow_location']--;
501
			fclose($handle);
502
			include_spip('inc/filtres');
503
			$url = suivre_lien($url, $res['location']);
0 ignored issues
show
Bug introduced by
It seems like $url can also be of type array; however, suivre_lien() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
504
			spip_log("recuperer_url recommence sur $url");
505
506
			return recuperer_url($url, $options);
507
		} elseif ($res['status'] !== 200) {
508
			spip_log('HTTP status ' . $res['status'] . " pour $url");
509
		}
510
		$result['status'] = $res['status'];
511
		if (isset($res['headers'])) {
512
			$result['headers'] = $res['headers'];
513
		}
514
		if (isset($res['last_modified'])) {
515
			$result['last_modified'] = $res['last_modified'];
516
		}
517
		if (isset($res['location'])) {
518
			$result['location'] = $res['location'];
519
		}
520
	}
521
522
	// on ne veut que les entetes
523
	if (!$options['taille_max'] or $options['methode'] == 'HEAD' or $result['status'] == '304') {
524
		return $result;
525
	}
526
527
528
	// s'il faut deballer, le faire via un fichier temporaire
529
	// sinon la memoire explose pour les gros flux
530
531
	$gz = false;
532
	if (preg_match(",\bContent-Encoding: .*gzip,is", $result['headers'])) {
533
		$gz = (_DIR_TMP . md5(uniqid(mt_rand())) . '.tmp.gz');
534
	}
535
536
	// si on a pas deja recuperer le contenu par une methode detournee
537
	if (!$result['length']) {
538
		$res = recuperer_body($handle, $options['taille_max'], $gz ? $gz : $copy);
539
		fclose($handle);
540
		if ($copy) {
541
			$result['length'] = $res;
542
			$result['file'] = $copy;
543
		} elseif ($res) {
544
			$result['page'] = &$res;
545
			$result['length'] = strlen($result['page']);
546
		}
547
	}
548
	if (!$result['page']) {
549
		return $result;
550
	}
551
552
	// Decompresser au besoin
553
	if ($gz) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $gz of type string|false is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
554
		$result['page'] = implode('', gzfile($gz));
555
		supprimer_fichier($gz);
556
	}
557
558
	// Faut-il l'importer dans notre charset local ?
559
	if ($options['transcoder']) {
560
		include_spip('inc/charsets');
561
		$result['page'] = transcoder_page($result['page'], $result['headers']);
562
	}
563
564
	return $result;
565
}
566
567
/**
568
 * Récuperer une URL si on l'a pas déjà dans un cache fichier
569
 *
570
 * Le délai de cache est fourni par l'option `delai_cache`
571
 * Les autres options et le format de retour sont identiques à la fonction `recuperer_url`
572
 * @uses recuperer_url()
573
 *
574
 * @param string $url
575
 * @param array $options
576
 *   int delai_cache : anciennete acceptable pour le contenu (en seconde)
577
 * @return array|bool|mixed
578
 */
579
function recuperer_url_cache($url, $options = array()) {
580
	if (!defined('_DELAI_RECUPERER_URL_CACHE')) {
581
		define('_DELAI_RECUPERER_URL_CACHE', 3600);
582
	}
583
	$default = array(
584
		'transcoder' => false,
585
		'methode' => 'GET',
586
		'taille_max' => null,
587
		'datas' => '',
588
		'boundary' => '',
589
		'refuser_gz' => false,
590
		'if_modified_since' => '',
591
		'uri_referer' => '',
592
		'file' => '',
593
		'follow_location' => 10,
594
		'version_http' => _INC_DISTANT_VERSION_HTTP,
595
		'delai_cache' => _DELAI_RECUPERER_URL_CACHE,
596
	);
597
	$options = array_merge($default, $options);
598
599
	// cas ou il n'est pas possible de cacher
600
	if (!empty($options['data']) or $options['methode'] == 'POST') {
601
		return recuperer_url($url, $options);
602
	}
603
604
	// ne pas tenter plusieurs fois la meme url en erreur (non cachee donc)
605
	static $errors = array();
606
	if (isset($errors[$url])) {
607
		return $errors[$url];
608
	}
609
610
	$sig = $options;
611
	unset($sig['if_modified_since']);
612
	unset($sig['delai_cache']);
613
	$sig['url'] = $url;
614
615
	$dir = sous_repertoire(_DIR_CACHE, 'curl');
616
	$cache = md5(serialize($sig)) . '-' . substr(preg_replace(',\W+,', '_', $url), 0, 80);
617
	$sub = sous_repertoire($dir, substr($cache, 0, 2));
618
	$cache = "$sub$cache";
619
620
	$res = false;
621
	$is_cached = file_exists($cache);
622
	if ($is_cached
623
		and (filemtime($cache) > $_SERVER['REQUEST_TIME'] - $options['delai_cache'])
624
	) {
625
		lire_fichier($cache, $res);
626
		if ($res = unserialize($res)) {
627
			// mettre le last_modified et le status=304 ?
628
		}
629
	}
630
	if (!$res) {
631
		$res = recuperer_url($url, $options);
632
		// ne pas recharger cette url non cachee dans le meme hit puisque non disponible
633
		if (!$res) {
634
			if ($is_cached) {
635
				// on a pas reussi a recuperer mais on avait un cache : l'utiliser
636
				lire_fichier($cache, $res);
637
				$res = unserialize($res);
638
			}
639
640
			return $errors[$url] = $res;
641
		}
642
		ecrire_fichier($cache, serialize($res));
643
	}
644
645
	return $res;
646
}
647
648
/**
649
 * Obsolète : Récupère une page sur le net et au besoin l'encode dans le charset local
650
 *
651
 * Gère les redirections de page (301) sur l'URL demandée (maximum 10 redirections)
652
 *
653
 * @deprecated
654
 * @uses recuperer_url()
655
 *
656
 * @param string $url
657
 *     URL de la page à récupérer
658
 * @param bool|string $trans
659
 *     - chaîne longue : c'est un nom de fichier (nom pour sa copie locale)
660
 *     - true : demande d'encodage/charset
661
 *     - null : ne retourner que les headers
662
 * @param bool $get_headers
663
 *     Si on veut récupérer les entêtes
664
 * @param int|null $taille_max
665
 *     Arrêter le contenu au-delà (0 = seulement les entetes ==> requête HEAD).
666
 *     Par defaut taille_max = 1Mo.
667
 * @param string|array $datas
668
 *     Pour faire un POST de données
669
 * @param string $boundary
670
 *     Pour forcer l'envoi par cette méthode
671
 * @param bool $refuser_gz
672
 *     Pour forcer le refus de la compression (cas des serveurs orthographiques)
673
 * @param string $date_verif
674
 *     Un timestamp unix pour arrêter la récuperation si la page distante
675
 *     n'a pas été modifiée depuis une date donnée
676
 * @param string $uri_referer
677
 *     Pour préciser un référer différent
678
 * @return string|bool
679
 *     - Code de la page obtenue (avec ou sans entête)
680
 *     - false si la page n'a pu être récupérée (status different de 200)
681
 **/
682
function recuperer_page(
683
	$url,
684
	$trans = false,
685
	$get_headers = false,
686
	$taille_max = null,
687
	$datas = '',
688
	$boundary = '',
689
	$refuser_gz = false,
690
	$date_verif = '',
691
	$uri_referer = ''
692
) {
693
	// $copy = copier le fichier ?
694
	$copy = (is_string($trans) and strlen($trans) > 5); // eviter "false" :-)
695
696
	if (!is_null($taille_max) and ($taille_max == 0)) {
697
		$get = 'HEAD';
698
	} else {
699
		$get = 'GET';
700
	}
701
702
	$options = array(
703
		'transcoder' => $trans === true,
704
		'methode' => $get,
705
		'datas' => $datas,
706
		'boundary' => $boundary,
707
		'refuser_gz' => $refuser_gz,
708
		'if_modified_since' => $date_verif,
709
		'uri_referer' => $uri_referer,
710
		'file' => $copy ? $trans : '',
711
		'follow_location' => 10,
712
	);
713
	if (!is_null($taille_max)) {
714
		$options['taille_max'] = $taille_max;
715
	}
716
	// dix tentatives maximum en cas d'entetes 301...
717
	$res = recuperer_url($url, $options);
718
	if (!$res) {
719
		return false;
720
	}
721
	if ($res['status'] !== 200) {
722
		return false;
723
	}
724
	if ($get_headers) {
725
		return $res['headers'] . "\n" . $res['page'];
726
	}
727
728
	return $res['page'];
729
}
730
731
732
/**
733
 * Obsolete Récupère une page sur le net et au besoin l'encode dans le charset local
734
 *
735
 * @deprecated
736
 *
737
 * @uses recuperer_url()
738
 *
739
 * @param string $url
740
 *     URL de la page à récupérer
741
 * @param bool|null|string $trans
742
 *     - chaîne longue : c'est un nom de fichier (nom pour sa copie locale)
743
 *     - true : demande d'encodage/charset
744
 *     - null : ne retourner que les headers
745
 * @param string $get
746
 *     Type de requête HTTP à faire (HEAD, GET ou POST)
747
 * @param int|bool $taille_max
0 ignored issues
show
Documentation introduced by
Consider making the type for parameter $taille_max a bit more specific; maybe use integer.
Loading history...
748
 *     Arrêter le contenu au-delà (0 = seulement les entetes ==> requête HEAD).
749
 *     Par defaut taille_max = 1Mo.
750
 * @param string|array $datas
751
 *     Pour faire un POST de données
752
 * @param bool $refuser_gz
753
 *     Pour forcer le refus de la compression (cas des serveurs orthographiques)
754
 * @param string $date_verif
755
 *     Un timestamp unix pour arrêter la récuperation si la page distante
756
 *     n'a pas été modifiée depuis une date donnée
757
 * @param string $uri_referer
758
 *     Pour préciser un référer différent
759
 * @return string|array|bool
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use false|array.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
760
 *     - Retourne l'URL en cas de 301,
761
 *     - Un tableau (entête, corps) si ok,
762
 *     - false sinon
763
 **/
764
function recuperer_lapage(
765
	$url,
766
	$trans = false,
767
	$get = 'GET',
768
	$taille_max = 1048576,
769
	$datas = '',
770
	$refuser_gz = false,
771
	$date_verif = '',
772
	$uri_referer = ''
773
) {
774
	// $copy = copier le fichier ?
775
	$copy = (is_string($trans) and strlen($trans) > 5); // eviter "false" :-)
776
777
	// si on ecrit directement dans un fichier, pour ne pas manipuler
778
	// en memoire refuser gz
779
	if ($copy) {
780
		$refuser_gz = true;
781
	}
782
783
	$options = array(
784
		'transcoder' => $trans === true,
785
		'methode' => $get,
786
		'datas' => $datas,
787
		'refuser_gz' => $refuser_gz,
788
		'if_modified_since' => $date_verif,
789
		'uri_referer' => $uri_referer,
790
		'file' => $copy ? $trans : '',
791
		'follow_location' => false,
792
	);
793
	if (!is_null($taille_max)) {
794
		$options['taille_max'] = $taille_max;
795
	}
796
	// dix tentatives maximum en cas d'entetes 301...
797
	$res = recuperer_url($url, $options);
798
799
	if (!$res) {
800
		return false;
801
	}
802
	if ($res['status'] !== 200) {
803
		return false;
804
	}
805
806
	return array($res['headers'], $res['page']);
807
}
808
809
/**
810
 * Recuperer le contenu sur lequel pointe la resource passee en argument
811
 * $taille_max permet de tronquer
812
 * de l'url dont on a deja recupere les en-tetes
813
 *
814
 * @param resource $handle
815
 * @param int $taille_max
816
 * @param string $fichier
817
 *   fichier dans lequel copier le contenu de la resource
818
 * @return bool|int|string
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use integer|false|string.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
819
 *   bool false si echec
820
 *   int taille du fichier si argument fichier fourni
821
 *   string contenu de la resource
822
 */
823
function recuperer_body($handle, $taille_max = _INC_DISTANT_MAX_SIZE, $fichier = '') {
824
	$taille = 0;
825
	$result = '';
826
	$fp = false;
827
	if ($fichier) {
828
		include_spip('inc/acces');
829
		$tmpfile = "$fichier." . creer_uniqid() . '.tmp';
830
		$fp = spip_fopen_lock($tmpfile, 'w', LOCK_EX);
831
		if (!$fp and file_exists($fichier)) {
832
			return filesize($fichier);
833
		}
834
		if (!$fp) {
835
			return false;
836
		}
837
		$result = 0; // on renvoie la taille du fichier
838
	}
839
	while (!feof($handle) and $taille < $taille_max) {
840
		$res = fread($handle, 16384);
841
		$taille += strlen($res);
842
		if ($fp) {
843
			fwrite($fp, $res);
844
			$result = $taille;
845
		} else {
846
			$result .= $res;
847
		}
848
	}
849
	if ($fp) {
850
		spip_fclose_unlock($fp);
851
		spip_unlink($fichier);
852
		@rename($tmpfile, $fichier);
0 ignored issues
show
Bug introduced by
The variable $tmpfile does not seem to be defined for all execution paths leading up to this point.

If you define a variable conditionally, it can happen that it is not defined for all execution paths.

Let’s take a look at an example:

function myFunction($a) {
    switch ($a) {
        case 'foo':
            $x = 1;
            break;

        case 'bar':
            $x = 2;
            break;
    }

    // $x is potentially undefined here.
    echo $x;
}

In the above example, the variable $x is defined if you pass “foo” or “bar” as argument for $a. However, since the switch statement has no default case statement, if you pass any other value, the variable $x would be undefined.

Available Fixes

  1. Check for existence of the variable explicitly:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        if (isset($x)) { // Make sure it's always set.
            echo $x;
        }
    }
    
  2. Define a default value for the variable:

    function myFunction($a) {
        $x = ''; // Set a default which gets overridden for certain paths.
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        echo $x;
    }
    
  3. Add a value for the missing path:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
    
            // We add support for the missing case.
            default:
                $x = '';
                break;
        }
    
        echo $x;
    }
    
Loading history...
Security Best Practice introduced by
It seems like you do not handle an error condition here. This can introduce security issues, and is generally not recommended.

If you suppress an error, we recommend checking for the error condition explicitly:

// For example instead of
@mkdir($dir);

// Better use
if (@mkdir($dir) === false) {
    throw new \RuntimeException('The directory '.$dir.' could not be created.');
}
Loading history...
853
		if (!file_exists($fichier)) {
854
			return false;
855
		}
856
	}
857
858
	return $result;
859
}
860
861
/**
862
 * Lit les entetes de reponse HTTP sur la socket $handle
863
 * et retourne
864
 * false en cas d'echec,
865
 * un tableau associatif en cas de succes, contenant :
866
 * - le status
867
 * - le tableau complet des headers
868
 * - la date de derniere modif si connue
869
 * - l'url de redirection si specifiee
870
 *
871
 * @param resource $handle
872
 * @param int|bool $if_modified_since
873
 * @return bool|array
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use false|array.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
874
 *   int status
875
 *   string headers
876
 *   int last_modified
877
 *   string location
878
 */
879
function recuperer_entetes_complets($handle, $if_modified_since = false) {
880
	$result = array('status' => 0, 'headers' => array(), 'last_modified' => 0, 'location' => '');
881
882
	$s = @trim(fgets($handle, 16384));
883
	if (!preg_match(',^HTTP/[0-9]+\.[0-9]+ ([0-9]+),', $s, $r)) {
884
		return false;
885
	}
886
	$result['status'] = intval($r[1]);
887
	while ($s = trim(fgets($handle, 16384))) {
888
		$result['headers'][] = $s . "\n";
889
		preg_match(',^([^:]*): *(.*)$,i', $s, $r);
890
		list(, $d, $v) = $r;
891
		if (strtolower(trim($d)) == 'location' and $result['status'] >= 300 and $result['status'] < 400) {
892
			$result['location'] = $v;
893
		} elseif ($d == 'Last-Modified') {
894
			$result['last_modified'] = strtotime($v);
895
		}
896
	}
897
	if ($if_modified_since
898
		and $result['last_modified']
899
		and $if_modified_since > $result['last_modified']
900
		and $result['status'] == 200
901
	) {
902
		$result['status'] = 304;
903
	}
904
905
	$result['headers'] = implode('', $result['headers']);
906
907
	return $result;
908
}
909
910
/**
911
 * Obsolete : version simplifiee de recuperer_entetes_complets
912
 * Retourne les informations d'entête HTTP d'un socket
913
 *
914
 * Lit les entêtes de reponse HTTP sur la socket $f
915
 *
916
 * @uses recuperer_entetes_complets()
917
 * @deprecated
918
 *
919
 * @param resource $f
920
 *     Socket d'un fichier (issu de fopen)
921
 * @param int|string $date_verif
922
 *     Pour tester une date de dernière modification
923
 * @return string|int|array
924
 *     - la valeur (chaîne) de l'en-tete Location si on l'a trouvée
925
 *     - la valeur (numerique) du statut si different de 200, notamment Not-Modified
926
 *     - le tableau des entetes dans tous les autres cas
927
 **/
928
function recuperer_entetes($f, $date_verif = '') {
929
	//Cas ou la page distante n'a pas bouge depuis
930
	//la derniere visite
931
	$res = recuperer_entetes_complets($f, $date_verif);
932
	if (!$res) {
933
		return false;
934
	}
935
	if ($res['location']) {
936
		return $res['location'];
937
	}
938
	if ($res['status'] != 200) {
939
		return $res['status'];
940
	}
941
942
	return explode("\n", $res['headers']);
943
}
944
945
/**
946
 * Calcule le nom canonique d'une copie local d'un fichier distant
947
 *
948
 * Si on doit conserver une copie locale des fichiers distants, autant que ca
949
 * soit à un endroit canonique
950
 *
951
 * @note
952
 *   Si ca peut être bijectif c'est encore mieux,
953
 *   mais là tout de suite je ne trouve pas l'idee, étant donné les limitations
954
 *   des filesystems
955
 *
956
 * @param string $source
957
 *     URL de la source
958
 * @param string $extension
959
 *     Extension du fichier
960
 * @return string
961
 *     Nom du fichier pour copie locale
962
 **/
963
function nom_fichier_copie_locale($source, $extension) {
964
	include_spip('inc/documents');
965
966
	$d = creer_repertoire_documents('distant'); # IMG/distant/
967
	$d = sous_repertoire($d, $extension); # IMG/distant/pdf/
968
969
	// on se place tout le temps comme si on etait a la racine
970
	if (_DIR_RACINE) {
971
		$d = preg_replace(',^' . preg_quote(_DIR_RACINE) . ',', '', $d);
972
	}
973
974
	$m = md5($source);
975
976
	return $d
977
	. substr(preg_replace(',[^\w-],', '', basename($source)) . '-' . $m, 0, 12)
978
	. substr($m, 0, 4)
979
	. ".$extension";
980
}
981
982
/**
983
 * Donne le nom de la copie locale de la source
984
 *
985
 * Soit obtient l'extension du fichier directement de l'URL de la source,
986
 * soit tente de le calculer.
987
 *
988
 * @uses nom_fichier_copie_locale()
989
 * @uses recuperer_infos_distantes()
990
 *
991
 * @param string $source
992
 *      URL de la source distante
993
 * @return string
0 ignored issues
show
Documentation introduced by
Should the return type not be string|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
994
 *      Nom du fichier calculé
995
 **/
996
function fichier_copie_locale($source) {
997
	// Si c'est deja local pas de souci
998
	if (!tester_url_absolue($source)) {
999
		if (_DIR_RACINE) {
1000
			$source = preg_replace(',^' . preg_quote(_DIR_RACINE) . ',', '', $source);
1001
		}
1002
1003
		return $source;
1004
	}
1005
1006
	// optimisation : on regarde si on peut deviner l'extension dans l'url et si le fichier
1007
	// a deja ete copie en local avec cette extension
1008
	// dans ce cas elle est fiable, pas la peine de requeter en base
1009
	$path_parts = pathinfo($source);
1010
	if (!isset($path_parts['extension'])) {
1011
		$path_parts['extension'] = '';
1012
	}
1013
	$ext = $path_parts ? $path_parts['extension'] : '';
1014
	if ($ext
1015
		and preg_match(',^\w+$,', $ext) // pas de php?truc=1&...
1016
		and $f = nom_fichier_copie_locale($source, $ext)
1017
		and file_exists(_DIR_RACINE . $f)
1018
	) {
1019
		return $f;
1020
	}
1021
1022
1023
	// Si c'est deja dans la table des documents,
1024
	// ramener le nom de sa copie potentielle
1025
	$ext = sql_getfetsel('extension', 'spip_documents', 'fichier=' . sql_quote($source) . " AND distant='oui' AND extension <> ''");
1026
1027
	if ($ext) {
1028
		return nom_fichier_copie_locale($source, $ext);
1029
	}
1030
1031
	// voir si l'extension indiquee dans le nom du fichier est ok
1032
	// et si il n'aurait pas deja ete rapatrie
1033
1034
	$ext = $path_parts ? $path_parts['extension'] : '';
1035
1036
	if ($ext and sql_getfetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($ext))) {
1037
		$f = nom_fichier_copie_locale($source, $ext);
1038
		if (file_exists(_DIR_RACINE . $f)) {
1039
			return $f;
1040
		}
1041
	}
1042
1043
	// Ping  pour voir si son extension est connue et autorisee
1044
	// avec mise en cache du resultat du ping
1045
1046
	$cache = sous_repertoire(_DIR_CACHE, 'rid') . md5($source);
1047
	if (!@file_exists($cache)
1048
		or !$path_parts = @unserialize(spip_file_get_contents($cache))
1049
		or _request('var_mode') == 'recalcul'
1050
	) {
1051
		$path_parts = recuperer_infos_distantes($source, 0, false);
1052
		ecrire_fichier($cache, serialize($path_parts));
1053
	}
1054
	$ext = !empty($path_parts['extension']) ? $path_parts['extension'] : '';
1055
	if ($ext and sql_getfetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($ext))) {
1056
		return nom_fichier_copie_locale($source, $ext);
1057
	}
1058
	spip_log("pas de copie locale pour $source");
1059
}
1060
1061
1062
/**
1063
 * Récupérer les infos d'un document distant, sans trop le télécharger
1064
 *
1065
 * @param string $source
1066
 *     URL de la source
1067
 * @param int $max
1068
 *     Taille maximum du fichier à télécharger
1069
 * @param bool $charger_si_petite_image
1070
 *     Pour télécharger le document s'il est petit
1071
 * @return array
0 ignored issues
show
Documentation introduced by
Should the return type not be false|array? Also, consider making the array more specific, something like array<String>, or String[].

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

If the return type contains the type array, this check recommends the use of a more specific type like String[] or array<String>.

Loading history...
1072
 *     Couples des informations obtenues parmis :
1073
 *
1074
 *     - 'body' = chaine
1075
 *     - 'type_image' = booleen
1076
 *     - 'titre' = chaine
1077
 *     - 'largeur' = intval
1078
 *     - 'hauteur' = intval
1079
 *     - 'taille' = intval
1080
 *     - 'extension' = chaine
1081
 *     - 'fichier' = chaine
1082
 *     - 'mime_type' = chaine
1083
 **/
1084
function recuperer_infos_distantes($source, $max = 0, $charger_si_petite_image = true) {
1085
1086
	// pas la peine de perdre son temps
1087
	if (!tester_url_absolue($source)) {
1088
		return false;
1089
	}
1090
1091
	# charger les alias des types mime
1092
	include_spip('base/typedoc');
1093
1094
	$a = array();
1095
	$mime_type = '';
1096
	// On va directement charger le debut des images et des fichiers html,
1097
	// de maniere a attrapper le maximum d'infos (titre, taille, etc). Si
1098
	// ca echoue l'utilisateur devra les entrer...
1099
	if ($headers = recuperer_page($source, false, true, $max, '', '', true)) {
0 ignored issues
show
Deprecated Code introduced by
The function recuperer_page() has been deprecated.

This function has been deprecated.

Loading history...
1100
		list($headers, $a['body']) = preg_split(',\n\n,', $headers, 2);
1101
1102
		if (preg_match(",\nContent-Type: *([^[:space:];]*),i", "\n$headers", $regs)) {
1103
			$mime_type = (trim($regs[1]));
1104
		} else {
1105
			$mime_type = '';
1106
		} // inconnu
1107
1108
		// Appliquer les alias
1109
		while (isset($GLOBALS['mime_alias'][$mime_type])) {
1110
			$mime_type = $GLOBALS['mime_alias'][$mime_type];
1111
		}
1112
1113
		// Si on a un mime-type insignifiant
1114
		// text/plain,application/octet-stream ou vide
1115
		// c'est peut-etre que le serveur ne sait pas
1116
		// ce qu'il sert ; on va tenter de detecter via l'extension de l'url
1117
		// ou le Content-Disposition: attachment; filename=...
1118
		$t = null;
1119
		if (in_array($mime_type, array('text/plain', '', 'application/octet-stream'))) {
1120
			if (!$t
1121
				and preg_match(',\.([a-z0-9]+)(\?.*)?$,i', $source, $rext)
1122
			) {
1123
				$t = sql_fetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($rext[1], '', 'text'));
1124
			}
1125 View Code Duplication
			if (!$t
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1126
				and preg_match(',^Content-Disposition:\s*attachment;\s*filename=(.*)$,Uims', $headers, $m)
1127
				and preg_match(',\.([a-z0-9]+)(\?.*)?$,i', $m[1], $rext)
1128
			) {
1129
				$t = sql_fetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($rext[1], '', 'text'));
1130
			}
1131
		}
1132
1133
		// Autre mime/type (ou text/plain avec fichier d'extension inconnue)
1134
		if (!$t) {
1135
			$t = sql_fetsel('extension', 'spip_types_documents', 'mime_type=' . sql_quote($mime_type));
1136
		}
1137
1138
		// Toujours rien ? (ex: audio/x-ogg au lieu de application/ogg)
1139
		// On essaie de nouveau avec l'extension
1140 View Code Duplication
		if (!$t
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1141
			and $mime_type != 'text/plain'
1142
			and preg_match(',\.([a-z0-9]+)(\?.*)?$,i', $source, $rext)
1143
		) {
1144
			# eviter xxx.3 => 3gp (> SPIP 3)
1145
			$t = sql_fetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($rext[1], '', 'text'));
1146
		}
1147
1148
		if ($t) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $t of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
1149
			spip_log("mime-type $mime_type ok, extension " . $t['extension']);
1150
			$a['extension'] = $t['extension'];
1151
		} else {
1152
			# par defaut on retombe sur '.bin' si c'est autorise
1153
			spip_log("mime-type $mime_type inconnu");
1154
			$t = sql_fetsel('extension', 'spip_types_documents', "extension='bin'");
1155
			if (!$t) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $t of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
1156
				return false;
1157
			}
1158
			$a['extension'] = $t['extension'];
1159
		}
1160
1161
		if (preg_match(",\nContent-Length: *([^[:space:]]*),i", "\n$headers", $regs)) {
1162
			$a['taille'] = intval($regs[1]);
1163
		}
1164
	}
1165
1166
	// Echec avec HEAD, on tente avec GET
1167
	if (!$a and !$max) {
1168
		spip_log("tenter GET $source");
1169
		$a = recuperer_infos_distantes($source, _INC_DISTANT_MAX_SIZE);
1170
	}
1171
1172
	// si on a rien trouve pas la peine d'insister
1173
	if (!$a) {
1174
		return false;
1175
	}
1176
1177
	// S'il s'agit d'une image pas trop grosse ou d'un fichier html, on va aller
1178
	// recharger le document en GET et recuperer des donnees supplementaires...
1179
	if (preg_match(',^image/(jpeg|gif|png|swf),', $mime_type)) {
1180
		if ($max == 0
1181
			and (empty($a['taille']) or $a['taille'] < _INC_DISTANT_MAX_SIZE)
1182
			and isset($GLOBALS['meta']['formats_graphiques'])
1183
			and (strpos($GLOBALS['meta']['formats_graphiques'], $a['extension']) !== false)
1184
			and $charger_si_petite_image
1185
		) {
1186
			$a = recuperer_infos_distantes($source, _INC_DISTANT_MAX_SIZE);
1187
		} else {
1188
			if ($a['body']) {
1189
				$a['fichier'] = _DIR_RACINE . nom_fichier_copie_locale($source, $a['extension']);
1190
				ecrire_fichier($a['fichier'], $a['body']);
1191
				$size_image = @getimagesize($a['fichier']);
1192
				$a['largeur'] = intval($size_image[0]);
1193
				$a['hauteur'] = intval($size_image[1]);
1194
				$a['type_image'] = true;
1195
			}
1196
		}
1197
	}
1198
1199
	// Fichier swf, si on n'a pas la taille, on va mettre 425x350 par defaut
1200
	// ce sera mieux que 0x0
1201
	if ($a and isset($a['extension']) and $a['extension'] == 'swf'
1202
		and empty($a['largeur'])
1203
	) {
1204
		$a['largeur'] = 425;
1205
		$a['hauteur'] = 350;
1206
	}
1207
1208
	if ($mime_type == 'text/html') {
1209
		include_spip('inc/filtres');
1210
		$page = recuperer_page($source, true, false, _INC_DISTANT_MAX_SIZE);
0 ignored issues
show
Deprecated Code introduced by
The function recuperer_page() has been deprecated.

This function has been deprecated.

Loading history...
1211
		if (preg_match(',<title>(.*?)</title>,ims', $page, $regs)) {
1212
			$a['titre'] = corriger_caracteres(trim($regs[1]));
1213
		}
1214
		if (!isset($a['taille']) or !$a['taille']) {
1215
			$a['taille'] = strlen($page); # a peu pres
1216
		}
1217
	}
1218
	$a['mime_type'] = $mime_type;
1219
1220
	return $a;
1221
}
1222
1223
1224
/**
1225
 * Tester si un host peut etre recuperer directement ou doit passer par un proxy
1226
 *
1227
 * On peut passer en parametre le proxy et la liste des host exclus,
1228
 * pour les besoins des tests, lors de la configuration
1229
 *
1230
 * @param string $host
1231
 * @param string $http_proxy
0 ignored issues
show
Documentation introduced by
Should the type for parameter $http_proxy not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
1232
 * @param string $http_noproxy
0 ignored issues
show
Documentation introduced by
Should the type for parameter $http_noproxy not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
1233
 * @return string
1234
 */
1235
function need_proxy($host, $http_proxy = null, $http_noproxy = null) {
1236
	if (is_null($http_proxy)) {
1237
		$http_proxy = isset($GLOBALS['meta']['http_proxy']) ? $GLOBALS['meta']['http_proxy'] : null;
1238
	}
1239
	// rien a faire si pas de proxy :)
1240
	if (is_null($http_proxy) or !$http_proxy = trim($http_proxy)) {
1241
		return '';
1242
	}
1243
1244
	if (is_null($http_noproxy)) {
1245
		$http_noproxy = isset($GLOBALS['meta']['http_noproxy']) ? $GLOBALS['meta']['http_noproxy'] : null;
1246
	}
1247
	// si pas d'exception, on retourne le proxy
1248
	if (is_null($http_noproxy) or !$http_noproxy = trim($http_noproxy)) {
1249
		return $http_proxy;
1250
	}
1251
1252
	// si le host ou l'un des domaines parents est dans $http_noproxy on fait exception
1253
	// $http_noproxy peut contenir plusieurs domaines separes par des espaces ou retour ligne
1254
	$http_noproxy = str_replace("\n", " ", $http_noproxy);
1255
	$http_noproxy = str_replace("\r", " ", $http_noproxy);
1256
	$http_noproxy = " $http_noproxy ";
1257
	$domain = $host;
1258
	// si le domaine exact www.example.org est dans les exceptions
1259
	if (strpos($http_noproxy, " $domain ") !== false)
1260
		return '';
1261
1262
	while (strpos($domain, '.') !== false) {
1263
		$domain = explode('.', $domain);
1264
		array_shift($domain);
1265
		$domain = implode('.', $domain);
1266
1267
		// ou si un domaine parent commencant par un . est dans les exceptions (indiquant qu'il couvre tous les sous-domaines)
1268
		if (strpos($http_noproxy, " .$domain ") !== false) {
1269
			return '';
1270
		}
1271
	}
1272
1273
	// ok c'est pas une exception
1274
	return $http_proxy;
1275
}
1276
1277
1278
/**
1279
 * Initialise une requete HTTP avec entetes
1280
 *
1281
 * Décompose l'url en son schema+host+path+port et lance la requete.
1282
 * Retourne le descripteur sur lequel lire la réponse.
1283
 *
1284
 * @uses lance_requete()
1285
 *
1286
 * @param string $method
1287
 *   HEAD, GET, POST
1288
 * @param string $url
1289
 * @param bool $refuse_gz
1290
 * @param string $referer
1291
 * @param string $datas
1292
 * @param string $vers
1293
 * @param string $date
1294
 * @return array
1295
 */
1296
function init_http($method, $url, $refuse_gz = false, $referer = '', $datas = '', $vers = 'HTTP/1.0', $date = '') {
1297
	$user = $via_proxy = $proxy_user = '';
0 ignored issues
show
Unused Code introduced by
$proxy_user is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
Unused Code introduced by
$via_proxy is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
1298
	$fopen = false;
1299
1300
	$t = @parse_url($url);
1301
	$host = $t['host'];
1302
	if ($t['scheme'] == 'http') {
1303
		$scheme = 'http';
1304
		$noproxy = '';
1305
	} elseif ($t['scheme'] == 'https') {
1306
		$scheme = 'ssl';
1307
		$noproxy = 'ssl://';
1308 View Code Duplication
		if (!isset($t['port']) || !($port = $t['port'])) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1309
			$t['port'] = 443;
1310
		}
1311
	} else {
1312
		$scheme = $t['scheme'];
1313
		$noproxy = $scheme . '://';
1314
	}
1315
	if (isset($t['user'])) {
1316
		$user = array($t['user'], $t['pass']);
1317
	}
1318
1319 View Code Duplication
	if (!isset($t['port']) || !($port = $t['port'])) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1320
		$port = 80;
1321
	}
1322 View Code Duplication
	if (!isset($t['path']) || !($path = $t['path'])) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1323
		$path = '/';
1324
	}
1325
1326
	if (!empty($t['query'])) {
1327
		$path .= '?' . $t['query'];
1328
	}
1329
1330
	$f = lance_requete($method, $scheme, $user, $host, $path, $port, $noproxy, $refuse_gz, $referer, $datas, $vers, $date);
0 ignored issues
show
Bug introduced by
It seems like $user defined by $via_proxy = $proxy_user = '' on line 1297 can also be of type string; however, lance_requete() does only seem to accept array, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
1331
	if (!$f or !is_resource($f)) {
1332
		// fallback : fopen si on a pas fait timeout dans lance_requete
1333
		// ce qui correspond a $f===110
1334
		if ($f !== 110
1335
			and !need_proxy($host)
1336
			and !_request('tester_proxy')
1337
			and (!isset($GLOBALS['inc_distant_allow_fopen']) or $GLOBALS['inc_distant_allow_fopen'])
1338
		) {
1339
			$f = @fopen($url, 'rb');
1340
			spip_log("connexion vers $url par simple fopen");
1341
			$fopen = true;
1342
		} else {
1343
			// echec total
1344
			$f = false;
1345
		}
1346
	}
1347
1348
	return array($f, $fopen);
1349
}
1350
1351
/**
1352
 * Lancer la requete proprement dite
1353
 *
1354
 * @param string $method
1355
 *   type de la requete (GET, HEAD, POST...)
1356
 * @param string $scheme
1357
 *   protocole (http, tls, ftp...)
1358
 * @param array $user
1359
 *   couple (utilisateur, mot de passe) en cas d'authentification http
1360
 * @param string $host
1361
 *   nom de domaine
1362
 * @param string $path
1363
 *   chemin de la page cherchee
1364
 * @param string $port
1365
 *   port utilise pour la connexion
1366
 * @param bool $noproxy
1367
 *   protocole utilise si requete sans proxy
1368
 * @param bool $refuse_gz
1369
 *   refuser la compression GZ
1370
 * @param string $referer
1371
 *   referer
1372
 * @param string $datas
1373
 *   donnees postees
1374
 * @param string $vers
1375
 *   version HTTP
1376
 * @param int|string $date
1377
 *   timestamp pour entente If-Modified-Since
1378
 * @return bool|resource
1379
 *   false|int si echec
1380
 *   resource socket vers l'url demandee
1381
 */
1382
function lance_requete(
1383
	$method,
1384
	$scheme,
1385
	$user,
1386
	$host,
1387
	$path,
1388
	$port,
1389
	$noproxy,
1390
	$refuse_gz = false,
1391
	$referer = '',
1392
	$datas = '',
1393
	$vers = 'HTTP/1.0',
1394
	$date = ''
1395
) {
1396
1397
	$proxy_user = '';
1398
	$http_proxy = need_proxy($host);
1399
	if ($user) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $user of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
1400
		$user = urlencode($user[0]) . ':' . urlencode($user[1]);
1401
	}
1402
1403
	$connect = '';
1404
	if ($http_proxy) {
1405
		if (defined('_PROXY_HTTPS_VIA_CONNECT') and in_array($scheme , array('tls','ssl'))) {
1406
			$path_host = (!$user ? '' : "$user@") . $host . (($port != 80) ? ":$port" : '');
1407
			$connect = 'CONNECT ' . $path_host . " $vers\r\n"
1408
				. "Host: $path_host\r\n"
1409
				. "Proxy-Connection: Keep-Alive\r\n";
1410
		} else {
1411
			$path = (in_array($scheme , array('tls','ssl')) ? 'https://' : "$scheme://")
1412
				. (!$user ? '' : "$user@")
1413
				. "$host" . (($port != 80) ? ":$port" : '') . $path;
1414
		}
1415
		$t2 = @parse_url($http_proxy);
1416
		$first_host = $t2['host'];
1417
		if (!($port = $t2['port'])) {
1418
			$port = 80;
1419
		}
1420
		if ($t2['user']) {
1421
			$proxy_user = base64_encode($t2['user'] . ':' . $t2['pass']);
1422
		}
1423
	} else {
1424
		$first_host = $noproxy . $host;
1425
	}
1426
1427
	if ($connect) {
1428
		$streamContext = stream_context_create(array(
1429
			'ssl' => array(
1430
				'verify_peer' => false,
1431
				'allow_self_signed' => true,
1432
				'SNI_enabled' => true,
1433
				'peer_name' => $host,
1434
			)
1435
		));
1436
		if (version_compare(phpversion(), '5.6', '<')) {
1437
			stream_context_set_option($streamContext, 'ssl', 'SNI_server_name', $host);
1438
		}
1439
		$f = @stream_socket_client(
1440
			"tcp://$first_host:$port",
1441
			$errno,
1442
			$errstr,
1443
			_INC_DISTANT_CONNECT_TIMEOUT,
1444
			STREAM_CLIENT_CONNECT,
1445
			$streamContext
1446
		);
1447
		spip_log("Recuperer $path sur $first_host:$port par $f (via CONNECT)", 'connect');
1448
		if (!$f) {
1449
			spip_log("Erreur connexion $errno $errstr", _LOG_ERREUR);
1450
			return $errno;
1451
		}
1452
		stream_set_timeout($f, _INC_DISTANT_CONNECT_TIMEOUT);
1453
1454
		fputs($f, $connect);
1455
		fputs($f, "\r\n");
1456
		$res = fread($f, 1024);
1457
		if (!$res
1458
			or !count($res = explode(' ', $res))
1459
			or $res[1] !== '200'
1460
		) {
1461
			spip_log("Echec CONNECT sur $first_host:$port", 'connect' . _LOG_INFO_IMPORTANTE);
1462
			fclose($f);
1463
1464
			return false;
1465
		}
1466
		// important, car sinon on lit trop vite et les donnees ne sont pas encore dispo
1467
		stream_set_blocking($f, true);
1468
		// envoyer le handshake
1469
		stream_socket_enable_crypto($f, true, STREAM_CRYPTO_METHOD_SSLv23_CLIENT);
1470
		spip_log("OK CONNECT sur $first_host:$port", 'connect');
1471
	} else {
1472
		$ntry = 3;
1473
		do {
1474
			$f = @fsockopen($first_host, $port, $errno, $errstr, _INC_DISTANT_CONNECT_TIMEOUT);
1475
		} while (!$f and $ntry-- and $errno !== 110 and sleep(1));
1476
		spip_log("Recuperer $path sur $first_host:$port par $f");
1477
		if (!$f) {
1478
			spip_log("Erreur connexion $errno $errstr", _LOG_ERREUR);
1479
1480
			return $errno;
1481
		}
1482
		stream_set_timeout($f, _INC_DISTANT_CONNECT_TIMEOUT);
1483
	}
1484
1485
	$site = isset($GLOBALS['meta']['adresse_site']) ? $GLOBALS['meta']['adresse_site'] : '';
1486
1487
	$req = "$method $path $vers\r\n"
1488
		. "Host: $host\r\n"
1489
		. 'User-Agent: ' . _INC_DISTANT_USER_AGENT . "\r\n"
1490
		. ($refuse_gz ? '' : ('Accept-Encoding: ' . _INC_DISTANT_CONTENT_ENCODING . "\r\n"))
1491
		. (!$site ? '' : "Referer: $site/$referer\r\n")
1492
		. (!$date ? '' : 'If-Modified-Since: ' . (gmdate('D, d M Y H:i:s', $date) . " GMT\r\n"))
1493
		. (!$user ? '' : ('Authorization: Basic ' . base64_encode($user) . "\r\n"))
1494
		. (!$proxy_user ? '' : "Proxy-Authorization: Basic $proxy_user\r\n")
1495
		. (!strpos($vers, '1.1') ? '' : "Keep-Alive: 300\r\nConnection: keep-alive\r\n");
1496
1497
#	spip_log("Requete\n$req");
1498
	fputs($f, $req);
1499
	fputs($f, $datas ? $datas : "\r\n");
1500
1501
	return $f;
1502
}
1503