Completed
Push — master ( 605ecb...157706 )
by cam
06:19
created

distant.php ➔ valider_url_distante()   F

Complexity

Conditions 34
Paths 464

Size

Total Lines 85

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 34
nc 464
nop 2
dl 0
loc 85
rs 0.7443
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/***************************************************************************\
4
 *  SPIP, Systeme de publication pour l'internet                           *
5
 *                                                                         *
6
 *  Copyright (c) 2001-2018                                                *
7
 *  Arnaud Martin, Antoine Pitrou, Philippe Riviere, Emmanuel Saint-James  *
8
 *                                                                         *
9
 *  Ce programme est un logiciel libre distribue sous licence GNU/GPL.     *
10
 *  Pour plus de details voir le fichier COPYING.txt ou l'aide en ligne.   *
11
\***************************************************************************/
12
13
/**
14
 * Ce fichier gère l'obtention de données distantes
15
 *
16
 * @package SPIP\Core\Distant
17
 **/
18
if (!defined('_ECRIRE_INC_VERSION')) {
19
	return;
20
}
21
22
if (!defined('_INC_DISTANT_VERSION_HTTP')) {
23
	define('_INC_DISTANT_VERSION_HTTP', 'HTTP/1.0');
24
}
25
if (!defined('_INC_DISTANT_CONTENT_ENCODING')) {
26
	define('_INC_DISTANT_CONTENT_ENCODING', 'gzip');
27
}
28
if (!defined('_INC_DISTANT_USER_AGENT')) {
29
	define('_INC_DISTANT_USER_AGENT', 'SPIP-' . $GLOBALS['spip_version_affichee'] . ' (' . $GLOBALS['home_server'] . ')');
30
}
31
if (!defined('_INC_DISTANT_MAX_SIZE')) {
32
	define('_INC_DISTANT_MAX_SIZE', 2097152);
33
}
34
if (!defined('_INC_DISTANT_CONNECT_TIMEOUT')) {
35
	define('_INC_DISTANT_CONNECT_TIMEOUT', 10);
36
}
37
38
define('_REGEXP_COPIE_LOCALE', ',' 	.
39
	preg_replace(
40
		'@^https?:@',
41
		'https?:',
42
		(isset($GLOBALS['meta']['adresse_site']) ? $GLOBALS['meta']['adresse_site'] : '')
43
	)
44
	. '/?spip.php[?]action=acceder_document.*file=(.*)$,');
45
46
//@define('_COPIE_LOCALE_MAX_SIZE',2097152); // poids (inc/utils l'a fait)
47
48
/**
49
 * Crée au besoin la copie locale d'un fichier distant
50
 *
51
 * Prend en argument un chemin relatif au rep racine, ou une URL
52
 * Renvoie un chemin relatif au rep racine, ou false
53
 *
54
 * @link http://www.spip.net/4155
55
 * @pipeline_appel post_edition
56
 *
57
 * @param string $source
58
 * @param string $mode
59
 *   - 'test' - ne faire que tester
60
 *   - 'auto' - charger au besoin
61
 *   - 'modif' - Si deja present, ne charger que si If-Modified-Since
62
 *   - 'force' - charger toujours (mettre a jour)
63
 * @param string $local
0 ignored issues
show
Documentation introduced by
Should the type for parameter $local not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
64
 *   permet de specifier le nom du fichier local (stockage d'un cache par exemple, et non document IMG)
65
 * @param int $taille_max
0 ignored issues
show
Documentation introduced by
Should the type for parameter $taille_max not be integer|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
66
 *   taille maxi de la copie local, par defaut _COPIE_LOCALE_MAX_SIZE
67
 * @return bool|string
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use string|false.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
68
 */
69
function copie_locale($source, $mode = 'auto', $local = null, $taille_max = null) {
70
71
	// si c'est la protection de soi-meme, retourner le path
72
	if ($mode !== 'force' and preg_match(_REGEXP_COPIE_LOCALE, $source, $match)) {
73
		$source = substr(_DIR_IMG, strlen(_DIR_RACINE)) . urldecode($match[1]);
74
75
		return @file_exists($source) ? $source : false;
76
	}
77
78
	if (is_null($local)) {
79
		$local = fichier_copie_locale($source);
80 View Code Duplication
	} else {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
81
		if (_DIR_RACINE and strncmp(_DIR_RACINE, $local, strlen(_DIR_RACINE)) == 0) {
82
			$local = substr($local, strlen(_DIR_RACINE));
83
		}
84
	}
85
86
	// si $local = '' c'est un fichier refuse par fichier_copie_locale(),
87
	// par exemple un fichier qui ne figure pas dans nos documents ;
88
	// dans ce cas on n'essaie pas de le telecharger pour ensuite echouer
89
	if (!$local) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $local of type string|null is loosely compared to false; this is ambiguous if the string can be empty. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
90
		return false;
91
	}
92
93
	$localrac = _DIR_RACINE . $local;
94
	$t = ($mode == 'force') ? false : @file_exists($localrac);
95
96
	// test d'existence du fichier
97
	if ($mode == 'test') {
98
		return $t ? $local : '';
99
	}
100
101
	// sinon voir si on doit/peut le telecharger
102
	if ($local == $source or !tester_url_absolue($source)) {
103
		return $local;
104
	}
105
106
	if ($mode == 'modif' or !$t) {
107
		// passer par un fichier temporaire unique pour gerer les echecs en cours de recuperation
108
		// et des eventuelles recuperations concurantes
109
		include_spip('inc/acces');
110
		if (!$taille_max) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $taille_max of type integer|null is loosely compared to false; this is ambiguous if the integer can be zero. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
111
			$taille_max = _COPIE_LOCALE_MAX_SIZE;
112
		}
113
		$res = recuperer_url(
114
			$source,
115
			array('file' => $localrac, 'taille_max' => $taille_max, 'if_modified_since' => $t ? filemtime($localrac) : '')
116
		);
117
		if (!$res or (!$res['length'] and $res['status'] != 304)) {
118
			spip_log("copie_locale : Echec recuperation $source sur $localrac status : " . $res['status'], _LOG_INFO_IMPORTANTE);
119
		}
120
		if (!$res['length']) {
121
			// si $t c'est sans doute juste un not-modified-since
122
			return $t ? $local : false;
123
		}
124
		spip_log("copie_locale : recuperation $source sur $localrac taille " . $res['length'] . ' OK');
125
126
		// pour une eventuelle indexation
127
		pipeline(
128
			'post_edition',
129
			array(
130
				'args' => array(
131
					'operation' => 'copie_locale',
132
					'source' => $source,
133
					'fichier' => $local,
134
					'http_res' => $res['length'],
135
				),
136
				'data' => null
137
			)
138
		);
139
	}
140
141
	return $local;
142
}
143
144
/**
145
 * Valider qu'une URL d'un document distant est bien distante
146
 * et pas une url localhost qui permet d'avoir des infos sur le serveur
147
 * inspiree de https://core.trac.wordpress.org/browser/trunk/src/wp-includes/http.php?rev=36435#L500
148
 * 
149
 * @param string $url
150
 * @param array $known_hosts
151
 *   url/hosts externes connus et acceptes
152
 * @return false|string 
153
 *   url ou false en cas d'echec
154
 */
155
function valider_url_distante($url, $known_hosts = array()) {
156
	if (!function_exists('protocole_verifier')){
157
		include_spip('inc/filtres_mini');
158
	}
159
160
	if (!protocole_verifier($url, array('http', 'https'))) {
161
		return false;
162
	}
163
	
164
	$parsed_url = parse_url($url);
165
	if (!$parsed_url or empty($parsed_url['host']) ) {
166
		return false;
167
	}
168
169
	if (isset($parsed_url['user']) or isset($parsed_url['pass'])) {
170
		return false;
171
	}
172
173
	if (false !== strpbrk($parsed_url['host'], ':#?[]')) {
174
		return false;
175
	}
176
177
	if (!is_array($known_hosts)) {
178
		$known_hosts = array($known_hosts);
179
	}
180
	$known_hosts[] = $GLOBALS['meta']['adresse_site'];
181
	$known_hosts[] = self();
182
	$known_hosts = pipeline('declarer_hosts_distants', $known_hosts);
183
184
185
	$is_known_host = false;
186
	foreach ($known_hosts as $known_host) {
187
		$parse_known = $parsed_url($known_host);
188
		if ($parse_known
189
		  and strtolower($parse_known['host']) === strtolower($parsed_url['host'])) {
190
			$is_known_host = true;
191
			break;
192
		}
193
	}
194
195
	if (!$is_known_host) {
196
		$host = trim($parsed_url['host'], '.');
197
		if (preg_match('#^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$#', $host)) {
198
			$ip = $host;
199
		} else {
200
			$ip = gethostbyname($host);
201
			if ($ip === $host) {
202
				// Error condition for gethostbyname()
203
				$ip = false;
204
			}
205
		}
206
		if ($ip) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $ip of type false|string is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
207
			$parts = array_map('intval', explode( '.', $ip ));
208
			if (127 === $parts[0] or 10 === $parts[0] or 0 === $parts[0]
209
			  or ( 172 === $parts[0] and 16 <= $parts[1] and 31 >= $parts[1] )
210
			  or ( 192 === $parts[0] && 168 === $parts[1] )
211
			) {
212
				return false;
213
			}
214
		}
215
	}
216
217
	if (empty($parsed_url['port'])) {
218
		return $url;
219
	}
220
221
	$port = $parsed_url['port'];
222
	if ($port === 80  or $port === 443  or $port === 8080) {
0 ignored issues
show
Unused Code Bug introduced by
The strict comparison === seems to always evaluate to false as the types of $port (string) and 80 (integer) can never be identical. Maybe you want to use a loose comparison == instead?
Loading history...
223
		return $url;
224
	}
225
226
	if ($is_known_host) {
227
		foreach ($known_hosts as $known_host) {
228
			$parse_known = $parsed_url($known_host);
229
			if ($parse_known
230
				and !empty($parse_known['port'])
231
			  and strtolower($parse_known['host']) === strtolower($parsed_url['host'])
232
			  and $parse_known['port'] == $port) {
233
				return $url;
234
			}
235
		}
236
	}
237
238
	return false;
239
}
240
241
/**
242
 * Preparer les donnes pour un POST
243
 * si $donnees est une chaine
244
 *  - charge a l'envoyeur de la boundariser, de gerer le Content-Type etc...
245
 *  - on traite les retour ligne pour les mettre au bon format
246
 *  - on decoupe en entete/corps (separes par ligne vide)
247
 * si $donnees est un tableau
248
 *  - structuration en chaine avec boundary si necessaire ou fournie et bon Content-Type
249
 *
250
 * @param string|array $donnees
251
 * @param string $boundary
252
 * @return array
253
 *   entete,corps
254
 */
255
function prepare_donnees_post($donnees, $boundary = '') {
256
257
	// permettre a la fonction qui a demande le post de formater elle meme ses donnees
258
	// pour un appel soap par exemple
259
	// l'entete est separe des donnees par un double retour a la ligne
260
	// on s'occupe ici de passer tous les retours lignes (\r\n, \r ou \n) en \r\n
261
	if (is_string($donnees) && strlen($donnees)) {
262
		$entete = '';
263
		// on repasse tous les \r\n et \r en simples \n
264
		$donnees = str_replace("\r\n", "\n", $donnees);
265
		$donnees = str_replace("\r", "\n", $donnees);
266
		// un double retour a la ligne signifie la fin de l'entete et le debut des donnees
267
		$p = strpos($donnees, "\n\n");
268
		if ($p !== false) {
269
			$entete = str_replace("\n", "\r\n", substr($donnees, 0, $p + 1));
270
			$donnees = substr($donnees, $p + 2);
271
		}
272
		$chaine = str_replace("\n", "\r\n", $donnees);
273
	} else {
274
		/* boundary automatique */
275
		// Si on a plus de 500 octects de donnees, on "boundarise"
276
		if ($boundary === '') {
277
			$taille = 0;
278
			foreach ($donnees as $cle => $valeur) {
0 ignored issues
show
Bug introduced by
The expression $donnees of type array|string is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
279
				if (is_array($valeur)) {
280
					foreach ($valeur as $val2) {
281
						$taille += strlen($val2);
282
					}
283
				} else {
284
					// faut-il utiliser spip_strlen() dans inc/charsets ?
285
					$taille += strlen($valeur);
286
				}
287
			}
288
			if ($taille > 500) {
289
				$boundary = substr(md5(rand() . 'spip'), 0, 8);
290
			}
291
		}
292
293
		if (is_string($boundary) and strlen($boundary)) {
294
			// fabrique une chaine HTTP pour un POST avec boundary
295
			$entete = "Content-Type: multipart/form-data; boundary=$boundary\r\n";
296
			$chaine = '';
297
			if (is_array($donnees)) {
298
				foreach ($donnees as $cle => $valeur) {
299
					if (is_array($valeur)) {
300
						foreach ($valeur as $val2) {
301
							$chaine .= "\r\n--$boundary\r\n";
302
							$chaine .= "Content-Disposition: form-data; name=\"{$cle}[]\"\r\n";
303
							$chaine .= "\r\n";
304
							$chaine .= $val2;
305
						}
306
					} else {
307
						$chaine .= "\r\n--$boundary\r\n";
308
						$chaine .= "Content-Disposition: form-data; name=\"$cle\"\r\n";
309
						$chaine .= "\r\n";
310
						$chaine .= $valeur;
311
					}
312
				}
313
				$chaine .= "\r\n--$boundary\r\n";
314
			}
315
		} else {
316
			// fabrique une chaine HTTP simple pour un POST
317
			$entete = 'Content-Type: application/x-www-form-urlencoded' . "\r\n";
318
			$chaine = array();
319
			if (is_array($donnees)) {
320
				foreach ($donnees as $cle => $valeur) {
321
					if (is_array($valeur)) {
322
						foreach ($valeur as $val2) {
323
							$chaine[] = rawurlencode($cle) . '[]=' . rawurlencode($val2);
324
						}
325
					} else {
326
						$chaine[] = rawurlencode($cle) . '=' . rawurlencode($valeur);
327
					}
328
				}
329
				$chaine = implode('&', $chaine);
330
			} else {
331
				$chaine = $donnees;
332
			}
333
		}
334
	}
335
336
	return array($entete, $chaine);
337
}
338
339
/**
340
 * Convertir une URL dont le host est en utf8 en ascii
341
 * Utilise la librairie https://github.com/phlylabs/idna-convert/tree/v0.9.1
342
 * dans sa derniere version compatible toutes version PHP 5
343
 * La fonction PHP idn_to_ascii depend d'un package php5-intl et est rarement disponible
344
 *
345
 * @param string $url_idn
346
 * @return array|string
347
 */
348
function url_to_ascii($url_idn) {
349
350
	if ($parts = parse_url($url_idn)) {
351
		$host = $parts['host'];
352
		if (!preg_match(',^[a-z0-9_\.\-]+$,i', $host)) {
353
			include_spip('inc/idna_convert.class');
354
			$IDN = new idna_convert();
355
			$host_ascii = $IDN->encode($host);
356
			$url_idn = explode($host, $url_idn, 2);
357
			$url_idn = implode($host_ascii, $url_idn);
358
		}
359
	}
360
361
	return $url_idn;
362
}
363
364
/**
365
 * Récupère le contenu d'une URL
366
 * au besoin encode son contenu dans le charset local
367
 *
368
 * @uses init_http()
369
 * @uses recuperer_entetes()
370
 * @uses recuperer_body()
371
 * @uses transcoder_page()
372
 *
373
 * @param string $url
374
 * @param array $options
375
 *   bool transcoder : true si on veut transcoder la page dans le charset du site
376
 *   string methode : Type de requête HTTP à faire (HEAD, GET ou POST)
377
 *   int taille_max : Arrêter le contenu au-delà (0 = seulement les entetes ==> requête HEAD). Par defaut taille_max = 1Mo ou 16Mo si copie dans un fichier
378
 *   string|array datas : Pour envoyer des donnees (array) et/ou entetes (string) (force la methode POST si donnees non vide)
379
 *   string boundary : boundary pour formater les datas au format array
380
 *   bool refuser_gz : Pour forcer le refus de la compression (cas des serveurs orthographiques)
381
 *   int if_modified_since : Un timestamp unix pour arrêter la récuperation si la page distante n'a pas été modifiée depuis une date donnée
382
 *   string uri_referer : Pour préciser un référer différent
383
 *   string file : nom du fichier dans lequel copier le contenu
384
 *   int follow_location : nombre de redirections a suivre (0 pour ne rien suivre)
385
 *   string version_http : version du protocole HTTP a utiliser (par defaut defini par la constante _INC_DISTANT_VERSION_HTTP)
386
 * @return array|bool
387
 *   false si echec
388
 *   array sinon :
389
 *     int status : le status de la page
390
 *     string headers : les entetes de la page
391
 *     string page : le contenu de la page (vide si copie dans un fichier)
392
 *     int last_modified : timestamp de derniere modification
393
 *     string location : url de redirection envoyee par la page
394
 *     string url : url reelle de la page recuperee
395
 *     int length : taille du contenu ou du fichier
396
 *
397
 *     string file : nom du fichier si enregistre dans un fichier
398
 */
399
function recuperer_url($url, $options = array()) {
400
	$default = array(
401
		'transcoder' => false,
402
		'methode' => 'GET',
403
		'taille_max' => null,
404
		'datas' => '',
405
		'boundary' => '',
406
		'refuser_gz' => false,
407
		'if_modified_since' => '',
408
		'uri_referer' => '',
409
		'file' => '',
410
		'follow_location' => 10,
411
		'version_http' => _INC_DISTANT_VERSION_HTTP,
412
	);
413
	$options = array_merge($default, $options);
414
	// copier directement dans un fichier ?
415
	$copy = $options['file'];
416
417
	if ($options['methode'] == 'HEAD') {
418
		$options['taille_max'] = 0;
419
	}
420
	if (is_null($options['taille_max'])) {
421
		$options['taille_max'] = $copy ? _COPIE_LOCALE_MAX_SIZE : _INC_DISTANT_MAX_SIZE;
422
	}
423
424
	if (!empty($options['datas'])) {
425
		list($head, $postdata) = prepare_donnees_post($options['datas'], $options['boundary']);
426
		if (stripos($head, 'Content-Length:') === false) {
427
			$head .= 'Content-Length: ' . strlen($postdata);
428
		}
429
		$options['datas'] = $head . "\r\n\r\n" . $postdata;
430
		if (strlen($postdata)) {
431
			$options['methode'] = 'POST';
432
		}
433
	}
434
435
	// Accepter les URLs au format feed:// ou qui ont oublie le http:// ou les urls relatives au protocole
436
	$url = preg_replace(',^feed://,i', 'http://', $url);
437
	if (!tester_url_absolue($url)) {
438
		$url = 'http://' . $url;
439
	} elseif (strncmp($url, '//', 2) == 0) {
440
		$url = 'http:' . $url;
441
	}
442
443
	$url = url_to_ascii($url);
444
445
	$result = array(
446
		'status' => 0,
447
		'headers' => '',
448
		'page' => '',
449
		'length' => 0,
450
		'last_modified' => '',
451
		'location' => '',
452
		'url' => $url
453
	);
454
455
	// si on ecrit directement dans un fichier, pour ne pas manipuler en memoire refuser gz
456
	$refuser_gz = (($options['refuser_gz'] or $copy) ? true : false);
457
458
	// ouvrir la connexion et envoyer la requete et ses en-tetes
459
	list($handle, $fopen) = init_http(
460
		$options['methode'],
461
		$url,
0 ignored issues
show
Bug introduced by
It seems like $url can also be of type array; however, init_http() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
462
		$refuser_gz,
463
		$options['uri_referer'],
464
		$options['datas'],
465
		$options['version_http'],
466
		$options['if_modified_since']
467
	);
468
	if (!$handle) {
469
		spip_log("ECHEC init_http $url");
470
471
		return false;
472
	}
473
474
	// Sauf en fopen, envoyer le flux d'entree
475
	// et recuperer les en-tetes de reponses
476
	if (!$fopen) {
477
		$res = recuperer_entetes_complets($handle, $options['if_modified_since']);
478
		if (!$res) {
479
			fclose($handle);
480
			$t = @parse_url($url);
481
			$host = $t['host'];
482
			// Chinoisierie inexplicable pour contrer
483
			// les actions liberticides de l'empire du milieu
484
			if (!need_proxy($host)
485
				and $res = @file_get_contents($url)
486
			) {
487
				$result['length'] = strlen($res);
488
				if ($copy) {
489
					ecrire_fichier($copy, $res);
490
					$result['file'] = $copy;
491
				} else {
492
					$result['page'] = $res;
493
				}
494
				$res = array(
495
					'status' => 200,
496
				);
497
			} else {
498
				return false;
499
			}
500
		} elseif ($res['location'] and $options['follow_location']) {
501
			$options['follow_location']--;
502
			fclose($handle);
503
			include_spip('inc/filtres');
504
			$url = suivre_lien($url, $res['location']);
0 ignored issues
show
Bug introduced by
It seems like $url can also be of type array; however, suivre_lien() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
505
			spip_log("recuperer_url recommence sur $url");
506
507
			return recuperer_url($url, $options);
508
		} elseif ($res['status'] !== 200) {
509
			spip_log('HTTP status ' . $res['status'] . " pour $url");
510
		}
511
		$result['status'] = $res['status'];
512
		if (isset($res['headers'])) {
513
			$result['headers'] = $res['headers'];
514
		}
515
		if (isset($res['last_modified'])) {
516
			$result['last_modified'] = $res['last_modified'];
517
		}
518
		if (isset($res['location'])) {
519
			$result['location'] = $res['location'];
520
		}
521
	}
522
523
	// on ne veut que les entetes
524
	if (!$options['taille_max'] or $options['methode'] == 'HEAD' or $result['status'] == '304') {
525
		return $result;
526
	}
527
528
529
	// s'il faut deballer, le faire via un fichier temporaire
530
	// sinon la memoire explose pour les gros flux
531
532
	$gz = false;
533
	if (preg_match(",\bContent-Encoding: .*gzip,is", $result['headers'])) {
534
		$gz = (_DIR_TMP . md5(uniqid(mt_rand())) . '.tmp.gz');
535
	}
536
537
	// si on a pas deja recuperer le contenu par une methode detournee
538
	if (!$result['length']) {
539
		$res = recuperer_body($handle, $options['taille_max'], $gz ? $gz : $copy);
540
		fclose($handle);
541
		if ($copy) {
542
			$result['length'] = $res;
543
			$result['file'] = $copy;
544
		} elseif ($res) {
545
			$result['page'] = &$res;
546
			$result['length'] = strlen($result['page']);
547
		}
548
	}
549
	if (!$result['page']) {
550
		return $result;
551
	}
552
553
	// Decompresser au besoin
554
	if ($gz) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $gz of type string|false is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
555
		$result['page'] = implode('', gzfile($gz));
556
		supprimer_fichier($gz);
557
	}
558
559
	// Faut-il l'importer dans notre charset local ?
560
	if ($options['transcoder']) {
561
		include_spip('inc/charsets');
562
		$result['page'] = transcoder_page($result['page'], $result['headers']);
563
	}
564
565
	return $result;
566
}
567
568
/**
569
 * Récuperer une URL si on l'a pas déjà dans un cache fichier
570
 *
571
 * Le délai de cache est fourni par l'option `delai_cache`
572
 * Les autres options et le format de retour sont identiques à la fonction `recuperer_url`
573
 * @uses recuperer_url()
574
 *
575
 * @param string $url
576
 * @param array $options
577
 *   int delai_cache : anciennete acceptable pour le contenu (en seconde)
578
 * @return array|bool|mixed
579
 */
580
function recuperer_url_cache($url, $options = array()) {
581
	if (!defined('_DELAI_RECUPERER_URL_CACHE')) {
582
		define('_DELAI_RECUPERER_URL_CACHE', 3600);
583
	}
584
	$default = array(
585
		'transcoder' => false,
586
		'methode' => 'GET',
587
		'taille_max' => null,
588
		'datas' => '',
589
		'boundary' => '',
590
		'refuser_gz' => false,
591
		'if_modified_since' => '',
592
		'uri_referer' => '',
593
		'file' => '',
594
		'follow_location' => 10,
595
		'version_http' => _INC_DISTANT_VERSION_HTTP,
596
		'delai_cache' => _DELAI_RECUPERER_URL_CACHE,
597
	);
598
	$options = array_merge($default, $options);
599
600
	// cas ou il n'est pas possible de cacher
601
	if (!empty($options['data']) or $options['methode'] == 'POST') {
602
		return recuperer_url($url, $options);
603
	}
604
605
	// ne pas tenter plusieurs fois la meme url en erreur (non cachee donc)
606
	static $errors = array();
607
	if (isset($errors[$url])) {
608
		return $errors[$url];
609
	}
610
611
	$sig = $options;
612
	unset($sig['if_modified_since']);
613
	unset($sig['delai_cache']);
614
	$sig['url'] = $url;
615
616
	$dir = sous_repertoire(_DIR_CACHE, 'curl');
617
	$cache = md5(serialize($sig)) . '-' . substr(preg_replace(',\W+,', '_', $url), 0, 80);
618
	$sub = sous_repertoire($dir, substr($cache, 0, 2));
619
	$cache = "$sub$cache";
620
621
	$res = false;
622
	$is_cached = file_exists($cache);
623
	if ($is_cached
624
		and (filemtime($cache) > $_SERVER['REQUEST_TIME'] - $options['delai_cache'])
625
	) {
626
		lire_fichier($cache, $res);
627
		if ($res = unserialize($res)) {
628
			// mettre le last_modified et le status=304 ?
629
		}
630
	}
631
	if (!$res) {
632
		$res = recuperer_url($url, $options);
633
		// ne pas recharger cette url non cachee dans le meme hit puisque non disponible
634
		if (!$res) {
635
			if ($is_cached) {
636
				// on a pas reussi a recuperer mais on avait un cache : l'utiliser
637
				lire_fichier($cache, $res);
638
				$res = unserialize($res);
639
			}
640
641
			return $errors[$url] = $res;
642
		}
643
		ecrire_fichier($cache, serialize($res));
644
	}
645
646
	return $res;
647
}
648
649
/**
650
 * Obsolète : Récupère une page sur le net et au besoin l'encode dans le charset local
651
 *
652
 * Gère les redirections de page (301) sur l'URL demandée (maximum 10 redirections)
653
 *
654
 * @deprecated
655
 * @uses recuperer_url()
656
 *
657
 * @param string $url
658
 *     URL de la page à récupérer
659
 * @param bool|string $trans
660
 *     - chaîne longue : c'est un nom de fichier (nom pour sa copie locale)
661
 *     - true : demande d'encodage/charset
662
 *     - null : ne retourner que les headers
663
 * @param bool $get_headers
664
 *     Si on veut récupérer les entêtes
665
 * @param int|null $taille_max
666
 *     Arrêter le contenu au-delà (0 = seulement les entetes ==> requête HEAD).
667
 *     Par defaut taille_max = 1Mo.
668
 * @param string|array $datas
669
 *     Pour faire un POST de données
670
 * @param string $boundary
671
 *     Pour forcer l'envoi par cette méthode
672
 * @param bool $refuser_gz
673
 *     Pour forcer le refus de la compression (cas des serveurs orthographiques)
674
 * @param string $date_verif
675
 *     Un timestamp unix pour arrêter la récuperation si la page distante
676
 *     n'a pas été modifiée depuis une date donnée
677
 * @param string $uri_referer
678
 *     Pour préciser un référer différent
679
 * @return string|bool
680
 *     - Code de la page obtenue (avec ou sans entête)
681
 *     - false si la page n'a pu être récupérée (status different de 200)
682
 **/
683
function recuperer_page(
684
	$url,
685
	$trans = false,
686
	$get_headers = false,
687
	$taille_max = null,
688
	$datas = '',
689
	$boundary = '',
690
	$refuser_gz = false,
691
	$date_verif = '',
692
	$uri_referer = ''
693
) {
694
	// $copy = copier le fichier ?
695
	$copy = (is_string($trans) and strlen($trans) > 5); // eviter "false" :-)
696
697
	if (!is_null($taille_max) and ($taille_max == 0)) {
698
		$get = 'HEAD';
699
	} else {
700
		$get = 'GET';
701
	}
702
703
	$options = array(
704
		'transcoder' => $trans === true,
705
		'methode' => $get,
706
		'datas' => $datas,
707
		'boundary' => $boundary,
708
		'refuser_gz' => $refuser_gz,
709
		'if_modified_since' => $date_verif,
710
		'uri_referer' => $uri_referer,
711
		'file' => $copy ? $trans : '',
712
		'follow_location' => 10,
713
	);
714
	if (!is_null($taille_max)) {
715
		$options['taille_max'] = $taille_max;
716
	}
717
	// dix tentatives maximum en cas d'entetes 301...
718
	$res = recuperer_url($url, $options);
719
	if (!$res) {
720
		return false;
721
	}
722
	if ($res['status'] !== 200) {
723
		return false;
724
	}
725
	if ($get_headers) {
726
		return $res['headers'] . "\n" . $res['page'];
727
	}
728
729
	return $res['page'];
730
}
731
732
733
/**
734
 * Obsolete Récupère une page sur le net et au besoin l'encode dans le charset local
735
 *
736
 * @deprecated
737
 *
738
 * @uses recuperer_url()
739
 *
740
 * @param string $url
741
 *     URL de la page à récupérer
742
 * @param bool|null|string $trans
743
 *     - chaîne longue : c'est un nom de fichier (nom pour sa copie locale)
744
 *     - true : demande d'encodage/charset
745
 *     - null : ne retourner que les headers
746
 * @param string $get
747
 *     Type de requête HTTP à faire (HEAD, GET ou POST)
748
 * @param int|bool $taille_max
0 ignored issues
show
Documentation introduced by
Consider making the type for parameter $taille_max a bit more specific; maybe use integer.
Loading history...
749
 *     Arrêter le contenu au-delà (0 = seulement les entetes ==> requête HEAD).
750
 *     Par defaut taille_max = 1Mo.
751
 * @param string|array $datas
752
 *     Pour faire un POST de données
753
 * @param bool $refuser_gz
754
 *     Pour forcer le refus de la compression (cas des serveurs orthographiques)
755
 * @param string $date_verif
756
 *     Un timestamp unix pour arrêter la récuperation si la page distante
757
 *     n'a pas été modifiée depuis une date donnée
758
 * @param string $uri_referer
759
 *     Pour préciser un référer différent
760
 * @return string|array|bool
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use false|array.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
761
 *     - Retourne l'URL en cas de 301,
762
 *     - Un tableau (entête, corps) si ok,
763
 *     - false sinon
764
 **/
765
function recuperer_lapage(
766
	$url,
767
	$trans = false,
768
	$get = 'GET',
769
	$taille_max = 1048576,
770
	$datas = '',
771
	$refuser_gz = false,
772
	$date_verif = '',
773
	$uri_referer = ''
774
) {
775
	// $copy = copier le fichier ?
776
	$copy = (is_string($trans) and strlen($trans) > 5); // eviter "false" :-)
777
778
	// si on ecrit directement dans un fichier, pour ne pas manipuler
779
	// en memoire refuser gz
780
	if ($copy) {
781
		$refuser_gz = true;
782
	}
783
784
	$options = array(
785
		'transcoder' => $trans === true,
786
		'methode' => $get,
787
		'datas' => $datas,
788
		'refuser_gz' => $refuser_gz,
789
		'if_modified_since' => $date_verif,
790
		'uri_referer' => $uri_referer,
791
		'file' => $copy ? $trans : '',
792
		'follow_location' => false,
793
	);
794
	if (!is_null($taille_max)) {
795
		$options['taille_max'] = $taille_max;
796
	}
797
	// dix tentatives maximum en cas d'entetes 301...
798
	$res = recuperer_url($url, $options);
799
800
	if (!$res) {
801
		return false;
802
	}
803
	if ($res['status'] !== 200) {
804
		return false;
805
	}
806
807
	return array($res['headers'], $res['page']);
808
}
809
810
/**
811
 * Recuperer le contenu sur lequel pointe la resource passee en argument
812
 * $taille_max permet de tronquer
813
 * de l'url dont on a deja recupere les en-tetes
814
 *
815
 * @param resource $handle
816
 * @param int $taille_max
817
 * @param string $fichier
818
 *   fichier dans lequel copier le contenu de la resource
819
 * @return bool|int|string
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use integer|false|string.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
820
 *   bool false si echec
821
 *   int taille du fichier si argument fichier fourni
822
 *   string contenu de la resource
823
 */
824
function recuperer_body($handle, $taille_max = _INC_DISTANT_MAX_SIZE, $fichier = '') {
825
	$taille = 0;
826
	$result = '';
827
	$fp = false;
828
	if ($fichier) {
829
		include_spip('inc/acces');
830
		$tmpfile = "$fichier." . creer_uniqid() . '.tmp';
831
		$fp = spip_fopen_lock($tmpfile, 'w', LOCK_EX);
832
		if (!$fp and file_exists($fichier)) {
833
			return filesize($fichier);
834
		}
835
		if (!$fp) {
836
			return false;
837
		}
838
		$result = 0; // on renvoie la taille du fichier
839
	}
840
	while (!feof($handle) and $taille < $taille_max) {
841
		$res = fread($handle, 16384);
842
		$taille += strlen($res);
843
		if ($fp) {
844
			fwrite($fp, $res);
845
			$result = $taille;
846
		} else {
847
			$result .= $res;
848
		}
849
	}
850
	if ($fp) {
851
		spip_fclose_unlock($fp);
852
		spip_unlink($fichier);
853
		@rename($tmpfile, $fichier);
0 ignored issues
show
Bug introduced by
The variable $tmpfile does not seem to be defined for all execution paths leading up to this point.

If you define a variable conditionally, it can happen that it is not defined for all execution paths.

Let’s take a look at an example:

function myFunction($a) {
    switch ($a) {
        case 'foo':
            $x = 1;
            break;

        case 'bar':
            $x = 2;
            break;
    }

    // $x is potentially undefined here.
    echo $x;
}

In the above example, the variable $x is defined if you pass “foo” or “bar” as argument for $a. However, since the switch statement has no default case statement, if you pass any other value, the variable $x would be undefined.

Available Fixes

  1. Check for existence of the variable explicitly:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        if (isset($x)) { // Make sure it's always set.
            echo $x;
        }
    }
    
  2. Define a default value for the variable:

    function myFunction($a) {
        $x = ''; // Set a default which gets overridden for certain paths.
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        echo $x;
    }
    
  3. Add a value for the missing path:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
    
            // We add support for the missing case.
            default:
                $x = '';
                break;
        }
    
        echo $x;
    }
    
Loading history...
Security Best Practice introduced by
It seems like you do not handle an error condition here. This can introduce security issues, and is generally not recommended.

If you suppress an error, we recommend checking for the error condition explicitly:

// For example instead of
@mkdir($dir);

// Better use
if (@mkdir($dir) === false) {
    throw new \RuntimeException('The directory '.$dir.' could not be created.');
}
Loading history...
854
		if (!file_exists($fichier)) {
855
			return false;
856
		}
857
	}
858
859
	return $result;
860
}
861
862
/**
863
 * Lit les entetes de reponse HTTP sur la socket $handle
864
 * et retourne
865
 * false en cas d'echec,
866
 * un tableau associatif en cas de succes, contenant :
867
 * - le status
868
 * - le tableau complet des headers
869
 * - la date de derniere modif si connue
870
 * - l'url de redirection si specifiee
871
 *
872
 * @param resource $handle
873
 * @param int|bool $if_modified_since
874
 * @return bool|array
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use false|array.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
875
 *   int status
876
 *   string headers
877
 *   int last_modified
878
 *   string location
879
 */
880
function recuperer_entetes_complets($handle, $if_modified_since = false) {
881
	$result = array('status' => 0, 'headers' => array(), 'last_modified' => 0, 'location' => '');
882
883
	$s = @trim(fgets($handle, 16384));
884
	if (!preg_match(',^HTTP/[0-9]+\.[0-9]+ ([0-9]+),', $s, $r)) {
885
		return false;
886
	}
887
	$result['status'] = intval($r[1]);
888
	while ($s = trim(fgets($handle, 16384))) {
889
		$result['headers'][] = $s . "\n";
890
		preg_match(',^([^:]*): *(.*)$,i', $s, $r);
891
		list(, $d, $v) = $r;
892
		if (strtolower(trim($d)) == 'location' and $result['status'] >= 300 and $result['status'] < 400) {
893
			$result['location'] = $v;
894
		} elseif ($d == 'Last-Modified') {
895
			$result['last_modified'] = strtotime($v);
896
		}
897
	}
898
	if ($if_modified_since
899
		and $result['last_modified']
900
		and $if_modified_since > $result['last_modified']
901
		and $result['status'] == 200
902
	) {
903
		$result['status'] = 304;
904
	}
905
906
	$result['headers'] = implode('', $result['headers']);
907
908
	return $result;
909
}
910
911
/**
912
 * Obsolete : version simplifiee de recuperer_entetes_complets
913
 * Retourne les informations d'entête HTTP d'un socket
914
 *
915
 * Lit les entêtes de reponse HTTP sur la socket $f
916
 *
917
 * @uses recuperer_entetes_complets()
918
 * @deprecated
919
 *
920
 * @param resource $f
921
 *     Socket d'un fichier (issu de fopen)
922
 * @param int|string $date_verif
923
 *     Pour tester une date de dernière modification
924
 * @return string|int|array
925
 *     - la valeur (chaîne) de l'en-tete Location si on l'a trouvée
926
 *     - la valeur (numerique) du statut si different de 200, notamment Not-Modified
927
 *     - le tableau des entetes dans tous les autres cas
928
 **/
929
function recuperer_entetes($f, $date_verif = '') {
930
	//Cas ou la page distante n'a pas bouge depuis
931
	//la derniere visite
932
	$res = recuperer_entetes_complets($f, $date_verif);
933
	if (!$res) {
934
		return false;
935
	}
936
	if ($res['location']) {
937
		return $res['location'];
938
	}
939
	if ($res['status'] != 200) {
940
		return $res['status'];
941
	}
942
943
	return explode("\n", $res['headers']);
944
}
945
946
/**
947
 * Calcule le nom canonique d'une copie local d'un fichier distant
948
 *
949
 * Si on doit conserver une copie locale des fichiers distants, autant que ca
950
 * soit à un endroit canonique
951
 *
952
 * @note
953
 *   Si ca peut être bijectif c'est encore mieux,
954
 *   mais là tout de suite je ne trouve pas l'idee, étant donné les limitations
955
 *   des filesystems
956
 *
957
 * @param string $source
958
 *     URL de la source
959
 * @param string $extension
960
 *     Extension du fichier
961
 * @return string
962
 *     Nom du fichier pour copie locale
963
 **/
964
function nom_fichier_copie_locale($source, $extension) {
965
	include_spip('inc/documents');
966
967
	$d = creer_repertoire_documents('distant'); # IMG/distant/
968
	$d = sous_repertoire($d, $extension); # IMG/distant/pdf/
969
970
	// on se place tout le temps comme si on etait a la racine
971
	if (_DIR_RACINE) {
972
		$d = preg_replace(',^' . preg_quote(_DIR_RACINE) . ',', '', $d);
973
	}
974
975
	$m = md5($source);
976
977
	return $d
978
	. substr(preg_replace(',[^\w-],', '', basename($source)) . '-' . $m, 0, 12)
979
	. substr($m, 0, 4)
980
	. ".$extension";
981
}
982
983
/**
984
 * Donne le nom de la copie locale de la source
985
 *
986
 * Soit obtient l'extension du fichier directement de l'URL de la source,
987
 * soit tente de le calculer.
988
 *
989
 * @uses nom_fichier_copie_locale()
990
 * @uses recuperer_infos_distantes()
991
 *
992
 * @param string $source
993
 *      URL de la source distante
994
 * @return string
0 ignored issues
show
Documentation introduced by
Should the return type not be string|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
995
 *      Nom du fichier calculé
996
 **/
997
function fichier_copie_locale($source) {
998
	// Si c'est deja local pas de souci
999
	if (!tester_url_absolue($source)) {
1000
		if (_DIR_RACINE) {
1001
			$source = preg_replace(',^' . preg_quote(_DIR_RACINE) . ',', '', $source);
1002
		}
1003
1004
		return $source;
1005
	}
1006
1007
	// optimisation : on regarde si on peut deviner l'extension dans l'url et si le fichier
1008
	// a deja ete copie en local avec cette extension
1009
	// dans ce cas elle est fiable, pas la peine de requeter en base
1010
	$path_parts = pathinfo($source);
1011
	if (!isset($path_parts['extension'])) {
1012
		$path_parts['extension'] = '';
1013
	}
1014
	$ext = $path_parts ? $path_parts['extension'] : '';
1015
	if ($ext
1016
		and preg_match(',^\w+$,', $ext) // pas de php?truc=1&...
1017
		and $f = nom_fichier_copie_locale($source, $ext)
1018
		and file_exists(_DIR_RACINE . $f)
1019
	) {
1020
		return $f;
1021
	}
1022
1023
1024
	// Si c'est deja dans la table des documents,
1025
	// ramener le nom de sa copie potentielle
1026
	$ext = sql_getfetsel('extension', 'spip_documents', 'fichier=' . sql_quote($source) . " AND distant='oui' AND extension <> ''");
1027
1028
	if ($ext) {
1029
		return nom_fichier_copie_locale($source, $ext);
1030
	}
1031
1032
	// voir si l'extension indiquee dans le nom du fichier est ok
1033
	// et si il n'aurait pas deja ete rapatrie
1034
1035
	$ext = $path_parts ? $path_parts['extension'] : '';
1036
1037
	if ($ext and sql_getfetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($ext))) {
1038
		$f = nom_fichier_copie_locale($source, $ext);
1039
		if (file_exists(_DIR_RACINE . $f)) {
1040
			return $f;
1041
		}
1042
	}
1043
1044
	// Ping  pour voir si son extension est connue et autorisee
1045
	// avec mise en cache du resultat du ping
1046
1047
	$cache = sous_repertoire(_DIR_CACHE, 'rid') . md5($source);
1048
	if (!@file_exists($cache)
1049
		or !$path_parts = @unserialize(spip_file_get_contents($cache))
1050
		or _request('var_mode') == 'recalcul'
1051
	) {
1052
		$path_parts = recuperer_infos_distantes($source, 0, false);
1053
		ecrire_fichier($cache, serialize($path_parts));
1054
	}
1055
	$ext = !empty($path_parts['extension']) ? $path_parts['extension'] : '';
1056
	if ($ext and sql_getfetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($ext))) {
1057
		return nom_fichier_copie_locale($source, $ext);
1058
	}
1059
	spip_log("pas de copie locale pour $source");
1060
}
1061
1062
1063
/**
1064
 * Récupérer les infos d'un document distant, sans trop le télécharger
1065
 *
1066
 * @param string $source
1067
 *     URL de la source
1068
 * @param int $max
1069
 *     Taille maximum du fichier à télécharger
1070
 * @param bool $charger_si_petite_image
1071
 *     Pour télécharger le document s'il est petit
1072
 * @return array
0 ignored issues
show
Documentation introduced by
Should the return type not be false|array? Also, consider making the array more specific, something like array<String>, or String[].

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

If the return type contains the type array, this check recommends the use of a more specific type like String[] or array<String>.

Loading history...
1073
 *     Couples des informations obtenues parmis :
1074
 *
1075
 *     - 'body' = chaine
1076
 *     - 'type_image' = booleen
1077
 *     - 'titre' = chaine
1078
 *     - 'largeur' = intval
1079
 *     - 'hauteur' = intval
1080
 *     - 'taille' = intval
1081
 *     - 'extension' = chaine
1082
 *     - 'fichier' = chaine
1083
 *     - 'mime_type' = chaine
1084
 **/
1085
function recuperer_infos_distantes($source, $max = 0, $charger_si_petite_image = true) {
1086
1087
	// pas la peine de perdre son temps
1088
	if (!tester_url_absolue($source)) {
1089
		return false;
1090
	}
1091
1092
	# charger les alias des types mime
1093
	include_spip('base/typedoc');
1094
1095
	$a = array();
1096
	$mime_type = '';
1097
	// On va directement charger le debut des images et des fichiers html,
1098
	// de maniere a attrapper le maximum d'infos (titre, taille, etc). Si
1099
	// ca echoue l'utilisateur devra les entrer...
1100
	if ($headers = recuperer_page($source, false, true, $max, '', '', true)) {
0 ignored issues
show
Deprecated Code introduced by
The function recuperer_page() has been deprecated.

This function has been deprecated.

Loading history...
1101
		list($headers, $a['body']) = preg_split(',\n\n,', $headers, 2);
1102
1103
		if (preg_match(",\nContent-Type: *([^[:space:];]*),i", "\n$headers", $regs)) {
1104
			$mime_type = (trim($regs[1]));
1105
		} else {
1106
			$mime_type = '';
1107
		} // inconnu
1108
1109
		// Appliquer les alias
1110
		while (isset($GLOBALS['mime_alias'][$mime_type])) {
1111
			$mime_type = $GLOBALS['mime_alias'][$mime_type];
1112
		}
1113
1114
		// Si on a un mime-type insignifiant
1115
		// text/plain,application/octet-stream ou vide
1116
		// c'est peut-etre que le serveur ne sait pas
1117
		// ce qu'il sert ; on va tenter de detecter via l'extension de l'url
1118
		// ou le Content-Disposition: attachment; filename=...
1119
		$t = null;
1120
		if (in_array($mime_type, array('text/plain', '', 'application/octet-stream'))) {
1121
			if (!$t
1122
				and preg_match(',\.([a-z0-9]+)(\?.*)?$,i', $source, $rext)
1123
			) {
1124
				$t = sql_fetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($rext[1], '', 'text'));
1125
			}
1126 View Code Duplication
			if (!$t
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1127
				and preg_match(',^Content-Disposition:\s*attachment;\s*filename=(.*)$,Uims', $headers, $m)
1128
				and preg_match(',\.([a-z0-9]+)(\?.*)?$,i', $m[1], $rext)
1129
			) {
1130
				$t = sql_fetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($rext[1], '', 'text'));
1131
			}
1132
		}
1133
1134
		// Autre mime/type (ou text/plain avec fichier d'extension inconnue)
1135
		if (!$t) {
1136
			$t = sql_fetsel('extension', 'spip_types_documents', 'mime_type=' . sql_quote($mime_type));
1137
		}
1138
1139
		// Toujours rien ? (ex: audio/x-ogg au lieu de application/ogg)
1140
		// On essaie de nouveau avec l'extension
1141 View Code Duplication
		if (!$t
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1142
			and $mime_type != 'text/plain'
1143
			and preg_match(',\.([a-z0-9]+)(\?.*)?$,i', $source, $rext)
1144
		) {
1145
			# eviter xxx.3 => 3gp (> SPIP 3)
1146
			$t = sql_fetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($rext[1], '', 'text'));
1147
		}
1148
1149
		if ($t) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $t of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
1150
			spip_log("mime-type $mime_type ok, extension " . $t['extension']);
1151
			$a['extension'] = $t['extension'];
1152
		} else {
1153
			# par defaut on retombe sur '.bin' si c'est autorise
1154
			spip_log("mime-type $mime_type inconnu");
1155
			$t = sql_fetsel('extension', 'spip_types_documents', "extension='bin'");
1156
			if (!$t) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $t of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
1157
				return false;
1158
			}
1159
			$a['extension'] = $t['extension'];
1160
		}
1161
1162
		if (preg_match(",\nContent-Length: *([^[:space:]]*),i", "\n$headers", $regs)) {
1163
			$a['taille'] = intval($regs[1]);
1164
		}
1165
	}
1166
1167
	// Echec avec HEAD, on tente avec GET
1168
	if (!$a and !$max) {
1169
		spip_log("tenter GET $source");
1170
		$a = recuperer_infos_distantes($source, _INC_DISTANT_MAX_SIZE);
1171
	}
1172
1173
	// si on a rien trouve pas la peine d'insister
1174
	if (!$a) {
1175
		return false;
1176
	}
1177
1178
	// S'il s'agit d'une image pas trop grosse ou d'un fichier html, on va aller
1179
	// recharger le document en GET et recuperer des donnees supplementaires...
1180
	if (preg_match(',^image/(jpeg|gif|png|swf),', $mime_type)) {
1181
		if ($max == 0
1182
			and (empty($a['taille']) or $a['taille'] < _INC_DISTANT_MAX_SIZE)
1183
			and isset($GLOBALS['meta']['formats_graphiques'])
1184
			and (strpos($GLOBALS['meta']['formats_graphiques'], $a['extension']) !== false)
1185
			and $charger_si_petite_image
1186
		) {
1187
			$a = recuperer_infos_distantes($source, _INC_DISTANT_MAX_SIZE);
1188
		} else {
1189
			if ($a['body']) {
1190
				$a['fichier'] = _DIR_RACINE . nom_fichier_copie_locale($source, $a['extension']);
1191
				ecrire_fichier($a['fichier'], $a['body']);
1192
				$size_image = @getimagesize($a['fichier']);
1193
				$a['largeur'] = intval($size_image[0]);
1194
				$a['hauteur'] = intval($size_image[1]);
1195
				$a['type_image'] = true;
1196
			}
1197
		}
1198
	}
1199
1200
	// Fichier swf, si on n'a pas la taille, on va mettre 425x350 par defaut
1201
	// ce sera mieux que 0x0
1202
	if ($a and isset($a['extension']) and $a['extension'] == 'swf'
1203
		and empty($a['largeur'])
1204
	) {
1205
		$a['largeur'] = 425;
1206
		$a['hauteur'] = 350;
1207
	}
1208
1209
	if ($mime_type == 'text/html') {
1210
		include_spip('inc/filtres');
1211
		$page = recuperer_page($source, true, false, _INC_DISTANT_MAX_SIZE);
0 ignored issues
show
Deprecated Code introduced by
The function recuperer_page() has been deprecated.

This function has been deprecated.

Loading history...
1212
		if (preg_match(',<title>(.*?)</title>,ims', $page, $regs)) {
1213
			$a['titre'] = corriger_caracteres(trim($regs[1]));
1214
		}
1215
		if (!isset($a['taille']) or !$a['taille']) {
1216
			$a['taille'] = strlen($page); # a peu pres
1217
		}
1218
	}
1219
	$a['mime_type'] = $mime_type;
1220
1221
	return $a;
1222
}
1223
1224
1225
/**
1226
 * Tester si un host peut etre recuperer directement ou doit passer par un proxy
1227
 *
1228
 * On peut passer en parametre le proxy et la liste des host exclus,
1229
 * pour les besoins des tests, lors de la configuration
1230
 *
1231
 * @param string $host
1232
 * @param string $http_proxy
0 ignored issues
show
Documentation introduced by
Should the type for parameter $http_proxy not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
1233
 * @param string $http_noproxy
0 ignored issues
show
Documentation introduced by
Should the type for parameter $http_noproxy not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
1234
 * @return string
1235
 */
1236
function need_proxy($host, $http_proxy = null, $http_noproxy = null) {
1237
	if (is_null($http_proxy)) {
1238
		$http_proxy = isset($GLOBALS['meta']['http_proxy']) ? $GLOBALS['meta']['http_proxy'] : null;
1239
	}
1240
	if (is_null($http_noproxy)) {
1241
		$http_noproxy = isset($GLOBALS['meta']['http_noproxy']) ? $GLOBALS['meta']['http_noproxy'] : null;
1242
	}
1243
1244
	$domain = substr($host, strpos($host, '.'));
1245
1246
	return ($http_proxy
1247
		and (strpos(" $http_noproxy ", " $host ") === false
1248
			and (strpos(" $http_noproxy ", " $domain ") === false)))
1249
		? $http_proxy : '';
1250
}
1251
1252
1253
/**
1254
 * Initialise une requete HTTP avec entetes
1255
 *
1256
 * Décompose l'url en son schema+host+path+port et lance la requete.
1257
 * Retourne le descripteur sur lequel lire la réponse.
1258
 *
1259
 * @uses lance_requete()
1260
 *
1261
 * @param string $method
1262
 *   HEAD, GET, POST
1263
 * @param string $url
1264
 * @param bool $refuse_gz
1265
 * @param string $referer
1266
 * @param string $datas
1267
 * @param string $vers
1268
 * @param string $date
1269
 * @return array
1270
 */
1271
function init_http($method, $url, $refuse_gz = false, $referer = '', $datas = '', $vers = 'HTTP/1.0', $date = '') {
1272
	$user = $via_proxy = $proxy_user = '';
0 ignored issues
show
Unused Code introduced by
$proxy_user is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
Unused Code introduced by
$via_proxy is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
1273
	$fopen = false;
1274
1275
	$t = @parse_url($url);
1276
	$host = $t['host'];
1277
	if ($t['scheme'] == 'http') {
1278
		$scheme = 'http';
1279
		$noproxy = '';
1280
	} elseif ($t['scheme'] == 'https') {
1281
		$scheme = 'ssl';
1282
		$noproxy = 'ssl://';
1283 View Code Duplication
		if (!isset($t['port']) || !($port = $t['port'])) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1284
			$t['port'] = 443;
1285
		}
1286
	} else {
1287
		$scheme = $t['scheme'];
1288
		$noproxy = $scheme . '://';
1289
	}
1290
	if (isset($t['user'])) {
1291
		$user = array($t['user'], $t['pass']);
1292
	}
1293
1294 View Code Duplication
	if (!isset($t['port']) || !($port = $t['port'])) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1295
		$port = 80;
1296
	}
1297 View Code Duplication
	if (!isset($t['path']) || !($path = $t['path'])) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1298
		$path = '/';
1299
	}
1300
1301
	if (!empty($t['query'])) {
1302
		$path .= '?' . $t['query'];
1303
	}
1304
1305
	$f = lance_requete($method, $scheme, $user, $host, $path, $port, $noproxy, $refuse_gz, $referer, $datas, $vers, $date);
0 ignored issues
show
Bug introduced by
It seems like $user defined by $via_proxy = $proxy_user = '' on line 1272 can also be of type string; however, lance_requete() does only seem to accept array, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
1306
	if (!$f or !is_resource($f)) {
1307
		// fallback : fopen si on a pas fait timeout dans lance_requete
1308
		// ce qui correspond a $f===110
1309
		if ($f !== 110
1310
			and !need_proxy($host)
1311
			and !_request('tester_proxy')
1312
			and (!isset($GLOBALS['inc_distant_allow_fopen']) or $GLOBALS['inc_distant_allow_fopen'])
1313
		) {
1314
			$f = @fopen($url, 'rb');
1315
			spip_log("connexion vers $url par simple fopen");
1316
			$fopen = true;
1317
		} else {
1318
			// echec total
1319
			$f = false;
1320
		}
1321
	}
1322
1323
	return array($f, $fopen);
1324
}
1325
1326
/**
1327
 * Lancer la requete proprement dite
1328
 *
1329
 * @param string $method
1330
 *   type de la requete (GET, HEAD, POST...)
1331
 * @param string $scheme
1332
 *   protocole (http, tls, ftp...)
1333
 * @param array $user
1334
 *   couple (utilisateur, mot de passe) en cas d'authentification http
1335
 * @param string $host
1336
 *   nom de domaine
1337
 * @param string $path
1338
 *   chemin de la page cherchee
1339
 * @param string $port
1340
 *   port utilise pour la connexion
1341
 * @param bool $noproxy
1342
 *   protocole utilise si requete sans proxy
1343
 * @param bool $refuse_gz
1344
 *   refuser la compression GZ
1345
 * @param string $referer
1346
 *   referer
1347
 * @param string $datas
1348
 *   donnees postees
1349
 * @param string $vers
1350
 *   version HTTP
1351
 * @param int|string $date
1352
 *   timestamp pour entente If-Modified-Since
1353
 * @return bool|resource
1354
 *   false|int si echec
1355
 *   resource socket vers l'url demandee
1356
 */
1357
function lance_requete(
1358
	$method,
1359
	$scheme,
1360
	$user,
1361
	$host,
1362
	$path,
1363
	$port,
1364
	$noproxy,
1365
	$refuse_gz = false,
1366
	$referer = '',
1367
	$datas = '',
1368
	$vers = 'HTTP/1.0',
1369
	$date = ''
1370
) {
1371
1372
	$proxy_user = '';
1373
	$http_proxy = need_proxy($host);
1374
	if ($user) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $user of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
1375
		$user = urlencode($user[0]) . ':' . urlencode($user[1]);
1376
	}
1377
1378
	$connect = '';
1379
	if ($http_proxy) {
1380
		if (defined('_PROXY_HTTPS_VIA_CONNECT') and in_array($scheme , array('tls','ssl'))) {
1381
			$path_host = (!$user ? '' : "$user@") . $host . (($port != 80) ? ":$port" : '');
1382
			$connect = 'CONNECT ' . $path_host . " $vers\r\n"
1383
				. "Host: $path_host\r\n"
1384
				. "Proxy-Connection: Keep-Alive\r\n";
1385
		} else {
1386
			$path = (in_array($scheme , array('tls','ssl')) ? 'https://' : "$scheme://")
1387
				. (!$user ? '' : "$user@")
1388
				. "$host" . (($port != 80) ? ":$port" : '') . $path;
1389
		}
1390
		$t2 = @parse_url($http_proxy);
1391
		$first_host = $t2['host'];
1392
		if (!($port = $t2['port'])) {
1393
			$port = 80;
1394
		}
1395
		if ($t2['user']) {
1396
			$proxy_user = base64_encode($t2['user'] . ':' . $t2['pass']);
1397
		}
1398
	} else {
1399
		$first_host = $noproxy . $host;
1400
	}
1401
1402
	if ($connect) {
1403
		$streamContext = stream_context_create(array(
1404
			'ssl' => array(
1405
				'verify_peer' => false,
1406
				'allow_self_signed' => true,
1407
				'SNI_enabled' => true,
1408
				'peer_name' => $host,
1409
			)
1410
		));
1411
		if (version_compare(phpversion(), '5.6', '<')) {
1412
			stream_context_set_option($streamContext, 'ssl', 'SNI_server_name', $host);
1413
		}
1414
		$f = @stream_socket_client(
1415
			"tcp://$first_host:$port",
1416
			$errno,
1417
			$errstr,
1418
			_INC_DISTANT_CONNECT_TIMEOUT,
1419
			STREAM_CLIENT_CONNECT,
1420
			$streamContext
1421
		);
1422
		spip_log("Recuperer $path sur $first_host:$port par $f (via CONNECT)", 'connect');
1423
		if (!$f) {
1424
			spip_log("Erreur connexion $errno $errstr", _LOG_ERREUR);
1425
			return $errno;
1426
		}
1427
		stream_set_timeout($f, _INC_DISTANT_CONNECT_TIMEOUT);
1428
1429
		fputs($f, $connect);
1430
		fputs($f, "\r\n");
1431
		$res = fread($f, 1024);
1432
		if (!$res
1433
			or !count($res = explode(' ', $res))
1434
			or $res[1] !== '200'
1435
		) {
1436
			spip_log("Echec CONNECT sur $first_host:$port", 'connect' . _LOG_INFO_IMPORTANTE);
1437
			fclose($f);
1438
1439
			return false;
1440
		}
1441
		// important, car sinon on lit trop vite et les donnees ne sont pas encore dispo
1442
		stream_set_blocking($f, true);
1443
		// envoyer le handshake
1444
		stream_socket_enable_crypto($f, true, STREAM_CRYPTO_METHOD_SSLv23_CLIENT);
1445
		spip_log("OK CONNECT sur $first_host:$port", 'connect');
1446
	} else {
1447
		$ntry = 3;
1448
		do {
1449
			$f = @fsockopen($first_host, $port, $errno, $errstr, _INC_DISTANT_CONNECT_TIMEOUT);
1450
		} while (!$f and $ntry-- and $errno !== 110 and sleep(1));
1451
		spip_log("Recuperer $path sur $first_host:$port par $f");
1452
		if (!$f) {
1453
			spip_log("Erreur connexion $errno $errstr", _LOG_ERREUR);
1454
1455
			return $errno;
1456
		}
1457
		stream_set_timeout($f, _INC_DISTANT_CONNECT_TIMEOUT);
1458
	}
1459
1460
	$site = isset($GLOBALS['meta']['adresse_site']) ? $GLOBALS['meta']['adresse_site'] : '';
1461
1462
	$req = "$method $path $vers\r\n"
1463
		. "Host: $host\r\n"
1464
		. 'User-Agent: ' . _INC_DISTANT_USER_AGENT . "\r\n"
1465
		. ($refuse_gz ? '' : ('Accept-Encoding: ' . _INC_DISTANT_CONTENT_ENCODING . "\r\n"))
1466
		. (!$site ? '' : "Referer: $site/$referer\r\n")
1467
		. (!$date ? '' : 'If-Modified-Since: ' . (gmdate('D, d M Y H:i:s', $date) . " GMT\r\n"))
1468
		. (!$user ? '' : ('Authorization: Basic ' . base64_encode($user) . "\r\n"))
1469
		. (!$proxy_user ? '' : "Proxy-Authorization: Basic $proxy_user\r\n")
1470
		. (!strpos($vers, '1.1') ? '' : "Keep-Alive: 300\r\nConnection: keep-alive\r\n");
1471
1472
#	spip_log("Requete\n$req");
1473
	fputs($f, $req);
1474
	fputs($f, $datas ? $datas : "\r\n");
1475
1476
	return $f;
1477
}
1478