Completed
Push — master ( 92f140...7db589 )
by cam
05:28 queued 21s
created

distant.php ➔ recuperer_entetes()   A

Complexity

Conditions 4
Paths 4

Size

Total Lines 16

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 4
nc 4
nop 2
dl 0
loc 16
rs 9.7333
c 0
b 0
f 0
1
<?php
2
3
/***************************************************************************\
4
 *  SPIP, Système de publication pour l'internet                           *
5
 *                                                                         *
6
 *  Copyright © avec tendresse depuis 2001                                 *
7
 *  Arnaud Martin, Antoine Pitrou, Philippe Rivière, Emmanuel Saint-James  *
8
 *                                                                         *
9
 *  Ce programme est un logiciel libre distribué sous licence GNU/GPL.     *
10
 *  Pour plus de détails voir le fichier COPYING.txt ou l'aide en ligne.   *
11
\***************************************************************************/
12
13
/**
14
 * Ce fichier gère l'obtention de données distantes
15
 *
16
 * @package SPIP\Core\Distant
17
 **/
18
if (!defined('_ECRIRE_INC_VERSION')) {
19
	return;
20
}
21
22
if (!defined('_INC_DISTANT_VERSION_HTTP')) {
23
	define('_INC_DISTANT_VERSION_HTTP', 'HTTP/1.0');
24
}
25
if (!defined('_INC_DISTANT_CONTENT_ENCODING')) {
26
	define('_INC_DISTANT_CONTENT_ENCODING', 'gzip');
27
}
28
if (!defined('_INC_DISTANT_USER_AGENT')) {
29
	define('_INC_DISTANT_USER_AGENT', 'SPIP-' . $GLOBALS['spip_version_affichee'] . ' (' . $GLOBALS['home_server'] . ')');
30
}
31
if (!defined('_INC_DISTANT_MAX_SIZE')) {
32
	define('_INC_DISTANT_MAX_SIZE', 2097152);
33
}
34
if (!defined('_INC_DISTANT_CONNECT_TIMEOUT')) {
35
	define('_INC_DISTANT_CONNECT_TIMEOUT', 10);
36
}
37
38
define('_REGEXP_COPIE_LOCALE', ',' 	.
39
	preg_replace(
40
		'@^https?:@',
41
		'https?:',
42
		(isset($GLOBALS['meta']['adresse_site']) ? $GLOBALS['meta']['adresse_site'] : '')
43
	)
44
	. '/?spip.php[?]action=acceder_document.*file=(.*)$,');
45
46
//@define('_COPIE_LOCALE_MAX_SIZE',2097152); // poids (inc/utils l'a fait)
47
48
/**
49
 * Crée au besoin la copie locale d'un fichier distant
50
 *
51
 * Prend en argument un chemin relatif au rep racine, ou une URL
52
 * Renvoie un chemin relatif au rep racine, ou false
53
 *
54
 * @link https://www.spip.net/4155
55
 * @pipeline_appel post_edition
56
 *
57
 * @param string $source
58
 * @param string $mode
59
 *   - 'test' - ne faire que tester
60
 *   - 'auto' - charger au besoin
61
 *   - 'modif' - Si deja present, ne charger que si If-Modified-Since
62
 *   - 'force' - charger toujours (mettre a jour)
63
 * @param string $local
0 ignored issues
show
Documentation introduced by
Should the type for parameter $local not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
64
 *   permet de specifier le nom du fichier local (stockage d'un cache par exemple, et non document IMG)
65
 * @param int $taille_max
0 ignored issues
show
Documentation introduced by
Should the type for parameter $taille_max not be integer|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
66
 *   taille maxi de la copie local, par defaut _COPIE_LOCALE_MAX_SIZE
67
 * @return bool|string
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use string|false.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
68
 */
69
function copie_locale($source, $mode = 'auto', $local = null, $taille_max = null) {
70
71
	// si c'est la protection de soi-meme, retourner le path
72
	if ($mode !== 'force' and preg_match(_REGEXP_COPIE_LOCALE, $source, $match)) {
73
		$source = substr(_DIR_IMG, strlen(_DIR_RACINE)) . urldecode($match[1]);
74
75
		return @file_exists($source) ? $source : false;
76
	}
77
78
	if (is_null($local)) {
79
		$local = fichier_copie_locale($source);
80 View Code Duplication
	} else {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
81
		if (_DIR_RACINE and strncmp(_DIR_RACINE, $local, strlen(_DIR_RACINE)) == 0) {
82
			$local = substr($local, strlen(_DIR_RACINE));
83
		}
84
	}
85
86
	// si $local = '' c'est un fichier refuse par fichier_copie_locale(),
87
	// par exemple un fichier qui ne figure pas dans nos documents ;
88
	// dans ce cas on n'essaie pas de le telecharger pour ensuite echouer
89
	if (!$local) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $local of type string|null is loosely compared to false; this is ambiguous if the string can be empty. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
90
		return false;
91
	}
92
93
	$localrac = _DIR_RACINE . $local;
94
	$t = ($mode == 'force') ? false : @file_exists($localrac);
95
96
	// test d'existence du fichier
97
	if ($mode == 'test') {
98
		return $t ? $local : '';
99
	}
100
101
	// sinon voir si on doit/peut le telecharger
102
	if ($local == $source or !tester_url_absolue($source)) {
103
		return $local;
104
	}
105
106
	if ($mode == 'modif' or !$t) {
107
		// passer par un fichier temporaire unique pour gerer les echecs en cours de recuperation
108
		// et des eventuelles recuperations concurantes
109
		include_spip('inc/acces');
110
		if (!$taille_max) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $taille_max of type integer|null is loosely compared to false; this is ambiguous if the integer can be zero. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For integer values, zero is a special case, in particular the following results might be unexpected:

0   == false // true
0   == null  // true
123 == false // false
123 == null  // false

// It is often better to use strict comparison
0 === false // false
0 === null  // false
Loading history...
111
			$taille_max = _COPIE_LOCALE_MAX_SIZE;
112
		}
113
		$res = recuperer_url(
114
			$source,
115
			array('file' => $localrac, 'taille_max' => $taille_max, 'if_modified_since' => $t ? filemtime($localrac) : '')
116
		);
117
		if (!$res or (!$res['length'] and $res['status'] != 304)) {
118
			spip_log("copie_locale : Echec recuperation $source sur $localrac status : " . $res['status'], 'distant' . _LOG_INFO_IMPORTANTE);
119
		}
120
		if (!$res['length']) {
121
			// si $t c'est sans doute juste un not-modified-since
122
			return $t ? $local : false;
123
		}
124
		spip_log("copie_locale : recuperation $source sur $localrac taille " . $res['length'] . ' OK', 'distant');
125
126
		// pour une eventuelle indexation
127
		pipeline(
128
			'post_edition',
129
			array(
130
				'args' => array(
131
					'operation' => 'copie_locale',
132
					'source' => $source,
133
					'fichier' => $local,
134
					'http_res' => $res['length'],
135
				),
136
				'data' => null
137
			)
138
		);
139
	}
140
141
	return $local;
142
}
143
144
/**
145
 * Valider qu'une URL d'un document distant est bien distante
146
 * et pas une url localhost qui permet d'avoir des infos sur le serveur
147
 * inspiree de https://core.trac.wordpress.org/browser/trunk/src/wp-includes/http.php?rev=36435#L500
148
 * 
149
 * @param string $url
150
 * @param array $known_hosts
151
 *   url/hosts externes connus et acceptes
152
 * @return false|string 
153
 *   url ou false en cas d'echec
154
 */
155
function valider_url_distante($url, $known_hosts = array()) {
156
	if (!function_exists('protocole_verifier')){
157
		include_spip('inc/filtres_mini');
158
	}
159
160
	if (!protocole_verifier($url, array('http', 'https'))) {
161
		return false;
162
	}
163
	
164
	$parsed_url = parse_url($url);
165
	if (!$parsed_url or empty($parsed_url['host']) ) {
166
		return false;
167
	}
168
169
	if (isset($parsed_url['user']) or isset($parsed_url['pass'])) {
170
		return false;
171
	}
172
173
	if (false !== strpbrk($parsed_url['host'], ':#?[]')) {
174
		return false;
175
	}
176
177
	if (!is_array($known_hosts)) {
178
		$known_hosts = array($known_hosts);
179
	}
180
	$known_hosts[] = $GLOBALS['meta']['adresse_site'];
181
	$known_hosts[] = url_de_base();
182
	$known_hosts = pipeline('declarer_hosts_distants', $known_hosts);
183
184
	$is_known_host = false;
185
	foreach ($known_hosts as $known_host) {
186
		$parse_known = parse_url($known_host);
187
		if ($parse_known
188
		  and strtolower($parse_known['host']) === strtolower($parsed_url['host'])) {
189
			$is_known_host = true;
190
			break;
191
		}
192
	}
193
194
	if (!$is_known_host) {
195
		$host = trim($parsed_url['host'], '.');
196
		if (preg_match('#^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$#', $host)) {
197
			$ip = $host;
198
		} else {
199
			$ip = gethostbyname($host);
200
			if ($ip === $host) {
201
				// Error condition for gethostbyname()
202
				$ip = false;
203
			}
204
		}
205
		if ($ip) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $ip of type false|string is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
206
			$parts = array_map('intval', explode( '.', $ip ));
207
			if (127 === $parts[0] or 10 === $parts[0] or 0 === $parts[0]
208
			  or ( 172 === $parts[0] and 16 <= $parts[1] and 31 >= $parts[1] )
209
			  or ( 192 === $parts[0] && 168 === $parts[1] )
210
			) {
211
				return false;
212
			}
213
		}
214
	}
215
216
	if (empty($parsed_url['port'])) {
217
		return $url;
218
	}
219
220
	$port = $parsed_url['port'];
221
	if ($port === 80  or $port === 443  or $port === 8080) {
0 ignored issues
show
Unused Code Bug introduced by
The strict comparison === seems to always evaluate to false as the types of $port (string) and 80 (integer) can never be identical. Maybe you want to use a loose comparison == instead?
Loading history...
222
		return $url;
223
	}
224
225
	if ($is_known_host) {
226
		foreach ($known_hosts as $known_host) {
227
			$parse_known = parse_url($known_host);
228
			if ($parse_known
229
				and !empty($parse_known['port'])
230
			  and strtolower($parse_known['host']) === strtolower($parsed_url['host'])
231
			  and $parse_known['port'] == $port) {
232
				return $url;
233
			}
234
		}
235
	}
236
237
	return false;
238
}
239
240
/**
241
 * Preparer les donnes pour un POST
242
 * si $donnees est une chaine
243
 *  - charge a l'envoyeur de la boundariser, de gerer le Content-Type,
244
 *    de séparer les entetes des données par une ligne vide etc...
245
 *  - on traite les retour ligne pour les mettre au bon format
246
 *  - on decoupe en entete/corps (separes par ligne vide)
247
 * si $donnees est un tableau
248
 *  - structuration en chaine avec boundary si necessaire ou fournie et bon Content-Type
249
 *
250
 * @param string|array $donnees
251
 * @param string $boundary
252
 * @return array
253
 *   entete,corps
254
 */
255
function prepare_donnees_post($donnees, $boundary = '') {
256
257
	// permettre a la fonction qui a demande le post de formater elle meme ses donnees
258
	// pour un appel soap par exemple
259
	// l'entete est separe des donnees par un double retour a la ligne
260
	// on s'occupe ici de passer tous les retours lignes (\r\n, \r ou \n) en \r\n
261
	if (is_string($donnees) && strlen($donnees)) {
262
		$entete = '';
263
		// on repasse tous les \r\n et \r en simples \n
264
		$donnees = str_replace("\r\n", "\n", $donnees);
265
		$donnees = str_replace("\r", "\n", $donnees);
266
		// un double retour a la ligne signifie la fin de l'entete et le debut des donnees
267
		$p = strpos($donnees, "\n\n");
268
		if ($p !== false) {
269
			$entete = str_replace("\n", "\r\n", substr($donnees, 0, $p + 1));
270
			$donnees = substr($donnees, $p + 2);
271
		}
272
		$chaine = str_replace("\n", "\r\n", $donnees);
273
	} else {
274
		/* boundary automatique */
275
		// Si on a plus de 500 octects de donnees, on "boundarise"
276
		if ($boundary === '') {
277
			$taille = 0;
278
			foreach ($donnees as $cle => $valeur) {
0 ignored issues
show
Bug introduced by
The expression $donnees of type array|string is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
279
				if (is_array($valeur)) {
280
					foreach ($valeur as $val2) {
281
						$taille += strlen($val2);
282
					}
283
				} else {
284
					// faut-il utiliser spip_strlen() dans inc/charsets ?
285
					$taille += strlen($valeur);
286
				}
287
			}
288
			if ($taille > 500) {
289
				$boundary = substr(md5(rand() . 'spip'), 0, 8);
290
			}
291
		}
292
293
		if (is_string($boundary) and strlen($boundary)) {
294
			// fabrique une chaine HTTP pour un POST avec boundary
295
			$entete = "Content-Type: multipart/form-data; boundary=$boundary\r\n";
296
			$chaine = '';
297
			if (is_array($donnees)) {
298
				foreach ($donnees as $cle => $valeur) {
299
					if (is_array($valeur)) {
300
						foreach ($valeur as $val2) {
301
							$chaine .= "\r\n--$boundary\r\n";
302
							$chaine .= "Content-Disposition: form-data; name=\"{$cle}[]\"\r\n";
303
							$chaine .= "\r\n";
304
							$chaine .= $val2;
305
						}
306
					} else {
307
						$chaine .= "\r\n--$boundary\r\n";
308
						$chaine .= "Content-Disposition: form-data; name=\"$cle\"\r\n";
309
						$chaine .= "\r\n";
310
						$chaine .= $valeur;
311
					}
312
				}
313
				$chaine .= "\r\n--$boundary\r\n";
314
			}
315
		} else {
316
			// fabrique une chaine HTTP simple pour un POST
317
			$entete = 'Content-Type: application/x-www-form-urlencoded' . "\r\n";
318
			$chaine = array();
319
			if (is_array($donnees)) {
320
				foreach ($donnees as $cle => $valeur) {
321
					if (is_array($valeur)) {
322
						foreach ($valeur as $val2) {
323
							$chaine[] = rawurlencode($cle) . '[]=' . rawurlencode($val2);
324
						}
325
					} else {
326
						$chaine[] = rawurlencode($cle) . '=' . rawurlencode($valeur);
327
					}
328
				}
329
				$chaine = implode('&', $chaine);
330
			} else {
331
				$chaine = $donnees;
332
			}
333
		}
334
	}
335
336
	return array($entete, $chaine);
337
}
338
339
/**
340
 * Convertir une URL dont le host est en utf8 en ascii
341
 * Utilise la librairie https://github.com/phlylabs/idna-convert/tree/v0.9.1
342
 * dans sa derniere version compatible toutes version PHP 5
343
 * La fonction PHP idn_to_ascii depend d'un package php5-intl et est rarement disponible
344
 *
345
 * @param string $url_idn
346
 * @return array|string
347
 */
348
function url_to_ascii($url_idn) {
349
350
	if ($parts = parse_url($url_idn)) {
351
		$host = $parts['host'];
352
		if (!preg_match(',^[a-z0-9_\.\-]+$,i', $host)) {
353
			include_spip('inc/idna_convert.class');
354
			$IDN = new idna_convert();
355
			$host_ascii = $IDN->encode($host);
356
			$url_idn = explode($host, $url_idn, 2);
357
			$url_idn = implode($host_ascii, $url_idn);
358
		}
359
		// et on urlencode les char utf si besoin dans le path
360
		$url_idn = preg_replace_callback('/[^\x20-\x7f]/', function($match) { return urlencode($match[0]); }, $url_idn);
361
	}
362
363
	return $url_idn;
364
}
365
366
/**
367
 * Récupère le contenu d'une URL
368
 * au besoin encode son contenu dans le charset local
369
 *
370
 * @uses init_http()
371
 * @uses recuperer_entetes_complets()
372
 * @uses recuperer_body()
373
 * @uses transcoder_page()
374
 * @uses prepare_donnees_post()
375
 *
376
 * @param string $url
377
 * @param array $options
378
 *   bool transcoder : true si on veut transcoder la page dans le charset du site
379
 *   string methode : Type de requête HTTP à faire (HEAD, GET ou POST)
380
 *   int taille_max : Arrêter le contenu au-delà (0 = seulement les entetes ==> requête HEAD). Par defaut taille_max = 1Mo ou 16Mo si copie dans un fichier
381
 *   string|array datas : Pour envoyer des donnees (array) et/ou entetes au complet, avec saut de ligne entre headers et donnees ( string @see prepare_donnees_post()) (force la methode POST si donnees non vide)
382
 *   string boundary : boundary pour formater les datas au format array
383
 *   bool refuser_gz : Pour forcer le refus de la compression (cas des serveurs orthographiques)
384
 *   int if_modified_since : Un timestamp unix pour arrêter la récuperation si la page distante n'a pas été modifiée depuis une date donnée
385
 *   string uri_referer : Pour préciser un référer différent
386
 *   string file : nom du fichier dans lequel copier le contenu
387
 *   int follow_location : nombre de redirections a suivre (0 pour ne rien suivre)
388
 *   string version_http : version du protocole HTTP a utiliser (par defaut defini par la constante _INC_DISTANT_VERSION_HTTP)
389
 * @return array|bool
390
 *   false si echec
391
 *   array sinon :
392
 *     int status : le status de la page
393
 *     string headers : les entetes de la page
394
 *     string page : le contenu de la page (vide si copie dans un fichier)
395
 *     int last_modified : timestamp de derniere modification
396
 *     string location : url de redirection envoyee par la page
397
 *     string url : url reelle de la page recuperee
398
 *     int length : taille du contenu ou du fichier
399
 *
400
 *     string file : nom du fichier si enregistre dans un fichier
401
 */
402
function recuperer_url($url, $options = array()) {
403
	$default = array(
404
		'transcoder' => false,
405
		'methode' => 'GET',
406
		'taille_max' => null,
407
		'datas' => '',
408
		'boundary' => '',
409
		'refuser_gz' => false,
410
		'if_modified_since' => '',
411
		'uri_referer' => '',
412
		'file' => '',
413
		'follow_location' => 10,
414
		'version_http' => _INC_DISTANT_VERSION_HTTP,
415
	);
416
	$options = array_merge($default, $options);
417
	// copier directement dans un fichier ?
418
	$copy = $options['file'];
419
420
	if ($options['methode'] == 'HEAD') {
421
		$options['taille_max'] = 0;
422
	}
423
	if (is_null($options['taille_max'])) {
424
		$options['taille_max'] = $copy ? _COPIE_LOCALE_MAX_SIZE : _INC_DISTANT_MAX_SIZE;
425
	}
426
427
	if (!empty($options['datas'])) {
428
		list($head, $postdata) = prepare_donnees_post($options['datas'], $options['boundary']);
429
		if (stripos($head, 'Content-Length:') === false) {
430
			$head .= 'Content-Length: ' . strlen($postdata);
431
		}
432
		$options['datas'] = $head . "\r\n\r\n" . $postdata;
433
		if (strlen($postdata)) {
434
			$options['methode'] = 'POST';
435
		}
436
	}
437
438
	// Accepter les URLs au format feed:// ou qui ont oublie le http:// ou les urls relatives au protocole
439
	$url = preg_replace(',^feed://,i', 'http://', $url);
440
	if (!tester_url_absolue($url)) {
441
		$url = 'http://' . $url;
442
	} elseif (strncmp($url, '//', 2) == 0) {
443
		$url = 'http:' . $url;
444
	}
445
446
	$url = url_to_ascii($url);
447
448
	$result = array(
449
		'status' => 0,
450
		'headers' => '',
451
		'page' => '',
452
		'length' => 0,
453
		'last_modified' => '',
454
		'location' => '',
455
		'url' => $url
456
	);
457
458
	// si on ecrit directement dans un fichier, pour ne pas manipuler en memoire refuser gz
459
	$refuser_gz = (($options['refuser_gz'] or $copy) ? true : false);
460
461
	// ouvrir la connexion et envoyer la requete et ses en-tetes
462
	list($handle, $fopen) = init_http(
463
		$options['methode'],
464
		$url,
0 ignored issues
show
Bug introduced by
It seems like $url can also be of type array; however, init_http() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
465
		$refuser_gz,
466
		$options['uri_referer'],
467
		$options['datas'],
468
		$options['version_http'],
469
		$options['if_modified_since']
470
	);
471
	if (!$handle) {
472
		spip_log("ECHEC init_http $url", 'distant' . _LOG_ERREUR);
473
474
		return false;
475
	}
476
477
	// Sauf en fopen, envoyer le flux d'entree
478
	// et recuperer les en-tetes de reponses
479
	if (!$fopen) {
480
		$res = recuperer_entetes_complets($handle, $options['if_modified_since']);
481
		if (!$res) {
482
			fclose($handle);
483
			$t = @parse_url($url);
484
			$host = $t['host'];
485
			// Chinoisierie inexplicable pour contrer
486
			// les actions liberticides de l'empire du milieu
487
			if (!need_proxy($host)
488
				and $res = @file_get_contents($url)
489
			) {
490
				$result['length'] = strlen($res);
491
				if ($copy) {
492
					ecrire_fichier($copy, $res);
493
					$result['file'] = $copy;
494
				} else {
495
					$result['page'] = $res;
496
				}
497
				$res = array(
498
					'status' => 200,
499
				);
500
			} else {
501
				spip_log("ECHEC chinoiserie $url", 'distant' . _LOG_ERREUR);
502
				return false;
503
			}
504
		} elseif ($res['location'] and $options['follow_location']) {
505
			$options['follow_location']--;
506
			fclose($handle);
507
			include_spip('inc/filtres');
508
			$url = suivre_lien($url, $res['location']);
0 ignored issues
show
Bug introduced by
It seems like $url can also be of type array; however, suivre_lien() does only seem to accept string, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
509
			spip_log("recuperer_url recommence sur $url", 'distant');
510
511
			return recuperer_url($url, $options);
512
		} elseif ($res['status'] !== 200) {
513
			spip_log('HTTP status ' . $res['status'] . " pour $url", 'distant');
514
		}
515
		$result['status'] = $res['status'];
516
		if (isset($res['headers'])) {
517
			$result['headers'] = $res['headers'];
518
		}
519
		if (isset($res['last_modified'])) {
520
			$result['last_modified'] = $res['last_modified'];
521
		}
522
		if (isset($res['location'])) {
523
			$result['location'] = $res['location'];
524
		}
525
	}
526
527
	// on ne veut que les entetes
528
	if (!$options['taille_max'] or $options['methode'] == 'HEAD' or $result['status'] == '304') {
529
		return $result;
530
	}
531
532
533
	// s'il faut deballer, le faire via un fichier temporaire
534
	// sinon la memoire explose pour les gros flux
535
536
	$gz = false;
537
	if (preg_match(",\bContent-Encoding: .*gzip,is", $result['headers'])) {
538
		$gz = (_DIR_TMP . md5(uniqid(mt_rand())) . '.tmp.gz');
539
	}
540
541
	// si on a pas deja recuperer le contenu par une methode detournee
542
	if (!$result['length']) {
543
		$res = recuperer_body($handle, $options['taille_max'], $gz ? $gz : $copy);
544
		fclose($handle);
545
		if ($copy) {
546
			$result['length'] = $res;
547
			$result['file'] = $copy;
548
		} elseif ($res) {
549
			$result['page'] = &$res;
550
			$result['length'] = strlen($result['page']);
551
		}
552
		if (!$result['status']) {
553
			$result['status'] = 200; // on a reussi, donc !
554
		}
555
	}
556
	if (!$result['page']) {
557
		return $result;
558
	}
559
560
	// Decompresser au besoin
561
	if ($gz) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $gz of type string|false is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
562
		$result['page'] = implode('', gzfile($gz));
563
		supprimer_fichier($gz);
564
	}
565
566
	// Faut-il l'importer dans notre charset local ?
567
	if ($options['transcoder']) {
568
		include_spip('inc/charsets');
569
		$result['page'] = transcoder_page($result['page'], $result['headers']);
570
	}
571
572
	return $result;
573
}
574
575
/**
576
 * Récuperer une URL si on l'a pas déjà dans un cache fichier
577
 *
578
 * Le délai de cache est fourni par l'option `delai_cache`
579
 * Les autres options et le format de retour sont identiques à la fonction `recuperer_url`
580
 * @uses recuperer_url()
581
 *
582
 * @param string $url
583
 * @param array $options
584
 *   int delai_cache : anciennete acceptable pour le contenu (en seconde)
585
 * @return array|bool|mixed
586
 */
587
function recuperer_url_cache($url, $options = array()) {
588
	if (!defined('_DELAI_RECUPERER_URL_CACHE')) {
589
		define('_DELAI_RECUPERER_URL_CACHE', 3600);
590
	}
591
	$default = array(
592
		'transcoder' => false,
593
		'methode' => 'GET',
594
		'taille_max' => null,
595
		'datas' => '',
596
		'boundary' => '',
597
		'refuser_gz' => false,
598
		'if_modified_since' => '',
599
		'uri_referer' => '',
600
		'file' => '',
601
		'follow_location' => 10,
602
		'version_http' => _INC_DISTANT_VERSION_HTTP,
603
		'delai_cache' => in_array(_VAR_MODE, ['preview', 'recalcul']) ? 0 : _DELAI_RECUPERER_URL_CACHE,
604
	);
605
	$options = array_merge($default, $options);
606
607
	// cas ou il n'est pas possible de cacher
608
	if (!empty($options['data']) or $options['methode'] == 'POST') {
609
		return recuperer_url($url, $options);
610
	}
611
612
	// ne pas tenter plusieurs fois la meme url en erreur (non cachee donc)
613
	static $errors = array();
614
	if (isset($errors[$url])) {
615
		return $errors[$url];
616
	}
617
618
	$sig = $options;
619
	unset($sig['if_modified_since']);
620
	unset($sig['delai_cache']);
621
	$sig['url'] = $url;
622
623
	$dir = sous_repertoire(_DIR_CACHE, 'curl');
624
	$cache = md5(serialize($sig)) . '-' . substr(preg_replace(',\W+,', '_', $url), 0, 80);
625
	$sub = sous_repertoire($dir, substr($cache, 0, 2));
626
	$cache = "$sub$cache";
627
628
	$res = false;
629
	$is_cached = file_exists($cache);
630
	if ($is_cached
631
		and (filemtime($cache) > $_SERVER['REQUEST_TIME'] - $options['delai_cache'])
632
	) {
633
		lire_fichier($cache, $res);
634
		if ($res = unserialize($res)) {
635
			// mettre le last_modified et le status=304 ?
636
		}
637
	}
638
	if (!$res) {
639
		$res = recuperer_url($url, $options);
640
		// ne pas recharger cette url non cachee dans le meme hit puisque non disponible
641
		if (!$res) {
642
			if ($is_cached) {
643
				// on a pas reussi a recuperer mais on avait un cache : l'utiliser
644
				lire_fichier($cache, $res);
645
				$res = unserialize($res);
646
			}
647
648
			return $errors[$url] = $res;
649
		}
650
		ecrire_fichier($cache, serialize($res));
651
	}
652
653
	return $res;
654
}
655
656
/**
657
 * Obsolète : Récupère une page sur le net et au besoin l'encode dans le charset local
658
 *
659
 * Gère les redirections de page (301) sur l'URL demandée (maximum 10 redirections)
660
 *
661
 * @deprecated 3.1
662
 * @see recuperer_url()
663
 * @uses recuperer_url()
664
 *
665
 * @param string $url
666
 *     URL de la page à récupérer
667
 * @param bool|string $trans
668
 *     - chaîne longue : c'est un nom de fichier (nom pour sa copie locale)
669
 *     - true : demande d'encodage/charset
670
 *     - null : ne retourner que les headers
671
 * @param bool $get_headers
672
 *     Si on veut récupérer les entêtes
673
 * @param int|null $taille_max
674
 *     Arrêter le contenu au-delà (0 = seulement les entetes ==> requête HEAD).
675
 *     Par defaut taille_max = 1Mo.
676
 * @param string|array $datas
677
 *     Pour faire un POST de données
678
 * @param string $boundary
679
 *     Pour forcer l'envoi par cette méthode
680
 * @param bool $refuser_gz
681
 *     Pour forcer le refus de la compression (cas des serveurs orthographiques)
682
 * @param string $date_verif
683
 *     Un timestamp unix pour arrêter la récuperation si la page distante
684
 *     n'a pas été modifiée depuis une date donnée
685
 * @param string $uri_referer
686
 *     Pour préciser un référer différent
687
 * @return string|bool
688
 *     - Code de la page obtenue (avec ou sans entête)
689
 *     - false si la page n'a pu être récupérée (status different de 200)
690
 **/
691
function recuperer_page(
692
	$url,
693
	$trans = false,
694
	$get_headers = false,
695
	$taille_max = null,
696
	$datas = '',
697
	$boundary = '',
698
	$refuser_gz = false,
699
	$date_verif = '',
700
	$uri_referer = ''
701
) {
702
	// $copy = copier le fichier ?
703
	$copy = (is_string($trans) and strlen($trans) > 5); // eviter "false" :-)
704
705
	if (!is_null($taille_max) and ($taille_max == 0)) {
706
		$get = 'HEAD';
707
	} else {
708
		$get = 'GET';
709
	}
710
711
	$options = array(
712
		'transcoder' => $trans === true,
713
		'methode' => $get,
714
		'datas' => $datas,
715
		'boundary' => $boundary,
716
		'refuser_gz' => $refuser_gz,
717
		'if_modified_since' => $date_verif,
718
		'uri_referer' => $uri_referer,
719
		'file' => $copy ? $trans : '',
720
		'follow_location' => 10,
721
	);
722
	if (!is_null($taille_max)) {
723
		$options['taille_max'] = $taille_max;
724
	}
725
	// dix tentatives maximum en cas d'entetes 301...
726
	$res = recuperer_url($url, $options);
727
	if (!$res) {
728
		return false;
729
	}
730
	if ($res['status'] !== 200) {
731
		return false;
732
	}
733
	if ($get_headers) {
734
		return $res['headers'] . "\n" . $res['page'];
735
	}
736
737
	return $res['page'];
738
}
739
740
741
/**
742
 * Obsolete Récupère une page sur le net et au besoin l'encode dans le charset local
743
 *
744
 * @deprecated 3.1
745
 *
746
 * @uses recuperer_url()
747
 *
748
 * @param string $url
749
 *     URL de la page à récupérer
750
 * @param bool|null|string $trans
751
 *     - chaîne longue : c'est un nom de fichier (nom pour sa copie locale)
752
 *     - true : demande d'encodage/charset
753
 *     - null : ne retourner que les headers
754
 * @param string $get
755
 *     Type de requête HTTP à faire (HEAD, GET ou POST)
756
 * @param int|bool $taille_max
0 ignored issues
show
Documentation introduced by
Consider making the type for parameter $taille_max a bit more specific; maybe use integer.
Loading history...
757
 *     Arrêter le contenu au-delà (0 = seulement les entetes ==> requête HEAD).
758
 *     Par defaut taille_max = 1Mo.
759
 * @param string|array $datas
760
 *     Pour faire un POST de données
761
 * @param bool $refuser_gz
762
 *     Pour forcer le refus de la compression (cas des serveurs orthographiques)
763
 * @param string $date_verif
764
 *     Un timestamp unix pour arrêter la récuperation si la page distante
765
 *     n'a pas été modifiée depuis une date donnée
766
 * @param string $uri_referer
767
 *     Pour préciser un référer différent
768
 * @return string|array|bool
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use false|array.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
769
 *     - Retourne l'URL en cas de 301,
770
 *     - Un tableau (entête, corps) si ok,
771
 *     - false sinon
772
 **/
773
function recuperer_lapage(
774
	$url,
775
	$trans = false,
776
	$get = 'GET',
777
	$taille_max = 1048576,
778
	$datas = '',
779
	$refuser_gz = false,
780
	$date_verif = '',
781
	$uri_referer = ''
782
) {
783
	// $copy = copier le fichier ?
784
	$copy = (is_string($trans) and strlen($trans) > 5); // eviter "false" :-)
785
786
	// si on ecrit directement dans un fichier, pour ne pas manipuler
787
	// en memoire refuser gz
788
	if ($copy) {
789
		$refuser_gz = true;
790
	}
791
792
	$options = array(
793
		'transcoder' => $trans === true,
794
		'methode' => $get,
795
		'datas' => $datas,
796
		'refuser_gz' => $refuser_gz,
797
		'if_modified_since' => $date_verif,
798
		'uri_referer' => $uri_referer,
799
		'file' => $copy ? $trans : '',
800
		'follow_location' => false,
801
	);
802
	if (!is_null($taille_max)) {
803
		$options['taille_max'] = $taille_max;
804
	}
805
	// dix tentatives maximum en cas d'entetes 301...
806
	$res = recuperer_url($url, $options);
807
808
	if (!$res) {
809
		return false;
810
	}
811
	if ($res['status'] !== 200) {
812
		return false;
813
	}
814
815
	return array($res['headers'], $res['page']);
816
}
817
818
/**
819
 * Recuperer le contenu sur lequel pointe la resource passee en argument
820
 * $taille_max permet de tronquer
821
 * de l'url dont on a deja recupere les en-tetes
822
 *
823
 * @param resource $handle
824
 * @param int $taille_max
825
 * @param string $fichier
826
 *   fichier dans lequel copier le contenu de la resource
827
 * @return bool|int|string
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use integer|false|string.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
828
 *   bool false si echec
829
 *   int taille du fichier si argument fichier fourni
830
 *   string contenu de la resource
831
 */
832
function recuperer_body($handle, $taille_max = _INC_DISTANT_MAX_SIZE, $fichier = '') {
833
	$taille = 0;
834
	$result = '';
835
	$fp = false;
836
	if ($fichier) {
837
		include_spip('inc/acces');
838
		$tmpfile = "$fichier." . creer_uniqid() . '.tmp';
839
		$fp = spip_fopen_lock($tmpfile, 'w', LOCK_EX);
840
		if (!$fp and file_exists($fichier)) {
841
			return filesize($fichier);
842
		}
843
		if (!$fp) {
844
			return false;
845
		}
846
		$result = 0; // on renvoie la taille du fichier
847
	}
848
	while (!feof($handle) and $taille < $taille_max) {
849
		$res = fread($handle, 16384);
850
		$taille += strlen($res);
851
		if ($fp) {
852
			fwrite($fp, $res);
853
			$result = $taille;
854
		} else {
855
			$result .= $res;
856
		}
857
	}
858
	if ($fp) {
859
		spip_fclose_unlock($fp);
860
		spip_unlink($fichier);
861
		@rename($tmpfile, $fichier);
0 ignored issues
show
Security Best Practice introduced by
It seems like you do not handle an error condition here. This can introduce security issues, and is generally not recommended.

If you suppress an error, we recommend checking for the error condition explicitly:

// For example instead of
@mkdir($dir);

// Better use
if (@mkdir($dir) === false) {
    throw new \RuntimeException('The directory '.$dir.' could not be created.');
}
Loading history...
Bug introduced by
The variable $tmpfile does not seem to be defined for all execution paths leading up to this point.

If you define a variable conditionally, it can happen that it is not defined for all execution paths.

Let’s take a look at an example:

function myFunction($a) {
    switch ($a) {
        case 'foo':
            $x = 1;
            break;

        case 'bar':
            $x = 2;
            break;
    }

    // $x is potentially undefined here.
    echo $x;
}

In the above example, the variable $x is defined if you pass “foo” or “bar” as argument for $a. However, since the switch statement has no default case statement, if you pass any other value, the variable $x would be undefined.

Available Fixes

  1. Check for existence of the variable explicitly:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        if (isset($x)) { // Make sure it's always set.
            echo $x;
        }
    }
    
  2. Define a default value for the variable:

    function myFunction($a) {
        $x = ''; // Set a default which gets overridden for certain paths.
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        echo $x;
    }
    
  3. Add a value for the missing path:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
    
            // We add support for the missing case.
            default:
                $x = '';
                break;
        }
    
        echo $x;
    }
    
Loading history...
862
		if (!file_exists($fichier)) {
863
			return false;
864
		}
865
	}
866
867
	return $result;
868
}
869
870
/**
871
 * Lit les entetes de reponse HTTP sur la socket $handle
872
 * et retourne
873
 * false en cas d'echec,
874
 * un tableau associatif en cas de succes, contenant :
875
 * - le status
876
 * - le tableau complet des headers
877
 * - la date de derniere modif si connue
878
 * - l'url de redirection si specifiee
879
 *
880
 * @param resource $handle
881
 * @param int|bool $if_modified_since
882
 * @return bool|array
0 ignored issues
show
Documentation introduced by
Consider making the return type a bit more specific; maybe use false|array.

This check looks for the generic type array as a return type and suggests a more specific type. This type is inferred from the actual code.

Loading history...
883
 *   int status
884
 *   string headers
885
 *   int last_modified
886
 *   string location
887
 */
888
function recuperer_entetes_complets($handle, $if_modified_since = false) {
889
	$result = array('status' => 0, 'headers' => array(), 'last_modified' => 0, 'location' => '');
890
891
	$s = @trim(fgets($handle, 16384));
892
	if (!preg_match(',^HTTP/[0-9]+\.[0-9]+ ([0-9]+),', $s, $r)) {
893
		return false;
894
	}
895
	$result['status'] = intval($r[1]);
896
	while ($s = trim(fgets($handle, 16384))) {
897
		$result['headers'][] = $s . "\n";
898
		preg_match(',^([^:]*): *(.*)$,i', $s, $r);
899
		list(, $d, $v) = $r;
900
		if (strtolower(trim($d)) == 'location' and $result['status'] >= 300 and $result['status'] < 400) {
901
			$result['location'] = $v;
902
		} elseif ($d == 'Last-Modified') {
903
			$result['last_modified'] = strtotime($v);
904
		}
905
	}
906
	if ($if_modified_since
907
		and $result['last_modified']
908
		and $if_modified_since > $result['last_modified']
909
		and $result['status'] == 200
910
	) {
911
		$result['status'] = 304;
912
	}
913
914
	$result['headers'] = implode('', $result['headers']);
915
916
	return $result;
917
}
918
919
/**
920
 * Calcule le nom canonique d'une copie local d'un fichier distant
921
 *
922
 * Si on doit conserver une copie locale des fichiers distants, autant que ca
923
 * soit à un endroit canonique
924
 *
925
 * @note
926
 *   Si ca peut être bijectif c'est encore mieux,
927
 *   mais là tout de suite je ne trouve pas l'idee, étant donné les limitations
928
 *   des filesystems
929
 *
930
 * @param string $source
931
 *     URL de la source
932
 * @param string $extension
933
 *     Extension du fichier
934
 * @return string
935
 *     Nom du fichier pour copie locale
936
 **/
937
function nom_fichier_copie_locale($source, $extension) {
938
	include_spip('inc/documents');
939
940
	$d = creer_repertoire_documents('distant'); # IMG/distant/
941
	$d = sous_repertoire($d, $extension); # IMG/distant/pdf/
942
943
	// on se place tout le temps comme si on etait a la racine
944
	if (_DIR_RACINE) {
945
		$d = preg_replace(',^' . preg_quote(_DIR_RACINE) . ',', '', $d);
946
	}
947
948
	$m = md5($source);
949
950
	return $d
951
	. substr(preg_replace(',[^\w-],', '', basename($source)) . '-' . $m, 0, 12)
952
	. substr($m, 0, 4)
953
	. ".$extension";
954
}
955
956
/**
957
 * Donne le nom de la copie locale de la source
958
 *
959
 * Soit obtient l'extension du fichier directement de l'URL de la source,
960
 * soit tente de le calculer.
961
 *
962
 * @uses nom_fichier_copie_locale()
963
 * @uses recuperer_infos_distantes()
964
 *
965
 * @param string $source
966
 *      URL de la source distante
967
 * @return string
0 ignored issues
show
Documentation introduced by
Should the return type not be string|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
968
 *      Nom du fichier calculé
969
 **/
970
function fichier_copie_locale($source) {
971
	// Si c'est deja local pas de souci
972
	if (!tester_url_absolue($source)) {
973
		if (_DIR_RACINE) {
974
			$source = preg_replace(',^' . preg_quote(_DIR_RACINE) . ',', '', $source);
975
		}
976
977
		return $source;
978
	}
979
980
	// optimisation : on regarde si on peut deviner l'extension dans l'url et si le fichier
981
	// a deja ete copie en local avec cette extension
982
	// dans ce cas elle est fiable, pas la peine de requeter en base
983
	$path_parts = pathinfo($source);
984
	if (!isset($path_parts['extension'])) {
985
		$path_parts['extension'] = '';
986
	}
987
	$ext = $path_parts ? $path_parts['extension'] : '';
988
	if ($ext
989
		and preg_match(',^\w+$,', $ext) // pas de php?truc=1&...
990
		and $f = nom_fichier_copie_locale($source, $ext)
991
		and file_exists(_DIR_RACINE . $f)
992
	) {
993
		return $f;
994
	}
995
996
997
	// Si c'est deja dans la table des documents,
998
	// ramener le nom de sa copie potentielle
999
	$ext = sql_getfetsel('extension', 'spip_documents', 'fichier=' . sql_quote($source) . " AND distant='oui' AND extension <> ''");
1000
1001
	if ($ext) {
1002
		return nom_fichier_copie_locale($source, $ext);
1003
	}
1004
1005
	// voir si l'extension indiquee dans le nom du fichier est ok
1006
	// et si il n'aurait pas deja ete rapatrie
1007
1008
	$ext = $path_parts ? $path_parts['extension'] : '';
1009
1010
	if ($ext and sql_getfetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($ext))) {
1011
		$f = nom_fichier_copie_locale($source, $ext);
1012
		if (file_exists(_DIR_RACINE . $f)) {
1013
			return $f;
1014
		}
1015
	}
1016
1017
	// Ping  pour voir si son extension est connue et autorisee
1018
	// avec mise en cache du resultat du ping
1019
1020
	$cache = sous_repertoire(_DIR_CACHE, 'rid') . md5($source);
1021
	if (!@file_exists($cache)
1022
		or !$path_parts = @unserialize(spip_file_get_contents($cache))
1023
		or _request('var_mode') == 'recalcul'
1024
	) {
1025
		$path_parts = recuperer_infos_distantes($source, 0, false);
1026
		ecrire_fichier($cache, serialize($path_parts));
1027
	}
1028
	$ext = !empty($path_parts['extension']) ? $path_parts['extension'] : '';
1029
	if ($ext and sql_getfetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($ext))) {
1030
		return nom_fichier_copie_locale($source, $ext);
1031
	}
1032
	spip_log("pas de copie locale pour $source", 'distant' . _LOG_ERREUR);
1033
}
1034
1035
1036
/**
1037
 * Récupérer les infos d'un document distant, sans trop le télécharger
1038
 *
1039
 * @param string $source
1040
 *     URL de la source
1041
 * @param int $max
1042
 *     Taille maximum du fichier à télécharger
1043
 * @param bool $charger_si_petite_image
1044
 *     Pour télécharger le document s'il est petit
1045
 * @return array
0 ignored issues
show
Documentation introduced by
Should the return type not be false|array? Also, consider making the array more specific, something like array<String>, or String[].

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

If the return type contains the type array, this check recommends the use of a more specific type like String[] or array<String>.

Loading history...
1046
 *     Couples des informations obtenues parmis :
1047
 *
1048
 *     - 'body' = chaine
1049
 *     - 'type_image' = booleen
1050
 *     - 'titre' = chaine
1051
 *     - 'largeur' = intval
1052
 *     - 'hauteur' = intval
1053
 *     - 'taille' = intval
1054
 *     - 'extension' = chaine
1055
 *     - 'fichier' = chaine
1056
 *     - 'mime_type' = chaine
1057
 **/
1058
function recuperer_infos_distantes($source, $max = 0, $charger_si_petite_image = true) {
1059
1060
	// pas la peine de perdre son temps
1061
	if (!tester_url_absolue($source)) {
1062
		return false;
1063
	}
1064
1065
	# charger les alias des types mime
1066
	include_spip('base/typedoc');
1067
1068
	$a = array();
1069
	$mime_type = '';
1070
	// On va directement charger le debut des images et des fichiers html,
1071
	// de maniere a attrapper le maximum d'infos (titre, taille, etc). Si
1072
	// ca echoue l'utilisateur devra les entrer...
1073
	if ($headers = recuperer_page($source, false, true, $max, '', '', true)) {
0 ignored issues
show
Deprecated Code introduced by
The function recuperer_page() has been deprecated with message: 3.1

This function has been deprecated. The supplier of the file has supplied an explanatory message.

The explanatory message should give you some clue as to whether and when the function will be removed from the class and what other function to use instead.

Loading history...
1074
		list($headers, $a['body']) = preg_split(',\n\n,', $headers, 2);
1075
1076
		if (preg_match(",\nContent-Type: *([^[:space:];]*),i", "\n$headers", $regs)) {
1077
			$mime_type = (trim($regs[1]));
1078
		} else {
1079
			$mime_type = '';
1080
		} // inconnu
1081
1082
		// Appliquer les alias
1083
		while (isset($GLOBALS['mime_alias'][$mime_type])) {
1084
			$mime_type = $GLOBALS['mime_alias'][$mime_type];
1085
		}
1086
1087
		// Si on a un mime-type insignifiant
1088
		// text/plain,application/octet-stream ou vide
1089
		// c'est peut-etre que le serveur ne sait pas
1090
		// ce qu'il sert ; on va tenter de detecter via l'extension de l'url
1091
		// ou le Content-Disposition: attachment; filename=...
1092
		$t = null;
1093
		if (in_array($mime_type, array('text/plain', '', 'application/octet-stream'))) {
1094
			if (!$t
1095
				and preg_match(',\.([a-z0-9]+)(\?.*)?$,i', $source, $rext)
1096
			) {
1097
				$t = sql_fetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($rext[1], '', 'text'));
1098
			}
1099 View Code Duplication
			if (!$t
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1100
				and preg_match(',^Content-Disposition:\s*attachment;\s*filename=(.*)$,Uims', $headers, $m)
1101
				and preg_match(',\.([a-z0-9]+)(\?.*)?$,i', $m[1], $rext)
1102
			) {
1103
				$t = sql_fetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($rext[1], '', 'text'));
1104
			}
1105
		}
1106
1107
		// Autre mime/type (ou text/plain avec fichier d'extension inconnue)
1108
		if (!$t) {
1109
			$t = sql_fetsel('extension', 'spip_types_documents', 'mime_type=' . sql_quote($mime_type));
1110
		}
1111
1112
		// Toujours rien ? (ex: audio/x-ogg au lieu de application/ogg)
1113
		// On essaie de nouveau avec l'extension
1114 View Code Duplication
		if (!$t
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1115
			and $mime_type != 'text/plain'
1116
			and preg_match(',\.([a-z0-9]+)(\?.*)?$,i', $source, $rext)
1117
		) {
1118
			# eviter xxx.3 => 3gp (> SPIP 3)
1119
			$t = sql_fetsel('extension', 'spip_types_documents', 'extension=' . sql_quote($rext[1], '', 'text'));
1120
		}
1121
1122
		if ($t) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $t of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
1123
			spip_log("mime-type $mime_type ok, extension " . $t['extension'], 'distant');
1124
			$a['extension'] = $t['extension'];
1125
		} else {
1126
			# par defaut on retombe sur '.bin' si c'est autorise
1127
			spip_log("mime-type $mime_type inconnu", 'distant');
1128
			$t = sql_fetsel('extension', 'spip_types_documents', "extension='bin'");
1129
			if (!$t) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $t of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
1130
				return false;
1131
			}
1132
			$a['extension'] = $t['extension'];
1133
		}
1134
1135
		if (preg_match(",\nContent-Length: *([^[:space:]]*),i", "\n$headers", $regs)) {
1136
			$a['taille'] = intval($regs[1]);
1137
		}
1138
	}
1139
1140
	// Echec avec HEAD, on tente avec GET
1141
	if (!$a and !$max) {
1142
		spip_log("tenter GET $source", 'distant');
1143
		$a = recuperer_infos_distantes($source, _INC_DISTANT_MAX_SIZE);
1144
	}
1145
1146
	// si on a rien trouve pas la peine d'insister
1147
	if (!$a) {
1148
		return false;
1149
	}
1150
1151
	// S'il s'agit d'une image pas trop grosse ou d'un fichier html, on va aller
1152
	// recharger le document en GET et recuperer des donnees supplementaires...
1153
	include_spip('inc/filtres_images_lib_mini');
1154
	if (strpos($mime_type, "image/") === 0
1155
	  and $extension = _image_trouver_extension_depuis_mime($mime_type)) {
1156
		if ($max == 0
1157
			and (empty($a['taille']) or $a['taille'] < _INC_DISTANT_MAX_SIZE)
1158
			and in_array($extension, formats_image_acceptables())
1159
			and $charger_si_petite_image
1160
		) {
1161
			$a = recuperer_infos_distantes($source, _INC_DISTANT_MAX_SIZE);
1162
		} else {
1163
			if ($a['body']) {
1164
				$a['extension'] = $extension;
1165
				$a['fichier'] = _DIR_RACINE . nom_fichier_copie_locale($source, $extension);
1166
				ecrire_fichier($a['fichier'], $a['body']);
1167
				$size_image = @spip_getimagesize($a['fichier']);
1168
				$a['largeur'] = intval($size_image[0]);
1169
				$a['hauteur'] = intval($size_image[1]);
1170
				$a['type_image'] = true;
1171
			}
1172
		}
1173
	}
1174
1175
	// Fichier swf, si on n'a pas la taille, on va mettre 425x350 par defaut
1176
	// ce sera mieux que 0x0
1177
	// Flash is dead!
1178
	if ($a and isset($a['extension']) and $a['extension'] == 'swf'
1179
		and empty($a['largeur'])
1180
	) {
1181
		$a['largeur'] = 425;
1182
		$a['hauteur'] = 350;
1183
	}
1184
1185
	if ($mime_type == 'text/html') {
1186
		include_spip('inc/filtres');
1187
		$page = recuperer_page($source, true, false, _INC_DISTANT_MAX_SIZE);
0 ignored issues
show
Deprecated Code introduced by
The function recuperer_page() has been deprecated with message: 3.1

This function has been deprecated. The supplier of the file has supplied an explanatory message.

The explanatory message should give you some clue as to whether and when the function will be removed from the class and what other function to use instead.

Loading history...
1188
		if (preg_match(',<title>(.*?)</title>,ims', $page, $regs)) {
1189
			$a['titre'] = corriger_caracteres(trim($regs[1]));
1190
		}
1191
		if (!isset($a['taille']) or !$a['taille']) {
1192
			$a['taille'] = strlen($page); # a peu pres
1193
		}
1194
	}
1195
	$a['mime_type'] = $mime_type;
1196
1197
	return $a;
1198
}
1199
1200
1201
/**
1202
 * Tester si un host peut etre recuperer directement ou doit passer par un proxy
1203
 *
1204
 * On peut passer en parametre le proxy et la liste des host exclus,
1205
 * pour les besoins des tests, lors de la configuration
1206
 *
1207
 * @param string $host
1208
 * @param string $http_proxy
0 ignored issues
show
Documentation introduced by
Should the type for parameter $http_proxy not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
1209
 * @param string $http_noproxy
0 ignored issues
show
Documentation introduced by
Should the type for parameter $http_noproxy not be string|null?

This check looks for @param annotations where the type inferred by our type inference engine differs from the declared type.

It makes a suggestion as to what type it considers more descriptive.

Most often this is a case of a parameter that can be null in addition to its declared types.

Loading history...
1210
 * @return string
1211
 */
1212
function need_proxy($host, $http_proxy = null, $http_noproxy = null) {
1213
	if (is_null($http_proxy)) {
1214
		$http_proxy = isset($GLOBALS['meta']['http_proxy']) ? $GLOBALS['meta']['http_proxy'] : null;
1215
	}
1216
	// rien a faire si pas de proxy :)
1217
	if (is_null($http_proxy) or !$http_proxy = trim($http_proxy)) {
1218
		return '';
1219
	}
1220
1221
	if (is_null($http_noproxy)) {
1222
		$http_noproxy = isset($GLOBALS['meta']['http_noproxy']) ? $GLOBALS['meta']['http_noproxy'] : null;
1223
	}
1224
	// si pas d'exception, on retourne le proxy
1225
	if (is_null($http_noproxy) or !$http_noproxy = trim($http_noproxy)) {
1226
		return $http_proxy;
1227
	}
1228
1229
	// si le host ou l'un des domaines parents est dans $http_noproxy on fait exception
1230
	// $http_noproxy peut contenir plusieurs domaines separes par des espaces ou retour ligne
1231
	$http_noproxy = str_replace("\n", " ", $http_noproxy);
1232
	$http_noproxy = str_replace("\r", " ", $http_noproxy);
1233
	$http_noproxy = " $http_noproxy ";
1234
	$domain = $host;
1235
	// si le domaine exact www.example.org est dans les exceptions
1236
	if (strpos($http_noproxy, " $domain ") !== false)
1237
		return '';
1238
1239
	while (strpos($domain, '.') !== false) {
1240
		$domain = explode('.', $domain);
1241
		array_shift($domain);
1242
		$domain = implode('.', $domain);
1243
1244
		// ou si un domaine parent commencant par un . est dans les exceptions (indiquant qu'il couvre tous les sous-domaines)
1245
		if (strpos($http_noproxy, " .$domain ") !== false) {
1246
			return '';
1247
		}
1248
	}
1249
1250
	// ok c'est pas une exception
1251
	return $http_proxy;
1252
}
1253
1254
1255
/**
1256
 * Initialise une requete HTTP avec entetes
1257
 *
1258
 * Décompose l'url en son schema+host+path+port et lance la requete.
1259
 * Retourne le descripteur sur lequel lire la réponse.
1260
 *
1261
 * @uses lance_requete()
1262
 *
1263
 * @param string $method
1264
 *   HEAD, GET, POST
1265
 * @param string $url
1266
 * @param bool $refuse_gz
1267
 * @param string $referer
1268
 * @param string $datas
1269
 * @param string $vers
1270
 * @param string $date
1271
 * @return array
1272
 */
1273
function init_http($method, $url, $refuse_gz = false, $referer = '', $datas = '', $vers = 'HTTP/1.0', $date = '') {
1274
	$user = $via_proxy = $proxy_user = '';
0 ignored issues
show
Unused Code introduced by
$proxy_user is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
Unused Code introduced by
$via_proxy is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
1275
	$fopen = false;
1276
1277
	$t = @parse_url($url);
1278
	$host = $t['host'];
1279
	if ($t['scheme'] == 'http') {
1280
		$scheme = 'http';
1281
		$noproxy = '';
1282
	} elseif ($t['scheme'] == 'https') {
1283
		$scheme = 'ssl';
1284
		$noproxy = 'ssl://';
1285 View Code Duplication
		if (!isset($t['port']) || !($port = $t['port'])) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1286
			$t['port'] = 443;
1287
		}
1288
	} else {
1289
		$scheme = $t['scheme'];
1290
		$noproxy = $scheme . '://';
1291
	}
1292
	if (isset($t['user'])) {
1293
		$user = array($t['user'], $t['pass']);
1294
	}
1295
1296 View Code Duplication
	if (!isset($t['port']) || !($port = $t['port'])) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1297
		$port = 80;
1298
	}
1299 View Code Duplication
	if (!isset($t['path']) || !($path = $t['path'])) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1300
		$path = '/';
1301
	}
1302
1303
	if (!empty($t['query'])) {
1304
		$path .= '?' . $t['query'];
1305
	}
1306
1307
	$f = lance_requete($method, $scheme, $user, $host, $path, $port, $noproxy, $refuse_gz, $referer, $datas, $vers, $date);
0 ignored issues
show
Bug introduced by
It seems like $user defined by $via_proxy = $proxy_user = '' on line 1274 can also be of type string; however, lance_requete() does only seem to accept array, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
1308
	if (!$f or !is_resource($f)) {
1309
		// fallback : fopen si on a pas fait timeout dans lance_requete
1310
		// ce qui correspond a $f===110
1311
		if ($f !== 110
1312
			and !need_proxy($host)
1313
			and !_request('tester_proxy')
1314
			and (!isset($GLOBALS['inc_distant_allow_fopen']) or $GLOBALS['inc_distant_allow_fopen'])
1315
		) {
1316
			$f = @fopen($url, 'rb');
1317
			spip_log("connexion vers $url par simple fopen", 'distant');
1318
			$fopen = true;
1319
		} else {
1320
			// echec total
1321
			$f = false;
1322
		}
1323
	}
1324
1325
	return array($f, $fopen);
1326
}
1327
1328
/**
1329
 * Lancer la requete proprement dite
1330
 *
1331
 * @param string $method
1332
 *   type de la requete (GET, HEAD, POST...)
1333
 * @param string $scheme
1334
 *   protocole (http, tls, ftp...)
1335
 * @param array $user
1336
 *   couple (utilisateur, mot de passe) en cas d'authentification http
1337
 * @param string $host
1338
 *   nom de domaine
1339
 * @param string $path
1340
 *   chemin de la page cherchee
1341
 * @param string $port
1342
 *   port utilise pour la connexion
1343
 * @param bool $noproxy
1344
 *   protocole utilise si requete sans proxy
1345
 * @param bool $refuse_gz
1346
 *   refuser la compression GZ
1347
 * @param string $referer
1348
 *   referer
1349
 * @param string $datas
1350
 *   donnees postees
1351
 * @param string $vers
1352
 *   version HTTP
1353
 * @param int|string $date
1354
 *   timestamp pour entente If-Modified-Since
1355
 * @return bool|resource
1356
 *   false|int si echec
1357
 *   resource socket vers l'url demandee
1358
 */
1359
function lance_requete(
1360
	$method,
1361
	$scheme,
1362
	$user,
1363
	$host,
1364
	$path,
1365
	$port,
1366
	$noproxy,
1367
	$refuse_gz = false,
1368
	$referer = '',
1369
	$datas = '',
1370
	$vers = 'HTTP/1.0',
1371
	$date = ''
1372
) {
1373
1374
	$proxy_user = '';
1375
	$http_proxy = need_proxy($host);
1376
	if ($user) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $user of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
1377
		$user = urlencode($user[0]) . ':' . urlencode($user[1]);
1378
	}
1379
1380
	$connect = '';
1381
	if ($http_proxy) {
1382
		if (!defined('_PROXY_HTTPS_NOT_VIA_CONNECT') and in_array($scheme , array('tls','ssl'))) {
1383
			$path_host = (!$user ? '' : "$user@") . $host . (($port != 80) ? ":$port" : '');
1384
			$connect = 'CONNECT ' . $path_host . " $vers\r\n"
1385
				. "Host: $path_host\r\n"
1386
				. "Proxy-Connection: Keep-Alive\r\n";
1387
		} else {
1388
			$path = (in_array($scheme , array('tls','ssl')) ? 'https://' : "$scheme://")
1389
				. (!$user ? '' : "$user@")
1390
				. "$host" . (($port != 80) ? ":$port" : '') . $path;
1391
		}
1392
		$t2 = @parse_url($http_proxy);
1393
		$first_host = $t2['host'];
1394
		if (!($port = $t2['port'])) {
1395
			$port = 80;
1396
		}
1397
		if ($t2['user']) {
1398
			$proxy_user = base64_encode($t2['user'] . ':' . $t2['pass']);
1399
		}
1400
	} else {
1401
		$first_host = $noproxy . $host;
1402
	}
1403
1404
	if ($connect) {
1405
		$streamContext = stream_context_create(array(
1406
			'ssl' => array(
1407
				'verify_peer' => false,
1408
				'allow_self_signed' => true,
1409
				'SNI_enabled' => true,
1410
				'peer_name' => $host,
1411
			)
1412
		));
1413
		$f = @stream_socket_client(
1414
			"tcp://$first_host:$port",
1415
			$errno,
1416
			$errstr,
1417
			_INC_DISTANT_CONNECT_TIMEOUT,
1418
			STREAM_CLIENT_CONNECT,
1419
			$streamContext
1420
		);
1421
		spip_log("Recuperer $path sur $first_host:$port par $f (via CONNECT)", 'connect');
1422
		if (!$f) {
1423
			spip_log("Erreur connexion $errno $errstr", 'distant' . _LOG_ERREUR);
1424
			return $errno;
1425
		}
1426
		stream_set_timeout($f, _INC_DISTANT_CONNECT_TIMEOUT);
1427
1428
		fputs($f, $connect);
1429
		fputs($f, "\r\n");
1430
		$res = fread($f, 1024);
1431
		if (!$res
1432
			or !count($res = explode(' ', $res))
1433
			or $res[1] !== '200'
1434
		) {
1435
			spip_log("Echec CONNECT sur $first_host:$port", 'connect' . _LOG_INFO_IMPORTANTE);
1436
			fclose($f);
1437
1438
			return false;
1439
		}
1440
		// important, car sinon on lit trop vite et les donnees ne sont pas encore dispo
1441
		stream_set_blocking($f, true);
1442
		// envoyer le handshake
1443
		stream_socket_enable_crypto($f, true, STREAM_CRYPTO_METHOD_SSLv23_CLIENT);
1444
		spip_log("OK CONNECT sur $first_host:$port", 'connect');
1445
	} else {
1446
		$ntry = 3;
1447
		do {
1448
			$f = @fsockopen($first_host, $port, $errno, $errstr, _INC_DISTANT_CONNECT_TIMEOUT);
1449
		} while (!$f and $ntry-- and $errno !== 110 and sleep(1));
1450
		spip_log("Recuperer $path sur $first_host:$port par $f");
1451
		if (!$f) {
1452
			spip_log("Erreur connexion $errno $errstr", 'distant' . _LOG_ERREUR);
1453
1454
			return $errno;
1455
		}
1456
		stream_set_timeout($f, _INC_DISTANT_CONNECT_TIMEOUT);
1457
	}
1458
1459
	$site = isset($GLOBALS['meta']['adresse_site']) ? $GLOBALS['meta']['adresse_site'] : '';
1460
1461
	$host_port = $host;
1462
	if ($port != (in_array($scheme , array('tls','ssl')) ? 443 : 80)) {
1463
		$host_port .= ":$port";
1464
	}
1465
	$req = "$method $path $vers\r\n"
1466
		. "Host: $host_port\r\n"
1467
		. 'User-Agent: ' . _INC_DISTANT_USER_AGENT . "\r\n"
1468
		. ($refuse_gz ? '' : ('Accept-Encoding: ' . _INC_DISTANT_CONTENT_ENCODING . "\r\n"))
1469
		. (!$site ? '' : "Referer: $site/$referer\r\n")
1470
		. (!$date ? '' : 'If-Modified-Since: ' . (gmdate('D, d M Y H:i:s', $date) . " GMT\r\n"))
1471
		. (!$user ? '' : ('Authorization: Basic ' . base64_encode($user) . "\r\n"))
1472
		. (!$proxy_user ? '' : "Proxy-Authorization: Basic $proxy_user\r\n")
1473
		. (!strpos($vers, '1.1') ? '' : "Keep-Alive: 300\r\nConnection: keep-alive\r\n");
1474
1475
#	spip_log("Requete\n$req", 'distant');
1476
	fputs($f, $req);
1477
	fputs($f, $datas ? $datas : "\r\n");
1478
1479
	return $f;
1480
}
1481