Completed
Push — master ( 80c6c9...40be87 )
by
unknown
02:48 queued 16s
created

SparqlHelper   F

Complexity

Total Complexity 73

Size/Duplication

Total Lines 759
Duplicated Lines 0 %

Coupling/Cohesion

Components 1
Dependencies 20

Importance

Changes 0
Metric Value
wmc 73
lcom 1
cbo 20
dl 0
loc 759
rs 2.4009
c 0
b 0
f 0

17 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 33 2
A getQueryPrefixes() 0 48 3
A hasType() 0 40 4
A findEntitiesWithSameStatement() 0 31 2
B findEntitiesWithSameQualifierOrReference() 0 46 5
A stringLiteral() 0 3 1
A getOtherEntities() 0 24 4
C getRdfLiteral() 0 43 15
C matchesRegularExpression() 0 84 9
A serializeConstraintParameterException() 0 6 1
A deserializeConstraintParameterException() 0 6 1
A matchesRegularExpressionWithSparql() 0 22 2
A isTimeout() 0 9 1
A getCacheMaxAge() 0 18 5
B getThrottling() 0 24 6
A getTimestampInFuture() 0 4 1
C runQuery() 0 99 11

How to fix   Complexity   

Complex Class

Complex classes like SparqlHelper often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use SparqlHelper, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace WikibaseQuality\ConstraintReport\ConstraintCheck\Helper;
4
5
use Config;
6
use DataValues\DataValue;
7
use DataValues\MonolingualTextValue;
8
use DateInterval;
9
use IBufferingStatsdDataFactory;
10
use InvalidArgumentException;
11
use MapCacheLRU;
12
use MediaWiki\Http\HttpRequestFactory;
13
use MWException;
14
use MWHttpRequest;
15
use Psr\Log\LoggerInterface;
16
use WANObjectCache;
17
use Wikibase\DataModel\Entity\EntityId;
18
use Wikibase\DataModel\Entity\EntityIdParser;
19
use Wikibase\DataModel\Entity\EntityIdParsingException;
20
use Wikibase\DataModel\Entity\EntityIdValue;
21
use Wikibase\DataModel\Services\Lookup\PropertyDataTypeLookup;
22
use Wikibase\DataModel\Snak\PropertyValueSnak;
23
use Wikibase\DataModel\Statement\Statement;
24
use Wikibase\Rdf\RdfVocabulary;
25
use WikibaseQuality\ConstraintReport\Api\ExpiryLock;
26
use WikibaseQuality\ConstraintReport\ConstraintCheck\Cache\CachedBool;
27
use WikibaseQuality\ConstraintReport\ConstraintCheck\Cache\CachedEntityIds;
28
use WikibaseQuality\ConstraintReport\ConstraintCheck\Cache\CachedQueryResults;
29
use WikibaseQuality\ConstraintReport\ConstraintCheck\Cache\CachingMetadata;
30
use WikibaseQuality\ConstraintReport\ConstraintCheck\Cache\Metadata;
31
use WikibaseQuality\ConstraintReport\ConstraintCheck\Context\Context;
32
use WikibaseQuality\ConstraintReport\ConstraintCheck\Message\ViolationMessage;
33
use WikibaseQuality\ConstraintReport\ConstraintCheck\Message\ViolationMessageDeserializer;
34
use WikibaseQuality\ConstraintReport\ConstraintCheck\Message\ViolationMessageSerializer;
35
use WikibaseQuality\ConstraintReport\Role;
36
use Wikimedia\Timestamp\ConvertibleTimestamp;
37
38
/**
39
 * Class for running a SPARQL query on some endpoint and getting the results.
40
 *
41
 * @author Lucas Werkmeister
42
 * @license GPL-2.0-or-later
43
 */
44
class SparqlHelper {
45
46
	/**
47
	 * @var Config
48
	 */
49
	private $config;
50
51
	/**
52
	 * @var RdfVocabulary
53
	 */
54
	private $rdfVocabulary;
55
56
	/**
57
	 * @var string[]
58
	 */
59
	private $entityPrefixes;
60
61
	/**
62
	 * @var string
63
	 */
64
	private $prefixes;
65
66
	/**
67
	 * @var EntityIdParser
68
	 */
69
	private $entityIdParser;
70
71
	/**
72
	 * @var PropertyDataTypeLookup
73
	 */
74
	private $propertyDataTypeLookup;
75
76
	/**
77
	 * @var WANObjectCache
78
	 */
79
	private $cache;
80
81
	/**
82
	 * @var ViolationMessageSerializer
83
	 */
84
	private $violationMessageSerializer;
85
86
	/**
87
	 * @var ViolationMessageDeserializer
88
	 */
89
	private $violationMessageDeserializer;
90
91
	/**
92
	 * @var IBufferingStatsdDataFactory
93
	 */
94
	private $dataFactory;
95
96
	/**
97
	 * @var LoggerInterface
98
	 */
99
	private $loggingHelper;
100
101
	/**
102
	 * @var string
103
	 */
104
	private $defaultUserAgent;
105
106
	/**
107
	 * @var ExpiryLock
108
	 */
109
	private $throttlingLock;
110
111
	/**
112
	 * @var int stands for: No Retry-After header-field was sent back
113
	 */
114
	const NO_RETRY_AFTER = -1;
115
	/**
116
	 * @var int stands for: Empty Retry-After header-field was sent back
117
	 */
118
	const EMPTY_RETRY_AFTER = -2;
119
	/**
120
	 * @var int stands for: Invalid Retry-After header-field was sent back
121
	 * link a string
122
	 */
123
	const INVALID_RETRY_AFTER = -3;
124
	/**
125
	 * @var string ID on which the lock is applied on
126
	 */
127
	const EXPIRY_LOCK_ID = 'SparqlHelper.runQuery';
128
129
	/**
130
	 * @var int HTTP response code for too many requests
131
	 */
132
	const HTTP_TOO_MANY_REQUESTS = 429;
133
134
	/**
135
	 * @var HttpRequestFactory
136
	 */
137
	private $requestFactory;
138
139
	public function __construct(
140
		Config $config,
141
		RdfVocabulary $rdfVocabulary,
142
		EntityIdParser $entityIdParser,
143
		PropertyDataTypeLookup $propertyDataTypeLookup,
144
		WANObjectCache $cache,
145
		ViolationMessageSerializer $violationMessageSerializer,
146
		ViolationMessageDeserializer $violationMessageDeserializer,
147
		IBufferingStatsdDataFactory $dataFactory,
148
		ExpiryLock $throttlingLock,
149
		LoggingHelper $loggingHelper,
150
		$defaultUserAgent,
151
		HttpRequestFactory $requestFactory
152
	) {
153
		$this->config = $config;
154
		$this->rdfVocabulary = $rdfVocabulary;
155
		$this->entityIdParser = $entityIdParser;
156
		$this->propertyDataTypeLookup = $propertyDataTypeLookup;
157
		$this->cache = $cache;
158
		$this->violationMessageSerializer = $violationMessageSerializer;
159
		$this->violationMessageDeserializer = $violationMessageDeserializer;
160
		$this->dataFactory = $dataFactory;
161
		$this->throttlingLock = $throttlingLock;
162
		$this->loggingHelper = $loggingHelper;
0 ignored issues
show
Documentation Bug introduced by
It seems like $loggingHelper of type object<WikibaseQuality\C...k\Helper\LoggingHelper> is incompatible with the declared type object<Psr\Log\LoggerInterface> of property $loggingHelper.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
163
		$this->defaultUserAgent = $defaultUserAgent;
164
		$this->requestFactory = $requestFactory;
165
		$this->entityPrefixes = [];
166
		foreach ( $rdfVocabulary->entityNamespaceNames as $namespaceName ) {
167
			$this->entityPrefixes[] = $rdfVocabulary->getNamespaceURI( $namespaceName );
168
		}
169
170
		$this->prefixes = $this->getQueryPrefixes( $rdfVocabulary );
171
	}
172
173
	private function getQueryPrefixes( RdfVocabulary $rdfVocabulary ) {
174
		// TODO: it would probably be smarter that RdfVocubulary exposed these prefixes somehow
175
		$prefixes = '';
176
		foreach ( $rdfVocabulary->entityNamespaceNames as $sourceName => $namespaceName ) {
177
			$prefixes .= <<<END
178
PREFIX {$namespaceName}: <{$rdfVocabulary->getNamespaceURI( $namespaceName )}>\n
179
END;
180
		}
181
		$prefixes .= <<<END
182
PREFIX wds: <{$rdfVocabulary->getNamespaceURI( RdfVocabulary::NS_STATEMENT )}>
183
PREFIX wdv: <{$rdfVocabulary->getNamespaceURI( RdfVocabulary::NS_VALUE )}>\n
184
END;
185
186
		foreach ( $rdfVocabulary->propertyNamespaceNames as $sourceName => $sourceNamespaces ) {
187
			$namespaceName = $sourceNamespaces[RdfVocabulary::NSP_DIRECT_CLAIM];
188
			$prefixes .= <<<END
189
PREFIX {$namespaceName}: <{$rdfVocabulary->getNamespaceURI( $namespaceName )}>\n
190
END;
191
			$namespaceName = $sourceNamespaces[RdfVocabulary::NSP_CLAIM];
192
			$prefixes .= <<<END
193
PREFIX {$namespaceName}: <{$rdfVocabulary->getNamespaceURI( $namespaceName )}>\n
194
END;
195
			$namespaceName = $sourceNamespaces[RdfVocabulary::NSP_CLAIM_STATEMENT];
196
			$prefixes .= <<<END
197
PREFIX {$namespaceName}: <{$rdfVocabulary->getNamespaceURI( $namespaceName )}>\n
198
END;
199
			$namespaceName = $sourceNamespaces[RdfVocabulary::NSP_QUALIFIER];
200
			$prefixes .= <<<END
201
PREFIX {$namespaceName}: <{$rdfVocabulary->getNamespaceURI( $namespaceName )}>\n
202
END;
203
			$namespaceName = $sourceNamespaces[RdfVocabulary::NSP_QUALIFIER_VALUE];
204
			$prefixes .= <<<END
205
PREFIX {$namespaceName}: <{$rdfVocabulary->getNamespaceURI( $namespaceName )}>\n
206
END;
207
			$namespaceName = $sourceNamespaces[RdfVocabulary::NSP_REFERENCE];
208
			$prefixes .= <<<END
209
PREFIX {$namespaceName}: <{$rdfVocabulary->getNamespaceURI( $namespaceName )}>\n
210
END;
211
			$namespaceName = $sourceNamespaces[RdfVocabulary::NSP_REFERENCE_VALUE];
212
			$prefixes .= <<<END
213
PREFIX {$namespaceName}: <{$rdfVocabulary->getNamespaceURI( $namespaceName )}>\n
214
END;
215
		}
216
		$prefixes .= <<<END
217
PREFIX wikibase: <{$rdfVocabulary->getNamespaceURI( RdfVocabulary::NS_ONTOLOGY )}>\n
218
END;
219
		return $prefixes;
220
	}
221
222
	/**
223
	 * @param string $id entity ID serialization of the entity to check
224
	 * @param string[] $classes entity ID serializations of the expected types
225
	 *
226
	 * @return CachedBool
227
	 * @throws SparqlHelperException if the query times out or some other error occurs
228
	 */
229
	public function hasType( $id, array $classes ) {
230
		$subclassOfId = $this->config->get( 'WBQualityConstraintsSubclassOfId' );
231
		// TODO hint:gearing is a workaround for T168973 and can hopefully be removed eventually
232
		$gearingHint = $this->config->get( 'WBQualityConstraintsSparqlHasWikibaseSupport' ) ?
233
			' hint:Prior hint:gearing "forward".' :
234
			'';
235
236
		$metadatas = [];
237
238
		foreach ( array_chunk( $classes, 20 ) as $classesChunk ) {
239
			$classesValues = implode( ' ', array_map(
240
				function( $class ) {
241
					return 'wd:' . $class;
242
				},
243
				$classesChunk
244
			) );
245
246
			$query = <<<EOF
247
ASK {
248
  BIND(wd:$id AS ?item)
249
  VALUES ?class { $classesValues }
250
  ?item wdt:$subclassOfId* ?class.$gearingHint
251
}
252
EOF;
253
254
			$result = $this->runQuery( $query );
255
			$metadatas[] = $result->getMetadata();
256
			if ( $result->getArray()['boolean'] ) {
257
				return new CachedBool(
258
					true,
259
					Metadata::merge( $metadatas )
260
				);
261
			}
262
		}
263
264
		return new CachedBool(
265
			false,
266
			Metadata::merge( $metadatas )
267
		);
268
	}
269
270
	/**
271
	 * @param Statement $statement
272
	 * @param boolean $ignoreDeprecatedStatements Whether to ignore deprecated statements or not.
273
	 *
274
	 * @return CachedEntityIds
275
	 * @throws SparqlHelperException if the query times out or some other error occurs
276
	 */
277
	public function findEntitiesWithSameStatement(
278
		Statement $statement,
279
		$ignoreDeprecatedStatements
280
	) {
281
		$pid = $statement->getPropertyId()->serialize();
282
		$guid = str_replace( '$', '-', $statement->getGuid() );
283
284
		$deprecatedFilter = '';
285
		if ( $ignoreDeprecatedStatements ) {
286
			$deprecatedFilter = 'MINUS { ?otherStatement wikibase:rank wikibase:DeprecatedRank. }';
287
		}
288
289
		$query = <<<EOF
290
SELECT DISTINCT ?otherEntity WHERE {
291
  BIND(wds:$guid AS ?statement)
292
  BIND(p:$pid AS ?p)
293
  BIND(ps:$pid AS ?ps)
294
  ?entity ?p ?statement.
295
  ?statement ?ps ?value.
296
  ?otherStatement ?ps ?value.
297
  ?otherEntity ?p ?otherStatement.
298
  FILTER(?otherEntity != ?entity)
299
  $deprecatedFilter
300
}
301
LIMIT 10
302
EOF;
303
304
		$result = $this->runQuery( $query );
305
306
		return $this->getOtherEntities( $result );
307
	}
308
309
	/**
310
	 * @param EntityId $entityId The entity ID on the containing entity
311
	 * @param PropertyValueSnak $snak
312
	 * @param string $type Context::TYPE_QUALIFIER or Context::TYPE_REFERENCE
313
	 * @param boolean $ignoreDeprecatedStatements Whether to ignore deprecated statements or not.
314
	 *
315
	 * @return CachedEntityIds
316
	 * @throws SparqlHelperException if the query times out or some other error occurs
317
	 */
318
	public function findEntitiesWithSameQualifierOrReference(
319
		EntityId $entityId,
320
		PropertyValueSnak $snak,
321
		$type,
322
		$ignoreDeprecatedStatements
323
	) {
324
		$eid = $entityId->getSerialization();
325
		$pid = $snak->getPropertyId()->getSerialization();
326
		$prefix = $type === Context::TYPE_QUALIFIER ? 'pq' : 'pr';
327
		$dataValue = $snak->getDataValue();
328
		$dataType = $this->propertyDataTypeLookup->getDataTypeIdForProperty(
329
			$snak->getPropertyId()
330
		);
331
		list( $value, $isFullValue ) = $this->getRdfLiteral( $dataType, $dataValue );
332
		if ( $isFullValue ) {
333
			$prefix .= 'v';
334
		}
335
		$path = $type === Context::TYPE_QUALIFIER ?
336
			"$prefix:$pid" :
337
			"prov:wasDerivedFrom/$prefix:$pid";
338
339
		$deprecatedFilter = '';
340
		if ( $ignoreDeprecatedStatements ) {
341
			$deprecatedFilter = <<< EOF
342
  MINUS { ?otherStatement wikibase:rank wikibase:DeprecatedRank. }
343
EOF;
344
		}
345
346
		$query = <<<EOF
347
SELECT DISTINCT ?otherEntity WHERE {
348
  BIND(wd:$eid AS ?entity)
349
  BIND($value AS ?value)
350
  ?entity ?p ?statement.
351
  ?statement $path ?value.
352
  ?otherStatement $path ?value.
353
  ?otherEntity ?otherP ?otherStatement.
354
  FILTER(?otherEntity != ?entity)
355
$deprecatedFilter
356
}
357
LIMIT 10
358
EOF;
359
360
		$result = $this->runQuery( $query );
361
362
		return $this->getOtherEntities( $result );
363
	}
364
365
	/**
366
	 * Return SPARQL code for a string literal with $text as content.
367
	 *
368
	 * @param string $text
369
	 *
370
	 * @return string
371
	 */
372
	private function stringLiteral( $text ) {
373
		return '"' . strtr( $text, [ '"' => '\\"', '\\' => '\\\\' ] ) . '"';
374
	}
375
376
	/**
377
	 * Extract and parse entity IDs from the ?otherEntity column of a SPARQL query result.
378
	 *
379
	 * @param CachedQueryResults $results
380
	 *
381
	 * @return CachedEntityIds
382
	 */
383
	private function getOtherEntities( CachedQueryResults $results ) {
384
		return new CachedEntityIds( array_map(
385
			function ( $resultBindings ) {
386
				$entityIRI = $resultBindings['otherEntity']['value'];
387
				foreach ( $this->entityPrefixes as $entityPrefix ) {
388
					$entityPrefixLength = strlen( $entityPrefix );
389
					if ( substr( $entityIRI, 0, $entityPrefixLength ) === $entityPrefix ) {
390
						try {
391
							return $this->entityIdParser->parse(
392
								substr( $entityIRI, $entityPrefixLength )
393
							);
394
						} catch ( EntityIdParsingException $e ) {
395
							// fall through
396
						}
397
					}
398
399
					return null;
400
				}
401
402
				return null;
403
			},
404
			$results->getArray()['results']['bindings']
405
		), $results->getMetadata() );
406
	}
407
408
	// @codingStandardsIgnoreStart cyclomatic complexity of this function is too high
409
	/**
410
	 * Get an RDF literal or IRI with which the given data value can be matched in a query.
411
	 *
412
	 * @param string $dataType
413
	 * @param DataValue $dataValue
414
	 *
415
	 * @return array the literal or IRI as a string in SPARQL syntax,
416
	 * and a boolean indicating whether it refers to a full value node or not
417
	 */
418
	private function getRdfLiteral( $dataType, DataValue $dataValue ) {
419
		switch ( $dataType ) {
420
			case 'string':
421
			case 'external-id':
422
				return [ $this->stringLiteral( $dataValue->getValue() ), false ];
423
			case 'commonsMedia':
424
				$url = $this->rdfVocabulary->getMediaFileURI( $dataValue->getValue() );
425
				return [ '<' . $url . '>', false ];
426
			case 'geo-shape':
427
				$url = $this->rdfVocabulary->getGeoShapeURI( $dataValue->getValue() );
428
				return [ '<' . $url . '>', false ];
429
			case 'tabular-data':
430
				$url = $this->rdfVocabulary->getTabularDataURI( $dataValue->getValue() );
431
				return [ '<' . $url . '>', false ];
432
			case 'url':
433
				$url = $dataValue->getValue();
434
				if ( !preg_match( '/^[^<>"{}\\\\|^`\\x00-\\x20]*$/D', $url ) ) {
435
					// not a valid URL for SPARQL (see SPARQL spec, production 139 IRIREF)
436
					// such an URL should never reach us, so just throw
437
					throw new InvalidArgumentException( 'invalid URL: ' . $url );
438
				}
439
				return [ '<' . $url . '>', false ];
440
			case 'wikibase-item':
441
			case 'wikibase-property':
442
				/** @var EntityIdValue $dataValue */
443
				return [ 'wd:' . $dataValue->getEntityId()->getSerialization(), false ];
444
			case 'monolingualtext':
445
				/** @var MonolingualTextValue $dataValue */
446
				$lang = $dataValue->getLanguageCode();
447
				if ( !preg_match( '/^[a-zA-Z]+(-[a-zA-Z0-9]+)*$/D', $lang ) ) {
448
					// not a valid language tag for SPARQL (see SPARQL spec, production 145 LANGTAG)
449
					// such a language tag should never reach us, so just throw
450
					throw new InvalidArgumentException( 'invalid language tag: ' . $lang );
451
				}
452
				return [ $this->stringLiteral( $dataValue->getText() ) . '@' . $lang, false ];
453
			case 'globe-coordinate':
454
			case 'quantity':
455
			case 'time':
456
				return [ 'wdv:' . $dataValue->getHash(), true ];
457
			default:
458
				throw new InvalidArgumentException( 'unknown data type: ' . $dataType );
459
		}
460
	}
461
	// @codingStandardsIgnoreEnd
462
463
	/**
464
	 * @param string $text
465
	 * @param string $regex
466
	 *
467
	 * @return boolean
468
	 * @throws SparqlHelperException if the query times out or some other error occurs
469
	 * @throws ConstraintParameterException if the $regex is invalid
470
	 */
471
	public function matchesRegularExpression( $text, $regex ) {
472
		// caching wrapper around matchesRegularExpressionWithSparql
473
474
		$textHash = hash( 'sha256', $text );
475
		$cacheKey = $this->cache->makeKey(
476
			'WikibaseQualityConstraints', // extension
477
			'regex', // action
478
			'WDQS-Java', // regex flavor
479
			hash( 'sha256', $regex )
480
		);
481
		$cacheMapSize = $this->config->get( 'WBQualityConstraintsFormatCacheMapSize' );
482
483
		$cacheMapArray = $this->cache->getWithSetCallback(
484
			$cacheKey,
485
			WANObjectCache::TTL_DAY,
486
			function( $cacheMapArray ) use ( $text, $regex, $textHash, $cacheMapSize ) {
487
				// Initialize the cache map if not set
488
				if ( $cacheMapArray === false ) {
489
					$key = 'wikibase.quality.constraints.regex.cache.refresh.init';
490
					$this->dataFactory->increment( $key );
491
					return [];
492
				}
493
494
				$key = 'wikibase.quality.constraints.regex.cache.refresh';
495
				$this->dataFactory->increment( $key );
496
				$cacheMap = MapCacheLRU::newFromArray( $cacheMapArray, $cacheMapSize );
497
				if ( $cacheMap->has( $textHash ) ) {
498
					$key = 'wikibase.quality.constraints.regex.cache.refresh.hit';
499
					$this->dataFactory->increment( $key );
500
					$cacheMap->get( $textHash ); // ping cache
501
				} else {
502
					$key = 'wikibase.quality.constraints.regex.cache.refresh.miss';
503
					$this->dataFactory->increment( $key );
504
					try {
505
						$matches = $this->matchesRegularExpressionWithSparql( $text, $regex );
506
					} catch ( ConstraintParameterException $e ) {
507
						$matches = $this->serializeConstraintParameterException( $e );
508
					} catch ( SparqlHelperException $e ) {
509
						// don’t cache this
510
						return $cacheMap->toArray();
511
					}
512
					$cacheMap->set(
513
						$textHash,
514
						$matches,
515
						3 / 8
516
					);
517
				}
518
519
				return $cacheMap->toArray();
520
			},
521
			[
522
				// Once map is > 1 sec old, consider refreshing
523
				'ageNew' => 1,
524
				// Update 5 seconds after "ageNew" given a 1 query/sec cache check rate
525
				'hotTTR' => 5,
526
				// avoid querying cache servers multiple times in a request
527
				// (e. g. when checking format of a reference URL used multiple times on an entity)
528
				'pcTTL' => WANObjectCache::TTL_PROC_LONG,
529
			]
530
		);
531
532
		if ( isset( $cacheMapArray[$textHash] ) ) {
533
			$key = 'wikibase.quality.constraints.regex.cache.hit';
534
			$this->dataFactory->increment( $key );
535
			$matches = $cacheMapArray[$textHash];
536
			if ( is_bool( $matches ) ) {
537
				return $matches;
538
			} elseif ( is_array( $matches ) &&
539
				$matches['type'] == ConstraintParameterException::class ) {
540
				throw $this->deserializeConstraintParameterException( $matches );
541
			} else {
542
				throw new MWException(
543
					'Value of unknown type in object cache (' .
544
					'cache key: ' . $cacheKey . ', ' .
545
					'cache map key: ' . $textHash . ', ' .
546
					'value type: ' . gettype( $matches ) . ')'
547
				);
548
			}
549
		} else {
550
			$key = 'wikibase.quality.constraints.regex.cache.miss';
551
			$this->dataFactory->increment( $key );
552
			return $this->matchesRegularExpressionWithSparql( $text, $regex );
553
		}
554
	}
555
556
	private function serializeConstraintParameterException( ConstraintParameterException $cpe ) {
557
		return [
558
			'type' => ConstraintParameterException::class,
559
			'violationMessage' => $this->violationMessageSerializer->serialize( $cpe->getViolationMessage() ),
560
		];
561
	}
562
563
	private function deserializeConstraintParameterException( array $serialization ) {
564
		$message = $this->violationMessageDeserializer->deserialize(
565
			$serialization['violationMessage']
566
		);
567
		return new ConstraintParameterException( $message );
568
	}
569
570
	/**
571
	 * This function is only public for testing purposes;
572
	 * use matchesRegularExpression, which is equivalent but caches results.
573
	 *
574
	 * @param string $text
575
	 * @param string $regex
576
	 *
577
	 * @return boolean
578
	 * @throws SparqlHelperException if the query times out or some other error occurs
579
	 * @throws ConstraintParameterException if the $regex is invalid
580
	 */
581
	public function matchesRegularExpressionWithSparql( $text, $regex ) {
582
		$textStringLiteral = $this->stringLiteral( $text );
583
		$regexStringLiteral = $this->stringLiteral( '^(?:' . $regex . ')$' );
584
585
		$query = <<<EOF
586
SELECT (REGEX($textStringLiteral, $regexStringLiteral) AS ?matches) {}
587
EOF;
588
589
		$result = $this->runQuery( $query, false );
590
591
		$vars = $result->getArray()['results']['bindings'][0];
592
		if ( array_key_exists( 'matches', $vars ) ) {
593
			// true or false ⇒ regex okay, text matches or not
594
			return $vars['matches']['value'] === 'true';
595
		} else {
596
			// empty result: regex broken
597
			throw new ConstraintParameterException(
598
				( new ViolationMessage( 'wbqc-violation-message-parameter-regex' ) )
599
					->withInlineCode( $regex, Role::CONSTRAINT_PARAMETER_VALUE )
600
			);
601
		}
602
	}
603
604
	/**
605
	 * Check whether the text content of an error response indicates a query timeout.
606
	 *
607
	 * @param string $responseContent
608
	 *
609
	 * @return boolean
610
	 */
611
	public function isTimeout( $responseContent ) {
612
		$timeoutRegex = implode( '|', array_map(
613
			function ( $fqn ) {
614
				return preg_quote( $fqn, '/' );
615
			},
616
			$this->config->get( 'WBQualityConstraintsSparqlTimeoutExceptionClasses' )
617
		) );
618
		return (bool)preg_match( '/' . $timeoutRegex . '/', $responseContent );
619
	}
620
621
	/**
622
	 * Return the max-age of a cached response,
623
	 * or a boolean indicating whether the response was cached or not.
624
	 *
625
	 * @param array $responseHeaders see MWHttpRequest::getResponseHeaders()
626
	 *
627
	 * @return int|boolean the max-age (in seconds)
628
	 * or a plain boolean if no max-age can be determined
629
	 */
630
	public function getCacheMaxAge( $responseHeaders ) {
631
		if (
632
			array_key_exists( 'x-cache-status', $responseHeaders ) &&
633
			preg_match( '/^hit(?:-.*)?$/', $responseHeaders['x-cache-status'][0] )
634
		) {
635
			$maxage = [];
636
			if (
637
				array_key_exists( 'cache-control', $responseHeaders ) &&
638
				preg_match( '/\bmax-age=(\d+)\b/', $responseHeaders['cache-control'][0], $maxage )
639
			) {
640
				return intval( $maxage[1] );
641
			} else {
642
				return true;
643
			}
644
		} else {
645
			return false;
646
		}
647
	}
648
649
	/**
650
	 * Get the delay date of a 429 headered response, which is caused by
651
	 * throttling of to many SPARQL-Requests. The header-format is defined
652
	 * in RFC7231 see: https://tools.ietf.org/html/rfc7231#section-7.1.3
653
	 *
654
	 * @param MWHttpRequest $request
655
	 *
656
	 * @return int|ConvertibleTimestamp
657
	 * or SparlHelper::NO_RETRY_AFTER if there is no Retry-After header
658
	 * or SparlHelper::EMPTY_RETRY_AFTER if there is an empty Retry-After
659
	 * or SparlHelper::INVALID_RETRY_AFTER if there is something wrong with the format
660
	 */
661
	public function getThrottling( MWHttpRequest $request ) {
662
		$retryAfterValue = $request->getResponseHeader( 'Retry-After' );
663
		if ( $retryAfterValue === null ) {
664
			return self::NO_RETRY_AFTER;
665
		}
666
667
		$trimmedRetryAfterValue = trim( $retryAfterValue );
668
		if ( empty( $trimmedRetryAfterValue ) ) {
669
			return self::EMPTY_RETRY_AFTER;
670
		}
671
672
		if ( is_numeric( $trimmedRetryAfterValue ) ) {
673
			$delaySeconds = (int)$trimmedRetryAfterValue;
674
			if ( $delaySeconds >= 0 ) {
675
				return $this->getTimestampInFuture( new DateInterval( 'PT' . $delaySeconds . 'S' ) );
676
			}
677
		} else {
678
			$return = strtotime( $trimmedRetryAfterValue );
679
			if ( !empty( $return ) ) {
680
				return new ConvertibleTimestamp( $return );
681
			}
682
		}
683
		return self::INVALID_RETRY_AFTER;
684
	}
685
686
	private function getTimestampInFuture( DateInterval $delta ) {
687
		$now = new ConvertibleTimestamp();
688
		return new ConvertibleTimestamp( $now->timestamp->add( $delta ) );
689
	}
690
691
	/**
692
	 * Runs a query against the configured endpoint and returns the results.
693
	 * TODO: See if Sparql Client in core can be used instead of rolling our own
694
	 *
695
	 * @param string $query The query, unencoded (plain string).
696
	 * @param bool $needsPrefixes Whether the query requires prefixes or they can be omitted.
697
	 *
698
	 * @return CachedQueryResults
699
	 *
700
	 * @throws SparqlHelperException if the query times out or some other error occurs
701
	 */
702
	public function runQuery( $query, $needsPrefixes = true ) {
703
704
		if ( $this->throttlingLock->isLocked( self::EXPIRY_LOCK_ID ) ) {
705
			$this->dataFactory->increment( 'wikibase.quality.constraints.sparql.throttling' );
706
			throw new TooManySparqlRequestsException();
707
		}
708
709
		$endpoint = $this->config->get( 'WBQualityConstraintsSparqlEndpoint' );
710
		$maxQueryTimeMillis = $this->config->get( 'WBQualityConstraintsSparqlMaxMillis' );
711
		$fallbackBlockDuration = (int)$this->config->get( 'WBQualityConstraintsSparqlThrottlingFallbackDuration' );
712
713
		if ( $fallbackBlockDuration < 0 ) {
714
			throw new InvalidArgumentException( 'Fallback duration must be positive int but is: '.
715
				$fallbackBlockDuration );
716
		}
717
718
		if ( $this->config->get( 'WBQualityConstraintsSparqlHasWikibaseSupport' ) ) {
719
			$needsPrefixes = false;
720
		}
721
722
		if ( $needsPrefixes ) {
723
			$query = $this->prefixes . $query;
724
		}
725
		$query = "#wbqc\n" . $query;
726
727
		$url = $endpoint . '?' . http_build_query(
728
			[
729
				'query' => $query,
730
				'format' => 'json',
731
				'maxQueryTimeMillis' => $maxQueryTimeMillis,
732
			],
733
			null, ini_get( 'arg_separator.output' ),
734
			// encode spaces with %20, not +
735
			PHP_QUERY_RFC3986
736
		);
737
738
		$options = [
739
			'method' => 'GET',
740
			'timeout' => (int)round( ( $maxQueryTimeMillis + 1000 ) / 1000 ),
741
			'connectTimeout' => 'default',
742
			'userAgent' => $this->defaultUserAgent,
743
		];
744
		$request = $this->requestFactory->create( $url, $options );
745
		$startTime = microtime( true );
746
		$status = $request->execute();
747
		$endTime = microtime( true );
748
		$this->dataFactory->timing(
749
			'wikibase.quality.constraints.sparql.timing',
750
			( $endTime - $startTime ) * 1000
751
		);
752
753
		if ( $request->getStatus() === self::HTTP_TOO_MANY_REQUESTS ) {
754
			$this->dataFactory->increment( 'wikibase.quality.constraints.sparql.throttling' );
755
			$throttlingUntil = $this->getThrottling( $request );
756
			if ( !( $throttlingUntil instanceof ConvertibleTimestamp ) ) {
0 ignored issues
show
Bug introduced by
The class Wikimedia\Timestamp\ConvertibleTimestamp does not exist. Did you forget a USE statement, or did you not list all dependencies?

This error could be the result of:

1. Missing dependencies

PHP Analyzer uses your composer.json file (if available) to determine the dependencies of your project and to determine all the available classes and functions. It expects the composer.json to be in the root folder of your repository.

Are you sure this class is defined by one of your dependencies, or did you maybe not list a dependency in either the require or require-dev section?

2. Missing use statement

PHP does not complain about undefined classes in ìnstanceof checks. For example, the following PHP code will work perfectly fine:

if ($x instanceof DoesNotExist) {
    // Do something.
}

If you have not tested against this specific condition, such errors might go unnoticed.

Loading history...
757
				$this->loggingHelper->logSparqlHelperTooManyRequestsRetryAfterInvalid( $request );
0 ignored issues
show
Bug introduced by
The method logSparqlHelperTooManyRequestsRetryAfterInvalid() does not seem to exist on object<Psr\Log\LoggerInterface>.

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
758
				$this->throttlingLock->lock(
759
					self::EXPIRY_LOCK_ID,
760
					$this->getTimestampInFuture( new DateInterval( 'PT' . $fallbackBlockDuration . 'S' ) )
761
				);
762
			} else {
763
				$this->loggingHelper->logSparqlHelperTooManyRequestsRetryAfterPresent( $throttlingUntil, $request );
0 ignored issues
show
Bug introduced by
The method logSparqlHelperTooManyRequestsRetryAfterPresent() does not seem to exist on object<Psr\Log\LoggerInterface>.

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
764
				$this->throttlingLock->lock( self::EXPIRY_LOCK_ID, $throttlingUntil );
765
			}
766
			throw new TooManySparqlRequestsException();
767
		}
768
769
		$maxAge = $this->getCacheMaxAge( $request->getResponseHeaders() );
770
		if ( $maxAge ) {
771
			$this->dataFactory->increment( 'wikibase.quality.constraints.sparql.cached' );
772
		}
773
774
		if ( $status->isOK() ) {
775
			$json = $request->getContent();
776
			$arr = json_decode( $json, true );
777
			return new CachedQueryResults(
778
				$arr,
779
				Metadata::ofCachingMetadata(
780
					$maxAge ?
781
						CachingMetadata::ofMaximumAgeInSeconds( $maxAge ) :
0 ignored issues
show
Bug introduced by
It seems like $maxAge defined by $this->getCacheMaxAge($r...->getResponseHeaders()) on line 769 can also be of type boolean; however, WikibaseQuality\Constrai...ofMaximumAgeInSeconds() does only seem to accept integer, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
782
						CachingMetadata::fresh()
783
				)
784
			);
785
		} else {
786
			$this->dataFactory->increment( 'wikibase.quality.constraints.sparql.error' );
787
788
			$this->dataFactory->increment(
789
				"wikibase.quality.constraints.sparql.error.http.{$request->getStatus()}"
790
			);
791
792
			if ( $this->isTimeout( $request->getContent() ) ) {
793
				$this->dataFactory->increment(
794
					'wikibase.quality.constraints.sparql.error.timeout'
795
				);
796
			}
797
798
			throw new SparqlHelperException();
799
		}
800
	}
801
802
}
803