Passed
Push — master ( 73c322...402528 )
by Darko
12:47
created

AbstractAdultProviderPipe   F

Complexity

Total Complexity 124

Size/Duplication

Total Lines 845
Duplicated Lines 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
wmc 124
eloc 357
c 1
b 0
f 0
dl 0
loc 845
rs 2

31 Methods

Rating   Name   Duplication   Size   Complexity  
A getHttpClient() 0 22 2
A extractSynopsis() 0 3 1
A cleanTitleForComparison() 0 19 2
A isErrorPage() 0 19 3
A shouldSkip() 0 3 1
A extractCast() 0 3 1
A applyRateLimit() 0 13 3
A extractProductInfo() 0 3 1
A __construct() 0 2 1
F handleAgeVerification() 0 99 20
A setEchoOutput() 0 4 1
A getHtmlParser() 0 6 2
D extractAgeVerificationFormData() 0 56 18
A handle() 0 47 5
A calculateSimilarity() 0 20 2
A outputNotFound() 0 7 2
F fetchHtml() 0 114 23
A extractOpenGraph() 0 20 4
A getCachedSearch() 0 17 4
A cacheSearchResult() 0 8 2
A getAgeVerificationManager() 0 6 2
B requiresAgeVerification() 0 59 7
A getPriority() 0 3 1
A extractGenres() 0 3 1
A extractCovers() 0 3 1
A getDefaultHeaders() 0 14 1
B extractJsonLd() 0 17 7
A getRandomUserAgent() 0 12 1
A extractTrailers() 0 3 1
A outputMatch() 0 7 2
A getColorCli() 0 6 2

How to fix   Complexity   

Complex Class

Complex classes like AbstractAdultProviderPipe often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use AbstractAdultProviderPipe, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace App\Services\AdultProcessing\Pipes;
4
5
use App\Services\AdultProcessing\AdultProcessingPassable;
6
use App\Services\AdultProcessing\AdultProcessingResult;
7
use App\Services\AdultProcessing\AgeVerificationManager;
8
use Blacklight\ColorCLI;
9
use Closure;
10
use GuzzleHttp\Client;
11
use GuzzleHttp\Cookie\CookieJar;
12
use GuzzleHttp\Cookie\FileCookieJar;
13
use GuzzleHttp\Cookie\SetCookie;
14
use GuzzleHttp\Exception\ConnectException;
15
use GuzzleHttp\Exception\RequestException;
16
use GuzzleHttp\HandlerStack;
17
use GuzzleHttp\Middleware;
18
use GuzzleHttp\Psr7\Request;
19
use GuzzleHttp\Psr7\Response;
20
use Illuminate\Support\Facades\Cache;
21
use Illuminate\Support\Facades\Log;
22
use voku\helper\HtmlDomParser;
0 ignored issues
show
Bug introduced by
The type voku\helper\HtmlDomParser was not found. Maybe you did not declare it correctly or list all dependencies?

The issue could also be caused by a filter entry in the build configuration. If the path has been excluded in your configuration, e.g. excluded_paths: ["lib/*"], you can move it to the dependency path list as follows:

filter:
    dependency_paths: ["lib/*"]

For further information see https://scrutinizer-ci.com/docs/tools/php/php-scrutinizer/#list-dependency-paths

Loading history...
23
24
/**
25
 * Base class for adult movie processing pipe handlers.
26
 *
27
 * Each pipe is responsible for processing releases through a specific adult site provider.
28
 *
29
 * Note: This class intentionally uses lazy loading for HtmlDomParser to avoid
30
 * serialization issues with DOMDocument when using Laravel's Concurrency facade.
31
 */
32
abstract class AbstractAdultProviderPipe
33
{
34
    protected int $priority = 50;
35
    protected bool $echoOutput = true;
36
    protected ?ColorCLI $colorCli = null;
37
    protected ?HtmlDomParser $html = null;
38
    protected ?string $cookie = null;
39
40
    /**
41
     * Minimum similarity threshold for matching (percentage).
42
     */
43
    protected float $minimumSimilarity = 90.0;
44
45
    /**
46
     * HTTP client for making requests.
47
     */
48
    protected ?Client $httpClient = null;
49
50
    /**
51
     * Cookie jar for maintaining session cookies.
52
     */
53
    protected CookieJar|FileCookieJar|null $cookieJar = null;
54
55
    /**
56
     * Age verification manager for handling site-specific cookies.
57
     */
58
    protected ?AgeVerificationManager $ageVerificationManager = null;
59
60
    /**
61
     * Maximum number of retry attempts for failed requests.
62
     */
63
    protected int $maxRetries = 3;
64
65
    /**
66
     * Delay between retries in milliseconds.
67
     */
68
    protected int $retryDelay = 1000;
69
70
    /**
71
     * Rate limit delay between requests in milliseconds.
72
     */
73
    protected int $rateLimitDelay = 500;
74
75
    /**
76
     * Last request timestamp for rate limiting.
77
     */
78
    protected static array $lastRequestTime = [];
79
80
    /**
81
     * Cache duration for search results in minutes.
82
     */
83
    protected int $cacheDuration = 60;
84
85
    /**
86
     * Whether to use caching for this provider.
87
     */
88
    protected bool $useCache = true;
89
90
    public function __construct()
91
    {
92
        // Lazy load ColorCLI and HtmlDomParser to avoid serialization issues
93
    }
94
95
    /**
96
     * Get the ColorCLI instance (lazy loaded).
97
     */
98
    protected function getColorCli(): ColorCLI
99
    {
100
        if ($this->colorCli === null) {
101
            $this->colorCli = new ColorCLI();
102
        }
103
        return $this->colorCli;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->colorCli could return the type null which is incompatible with the type-hinted return Blacklight\ColorCLI. Consider adding an additional type-check to rule them out.
Loading history...
104
    }
105
106
    /**
107
     * Get the HtmlDomParser instance (lazy loaded).
108
     */
109
    protected function getHtmlParser(): HtmlDomParser
110
    {
111
        if ($this->html === null) {
112
            $this->html = new HtmlDomParser();
113
        }
114
        return $this->html;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->html could return the type null which is incompatible with the type-hinted return voku\helper\HtmlDomParser. Consider adding an additional type-check to rule them out.
Loading history...
115
    }
116
117
    /**
118
     * Handle the adult movie processing request.
119
     */
120
    public function handle(AdultProcessingPassable $passable, Closure $next): AdultProcessingPassable
121
    {
122
        // If we already have a match, skip processing
123
        if ($passable->shouldStopProcessing()) {
124
            return $next($passable);
125
        }
126
127
        // Set the cookie from passable
128
        $this->cookie = $passable->getCookie();
129
130
        // Skip if this provider shouldn't process
131
        if ($this->shouldSkip($passable)) {
132
            $passable->updateResult(
133
                AdultProcessingResult::skipped('Provider skipped', $this->getName()),
134
                $this->getName()
135
            );
136
            return $next($passable);
137
        }
138
139
        // Output processing message
140
        if ($this->echoOutput) {
141
            $this->getColorCli()->info('Checking '.$this->getDisplayName().' for movie info');
142
        }
143
144
        try {
145
            // Apply rate limiting
146
            $this->applyRateLimit();
147
148
            // Attempt to process with this provider
149
            $result = $this->process($passable);
150
151
            // Update the result
152
            $passable->updateResult($result, $this->getName());
153
        } catch (\Exception $e) {
154
            Log::error('Adult provider '.$this->getName().' failed: '.$e->getMessage(), [
155
                'provider' => $this->getName(),
156
                'title' => $passable->getCleanTitle(),
157
                'exception' => get_class($e),
158
            ]);
159
160
            $passable->updateResult(
161
                AdultProcessingResult::failed($e->getMessage(), $this->getName()),
162
                $this->getName()
163
            );
164
        }
165
166
        return $next($passable);
167
    }
168
169
    /**
170
     * Apply rate limiting between requests to the same provider.
171
     */
172
    protected function applyRateLimit(): void
173
    {
174
        $providerName = $this->getName();
175
        $now = microtime(true) * 1000;
176
177
        if (isset(self::$lastRequestTime[$providerName])) {
178
            $elapsed = $now - self::$lastRequestTime[$providerName];
179
            if ($elapsed < $this->rateLimitDelay) {
180
                usleep((int)(($this->rateLimitDelay - $elapsed) * 1000));
181
            }
182
        }
183
184
        self::$lastRequestTime[$providerName] = microtime(true) * 1000;
185
    }
186
187
    /**
188
     * Get the priority of this provider (lower = higher priority).
189
     */
190
    public function getPriority(): int
191
    {
192
        return $this->priority;
193
    }
194
195
    /**
196
     * Get the internal name of this provider.
197
     */
198
    abstract public function getName(): string;
199
200
    /**
201
     * Get the display name for user-facing output.
202
     */
203
    abstract public function getDisplayName(): string;
204
205
    /**
206
     * Get the base URL for the provider.
207
     */
208
    abstract protected function getBaseUrl(): string;
209
210
    /**
211
     * Attempt to process the movie through this provider.
212
     */
213
    abstract protected function process(AdultProcessingPassable $passable): AdultProcessingResult;
214
215
    /**
216
     * Search for a movie on this provider.
217
     *
218
     * @return array|false Returns array with 'title' and 'url' keys on success, false on failure
219
     */
220
    abstract protected function search(string $movie): array|false;
221
222
    /**
223
     * Get all movie information from the provider.
224
     */
225
    abstract protected function getMovieInfo(): array|false;
226
227
    /**
228
     * Check if this provider should be skipped for the given passable.
229
     */
230
    protected function shouldSkip(AdultProcessingPassable $passable): bool
231
    {
232
        return empty($passable->getCleanTitle());
233
    }
234
235
    /**
236
     * Set echo output flag.
237
     */
238
    public function setEchoOutput(bool $echo): self
239
    {
240
        $this->echoOutput = $echo;
241
        return $this;
242
    }
243
244
    /**
245
     * Get cached search result if available.
246
     */
247
    protected function getCachedSearch(string $movie): array|false|null
248
    {
249
        if (!$this->useCache) {
250
            return null;
251
        }
252
253
        $cacheKey = 'adult_search_' . $this->getName() . '_' . md5(strtolower($movie));
254
        $cached = Cache::get($cacheKey);
255
256
        if ($cached !== null) {
257
            if ($this->echoOutput) {
258
                $this->getColorCli()->info('Using cached result for: ' . $movie);
259
            }
260
            return $cached;
261
        }
262
263
        return null;
264
    }
265
266
    /**
267
     * Cache a search result.
268
     */
269
    protected function cacheSearchResult(string $movie, array|false $result): void
270
    {
271
        if (!$this->useCache) {
272
            return;
273
        }
274
275
        $cacheKey = 'adult_search_' . $this->getName() . '_' . md5(strtolower($movie));
276
        Cache::put($cacheKey, $result, now()->addMinutes($this->cacheDuration));
277
    }
278
279
    /**
280
     * Fetch raw HTML from a URL with retry support.
281
     */
282
    protected function fetchHtml(string $url, ?string $cookie = null, ?array $postData = null): string|false
283
    {
284
        $attempt = 0;
285
        $lastException = null;
286
        $ageVerificationAttempted = false;
287
288
        while ($attempt < $this->maxRetries) {
289
            try {
290
                $attempt++;
291
                $client = $this->getHttpClient();
292
293
                $options = [
294
                    'headers' => $this->getDefaultHeaders(),
295
                ];
296
297
                // Add custom cookie if provided
298
                if ($cookie) {
299
                    $options['headers']['Cookie'] = $cookie;
300
                }
301
302
                // Handle POST data
303
                if ($postData !== null) {
304
                    $options['form_params'] = $postData;
305
                    $response = $client->post($url, $options);
306
                } else {
307
                    $response = $client->get($url, $options);
308
                }
309
310
                $body = $response->getBody()->getContents();
311
312
                // Check if we were redirected to an age verification page
313
                $finalUrl = $response->getHeaderLine('X-Guzzle-Redirect-History');
314
                if (empty($finalUrl)) {
315
                    // Use the effective URI if available
316
                    $effectiveUri = $response->getHeader('X-Guzzle-Redirect-History');
317
                    if (!empty($effectiveUri)) {
318
                        $finalUrl = end($effectiveUri);
0 ignored issues
show
Unused Code introduced by
The assignment to $finalUrl is dead and can be removed.
Loading history...
319
                    }
320
                }
321
322
                // Check for common error pages
323
                if ($this->isErrorPage($body)) {
324
                    Log::warning('Received error page from ' . $this->getName() . ': ' . $url);
325
                    if ($attempt < $this->maxRetries) {
326
                        usleep($this->retryDelay * 1000);
327
                        continue;
328
                    }
329
                    return false;
330
                }
331
332
                // Check for age verification requirement
333
                if ($this->requiresAgeVerification($body)) {
334
                    // If we haven't tried age verification yet, refresh cookies and retry
335
                    if (!$ageVerificationAttempted) {
336
                        $ageVerificationAttempted = true;
337
338
                        // Refresh cookies using the manager
339
                        $this->getAgeVerificationManager()->refreshCookies($this->getBaseUrl());
340
341
                        // Reset HTTP client to pick up new cookies
342
                        $this->httpClient = null;
343
                        $this->cookieJar = null;
344
345
                        Log::info('Refreshed age verification cookies for ' . $this->getName() . ', retrying...');
346
                        continue;
347
                    }
348
349
                    $body = $this->handleAgeVerification($url, $body);
350
                    if ($body === false) {
351
                        return false;
352
                    }
353
                }
354
355
                return $body;
356
357
            } catch (ConnectException $e) {
358
                $lastException = $e;
359
                Log::warning('Connection failed for ' . $this->getName() . ' (attempt ' . $attempt . '): ' . $e->getMessage());
360
361
                if ($attempt < $this->maxRetries) {
362
                    usleep($this->retryDelay * 1000 * $attempt); // Exponential backoff
363
                }
364
            } catch (RequestException $e) {
365
                $lastException = $e;
366
                $statusCode = $e->hasResponse() ? $e->getResponse()->getStatusCode() : 0;
367
368
                // Don't retry on 4xx client errors (except 429 rate limit)
369
                if ($statusCode >= 400 && $statusCode < 500 && $statusCode !== 429) {
370
                    Log::error('HTTP ' . $statusCode . ' for ' . $this->getName() . ': ' . $url);
371
                    return false;
372
                }
373
374
                Log::warning('Request failed for ' . $this->getName() . ' (attempt ' . $attempt . '): ' . $e->getMessage());
375
376
                if ($attempt < $this->maxRetries) {
377
                    // Longer delay for rate limit errors
378
                    $delay = $statusCode === 429 ? $this->retryDelay * 5 : $this->retryDelay * $attempt;
379
                    usleep($delay * 1000);
380
                }
381
            } catch (\Exception $e) {
382
                $lastException = $e;
383
                Log::error('Unexpected error for ' . $this->getName() . ': ' . $e->getMessage());
384
385
                if ($attempt < $this->maxRetries) {
386
                    usleep($this->retryDelay * 1000);
387
                }
388
            }
389
        }
390
391
        if ($lastException) {
392
            Log::error('All retry attempts failed for ' . $this->getName() . ': ' . $lastException->getMessage());
393
        }
394
395
        return false;
396
    }
397
398
    /**
399
     * Get default HTTP headers.
400
     */
401
    protected function getDefaultHeaders(): array
402
    {
403
        return [
404
            'User-Agent' => $this->getRandomUserAgent(),
405
            'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
406
            'Accept-Language' => 'en-US,en;q=0.9',
407
            'Accept-Encoding' => 'gzip, deflate, br',
408
            'Cache-Control' => 'no-cache',
409
            'Pragma' => 'no-cache',
410
            'Sec-Fetch-Dest' => 'document',
411
            'Sec-Fetch-Mode' => 'navigate',
412
            'Sec-Fetch-Site' => 'none',
413
            'Sec-Fetch-User' => '?1',
414
            'Upgrade-Insecure-Requests' => '1',
415
        ];
416
    }
417
418
    /**
419
     * Get a random user agent string.
420
     */
421
    protected function getRandomUserAgent(): string
422
    {
423
        $userAgents = [
424
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
425
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36',
426
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
427
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
428
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
429
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0',
430
        ];
431
432
        return $userAgents[array_rand($userAgents)];
433
    }
434
435
    /**
436
     * Check if the response is an error page.
437
     */
438
    protected function isErrorPage(string $html): bool
439
    {
440
        $errorPatterns = [
441
            'Access Denied',
442
            'Service Unavailable',
443
            '503 Service',
444
            '502 Bad Gateway',
445
            'temporarily unavailable',
446
            'maintenance mode',
447
            'rate limit exceeded',
448
        ];
449
450
        foreach ($errorPatterns as $pattern) {
451
            if (stripos($html, $pattern) !== false) {
452
                return true;
453
            }
454
        }
455
456
        return false;
457
    }
458
459
    /**
460
     * Check if the page requires age verification.
461
     */
462
    protected function requiresAgeVerification(string $html): bool
463
    {
464
        // First check if this looks like a proper content page
465
        // Content pages have actual movie info, cast, etc.
466
        $contentIndicators = [
467
            '<title>.*?DVD.*?</title>',
468
            'product-info',
469
            'movie-details',
470
            'cast-list',
471
            'genre-list',
472
            '"@type":\s*"Movie"',
473
            '"@type":\s*"Product"',
474
        ];
475
476
        foreach ($contentIndicators as $pattern) {
477
            if (preg_match('/' . $pattern . '/is', $html)) {
478
                return false; // This is a content page, not an age verification page
479
            }
480
        }
481
482
        // Check for short page that might just be a redirect/age gate
483
        if (strlen($html) < 500) {
484
            return true; // Very short response likely means we got redirected
485
        }
486
487
        // Now check for explicit age verification indicators
488
        $agePatterns = [
489
            'age verification',
490
            'are you 18',
491
            'are you over 18',
492
            'confirm your age',
493
            'enter your age',
494
            'must be 18',
495
            'age-gate',
496
            'ageGate',
497
            'AgeConfirmation', // PopPorn specific
498
            'ageConfirmationButton', // ADE specific
499
            'age-confirmation', // Generic
500
            'verify your age',
501
            'adult content warning',
502
            'I am 18 or older',
503
            'I am over 18',
504
            'this site contains adult',
505
        ];
506
507
        // Count how many patterns match - if multiple match on a short page, it's likely age verification
508
        $matchCount = 0;
509
        foreach ($agePatterns as $pattern) {
510
            if (stripos($html, $pattern) !== false) {
511
                $matchCount++;
512
                // If the page is relatively short and has an age pattern, it's probably an age gate
513
                if (strlen($html) < 10000) {
514
                    return true;
515
                }
516
            }
517
        }
518
519
        // If multiple patterns match, it's likely an age verification page
520
        return $matchCount >= 2;
521
    }
522
523
    /**
524
     * Handle age verification requirement.
525
     */
526
    protected function handleAgeVerification(string $url, string $html): string|false
527
    {
528
        // First, try to use site-specific cookies from the AgeVerificationManager
529
        $manager = $this->getAgeVerificationManager();
0 ignored issues
show
Unused Code introduced by
The assignment to $manager is dead and can be removed.
Loading history...
530
        $domain = parse_url($this->getBaseUrl(), PHP_URL_HOST);
531
        $domain = preg_replace('/^www\./', '', $domain);
532
533
        // Re-initialize cookies from the manager and retry
534
        if ($this->cookieJar) {
535
            // The manager already handles setting cookies, but let's ensure they're fresh
536
            Log::info('Attempting to handle age verification for ' . $this->getName() . ' with domain: ' . $domain);
537
        }
538
539
        // Try to find and submit age verification form
540
        $this->getHtmlParser()->loadHtml($html);
541
542
        // Look for common age verification form patterns
543
        $forms = $this->getHtmlParser()->find('form');
544
        foreach ($forms as $form) {
545
            $action = $form->action ?? '';
546
            $method = strtoupper($form->method ?? 'GET');
547
548
            // Check if this looks like an age verification form
549
            $formHtml = $form->innerHtml ?? '';
550
            if (stripos($formHtml, 'age') !== false || stripos($formHtml, '18') !== false ||
551
                stripos($formHtml, 'adult') !== false || stripos($formHtml, 'enter') !== false ||
552
                stripos($formHtml, 'confirm') !== false) {
553
                // Try to submit the form with age confirmation
554
                $postData = $this->extractAgeVerificationFormData($form);
555
556
                if (!empty($postData)) {
557
                    $submitUrl = $action;
558
                    if (!str_starts_with($submitUrl, 'http')) {
559
                        $submitUrl = $this->getBaseUrl() . '/' . ltrim($submitUrl, '/');
560
                    }
561
562
                    // Submit the age verification
563
                    try {
564
                        $response = $this->getHttpClient()->request($method, $submitUrl, [
565
                            'form_params' => $postData,
566
                            'headers' => $this->getDefaultHeaders(),
567
                        ]);
568
569
                        $body = $response->getBody()->getContents();
570
571
                        // Check if we still get age verification after submit
572
                        if (!$this->requiresAgeVerification($body)) {
573
                            return $body;
574
                        }
575
                    } catch (\Exception $e) {
576
                        Log::warning('Age verification submission failed for ' . $this->getName() . ': ' . $e->getMessage());
577
                    }
578
                }
579
            }
580
        }
581
582
        // Look for JavaScript-based age verification (click to enter)
583
        if (preg_match('/onclick\s*=\s*["\'].*?(enter|agree|confirm|over18|adult).*?["\']/i', $html) ||
584
            preg_match('/<a[^>]*href\s*=\s*["\']([^"\']*)["\'][^>]*>(Enter|I am over 18|Agree|Enter Site|I Agree)/i', $html, $matches)) {
585
            // Try to follow the link or simulate the click
586
            if (!empty($matches[1])) {
587
                $enterUrl = $matches[1];
588
                if (!str_starts_with($enterUrl, 'http')) {
589
                    $enterUrl = $this->getBaseUrl() . '/' . ltrim($enterUrl, '/');
590
                }
591
592
                try {
593
                    $response = $this->getHttpClient()->get($enterUrl, [
594
                        'headers' => $this->getDefaultHeaders(),
595
                    ]);
596
                    $body = $response->getBody()->getContents();
597
598
                    if (!$this->requiresAgeVerification($body)) {
599
                        return $body;
600
                    }
601
                } catch (\Exception $e) {
602
                    Log::warning('Age verification link follow failed for ' . $this->getName() . ': ' . $e->getMessage());
603
                }
604
            }
605
        }
606
607
        // If all else fails, try to just refetch the original URL
608
        // (sometimes the cookies from previous attempts work)
609
        try {
610
            $response = $this->getHttpClient()->get($url, [
611
                'headers' => $this->getDefaultHeaders(),
612
            ]);
613
            $body = $response->getBody()->getContents();
614
615
            if (!$this->requiresAgeVerification($body)) {
616
                return $body;
617
            }
618
        } catch (\Exception $e) {
619
            Log::warning('Age verification retry failed for ' . $this->getName() . ': ' . $e->getMessage());
620
        }
621
622
        // If we couldn't handle age verification, log and return false
623
        Log::warning('Could not handle age verification for ' . $this->getName() . ': ' . $url);
624
        return false;
625
    }
626
627
    /**
628
     * Extract form data for age verification submission.
629
     */
630
    protected function extractAgeVerificationFormData($form): array
631
    {
632
        $data = [];
633
634
        // Get all input fields
635
        foreach ($form->find('input') as $input) {
636
            $name = $input->name ?? '';
637
            $type = strtolower($input->type ?? 'text');
638
            $value = $input->value ?? '';
639
640
            if (empty($name)) {
641
                continue;
642
            }
643
644
            // Handle different input types
645
            switch ($type) {
646
                case 'hidden':
647
                    $data[$name] = $value;
648
                    break;
649
                case 'checkbox':
650
                    // Usually age verification checkboxes need to be checked
651
                    if (stripos($name, 'age') !== false || stripos($name, 'agree') !== false || stripos($name, 'confirm') !== false) {
652
                        $data[$name] = $value ?: '1';
653
                    }
654
                    break;
655
                case 'submit':
656
                    // Include submit button value if it has a name
657
                    if (!empty($value)) {
658
                        $data[$name] = $value;
659
                    }
660
                    break;
661
                default:
662
                    // For text inputs that might be age/birthdate
663
                    if (stripos($name, 'age') !== false || stripos($name, 'year') !== false) {
664
                        $data[$name] = '1990'; // Default to a valid birth year
665
                    }
666
            }
667
        }
668
669
        // Handle select elements (for birthdate selection)
670
        foreach ($form->find('select') as $select) {
671
            $name = $select->name ?? '';
672
            if (empty($name)) {
673
                continue;
674
            }
675
676
            if (stripos($name, 'year') !== false) {
677
                $data[$name] = '1990';
678
            } elseif (stripos($name, 'month') !== false) {
679
                $data[$name] = '01';
680
            } elseif (stripos($name, 'day') !== false) {
681
                $data[$name] = '01';
682
            }
683
        }
684
685
        return $data;
686
    }
687
688
    /**
689
     * Get the age verification manager instance.
690
     */
691
    protected function getAgeVerificationManager(): AgeVerificationManager
692
    {
693
        if ($this->ageVerificationManager === null) {
694
            $this->ageVerificationManager = new AgeVerificationManager();
695
        }
696
        return $this->ageVerificationManager;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->ageVerificationManager could return the type null which is incompatible with the type-hinted return App\Services\AdultProces...\AgeVerificationManager. Consider adding an additional type-check to rule them out.
Loading history...
697
    }
698
699
    /**
700
     * Get or create HTTP client with retry middleware.
701
     */
702
    protected function getHttpClient(): Client
703
    {
704
        if ($this->httpClient === null) {
705
            // Use the AgeVerificationManager to get proper cookie jar with age verification cookies
706
            $this->cookieJar = $this->getAgeVerificationManager()->getCookieJar($this->getBaseUrl());
707
708
            $this->httpClient = new Client([
709
                'timeout' => 30,
710
                'connect_timeout' => 15,
711
                'verify' => false,
712
                'cookies' => $this->cookieJar,
713
                'allow_redirects' => [
714
                    'max' => 5,
715
                    'strict' => false,
716
                    'referer' => true,
717
                    'track_redirects' => true,
718
                ],
719
                'http_errors' => true,
720
            ]);
721
        }
722
723
        return $this->httpClient;
0 ignored issues
show
Bug Best Practice introduced by
The expression return $this->httpClient could return the type null which is incompatible with the type-hinted return GuzzleHttp\Client. Consider adding an additional type-check to rule them out.
Loading history...
724
    }
725
726
    /**
727
     * Calculate similarity between two strings using multiple algorithms.
728
     */
729
    protected function calculateSimilarity(string $searchTerm, string $resultTitle): float
730
    {
731
        // Clean up both strings for comparison
732
        $cleanSearch = $this->cleanTitleForComparison($searchTerm);
733
        $cleanResult = $this->cleanTitleForComparison($resultTitle);
734
735
        // Calculate similarity using multiple methods
736
        similar_text($cleanSearch, $cleanResult, $similarTextPercent);
737
738
        // Also calculate Levenshtein distance based similarity
739
        $maxLen = max(strlen($cleanSearch), strlen($cleanResult));
740
        if ($maxLen > 0) {
741
            $levenshtein = levenshtein($cleanSearch, $cleanResult);
742
            $levenshteinPercent = (1 - ($levenshtein / $maxLen)) * 100;
743
        } else {
744
            $levenshteinPercent = 0;
745
        }
746
747
        // Use the higher of the two similarity scores
748
        return max($similarTextPercent, $levenshteinPercent);
749
    }
750
751
    /**
752
     * Clean a title for comparison purposes.
753
     */
754
    protected function cleanTitleForComparison(string $title): string
755
    {
756
        $title = strtolower($title);
757
        $title = str_replace('/XXX/', '', $title);
758
759
        // Remove common adult movie prefixes/suffixes
760
        $removePatterns = [
761
            '/\b(xxx|adult|porn|erotic|hd|4k|1080p|720p|dvdrip|webrip|bluray)\b/i',
762
            '/\(.*?\)/',
763
            '/\[.*?\]/',
764
            '/[._-]+/',
765
            '/\s+/',
766
        ];
767
768
        foreach ($removePatterns as $pattern) {
769
            $title = preg_replace($pattern, ' ', $title);
770
        }
771
772
        return trim($title);
773
    }
774
775
    /**
776
     * Extract movie information from the loaded HTML.
777
     */
778
    protected function extractCovers(): array
779
    {
780
        return [];
781
    }
782
783
    protected function extractSynopsis(): array
784
    {
785
        return [];
786
    }
787
788
    protected function extractCast(): array
789
    {
790
        return [];
791
    }
792
793
    protected function extractGenres(): array
794
    {
795
        return [];
796
    }
797
798
    protected function extractProductInfo(bool $extras = false): array
0 ignored issues
show
Unused Code introduced by
The parameter $extras is not used and could be removed. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unused  annotation

798
    protected function extractProductInfo(/** @scrutinizer ignore-unused */ bool $extras = false): array

This check looks for parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
799
    {
800
        return [];
801
    }
802
803
    protected function extractTrailers(): array
804
    {
805
        return [];
806
    }
807
808
    /**
809
     * Output match success message.
810
     */
811
    protected function outputMatch(string $title): void
812
    {
813
        if (! $this->echoOutput) {
814
            return;
815
        }
816
817
        $this->getColorCli()->primary('Found match on '.$this->getDisplayName().': '.$title);
818
    }
819
820
    /**
821
     * Output failure message.
822
     */
823
    protected function outputNotFound(): void
824
    {
825
        if (! $this->echoOutput) {
826
            return;
827
        }
828
829
        $this->getColorCli()->notice('No match found on '.$this->getDisplayName());
830
    }
831
832
    /**
833
     * Parse JSON-LD structured data from HTML.
834
     */
835
    protected function extractJsonLd(string $html): ?array
836
    {
837
        if (preg_match_all('/<script[^>]*type=["\']application\/ld\+json["\'][^>]*>(.*?)<\/script>/si', $html, $matches)) {
838
            foreach ($matches[1] as $json) {
839
                $data = json_decode(trim($json), true);
840
                if (json_last_error() === JSON_ERROR_NONE && is_array($data)) {
841
                    // Handle both single object and array of objects
842
                    if (isset($data['@type'])) {
843
                        return $data;
844
                    } elseif (isset($data[0]['@type'])) {
845
                        return $data[0];
846
                    }
847
                }
848
            }
849
        }
850
851
        return null;
852
    }
853
854
    /**
855
     * Extract Open Graph meta data from HTML.
856
     */
857
    protected function extractOpenGraph(string $html): array
858
    {
859
        $og = [];
860
        $this->getHtmlParser()->loadHtml($html);
861
862
        $metaTags = [
863
            'og:title' => 'title',
864
            'og:description' => 'description',
865
            'og:image' => 'image',
866
            'og:url' => 'url',
867
        ];
868
869
        foreach ($metaTags as $property => $key) {
870
            $meta = $this->getHtmlParser()->findOne('meta[property="' . $property . '"]');
871
            if ($meta && isset($meta->content)) {
872
                $og[$key] = trim($meta->content);
873
            }
874
        }
875
876
        return $og;
877
    }
878
}
879
880