Passed
Push — master ( 80ad96...45165d )
by Alexey
03:00
created

ScraperUtil::convertDomNodeToText()   C

Complexity

Conditions 15
Paths 25

Size

Total Lines 34
Code Lines 26

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
eloc 26
dl 0
loc 34
rs 5.9166
c 0
b 0
f 0
cc 15
nc 25
nop 1

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
declare(strict_types=1);
3
4
namespace Nelexa\GPlay\Util;
5
6
class ScraperUtil
7
{
8
    /**
9
     * @param string $html
10
     * @return array
11
     */
12
    public static function extractScriptData(string $html): array
13
    {
14
        $scripts = [];
15
        if (preg_match_all('/>AF_initDataCallback[\s\S]*?<\/script/', $html, $matches)) {
16
            $scripts = array_reduce($matches[0], static function ($carry, $item) {
17
                if (
18
                    preg_match("/(ds:.*?)'/", $item, $keyMatch) &&
19
                    preg_match('/return ([\s\S]*?)}}\);<\//', $item, $valueMatch)
20
                ) {
21
                    $carry[$keyMatch[1]] = \GuzzleHttp\json_decode($valueMatch[1], true);
22
                }
23
                return $carry;
24
            }, $scripts);
25
        }
26
        return $scripts;
27
    }
28
29
    /**
30
     * @param string $html
31
     * @return string
32
     */
33
    public static function html2text(string $html): string
34
    {
35
        $doc = new \DOMDocument();
36
        $internalErrors = libxml_use_internal_errors(true);
37
        if (!$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html)) {
38
            throw new \RuntimeException('error load html: ' . $html);
39
        }
40
        libxml_use_internal_errors($internalErrors);
41
        $text = self::convertDomNodeToText($doc);
42
        $text = preg_replace('/\n{3,}/', "\n\n", trim($text));
43
        return trim($text);
44
    }
45
46
    /**
47
     * @param \DOMNode $node
48
     * @return string
49
     */
50
    private static function convertDomNodeToText(\DOMNode $node): string
51
    {
52
        if ($node instanceof \DOMText) {
53
            $text = preg_replace('/\s+/', ' ', $node->wholeText);
54
        } else {
55
            $text = '';
56
            if ($node->childNodes !== null) {
57
                foreach ($node->childNodes as $childNode) {
58
                    $text .= self::convertDomNodeToText($childNode);
59
                }
60
            }
61
62
            switch ($node->nodeName) {
63
                case 'h1':
64
                case 'h2':
65
                case 'h3':
66
                case 'h4':
67
                case 'h5':
68
                case 'h6':
69
                case 'p':
70
                case 'ul':
71
                case 'div':
72
                    $text = "\n\n" . $text . "\n\n";
73
                    break;
74
                case 'li':
75
                    $text = '- ' . $text . "\n";
76
                    break;
77
                case 'br':
78
                    $text .= "\n";
79
                    break;
80
            }
81
        }
82
83
        return $text;
84
    }
85
}
86