Passed
Push — master ( bb114b...6dc021 )
by Alexey
03:23 queued 10s
created

ScraperUtil::convertDomNodeToText()   C

Complexity

Conditions 15
Paths 25

Size

Total Lines 37
Code Lines 26

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 23
CRAP Score 15.1152

Importance

Changes 1
Bugs 0 Features 0
Metric Value
eloc 26
dl 0
loc 37
ccs 23
cts 25
cp 0.92
rs 5.9166
c 1
b 0
f 0
cc 15
nc 25
nop 1
crap 15.1152

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
declare(strict_types=1);
4
5
/**
6
 * @author   Ne-Lexa
7
 * @license  MIT
8
 *
9
 * @see      https://github.com/Ne-Lexa/google-play-scraper
10
 */
11
12
namespace Nelexa\GPlay\Util;
13
14
/**
15
 * @internal
16
 */
17
class ScraperUtil
18
{
19
    /**
20
     * @param string $html
21
     *
22
     * @return array
23
     */
24 44
    public static function extractScriptData(string $html): array
25
    {
26 44
        $scripts = [];
27
28 44
        if (preg_match_all('/>AF_initDataCallback\((.*?)\);<\/script/s', $html, $matches)) {
29 44
            $scripts = array_reduce(
30 44
                $matches[0],
31 44
                static function ($carry, $item) {
32
                    if (
33 44
                        preg_match("/(ds:.*?)'/", $item, $keyMatch) &&
34 44
                        preg_match('/data:([\s\S]*?)(, }\);<\/|, sideChannel:)/', $item, $valueMatch)
35
                    ) {
36 44
                        $carry[$keyMatch[1]] = \GuzzleHttp\json_decode($valueMatch[1], true);
37
                    }
38
39 44
                    return $carry;
40 44
                },
41
                $scripts
42
            );
43
        }
44
45 44
        return $scripts;
46
    }
47
48
    /**
49
     * @param string $html
50
     *
51
     * @return \DOMDocument
52
     */
53 35
    public static function createDomDocument(string $html): \DOMDocument
54
    {
55 35
        $doc = new \DOMDocument();
56 35
        $internalErrors = libxml_use_internal_errors(true);
57
58 35
        if (!$doc->loadHTML('<?xml encoding="utf-8"?>' . $html)) {
59
            throw new
60
            \RuntimeException(
61
                'error load html: ' . $html
62
            );
63
        }
64 35
        libxml_use_internal_errors($internalErrors);
65
66 35
        return $doc;
67
    }
68
69
    /**
70
     * @param string $html
71
     *
72
     * @return string
73
     */
74 35
    public static function html2text(string $html): string
75
    {
76 35
        $doc = self::createDomDocument($html);
77 35
        $text = self::convertDomNodeToText($doc);
78 35
        $text = preg_replace('/\n{3,}/', "\n\n", trim($text));
79
80 35
        return trim($text);
81
    }
82
83
    /**
84
     * @param \DOMNode $node
85
     *
86
     * @return string
87
     */
88 35
    private static function convertDomNodeToText(\DOMNode $node): string
89
    {
90 35
        if ($node instanceof \DOMText) {
91 35
            $text = preg_replace('/\s+/', ' ', $node->wholeText);
92
        } else {
93 35
            $text = '';
94
95 35
            if ($node->childNodes !== null) {
96 35
                foreach ($node->childNodes as $childNode) {
97 35
                    $text .= self::convertDomNodeToText($childNode);
98
                }
99
            }
100
101 35
            switch ($node->nodeName) {
102 35
                case 'h1':
103 35
                case 'h2':
104 35
                case 'h3':
105 35
                case 'h4':
106 35
                case 'h5':
107 35
                case 'h6':
108 35
                case 'p':
109 35
                case 'ul':
110 35
                case 'div':
111 35
                    $text = "\n\n" . $text . "\n\n";
112 35
                    break;
113
114 35
                case 'li':
115
                    $text = '- ' . $text . "\n";
116
                    break;
117
118 35
                case 'br':
119 20
                    $text .= "\n";
120
                    break;
121
            }
122
        }
123
124 35
        return $text;
125
    }
126
}
127