Passed
Push — master ( 137960...4d87d9 )
by Sys
11:11
created

SchemaExtractor   B

Complexity

Total Complexity 49

Size/Duplication

Total Lines 357
Duplicated Lines 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
wmc 49
eloc 176
c 1
b 0
f 0
dl 0
loc 357
rs 8.48

11 Methods

Rating   Name   Duplication   Size   Complexity  
A parseVersion() 0 19 3
B parseReturnTypes() 0 37 8
B parseNode() 0 28 9
A fromVersion() 0 15 4
B parseFieldTypes() 0 29 6
A parseFields() 0 31 4
A generateElement() 0 23 4
A __construct() 0 4 1
A fromFile() 0 16 3
A extract() 0 28 5
A fromUrl() 0 11 2

How to fix   Complexity   

Complex Class

Complex classes like SchemaExtractor often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use SchemaExtractor, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
4
namespace TgScraper\Common;
5
6
7
use Composer\InstalledVersions;
8
use InvalidArgumentException;
9
use JetBrains\PhpStorm\ArrayShape;
10
use OutOfBoundsException;
11
use PHPHtmlParser\Dom;
12
use PHPHtmlParser\Exceptions\ChildNotFoundException;
13
use PHPHtmlParser\Exceptions\CircularException;
14
use PHPHtmlParser\Exceptions\ContentLengthException;
15
use PHPHtmlParser\Exceptions\LogicalException;
16
use PHPHtmlParser\Exceptions\NotLoadedException;
17
use PHPHtmlParser\Exceptions\ParentNotFoundException;
18
use PHPHtmlParser\Exceptions\StrictException;
19
use Psr\Http\Client\ClientExceptionInterface;
20
use Psr\Log\LoggerInterface;
21
use TgScraper\Constants\Versions;
22
use Throwable;
23
24
/**
25
 * Class SchemaExtractor
26
 * @package TgScraper\Common
27
 */
28
class SchemaExtractor
29
{
30
31
    /**
32
     * Additional methods with boolean return value.
33
     */
34
    private const BOOL_RETURNS = [
35
        'answerShippingQuery',
36
        'answerPreCheckoutQuery'
37
    ];
38
39
    /**
40
     * @var string
41
     */
42
    private string $version;
43
44
    /**
45
     * SchemaExtractor constructor.
46
     * @param LoggerInterface $logger
47
     * @param Dom $dom
48
     * @throws ChildNotFoundException
49
     * @throws NotLoadedException
50
     */
51
    public function __construct(private LoggerInterface $logger, private Dom $dom)
52
    {
53
        $this->version = $this->parseVersion();
54
        $this->logger->info('Bot API version: ' . $this->version);
55
    }
56
57
58
    /**
59
     * @param LoggerInterface $logger
60
     * @param string $version
61
     * @return SchemaExtractor
62
     * @throws OutOfBoundsException
63
     * @throws Throwable
64
     */
65
    public static function fromVersion(LoggerInterface $logger, string $version = Versions::LATEST): SchemaExtractor
66
    {
67
        if (InstalledVersions::isInstalled('sysbot/tgscraper-cache') and class_exists('\TgScraper\Cache\CacheLoader')) {
68
            $logger->info('Cache package detected, searching for a cached version.');
69
            try {
70
                $path = \TgScraper\Cache\CacheLoader::getCachedVersion($version);
1 ignored issue
show
Bug introduced by
The type TgScraper\Cache\CacheLoader was not found. Maybe you did not declare it correctly or list all dependencies?

The issue could also be caused by a filter entry in the build configuration. If the path has been excluded in your configuration, e.g. excluded_paths: ["lib/*"], you can move it to the dependency path list as follows:

filter:
    dependency_paths: ["lib/*"]

For further information see https://scrutinizer-ci.com/docs/tools/php/php-scrutinizer/#list-dependency-paths

Loading history...
71
                $logger->info('Cached version found.');
72
                return self::fromFile($logger, $path);
73
            } catch (OutOfBoundsException) {
74
                $logger->info('Cached version not found, continuing with URL.');
75
            }
76
        }
77
        $url = Versions::getUrlFromText($version);
78
        $logger->info(sprintf('Using URL: %s', $url));
79
        return self::fromUrl($logger, $url);
80
    }
81
82
    /**
83
     * @param LoggerInterface $logger
84
     * @param string $path
85
     * @return SchemaExtractor
86
     * @throws Throwable
87
     */
88
    public static function fromFile(LoggerInterface $logger, string $path): SchemaExtractor
89
    {
90
        $dom = new Dom;
91
        if (!file_exists($path)) {
92
            throw new InvalidArgumentException('File not found');
93
        }
94
        $path = realpath($path);
95
        try {
96
            $logger->info(sprintf('Loading data from file "%s".', $path));
97
            $dom->loadFromFile($path);
98
            $logger->info('Data loaded.');
99
        } catch (Throwable $e) {
100
            $logger->critical(sprintf('Unable to load data from "%s": %s', $path, $e->getMessage()));
101
            throw $e;
102
        }
103
        return new self($logger, $dom);
104
    }
105
106
    /**
107
     * @param LoggerInterface $logger
108
     * @param string $url
109
     * @return SchemaExtractor
110
     * @throws ChildNotFoundException
111
     * @throws CircularException
112
     * @throws ClientExceptionInterface
113
     * @throws ContentLengthException
114
     * @throws LogicalException
115
     * @throws StrictException
116
     * @throws NotLoadedException
117
     */
118
    public static function fromUrl(LoggerInterface $logger, string $url): SchemaExtractor
119
    {
120
        $dom = new Dom;
121
        try {
122
            $dom->loadFromURL($url);
123
        } catch (Throwable $e) {
124
            $logger->critical(sprintf('Unable to load data from URL "%s": %s', $url, $e->getMessage()));
125
            throw $e;
126
        }
127
        $logger->info(sprintf('Data loaded from "%s".', $url));
128
        return new self($logger, $dom);
129
    }
130
131
    /**
132
     * @throws ParentNotFoundException
133
     * @throws ChildNotFoundException
134
     */
135
    #[ArrayShape(['description' => "string", 'table' => "mixed", 'extended_by' => "array"])]
136
    private static function parseNode(Dom\Node\AbstractNode $node): ?array
137
    {
138
        $description = '';
139
        $table = null;
140
        $extendedBy = [];
141
        $tag = '';
142
        $sibling = $node;
143
        while (!str_starts_with($tag, 'h')) {
144
            $sibling = $sibling->nextSibling();
145
            $tag = $sibling?->tag?->name();
146
            if (empty($node->text()) or empty($tag) or $tag == 'text') {
147
                continue;
148
            } elseif ($tag == 'p') {
149
                $description .= PHP_EOL . $sibling->innerHtml();
150
            } elseif ($tag == 'ul') {
151
                $items = $sibling->find('li');
152
                /* @var Dom\Node\AbstractNode $item */
153
                foreach ($items as $item) {
154
                    $extendedBy[] = $item->innerText;
155
                }
156
                break;
157
            } elseif ($tag == 'table') {
158
                $table = $sibling->find('tbody')->find('tr');
159
                break;
160
            }
161
        }
162
        return ['description' => $description, 'table' => $table, 'extended_by' => $extendedBy];
163
    }
164
165
    /**
166
     * @throws ChildNotFoundException
167
     * @throws NotLoadedException
168
     */
169
    private function parseVersion(): string
170
    {
171
        /** @var Dom\Node\AbstractNode $element */
172
        $element = $this->dom->find('h3')[0];
173
        $tag = '';
174
        while ($tag != 'p') {
175
            try {
176
                $element = $element->nextSibling();
177
            } catch (ChildNotFoundException | ParentNotFoundException) {
178
                continue;
179
            }
180
            $tag = $element->tag->name();
181
        }
182
        $versionNumbers = explode('.', str_replace('Bot API ', '', $element->innerText));
183
        return sprintf(
184
            '%s.%s.%s',
185
            $versionNumbers[0] ?? '1',
186
            $versionNumbers[1] ?? '0',
187
            $versionNumbers[2] ?? '0'
188
        );
189
    }
190
191
    /**
192
     * @return array
193
     * @throws Throwable
194
     */
195
    #[ArrayShape(['version' => "string", 'methods' => "array", 'types' => "array"])]
196
    public function extract(): array
197
    {
198
        try {
199
            $elements = $this->dom->find('h4');
200
        } catch (Throwable $e) {
201
            $this->logger->critical(sprintf('Unable to parse data: %s', $e->getMessage()));
202
            throw $e;
203
        }
204
        $data = ['version' => $this->version];
205
        /* @var Dom\Node\AbstractNode $element */
206
        foreach ($elements as $element) {
207
            if (!str_contains($name = $element->text, ' ')) {
208
                $isMethod = lcfirst($name) == $name;
209
                $path = $isMethod ? 'methods' : 'types';
210
                ['description' => $description, 'table' => $table, 'extended_by' => $extendedBy] = self::parseNode(
211
                    $element
212
                );
213
                $data[$path][] = self::generateElement(
214
                    $name,
215
                    trim($description),
216
                    $table,
217
                    $extendedBy,
218
                    $isMethod
219
                );
220
            }
221
        }
222
        return $data;
223
    }
224
225
    /**
226
     * @param string $name
227
     * @param string $description
228
     * @param Dom\Node\Collection|null $unparsedFields
229
     * @param array $extendedBy
230
     * @param bool $isMethod
231
     * @return array
232
     * @throws ChildNotFoundException
233
     * @throws CircularException
234
     * @throws ContentLengthException
235
     * @throws LogicalException
236
     * @throws NotLoadedException
237
     * @throws StrictException
238
     */
239
    private static function generateElement(
240
        string $name,
241
        string $description,
242
        ?Dom\Node\Collection $unparsedFields,
243
        array $extendedBy,
244
        bool $isMethod
245
    ): array {
246
        $fields = self::parseFields($unparsedFields, $isMethod);
247
        $result = [
248
            'name' => $name,
249
            'description' => htmlspecialchars_decode(strip_tags($description), ENT_QUOTES),
250
            'fields' => $fields
251
        ];
252
        if ($isMethod) {
253
            $returnTypes = self::parseReturnTypes($description);
254
            if (empty($returnTypes) and in_array($name, self::BOOL_RETURNS)) {
255
                $returnTypes[] = 'bool';
256
            }
257
            $result['return_types'] = $returnTypes;
258
            return $result;
259
        }
260
        $result['extended_by'] = $extendedBy;
261
        return $result;
262
    }
263
264
    /**
265
     * @param Dom\Node\Collection|null $fields
266
     * @param bool $isMethod
267
     * @return array
268
     */
269
    private static function parseFields(?Dom\Node\Collection $fields, bool $isMethod): array
270
    {
271
        $parsedFields = [];
272
        $fields = $fields ?? [];
273
        foreach ($fields as $field) {
274
            /* @var Dom\Node\AbstractNode $fieldData */
275
            $fieldData = $field->find('td');
276
            $name = $fieldData[0]->text;
277
            if (empty($name)) {
278
                continue;
279
            }
280
            $parsedData = [
281
                'name' => $name,
282
                'type' => strip_tags($fieldData[1]->innerHtml)
283
            ];
284
            $parsedData['types'] = self::parseFieldTypes($parsedData['type']);
285
            unset($parsedData['type']);
286
            if ($isMethod) {
287
                $parsedData['optional'] = $fieldData[2]->text != 'Yes';
288
                $parsedData['description'] = htmlspecialchars_decode(
289
                    strip_tags($fieldData[3]->innerHtml ?? $fieldData[3]->text ?? ''),
290
                    ENT_QUOTES
291
                );
292
            } else {
293
                $description = htmlspecialchars_decode(strip_tags($fieldData[2]->innerHtml), ENT_QUOTES);
294
                $parsedData['optional'] = str_starts_with($description, 'Optional.');
295
                $parsedData['description'] = $description;
296
            }
297
            $parsedFields[] = $parsedData;
298
        }
299
        return $parsedFields;
300
    }
301
302
    /**
303
     * @param string $rawType
304
     * @return array
305
     */
306
    private static function parseFieldTypes(string $rawType): array
307
    {
308
        $types = [];
309
        foreach (explode(' or ', $rawType) as $rawOrType) {
310
            if (stripos($rawOrType, 'array') === 0) {
311
                $types[] = str_replace(' and', ',', $rawOrType);
312
                continue;
313
            }
314
            foreach (explode(' and ', $rawOrType) as $unparsedType) {
315
                $types[] = $unparsedType;
316
            }
317
        }
318
        $parsedTypes = [];
319
        foreach ($types as $type) {
320
            $type = trim(str_replace(['number', 'of'], '', $type));
321
            $multiplesCount = substr_count(strtolower($type), 'array');
322
            $parsedType = trim(
323
                str_replace(
324
                    ['Array', 'Integer', 'String', 'Boolean', 'Float', 'True'],
325
                    ['', 'int', 'string', 'bool', 'float', 'bool'],
326
                    $type
327
                )
328
            );
329
            for ($i = 0; $i < $multiplesCount; $i++) {
330
                $parsedType = sprintf('Array<%s>', $parsedType);
331
            }
332
            $parsedTypes[] = $parsedType;
333
        }
334
        return $parsedTypes;
335
    }
336
337
    /**
338
     * @param string $description
339
     * @return array
340
     * @throws ChildNotFoundException
341
     * @throws CircularException
342
     * @throws NotLoadedException
343
     * @throws StrictException
344
     * @throws ContentLengthException
345
     * @throws LogicalException
346
     * @noinspection PhpUndefinedFieldInspection
347
     */
348
    private static function parseReturnTypes(string $description): array
349
    {
350
        $returnTypes = [];
351
        $phrases = explode('.', $description);
352
        $phrases = array_filter(
353
            $phrases,
354
            function ($phrase) {
355
                return (false !== stripos($phrase, 'returns') or false !== stripos($phrase, 'is returned'));
356
            }
357
        );
358
        foreach ($phrases as $phrase) {
359
            $dom = new Dom;
360
            $dom->loadStr($phrase);
361
            $a = $dom->find('a');
362
            $em = $dom->find('em');
363
            foreach ($a as $element) {
364
                if ($element->text == 'Messages') {
365
                    $returnTypes[] = 'Array<Message>';
366
                    continue;
367
                }
368
369
                $multiplesCount = substr_count(strtolower($phrase), 'array');
370
                $returnType = $element->text;
371
                for ($i = 0; $i < $multiplesCount; $i++) {
372
                    $returnType = sprintf('Array<%s>', $returnType);
373
                }
374
                $returnTypes[] = $returnType;
375
            }
376
            foreach ($em as $element) {
377
                if (in_array($element->text, ['False', 'force', 'Array'])) {
378
                    continue;
379
                }
380
                $type = str_replace(['True', 'Int', 'String'], ['bool', 'int', 'string'], $element->text);
381
                $returnTypes[] = $type;
382
            }
383
        }
384
        return $returnTypes;
385
    }
386
387
}