Text   F
last analyzed

Complexity

Total Complexity 75

Size/Duplication

Total Lines 441
Duplicated Lines 0 %

Test Coverage

Coverage 11.9%

Importance

Changes 2
Bugs 1 Features 0
Metric Value
wmc 75
eloc 168
dl 0
loc 441
ccs 20
cts 168
cp 0.119
rs 2.4
c 2
b 1
f 0

13 Methods

Rating   Name   Duplication   Size   Complexity  
A truncateOnWord() 0 11 2
A extractTextFromTags() 0 16 4
A truncate() 0 11 2
A smartStripTags() 0 8 1
A stopWordsForLanguage() 0 15 3
A extractSummary() 0 25 5
C extractTextFromField() 0 29 16
A extractKeywords() 0 25 5
B extractTextFromNeo() 0 31 10
A cleanupText() 0 21 3
B extractTextFromSuperTable() 0 38 11
A sanitizeUserInput() 0 20 2
B extractTextFromMatrix() 0 35 11

How to fix   Complexity   

Complex Class

Complex classes like Text often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Text, and based on these observations, apply Extract Interface, too.

1
<?php
2
/**
3
 * SEOmatic plugin for Craft CMS 3.x
4
 *
5
 * A turnkey SEO implementation for Craft CMS that is comprehensive, powerful,
6
 * and flexible
7
 *
8
 * @link      https://nystudio107.com
9
 * @copyright Copyright (c) 2017 nystudio107
10
 */
11
12
namespace nystudio107\seomatic\helpers;
13
14
use benf\neo\elements\Block as NeoBlock;
15
use benf\neo\elements\db\BlockQuery as NeoBlockQuery;
16
use craft\elements\db\MatrixBlockQuery;
17
use craft\elements\db\TagQuery;
18
use craft\elements\MatrixBlock;
19
use craft\elements\Tag;
20
use craft\helpers\HtmlPurifier;
21
use nystudio107\seomatic\helpers\Field as FieldHelper;
22
use nystudio107\seomatic\Seomatic;
23
use PhpScience\TextRank\TextRankFacade;
24
use PhpScience\TextRank\Tool\StopWords\StopWordsAbstract;
25
use Stringy\Stringy;
26
use verbb\doxter\Doxter;
27
use verbb\doxter\fields\data\DoxterData;
28
use verbb\supertable\elements\db\SuperTableBlockQuery;
29
use verbb\supertable\elements\SuperTableBlockElement as SuperTableBlock;
30
use verbb\supertable\models\SuperTableBlockTypeModel;
31
use yii\base\InvalidConfigException;
32
use function array_slice;
33
use function function_exists;
34
use function is_array;
35
36
/**
37
 * @author    nystudio107
38
 * @package   Seomatic
39
 * @since     3.0.0
40
 */
41
class Text
42
{
43
    // Constants
44
    // =========================================================================
45
46
    const LANGUAGE_MAP = [
47
        'en' => 'English',
48
        'fr' => 'French',
49
        'de' => 'German',
50
        'it' => 'Italian',
51
        'no' => 'Norwegian',
52
        'es' => 'Spanish',
53
    ];
54
55
    // Public Static Methods
56
    // =========================================================================
57
58
    /**
59
     * Truncates the string to a given length. If $substring is provided, and
60
     * truncating occurs, the string is further truncated so that the substring
61
     * may be appended without exceeding the desired length.
62
     *
63
     * @param string $string The string to truncate
64
     * @param int $length Desired length of the truncated string
65
     * @param string $substring The substring to append if it can fit
66
     *
67
     * @return string with the resulting $str after truncating
68
     */
69
    public static function truncate($string, $length, $substring = '…'): string
70
    {
71
        $result = $string;
72
73
        if (!empty($string)) {
74
            $string = HtmlPurifier::process($string, ['HTML.Allowed' => '']);
75
            $string = html_entity_decode($string, ENT_NOQUOTES, 'UTF-8');
76
            $result = (string)Stringy::create($string)->truncate($length, $substring);
77
        }
78
79
        return $result;
80
    }
81
82
    /**
83
     * Truncates the string to a given length, while ensuring that it does not
84
     * split words. If $substring is provided, and truncating occurs, the
85
     * string is further truncated so that the substring may be appended without
86
     * exceeding the desired length.
87
     *
88
     * @param string $string The string to truncate
89
     * @param int $length Desired length of the truncated string
90
     * @param string $substring The substring to append if it can fit
91
     *
92
     * @return string with the resulting $str after truncating
93
     */
94 1
    public static function truncateOnWord($string, $length, $substring = '…'): string
95
    {
96 1
        $result = $string;
97
98 1
        if (!empty($string)) {
99 1
            $string = HtmlPurifier::process($string, ['HTML.Allowed' => '']);
100 1
            $string = html_entity_decode($string, ENT_NOQUOTES, 'UTF-8');
101 1
            $result = (string)Stringy::create($string)->safeTruncate($length, $substring);
102
        }
103
104 1
        return $result;
105
    }
106
107
    /**
108
     * Extract plain old text from a field
109
     *
110
     * @param $field
111
     *
112
     * @return string
113
     */
114
    public static function extractTextFromField($field): string
115
    {
116
        if (empty($field)) {
117
            return '';
118
        }
119
        if ($field instanceof MatrixBlockQuery
120
            || (is_array($field) && $field[0] instanceof MatrixBlock)) {
121
            $result = self::extractTextFromMatrix($field);
122
        } elseif ($field instanceof NeoBlockQuery
123
            || (is_array($field) && $field[0] instanceof NeoBlock)) {
124
            $result = self::extractTextFromNeo($field);
125
        } elseif ($field instanceof SuperTableBlockQuery
126
            || (is_array($field) && $field[0] instanceof SuperTableBlock)) {
127
            $result = self::extractTextFromSuperTable($field);
128
        } elseif ($field instanceof TagQuery
129
            || (is_array($field) && $field[0] instanceof Tag)) {
130
            $result = self::extractTextFromTags($field);
131
        } elseif ($field instanceof DoxterData) {
132
            $result = self::smartStripTags(Doxter::$plugin->getService()->parseMarkdown($field->getRaw()));
133
        } else {
134
            if (is_array($field)) {
135
                $result = self::smartStripTags((string)$field[0]);
136
            } else {
137
                $result = self::smartStripTags((string)$field);
138
            }
139
        }
140
141
        //return $result;
142
        return self::sanitizeUserInput($result);
143
    }
144
145
    /**
146
     * Extract concatenated text from all of the tags in the $tagElement and
147
     * return as a comma-delimited string
148
     *
149
     * @param TagQuery|Tag[]|array $tags
150
     *
151
     * @return string
152
     */
153
    public static function extractTextFromTags($tags): string
154
    {
155
        if (empty($tags)) {
156
            return '';
157
        }
158
        $result = '';
159
        // Iterate through all of the matrix blocks
160
        if ($tags instanceof TagQuery) {
161
            $tags = $tags->all();
162
        }
163
        foreach ($tags as $tag) {
164
            $result .= $tag->title . ', ';
165
        }
166
        $result = rtrim($result, ', ');
167
168
        return $result;
169
    }
170
171
    /**
172
     * Extract text from all of the blocks in a matrix field, concatenating it
173
     * together.
174
     *
175
     * @param MatrixBlockQuery|MatrixBlock[]|array $blocks
176
     * @param string $fieldHandle
177
     *
178
     * @return string
179
     */
180
    public static function extractTextFromMatrix($blocks, $fieldHandle = ''): string
181
    {
182
        if (empty($blocks)) {
183
            return '';
184
        }
185
        $result = '';
186
        // Iterate through all of the matrix blocks
187
        if ($blocks instanceof MatrixBlockQuery) {
188
            $blocks = $blocks->all();
189
        }
190
        foreach ($blocks as $block) {
191
            try {
192
                $matrixBlockTypeModel = $block->getType();
193
            } catch (InvalidConfigException $e) {
194
                $matrixBlockTypeModel = null;
195
            }
196
            // Find any text fields inside of the matrix block
197
            if ($matrixBlockTypeModel) {
198
                $fieldClasses = FieldHelper::FIELD_CLASSES[FieldHelper::TEXT_FIELD_CLASS_KEY];
199
                $fields = $matrixBlockTypeModel->getFields();
200
201
                foreach ($fields as $field) {
202
                    /** @var array $fieldClasses */
203
                    foreach ($fieldClasses as $fieldClassKey) {
204
                        if ($field instanceof $fieldClassKey) {
205
                            if ($field->handle === $fieldHandle || empty($fieldHandle)) {
206
                                $result .= self::extractTextFromField($block[$field->handle]) . ' ';
207
                            }
208
                        }
209
                    }
210
                }
211
            }
212
        }
213
214
        return $result;
215
    }
216
217
    /**
218
     * Extract text from all of the blocks in a Neo field, concatenating it
219
     * together.
220
     *
221
     * @param NeoBlockQuery|NeoBlock[]|array $blocks
222
     * @param string $fieldHandle
223
     *
224
     * @return string
225
     */
226
    public static function extractTextFromNeo($blocks, $fieldHandle = ''): string
227
    {
228
        if (empty($blocks)) {
229
            return '';
230
        }
231
        $result = '';
232
        // Iterate through all of the matrix blocks
233
        if ($blocks instanceof NeoBlockQuery) {
234
            $blocks = $blocks->all();
235
        }
236
        foreach ($blocks as $block) {
237
            $layout = $block->getFieldLayout();
238
            // Find any text fields inside of the matrix block
239
            if ($layout) {
240
                $fieldClasses = FieldHelper::FIELD_CLASSES[FieldHelper::TEXT_FIELD_CLASS_KEY];
241
                $fields = $layout->getFields();
242
243
                foreach ($fields as $field) {
244
                    /** @var array $fieldClasses */
245
                    foreach ($fieldClasses as $fieldClassKey) {
246
                        if ($field instanceof $fieldClassKey) {
247
                            if ($field->handle === $fieldHandle || empty($fieldHandle)) {
248
                                $result .= self::extractTextFromField($block[$field->handle]) . ' ';
249
                            }
250
                        }
251
                    }
252
                }
253
            }
254
        }
255
256
        return $result;
257
    }
258
259
    /**
260
     * Extract text from all of the blocks in a matrix field, concatenating it
261
     * together.
262
     *
263
     * @param SuperTableBlockQuery|SuperTableBlock[]|array $blocks
264
     * @param string $fieldHandle
265
     *
266
     * @return string
267
     */
268
    public static function extractTextFromSuperTable($blocks, $fieldHandle = ''): string
269
    {
270
        if (empty($blocks)) {
271
            return '';
272
        }
273
        $result = '';
274
        // Iterate through all of the matrix blocks
275
        if ($blocks instanceof SuperTableBlockQuery) {
276
            $blocks = $blocks->all();
277
        }
278
        foreach ($blocks as $block) {
279
            try {
280
                /** @var SuperTableBlockTypeModel $superTableBlockTypeModel */
281
                $superTableBlockTypeModel = $block->getType();
282
            } catch (InvalidConfigException $e) {
283
                $superTableBlockTypeModel = null;
284
            }
285
            // Find any text fields inside of the matrix block
286
            if ($superTableBlockTypeModel) {
287
                $fieldClasses = FieldHelper::FIELD_CLASSES[FieldHelper::TEXT_FIELD_CLASS_KEY];
288
                // The SuperTableBlockTypeModel class lacks @mixin FieldLayoutBehavior in its annotations
289
                /** @phpstan-ignore-next-line */
290
                $fields = $superTableBlockTypeModel->getFields();
291
292
                foreach ($fields as $field) {
293
                    /** @var array $fieldClasses */
294
                    foreach ($fieldClasses as $fieldClassKey) {
295
                        if ($field instanceof $fieldClassKey) {
296
                            if ($field->handle === $fieldHandle || empty($fieldHandle)) {
297
                                $result .= self::extractTextFromField($block[$field->handle]) . ' ';
298
                            }
299
                        }
300
                    }
301
                }
302
            }
303
        }
304
305
        return $result;
306
    }
307
308
    /**
309
     * Return the most important keywords extracted from the text as a comma-
310
     * delimited string
311
     *
312
     * @param string $text
313
     * @param int $limit
314
     * @param bool $useStopWords
315
     *
316
     * @return string
317
     */
318
    public static function extractKeywords($text, $limit = 15, $useStopWords = true): string
319
    {
320
        if (empty($text)) {
321
            return '';
322
        }
323
        $api = new TextRankFacade();
324
        // Set the stop words that should be ignored
325
        if ($useStopWords) {
326
            $language = strtolower(substr(Seomatic::$language, 0, 2));
327
            $stopWords = self::stopWordsForLanguage($language);
328
            if ($stopWords !== null) {
329
                $api->setStopWords($stopWords);
330
            }
331
        }
332
        // Array of the most important keywords:
333
        $keywords = $api->getOnlyKeyWords(self::cleanupText($text));
334
335
        // If it's empty, just return the text
336
        if (empty($keywords)) {
337
            return $text;
338
        }
339
340
        $result = implode(', ', array_slice(array_keys($keywords), 0, $limit));
341
342
        return self::sanitizeUserInput($result);
343
    }
344
345
    /**
346
     * Extract a summary consisting of the 3 most important sentences from the
347
     * text
348
     *
349
     * @param string $text
350
     * @param bool $useStopWords
351
     *
352
     * @return string
353
     */
354
    public static function extractSummary($text, $useStopWords = true): string
355
    {
356
        if (empty($text)) {
357
            return '';
358
        }
359
        $api = new TextRankFacade();
360
        // Set the stop words that should be ignored
361
        if ($useStopWords) {
362
            $language = strtolower(substr(Seomatic::$language, 0, 2));
363
            $stopWords = self::stopWordsForLanguage($language);
364
            if ($stopWords !== null) {
365
                $api->setStopWords($stopWords);
366
            }
367
        }
368
        // Array of the most important keywords:
369
        $sentences = $api->getHighlights(self::cleanupText($text));
370
371
        // If it's empty, just return the text
372
        if (empty($sentences)) {
373
            return $text;
374
        }
375
376
        $result = implode(' ', $sentences);
377
378
        return self::sanitizeUserInput($result);
379
    }
380
381
382
    /**
383
     * Sanitize user input by decoding any HTML Entities, URL decoding the text,
384
     * then removing any newlines, stripping tags, stripping Twig tags, and changing
385
     * single {}'s into ()'s
386
     *
387
     * @param $str
388
     * @return string
389
     */
390 2
    public static function sanitizeUserInput($str): string
391
    {
392
        // Do some general cleanup
393 2
        $str = html_entity_decode($str, ENT_NOQUOTES, 'UTF-8');
394 2
        $str = rawurldecode($str);
395
        // Remove any linebreaks
396 2
        $str = (string)preg_replace("/\r|\n/", "", $str);
397 2
        $str = HtmlPurifier::process($str, ['HTML.Allowed' => '']);
398 2
        $str = html_entity_decode($str, ENT_NOQUOTES, 'UTF-8');
399
        // Remove any embedded Twig code
400 2
        $str = preg_replace('/{{.*?}}/', '', $str);
401 2
        $str = preg_replace('/{%.*?%}/', '', $str);
402
        // Change single brackets to parenthesis
403 2
        $str = preg_replace('/{/', '(', $str);
404 2
        $str = preg_replace('/}/', ')', $str);
405 2
        if (empty($str)) {
406 2
            $str = '';
407
        }
408
409 2
        return $str;
410
    }
411
412
    /**
413
     * Strip HTML tags, but replace them with a space rather than just eliminating them
414
     *
415
     * @param $str
416
     * @return string
417
     */
418
    public static function smartStripTags($str)
419
    {
420
        $str = str_replace('<', ' <', $str);
421
        $str = HtmlPurifier::process($str, ['HTML.Allowed' => '']);
422
        $str = html_entity_decode($str, ENT_NOQUOTES, 'UTF-8');
423
        $str = str_replace('  ', ' ', $str);
424
425
        return $str;
426
    }
427
428
    /**
429
     * Clean up the passed in text by converting it to UTF-8, stripping tags,
430
     * removing whitespace, and decoding HTML entities
431
     *
432
     * @param string $text
433
     *
434
     * @return string
435
     */
436
    public static function cleanupText($text): string
437
    {
438
        if (empty($text)) {
439
            return '';
440
        }
441
        // Convert to UTF-8
442
        if (function_exists('iconv')) {
443
            $text = iconv(mb_detect_encoding($text, mb_detect_order(), true), 'UTF-8//IGNORE', $text);
0 ignored issues
show
Bug introduced by
It seems like mb_detect_order() can also be of type true; however, parameter $encodings of mb_detect_encoding() does only seem to accept array|null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

443
            $text = iconv(mb_detect_encoding($text, /** @scrutinizer ignore-type */ mb_detect_order(), true), 'UTF-8//IGNORE', $text);
Loading history...
444
        } else {
445
            ini_set('mbstring.substitute_character', 'none');
446
            $text = mb_convert_encoding($text, 'UTF-8', 'UTF-8');
447
        }
448
        // Strip HTML tags
449
        $text = HtmlPurifier::process($text, ['HTML.Allowed' => '']);
0 ignored issues
show
Bug introduced by
It seems like $text can also be of type array; however, parameter $content of yii\helpers\BaseHtmlPurifier::process() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

449
        $text = HtmlPurifier::process(/** @scrutinizer ignore-type */ $text, ['HTML.Allowed' => '']);
Loading history...
450
        $text = html_entity_decode($text, ENT_NOQUOTES, 'UTF-8');
451
        // Remove excess whitespace
452
        $text = preg_replace('/\s{2,}/u', ' ', $text);
453
        // Decode any HTML entities
454
        $text = html_entity_decode($text);
455
456
        return $text;
457
    }
458
459
    // Protected Static Methods
460
    // =========================================================================
461
462
    /**
463
     * @param string $language
464
     *
465
     * @return null|StopWordsAbstract
466
     */
467
    protected static function stopWordsForLanguage(string $language)
468
    {
469
        $stopWords = null;
470
        if (!empty(self::LANGUAGE_MAP[$language])) {
471
            $language = self::LANGUAGE_MAP[$language];
472
        } else {
473
            $language = 'English';
474
        }
475
476
        $className = 'PhpScience\\TextRank\\Tool\\StopWords\\' . ucfirst($language);
477
        if (class_exists($className)) {
478
            $stopWords = new $className();
479
        }
480
481
        return $stopWords;
482
    }
483
}
484