Passed
Push — master ( 8430aa...11d349 )
by Josh
01:24
created

AbstractDiff   C

Complexity

Total Complexity 66

Size/Duplication

Total Lines 555
Duplicated Lines 0 %

Coupling/Cohesion

Components 2
Dependencies 4

Test Coverage

Coverage 73.08%

Importance

Changes 19
Bugs 6 Features 6
Metric Value
c 19
b 6
f 6
dl 0
loc 555
ccs 114
cts 156
cp 0.7308
rs 5.7474
wmc 66
lcom 2
cbo 4

36 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 18 3
build() 0 1 ?
A initPurifier() 0 19 3
A prepare() 0 7 1
A getDiffCache() 0 14 3
A hasDiffCache() 0 4 1
A getConfig() 0 4 1
A setConfig() 0 6 1
A getMatchThreshold() 0 4 1
A setMatchThreshold() 0 6 1
A setSpecialCaseChars() 0 4 1
A getSpecialCaseChars() 0 4 1
A addSpecialCaseChar() 0 4 1
A removeSpecialCaseChar() 0 4 1
A setSpecialCaseTags() 0 4 1
A addSpecialCaseTag() 0 4 1
A removeSpecialCaseTag() 0 4 1
A getSpecialCaseTags() 0 4 1
A getOldHtml() 0 4 1
A getNewHtml() 0 4 1
A getDifference() 0 4 1
A clearContent() 0 4 1
A setGroupDiffs() 0 6 1
A isGroupDiffs() 0 4 1
A setHTMLPurifierConfig() 0 4 1
A getOpeningTag() 0 4 1
A getClosingTag() 0 4 1
A getStringBetween() 0 14 3
A purifyHtml() 0 13 3
A splitInputsToWords() 0 5 1
A isPartOfWord() 0 4 1
C convertHtmlToListOfWords() 0 76 22
A isStartOfTag() 0 4 1
A isEndOfTag() 0 4 1
A isWhiteSpace() 0 4 1
A explode() 0 5 1

How to fix   Complexity   

Complex Class

Complex classes like AbstractDiff often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use AbstractDiff, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace Caxy\HtmlDiff;
4
5
/**
6
 * Class AbstractDiff.
7
 */
8
abstract class AbstractDiff
9
{
10
    /**
11
     * @var array
12
     *
13
     * @deprecated since 0.1.0
14
     */
15
    public static $defaultSpecialCaseTags = array('strong', 'b', 'i', 'big', 'small', 'u', 'sub', 'sup', 'strike', 's', 'p');
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 125 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
16
17
    /**
18
     * @var array
19
     *
20
     * @deprecated since 0.1.0
21
     */
22
    public static $defaultSpecialCaseChars = array('.', ',', '(', ')', '\'');
23
24
    /**
25
     * @var bool
26
     *
27
     * @deprecated since 0.1.0
28
     */
29
    public static $defaultGroupDiffs = true;
30
31
    /**
32
     * @var HtmlDiffConfig
33
     */
34
    protected $config;
35
36
    /**
37
     * @var string
38
     */
39
    protected $content;
40
41
    /**
42
     * @var string
43
     */
44
    protected $oldText;
45
46
    /**
47
     * @var string
48
     */
49
    protected $newText;
50
51
    /**
52
     * @var array
53
     */
54
    protected $oldWords = array();
55
56
    /**
57
     * @var array
58
     */
59
    protected $newWords = array();
60
61
    /**
62
     * @var DiffCache[]
63
     */
64
    protected $diffCaches = array();
65
66
    /**
67
     * @var \HTMLPurifier
68
     */
69
    protected $purifier;
70
71
    /**
72
     * @var \HTMLPurifier_Config|null
73
     */
74
    protected $purifierConfig = null;
75
76
    /**
77
     * AbstractDiff constructor.
78
     *
79
     * @param string     $oldText
80
     * @param string     $newText
81
     * @param string     $encoding
82
     * @param null|array $specialCaseTags
83
     * @param null|bool  $groupDiffs
84
     */
85 14
    public function __construct($oldText, $newText, $encoding = 'UTF-8', $specialCaseTags = null, $groupDiffs = null)
86
    {
87 14
        mb_substitute_character(0x20);
88
89 14
        $this->setConfig(HtmlDiffConfig::create()->setEncoding($encoding));
90
91 14
        if ($specialCaseTags !== null) {
92 13
            $this->config->setSpecialCaseTags($specialCaseTags);
93 13
        }
94
95 14
        if ($groupDiffs !== null) {
96
            $this->config->setGroupDiffs($groupDiffs);
97
        }
98
99 14
        $this->oldText = $oldText;
100 14
        $this->newText = $newText;
101 14
        $this->content = '';
102 14
    }
103
104
    /**
105
     * @return bool|string
106
     */
107
    abstract public function build();
108
109
    /**
110
     * Initializes HTMLPurifier with cache location.
111
     *
112
     * @param null|string $defaultPurifierSerializerCache
113
     */
114 14
    public function initPurifier($defaultPurifierSerializerCache = null)
115
    {
116 14
        $HTMLPurifierConfig = null;
0 ignored issues
show
Unused Code introduced by
$HTMLPurifierConfig is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
117
118 14
        if (null !== $this->purifierConfig) {
119 2
            $HTMLPurifierConfig  = $this->purifierConfig;
120 2
        } else {
121 14
            $HTMLPurifierConfig = \HTMLPurifier_Config::createDefault();
122
        }
123
124
        // Cache.SerializerPath defaults to Null and sets
125
        // the location to inside the vendor HTMLPurifier library
126
        // under the DefinitionCache/Serializer folder.
127 14
        if (!is_null($defaultPurifierSerializerCache)) {
128 2
            $HTMLPurifierConfig->set('Cache.SerializerPath', $defaultPurifierSerializerCache);
129 2
        }
130
131 14
        $this->purifier = new \HTMLPurifier($HTMLPurifierConfig);
132 14
    }
133
134
    /**
135
     * Prepare (purify) the HTML
136
     *
137
     * @return void
138
     */
139 14
    protected function prepare()
140
    {
141 14
        $this->initPurifier($this->config->getPurifierCacheLocation());
142
143 14
        $this->oldText = $this->purifyHtml($this->oldText);
0 ignored issues
show
Documentation Bug introduced by
It seems like $this->purifyHtml($this->oldText) can also be of type false. However, the property $oldText is declared as type string. Maybe add an additional type check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a mixed type is assigned to a property that is type hinted more strictly.

For example, imagine you have a variable $accountId that can either hold an Id object or false (if there is no account id yet). Your code now assigns that value to the id property of an instance of the Account class. This class holds a proper account, so the id value must no longer be false.

Either this assignment is in error or a type check should be added for that assignment.

class Id
{
    public $id;

    public function __construct($id)
    {
        $this->id = $id;
    }

}

class Account
{
    /** @var  Id $id */
    public $id;
}

$account_id = false;

if (starsAreRight()) {
    $account_id = new Id(42);
}

$account = new Account();
if ($account instanceof Id)
{
    $account->id = $account_id;
}
Loading history...
144 14
        $this->newText = $this->purifyHtml($this->newText);
0 ignored issues
show
Documentation Bug introduced by
It seems like $this->purifyHtml($this->newText) can also be of type false. However, the property $newText is declared as type string. Maybe add an additional type check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a mixed type is assigned to a property that is type hinted more strictly.

For example, imagine you have a variable $accountId that can either hold an Id object or false (if there is no account id yet). Your code now assigns that value to the id property of an instance of the Account class. This class holds a proper account, so the id value must no longer be false.

Either this assignment is in error or a type check should be added for that assignment.

class Id
{
    public $id;

    public function __construct($id)
    {
        $this->id = $id;
    }

}

class Account
{
    /** @var  Id $id */
    public $id;
}

$account_id = false;

if (starsAreRight()) {
    $account_id = new Id(42);
}

$account = new Account();
if ($account instanceof Id)
{
    $account->id = $account_id;
}
Loading history...
145 14
    }
146
147
    /**
148
     * @return DiffCache|null
149
     */
150 1
    protected function getDiffCache()
151
    {
152
        if (!$this->hasDiffCache()) {
153 1
            return null;
154
        }
155
156
        $hash = spl_object_hash($this->getConfig()->getCacheProvider());
157
158 1
        if (!array_key_exists($hash, $this->diffCaches)) {
159
            $this->diffCaches[$hash] = new DiffCache($this->getConfig()->getCacheProvider());
0 ignored issues
show
Bug introduced by
It seems like $this->getConfig()->getCacheProvider() can be null; however, __construct() does not accept null, maybe add an additional type check?

Unless you are absolutely sure that the expression can never be null because of other conditions, we strongly recommend to add an additional type check to your code:

/** @return stdClass|null */
function mayReturnNull() { }

function doesNotAcceptNull(stdClass $x) { }

// With potential error.
function withoutCheck() {
    $x = mayReturnNull();
    doesNotAcceptNull($x); // Potential error here.
}

// Safe - Alternative 1
function withCheck1() {
    $x = mayReturnNull();
    if ( ! $x instanceof stdClass) {
        throw new \LogicException('$x must be defined.');
    }
    doesNotAcceptNull($x);
}

// Safe - Alternative 2
function withCheck2() {
    $x = mayReturnNull();
    if ($x instanceof stdClass) {
        doesNotAcceptNull($x);
    }
}
Loading history...
160
        }
161
162
        return $this->diffCaches[$hash];
163
    }
164
165
    /**
166
     * @return bool
167
     */
168 14
    protected function hasDiffCache()
169
    {
170 14
        return null !== $this->getConfig()->getCacheProvider();
171
    }
172
173
    /**
174
     * @return HtmlDiffConfig
175
     */
176 14
    public function getConfig()
177 1
    {
178 14
        return $this->config;
179
    }
180
181
    /**
182
     * @param HtmlDiffConfig $config
183
     *
184
     * @return AbstractDiff
185
     */
186 14
    public function setConfig(HtmlDiffConfig $config)
187
    {
188 14
        $this->config = $config;
189
190 14
        return $this;
191
    }
192
193
    /**
194
     * @return int
195
     *
196
     * @deprecated since 0.1.0
197
     */
198
    public function getMatchThreshold()
199
    {
200
        return $this->config->getMatchThreshold();
201
    }
202
203
    /**
204
     * @param int $matchThreshold
205
     *
206
     * @return AbstractDiff
207
     *
208
     * @deprecated since 0.1.0
209
     */
210
    public function setMatchThreshold($matchThreshold)
211
    {
212
        $this->config->setMatchThreshold($matchThreshold);
213
214
        return $this;
215
    }
216
217
    /**
218
     * @param array $chars
219
     *
220
     * @deprecated since 0.1.0
221
     */
222
    public function setSpecialCaseChars(array $chars)
223
    {
224
        $this->config->setSpecialCaseChars($chars);
225
    }
226
227
    /**
228
     * @return array|null
229
     *
230
     * @deprecated since 0.1.0
231
     */
232
    public function getSpecialCaseChars()
233
    {
234
        return $this->config->getSpecialCaseChars();
235
    }
236
237
    /**
238
     * @param string $char
239
     *
240
     * @deprecated since 0.1.0
241
     */
242
    public function addSpecialCaseChar($char)
243
    {
244
        $this->config->addSpecialCaseChar($char);
245
    }
246
247
    /**
248
     * @param string $char
249
     *
250
     * @deprecated since 0.1.0
251
     */
252
    public function removeSpecialCaseChar($char)
253
    {
254
        $this->config->removeSpecialCaseChar($char);
255
    }
256
257
    /**
258
     * @param array $tags
259
     *
260
     * @deprecated since 0.1.0
261
     */
262
    public function setSpecialCaseTags(array $tags = array())
263
    {
264
        $this->config->setSpecialCaseChars($tags);
265
    }
266
267
    /**
268
     * @param string $tag
269
     *
270
     * @deprecated since 0.1.0
271
     */
272
    public function addSpecialCaseTag($tag)
273
    {
274
        $this->config->addSpecialCaseTag($tag);
275
    }
276
277
    /**
278
     * @param string $tag
279
     *
280
     * @deprecated since 0.1.0
281
     */
282
    public function removeSpecialCaseTag($tag)
283
    {
284
        $this->config->removeSpecialCaseTag($tag);
285
    }
286
287
    /**
288
     * @return array|null
289
     *
290
     * @deprecated since 0.1.0
291
     */
292
    public function getSpecialCaseTags()
293
    {
294
        return $this->config->getSpecialCaseTags();
295
    }
296
297
    /**
298
     * @return string
299
     */
300
    public function getOldHtml()
301
    {
302
        return $this->oldText;
303
    }
304
305
    /**
306
     * @return string
307
     */
308
    public function getNewHtml()
309
    {
310
        return $this->newText;
311
    }
312
313
    /**
314
     * @return string
315
     */
316
    public function getDifference()
317
    {
318
        return $this->content;
319
    }
320
321
    /**
322
     * Clears the diff content.
323
     *
324
     * @return void
325
     */
326
    public function clearContent()
327
    {
328
        $this->content = null;
329
    }
330
331
    /**
332
     * @param bool $boolean
333
     *
334
     * @return $this
335
     *
336
     * @deprecated since 0.1.0
337
     */
338
    public function setGroupDiffs($boolean)
339
    {
340
        $this->config->setGroupDiffs($boolean);
341
342
        return $this;
343
    }
344
345
    /**
346
     * @return bool
347
     *
348
     * @deprecated since 0.1.0
349
     */
350
    public function isGroupDiffs()
351
    {
352
        return $this->config->isGroupDiffs();
353
    }
354
355
    /**
356
     * @param \HTMLPurifier_Config $config
357
     */
358 2
    public function setHTMLPurifierConfig(\HTMLPurifier_Config $config)
359
    {
360 2
        $this->purifierConfig = $config;
361 2
    }
362
363
    /**
364
     * @param string $tag
365
     *
366
     * @return string
367
     */
368
    protected function getOpeningTag($tag)
369
    {
370
        return '/<'.$tag.'[^>]*/i';
371
    }
372
373
    /**
374
     * @param string $tag
375
     *
376
     * @return string
377
     */
378
    protected function getClosingTag($tag)
379
    {
380
        return '</'.$tag.'>';
381
    }
382
383
    /**
384
     * @param string $str
385
     * @param string $start
386
     * @param string $end
387
     *
388
     * @return string
389
     */
390
    protected function getStringBetween($str, $start, $end)
391
    {
392
        $expStr = explode($start, $str, 2);
393
        if (count($expStr) > 1) {
394
            $expStr = explode($end, $expStr[ 1 ]);
395
            if (count($expStr) > 1) {
396
                array_pop($expStr);
397
398
                return implode($end, $expStr);
399
            }
400
        }
401
402
        return '';
403
    }
404
405
    /**
406
     * @param string $html
407
     *
408
     * @return string
0 ignored issues
show
Documentation introduced by
Should the return type not be string|false|null?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
409
     */
410 14
    protected function purifyHtml($html)
411
    {
412 14
        if (class_exists('Tidy') && false) {
413
            $config = array('output-xhtml' => true, 'indent' => false);
414
            $tidy = new tidy();
415
            $tidy->parseString($html, $config, 'utf8');
416
            $html = (string) $tidy;
417
418
            return $this->getStringBetween($html, '<body>');
0 ignored issues
show
Bug introduced by
The call to getStringBetween() misses a required argument $end.

This check looks for function calls that miss required arguments.

Loading history...
419
        }
420
421 14
        return $this->purifier->purify($html);
422
    }
423
424 14
    protected function splitInputsToWords()
425
    {
426 14
        $this->oldWords = $this->convertHtmlToListOfWords($this->explode($this->oldText));
427 14
        $this->newWords = $this->convertHtmlToListOfWords($this->explode($this->newText));
428 14
    }
429
430
    /**
431
     * @param string $text
432
     *
433
     * @return bool
434
     */
435 14
    protected function isPartOfWord($text)
436
    {
437 14
        return ctype_alnum(str_replace($this->config->getSpecialCaseChars(), '', $text));
438
    }
439
440
    /**
441
     * @param array $characterString
442
     *
443
     * @return array
444
     */
445 14
    protected function convertHtmlToListOfWords($characterString)
446
    {
447 14
        $mode = 'character';
448 14
        $current_word = '';
449 14
        $words = array();
450 14
        foreach ($characterString as $i => $character) {
451
            switch ($mode) {
452 14
                case 'character':
453 14
                if ($this->isStartOfTag($character)) {
454 13
                    if ($current_word != '') {
455 12
                        $words[] = $current_word;
456 12
                    }
457
458 13
                    $current_word = '<';
459 13
                    $mode = 'tag';
460 14
                } elseif (preg_match("/\s/", $character)) {
461 12
                    if ($current_word !== '') {
462 12
                        $words[] = $current_word;
463 12
                    }
464 12
                    $current_word = preg_replace('/\s+/S', ' ', $character);
465 12
                    $mode = 'whitespace';
466 12
                } else {
467
                    if (
468 14
                        (ctype_alnum($character) && (strlen($current_word) == 0 || $this->isPartOfWord($current_word))) ||
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 122 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
469 14
                        (in_array($character, $this->config->getSpecialCaseChars()) && isset($characterString[$i + 1]) && $this->isPartOfWord($characterString[$i + 1]))
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 168 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
470 14
                    ) {
471 14
                        $current_word .= $character;
472 14
                    } else {
473 14
                        $words[] = $current_word;
474 14
                        $current_word = $character;
475
                    }
476
                }
477 14
                break;
478 14
                case 'tag' :
0 ignored issues
show
Coding Style introduced by
There must be no space before the colon in a CASE statement

As per the PSR-2 coding standard, there must not be a space in front of the colon in case statements.

switch ($selector) {
    case "A": //right
        doSomething();
        break;
    case "B" : //wrong
        doSomethingElse();
        break;
}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
479 14
                if ($this->isEndOfTag($character)) {
480 14
                    $current_word .= '>';
481 14
                    $words[] = $current_word;
482 14
                    $current_word = '';
483
484 14
                    if (!preg_match('[^\s]', $character)) {
485 14
                        $mode = 'whitespace';
486 14
                    } else {
487
                        $mode = 'character';
488
                    }
489 14
                } else {
490 14
                    $current_word .= $character;
491
                }
492 14
                break;
493 14
                case 'whitespace':
494 14
                if ($this->isStartOfTag($character)) {
495 12
                    if ($current_word !== '') {
496 12
                        $words[] = $current_word;
497 12
                    }
498 12
                    $current_word = '<';
499 12
                    $mode = 'tag';
500 14
                } elseif (preg_match("/\s/", $character)) {
501 10
                    $current_word .= $character;
502 10
                    $current_word = preg_replace('/\s+/S', ' ', $current_word);
503 10
                } else {
504 14
                    if ($current_word != '') {
505 12
                        $words[] = $current_word;
506 12
                    }
507 14
                    $current_word = $character;
508 14
                    $mode = 'character';
509
                }
510 14
                break;
511
                default:
512
                break;
513
            }
514 14
        }
515 14
        if ($current_word != '') {
516
            $words[] = $current_word;
517
        }
518
519 14
        return $words;
520
    }
521
522
    /**
523
     * @param string $val
524
     *
525
     * @return bool
526
     */
527 14
    protected function isStartOfTag($val)
528
    {
529 14
        return $val == '<';
530
    }
531
532
    /**
533
     * @param string $val
534
     *
535
     * @return bool
536
     */
537 14
    protected function isEndOfTag($val)
538
    {
539 14
        return $val == '>';
540
    }
541
542
    /**
543
     * @param string $value
544
     *
545
     * @return bool
546
     */
547
    protected function isWhiteSpace($value)
548
    {
549
        return !preg_match('[^\s]', $value);
550
    }
551
552
    /**
553
     * @param string $value
554
     *
555
     * @return array
556
     */
557 14
    protected function explode($value)
558
    {
559
        // as suggested by @onassar
560 14
        return preg_split('//u', $value);
561
    }
562
}
563