Diff   B
last analyzed

Complexity

Total Complexity 45

Size/Duplication

Total Lines 188
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 104
c 0
b 0
f 0
dl 0
loc 188
rs 8.8
wmc 45

3 Methods

Rating   Name   Duplication   Size   Complexity  
A cleanHTML() 0 23 5
F compareHTML() 0 100 26
C getHTMLChunks() 0 35 14

How to fix   Complexity   

Complex Class

Complex classes like Diff often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Diff, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace SilverStripe\View\Parsers;
4
5
use InvalidArgumentException;
6
use SilverStripe\Core\Convert;
7
use SilverStripe\Core\Injector\Injector;
8
9
require_once 'difflib/difflib.php';
10
11
/**
12
 * Class representing a 'diff' between two sequences of strings.
13
 */
14
class Diff extends \Diff
15
{
16
    public static $html_cleaner_class = null;
17
18
    /**
19
     *  Attempt to clean invalid HTML, which messes up diffs.
20
     *  This cleans code if possible, using an instance of HTMLCleaner
21
     *
22
     *  NB: By default, only extremely simple tidying is performed,
23
     *  by passing through DomDocument::loadHTML and saveXML
24
     *
25
     * @param string $content HTML content
26
     * @param HTMLCleaner $cleaner Optional instance of a HTMLCleaner class to
27
     *    use, overriding self::$html_cleaner_class
28
     * @return mixed|string
29
     */
30
    public static function cleanHTML($content, $cleaner = null)
31
    {
32
        if (!$cleaner) {
33
            if (self::$html_cleaner_class && class_exists(self::$html_cleaner_class)) {
34
                $cleaner = Injector::inst()->create(self::$html_cleaner_class);
35
            } else {
36
                //load cleaner if the dependent class is available
37
                $cleaner = HTMLCleaner::inst();
38
            }
39
        }
40
41
        if ($cleaner) {
42
            $content = $cleaner->cleanHTML($content);
43
        } else {
44
            // At most basic level of cleaning, use DOMDocument to save valid XML.
45
            $doc = HTMLValue::create($content);
46
            $content = $doc->getContent();
47
        }
48
49
        // Remove empty <ins /> and <del /> tags because browsers hate them
50
        $content = preg_replace('/<(ins|del)[^>]*\/>/', '', $content);
51
52
        return $content;
53
    }
54
55
    /**
56
     * @param string $from
57
     * @param string $to
58
     * @param bool $escape
59
     * @return string
60
     */
61
    public static function compareHTML($from, $to, $escape = false)
62
    {
63
        // First split up the content into words and tags
64
        $set1 = self::getHTMLChunks($from);
65
        $set2 = self::getHTMLChunks($to);
66
67
        // Diff that
68
        $diff = new Diff($set1, $set2);
69
70
        $tagStack[1] = $tagStack[2] = 0;
0 ignored issues
show
Comprehensibility Best Practice introduced by
$tagStack was never initialized. Although not strictly required by PHP, it is generally a good practice to add $tagStack = array(); before regardless.
Loading history...
71
        $rechunked[1] = $rechunked[2] = array();
0 ignored issues
show
Comprehensibility Best Practice introduced by
$rechunked was never initialized. Although not strictly required by PHP, it is generally a good practice to add $rechunked = array(); before regardless.
Loading history...
72
73
        // Go through everything, converting edited tags (and their content) into single chunks.  Otherwise
74
        // the generated HTML gets crusty
75
        foreach ($diff->edits as $edit) {
76
            $lookForTag = false;
77
            $stuffFor = [];
78
            switch ($edit->type) {
79
                case 'copy':
80
                    $lookForTag = false;
81
                    $stuffFor[1] = $edit->orig;
82
                    $stuffFor[2] = $edit->orig;
83
                    break;
84
85
                case 'change':
86
                    $lookForTag = true;
87
                    $stuffFor[1] = $edit->orig;
88
                    $stuffFor[2] = $edit->final;
89
                    break;
90
91
                case 'add':
92
                    $lookForTag = true;
93
                    $stuffFor[1] = null;
94
                    $stuffFor[2] = $edit->final;
95
                    break;
96
97
                case 'delete':
98
                    $lookForTag = true;
99
                    $stuffFor[1] = $edit->orig;
100
                    $stuffFor[2] = null;
101
                    break;
102
            }
103
104
            foreach ($stuffFor as $listName => $chunks) {
105
                if ($chunks) {
106
                    foreach ($chunks as $item) {
107
                        // $tagStack > 0 indicates that we should be tag-building
108
                        if ($tagStack[$listName]) {
109
                            $rechunked[$listName][sizeof($rechunked[$listName])-1] .= ' ' . $item;
110
                        } else {
111
                            $rechunked[$listName][] = $item;
112
                        }
113
114
                        if ($lookForTag
115
                            && !$tagStack[$listName]
116
                            && isset($item[0])
117
                            && $item[0] == "<"
118
                            && substr($item, 0, 2) != "</"
119
                        ) {
120
                            $tagStack[$listName] = 1;
121
                        } elseif ($tagStack[$listName]) {
122
                            if (substr($item, 0, 2) == "</") {
123
                                $tagStack[$listName]--;
124
                            } elseif (isset($item[0]) && $item[0] == "<") {
125
                                $tagStack[$listName]++;
126
                            }
127
                        }
128
                    }
129
                }
130
            }
131
        }
132
133
        // Diff the re-chunked data, turning it into maked up HTML
134
        $diff = new Diff($rechunked[1], $rechunked[2]);
135
        $content = '';
136
        foreach ($diff->edits as $edit) {
137
            $orig = ($escape) ? Convert::raw2xml($edit->orig) : $edit->orig;
138
            $final = ($escape) ? Convert::raw2xml($edit->final) : $edit->final;
139
140
            switch ($edit->type) {
141
                case 'copy':
142
                    $content .= " " . implode(" ", $orig) . " ";
0 ignored issues
show
Bug introduced by
It seems like $orig can also be of type string; however, parameter $pieces of implode() does only seem to accept array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

142
                    $content .= " " . implode(" ", /** @scrutinizer ignore-type */ $orig) . " ";
Loading history...
143
                    break;
144
145
                case 'change':
146
                    $content .= " <ins>" . implode(" ", $final) . "</ins> ";
147
                    $content .= " <del>" . implode(" ", $orig) . "</del> ";
148
                    break;
149
150
                case 'add':
151
                    $content .= " <ins>" . implode(" ", $final) . "</ins> ";
152
                    break;
153
154
                case 'delete':
155
                    $content .= " <del>" . implode(" ", $orig) . "</del> ";
156
                    break;
157
            }
158
        }
159
160
        return self::cleanHTML($content);
161
    }
162
163
    /**
164
     * @param string|bool|array $content If passed as an array, values will be concatenated with a comma.
165
     * @return array
166
     */
167
    public static function getHTMLChunks($content)
168
    {
169
        if ($content && !is_string($content) && !is_array($content) && !is_numeric($content) && !is_bool($content)) {
0 ignored issues
show
introduced by
The condition is_bool($content) is always true.
Loading history...
170
            throw new InvalidArgumentException('$content parameter needs to be a string or array');
171
        }
172
        if (is_bool($content)) {
173
            // Convert boolean to strings
174
            $content = $content ? "true" : "false";
175
        }
176
        if (is_array($content)) {
177
            $content = array_filter($content, 'is_scalar');
178
            // Convert array to CSV
179
            $content = implode(',', $content);
180
        }
181
182
        $content = str_replace(array("&nbsp;", "<", ">"), array(" "," <", "> "), $content);
183
        $candidateChunks = preg_split("/[\t\r\n ]+/", $content);
184
        $chunks = [];
185
        for ($i = 0; $i < count($candidateChunks); $i++) {
0 ignored issues
show
Bug introduced by
It seems like $candidateChunks can also be of type false; however, parameter $var of count() does only seem to accept Countable|array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

185
        for ($i = 0; $i < count(/** @scrutinizer ignore-type */ $candidateChunks); $i++) {
Loading history...
Performance Best Practice introduced by
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
186
            $item = $candidateChunks[$i];
187
            if (isset($item[0]) && $item[0] == "<") {
188
                $newChunk = $item;
189
                while ($item[strlen($item)-1] != ">") {
190
                    if (++$i >= count($candidateChunks)) {
191
                        break;
192
                    }
193
                    $item = $candidateChunks[$i];
194
                    $newChunk .= ' ' . $item;
195
                }
196
                $chunks[] = $newChunk;
197
            } else {
198
                $chunks[] = $item;
199
            }
200
        }
201
        return $chunks;
202
    }
203
}
204