Diff::compareHTML()   F
last analyzed

Complexity

Conditions 26
Paths 1386

Size

Total Lines 100
Code Lines 67

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 26
eloc 67
nc 1386
nop 3
dl 0
loc 100
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace SilverStripe\View\Parsers;
4
5
use InvalidArgumentException;
6
use SilverStripe\Core\Convert;
7
use SilverStripe\Core\Injector\Injector;
8
9
require_once 'difflib/difflib.php';
10
11
/**
12
 * Class representing a 'diff' between two sequences of strings.
13
 */
14
class Diff extends \Diff
15
{
16
    public static $html_cleaner_class = null;
17
18
    /**
19
     *  Attempt to clean invalid HTML, which messes up diffs.
20
     *  This cleans code if possible, using an instance of HTMLCleaner
21
     *
22
     *  NB: By default, only extremely simple tidying is performed,
23
     *  by passing through DomDocument::loadHTML and saveXML
24
     *
25
     * @param string $content HTML content
26
     * @param HTMLCleaner $cleaner Optional instance of a HTMLCleaner class to
27
     *    use, overriding self::$html_cleaner_class
28
     * @return mixed|string
29
     */
30
    public static function cleanHTML($content, $cleaner = null)
31
    {
32
        if (!$cleaner) {
33
            if (self::$html_cleaner_class && class_exists(self::$html_cleaner_class)) {
34
                $cleaner = Injector::inst()->create(self::$html_cleaner_class);
35
            } else {
36
                //load cleaner if the dependent class is available
37
                $cleaner = HTMLCleaner::inst();
38
            }
39
        }
40
41
        if ($cleaner) {
42
            $content = $cleaner->cleanHTML($content);
43
        } else {
44
            // At most basic level of cleaning, use DOMDocument to save valid XML.
45
            $doc = HTMLValue::create($content);
46
            $content = $doc->getContent();
47
        }
48
49
        // Remove empty <ins /> and <del /> tags because browsers hate them
50
        $content = preg_replace('/<(ins|del)[^>]*\/>/', '', $content);
51
52
        return $content;
53
    }
54
55
    /**
56
     * @param string $from
57
     * @param string $to
58
     * @param bool $escape
59
     * @return string
60
     */
61
    public static function compareHTML($from, $to, $escape = false)
62
    {
63
        // First split up the content into words and tags
64
        $set1 = self::getHTMLChunks($from);
65
        $set2 = self::getHTMLChunks($to);
66
67
        // Diff that
68
        $diff = new Diff($set1, $set2);
69
70
        $tagStack[1] = $tagStack[2] = 0;
0 ignored issues
show
Comprehensibility Best Practice introduced by
$tagStack was never initialized. Although not strictly required by PHP, it is generally a good practice to add $tagStack = array(); before regardless.
Loading history...
71
        $rechunked[1] = $rechunked[2] = array();
0 ignored issues
show
Comprehensibility Best Practice introduced by
$rechunked was never initialized. Although not strictly required by PHP, it is generally a good practice to add $rechunked = array(); before regardless.
Loading history...
72
73
        // Go through everything, converting edited tags (and their content) into single chunks.  Otherwise
74
        // the generated HTML gets crusty
75
        foreach ($diff->edits as $edit) {
76
            $lookForTag = false;
77
            $stuffFor = [];
78
            switch ($edit->type) {
79
                case 'copy':
80
                    $lookForTag = false;
81
                    $stuffFor[1] = $edit->orig;
82
                    $stuffFor[2] = $edit->orig;
83
                    break;
84
85
                case 'change':
86
                    $lookForTag = true;
87
                    $stuffFor[1] = $edit->orig;
88
                    $stuffFor[2] = $edit->final;
89
                    break;
90
91
                case 'add':
92
                    $lookForTag = true;
93
                    $stuffFor[1] = null;
94
                    $stuffFor[2] = $edit->final;
95
                    break;
96
97
                case 'delete':
98
                    $lookForTag = true;
99
                    $stuffFor[1] = $edit->orig;
100
                    $stuffFor[2] = null;
101
                    break;
102
            }
103
104
            foreach ($stuffFor as $listName => $chunks) {
105
                if ($chunks) {
106
                    foreach ($chunks as $item) {
107
                        // $tagStack > 0 indicates that we should be tag-building
108
                        if ($tagStack[$listName]) {
109
                            $rechunked[$listName][sizeof($rechunked[$listName])-1] .= ' ' . $item;
110
                        } else {
111
                            $rechunked[$listName][] = $item;
112
                        }
113
114
                        if ($lookForTag
115
                            && !$tagStack[$listName]
116
                            && isset($item[0])
117
                            && $item[0] == "<"
118
                            && substr($item, 0, 2) != "</"
119
                        ) {
120
                            $tagStack[$listName] = 1;
121
                        } elseif ($tagStack[$listName]) {
122
                            if (substr($item, 0, 2) == "</") {
123
                                $tagStack[$listName]--;
124
                            } elseif (isset($item[0]) && $item[0] == "<") {
125
                                $tagStack[$listName]++;
126
                            }
127
                        }
128
                    }
129
                }
130
            }
131
        }
132
133
        // Diff the re-chunked data, turning it into maked up HTML
134
        $diff = new Diff($rechunked[1], $rechunked[2]);
135
        $content = '';
136
        foreach ($diff->edits as $edit) {
137
            $orig = ($escape) ? Convert::raw2xml($edit->orig) : $edit->orig;
138
            $final = ($escape) ? Convert::raw2xml($edit->final) : $edit->final;
139
140
            switch ($edit->type) {
141
                case 'copy':
142
                    $content .= " " . implode(" ", $orig) . " ";
0 ignored issues
show
Bug introduced by
It seems like $orig can also be of type string; however, parameter $pieces of implode() does only seem to accept array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

142
                    $content .= " " . implode(" ", /** @scrutinizer ignore-type */ $orig) . " ";
Loading history...
143
                    break;
144
145
                case 'change':
146
                    $content .= " <ins>" . implode(" ", $final) . "</ins> ";
147
                    $content .= " <del>" . implode(" ", $orig) . "</del> ";
148
                    break;
149
150
                case 'add':
151
                    $content .= " <ins>" . implode(" ", $final) . "</ins> ";
152
                    break;
153
154
                case 'delete':
155
                    $content .= " <del>" . implode(" ", $orig) . "</del> ";
156
                    break;
157
            }
158
        }
159
160
        return self::cleanHTML($content);
161
    }
162
163
    /**
164
     * @param string|bool|array $content If passed as an array, values will be concatenated with a comma.
165
     * @return array
166
     */
167
    public static function getHTMLChunks($content)
168
    {
169
        if ($content && !is_string($content) && !is_array($content) && !is_numeric($content) && !is_bool($content)) {
0 ignored issues
show
introduced by
The condition is_bool($content) is always true.
Loading history...
170
            throw new InvalidArgumentException('$content parameter needs to be a string or array');
171
        }
172
        if (is_bool($content)) {
173
            // Convert boolean to strings
174
            $content = $content ? "true" : "false";
175
        }
176
        if (is_array($content)) {
177
            $content = array_filter($content, 'is_scalar');
178
            // Convert array to CSV
179
            $content = implode(',', $content);
180
        }
181
182
        $content = str_replace(array("&nbsp;", "<", ">"), array(" "," <", "> "), $content);
183
        $candidateChunks = preg_split("/[\t\r\n ]+/", $content);
184
        $chunks = [];
185
        for ($i = 0; $i < count($candidateChunks); $i++) {
0 ignored issues
show
Bug introduced by
It seems like $candidateChunks can also be of type false; however, parameter $var of count() does only seem to accept Countable|array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

185
        for ($i = 0; $i < count(/** @scrutinizer ignore-type */ $candidateChunks); $i++) {
Loading history...
Performance Best Practice introduced by
It seems like you are calling the size function count() as part of the test condition. You might want to compute the size beforehand, and not on each iteration.

If the size of the collection does not change during the iteration, it is generally a good practice to compute it beforehand, and not on each iteration:

for ($i=0; $i<count($array); $i++) { // calls count() on each iteration
}

// Better
for ($i=0, $c=count($array); $i<$c; $i++) { // calls count() just once
}
Loading history...
186
            $item = $candidateChunks[$i];
187
            if (isset($item[0]) && $item[0] == "<") {
188
                $newChunk = $item;
189
                while ($item[strlen($item)-1] != ">") {
190
                    if (++$i >= count($candidateChunks)) {
191
                        break;
192
                    }
193
                    $item = $candidateChunks[$i];
194
                    $newChunk .= ' ' . $item;
195
                }
196
                $chunks[] = $newChunk;
197
            } else {
198
                $chunks[] = $item;
199
            }
200
        }
201
        return $chunks;
202
    }
203
}
204