Test Failed
Pull Request — master (#96)
by Sven
03:16
created

ListDiffLines::getChildNodeByIndex()   A

Complexity

Conditions 4
Paths 4

Size

Total Lines 18

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 0
CRAP Score 20

Importance

Changes 0
Metric Value
cc 4
nc 4
nop 2
dl 0
loc 18
ccs 0
cts 0
cp 0
crap 20
rs 9.6666
c 0
b 0
f 0
1
<?php
2
3
namespace Caxy\HtmlDiff;
4
5
use Caxy\HtmlDiff\Strategy\ListItemMatchStrategy;
6
use DOMDocument;
7
use DOMElement;
8
use DOMNode;
9
use DOMText;
10
use DOMXPath;
11
use LogicException;
12
13
class ListDiffLines extends AbstractDiff
14
{
15
    private const CLASS_LIST_ITEM_ADDED   = 'normal new';
16
    private const CLASS_LIST_ITEM_DELETED = 'removed';
17
    private const CLASS_LIST_ITEM_CHANGED = 'replacement';
18
    private const CLASS_LIST_ITEM_NONE    = 'normal';
19
20
    protected const LIST_TAG_NAMES = ['ul', 'ol', 'dl'];
21
22
    /**
23
     * List of tags that should be included when retrieving
24
     * text from a single list item that will be used in
25
     * matching logic (and only in matching logic).
26
     *
27
     * @see getRelevantNodeText()
28
     *
29
     * @var array
30
     */
31
    protected static $listContentTags = [
32
        'h1', 'h2', 'h3', 'h4', 'h5', 'pre', 'div', 'br', 'hr', 'code',
33
        'input', 'form', 'img', 'span', 'a', 'i', 'b', 'strong', 'em',
34
        'font', 'big', 'del', 'tt', 'sub', 'sup', 'strike',
35
    ];
36
37
    /**
38
     * @var LcsService
39
     */
40
    protected $lcsService;
41
42
    /**
43
     * @var array<string, DOMElement>
44 7
     */
45
    private $nodeCache;
46 7
47
    /**
48 7
     * @param string              $oldText
49 7
     * @param string              $newText
50
     * @param HtmlDiffConfig|null $config
51
     *
52 7
     * @return ListDiffLines
53
     */
54
    public static function create($oldText, $newText, HtmlDiffConfig $config = null)
55
    {
56
        $diff = new self($oldText, $newText);
57
58 7
        if (null !== $config) {
59
            $diff->setConfig($config);
60 7
        }
61
62 7
        return $diff;
63
    }
64
65
    /**
66
     * {@inheritDoc}
67
     */
68 7
    public function build()
69 7
    {
70
        $this->prepare();
71 7
72
        if ($this->hasDiffCache() && $this->getDiffCache()->contains($this->oldText, $this->newText)) {
73
            $this->content = $this->getDiffCache()->fetch($this->oldText, $this->newText);
74
75
            return $this->content;
76
        }
77
78
        $this->lcsService = new LcsService(
79
            new ListItemMatchStrategy($this->stringUtil, $this->config->getMatchThreshold())
80 7
        );
81
82
        return $this->listByLines($this->oldText, $this->newText);
83 7
    }
84
85 7
    protected function listByLines(string $old, string $new) : string
86
    {
87 7
        $new = mb_convert_encoding($new, 'HTML-ENTITIES', "UTF-8");
88 7
        $old = mb_convert_encoding($old, 'HTML-ENTITIES', "UTF-8");
89
90 7
        $newDom = new DOMDocument();
91
        $newDom->loadHTML($new);
92 7
93
        $oldDom = new DOMDocument();
94
        $oldDom->loadHTML($old);
95
96
        $newListNode = $this->findListNode($newDom);
97
        $oldListNode = $this->findListNode($oldDom);
98
99
        $operations = $this->getListItemOperations($oldListNode, $newListNode);
0 ignored issues
show
Compatibility introduced by
$oldListNode of type object<DOMNode> is not a sub-type of object<DOMElement>. It seems like you assume a child class of the class DOMNode to be always present.

This check looks for parameters that are defined as one type in their type hint or doc comment but seem to be used as a narrower type, i.e an implementation of an interface or a subclass.

Consider changing the type of the parameter or doing an instanceof check before assuming your parameter is of the expected type.

Loading history...
Compatibility introduced by
$newListNode of type object<DOMNode> is not a sub-type of object<DOMElement>. It seems like you assume a child class of the class DOMNode to be always present.

This check looks for parameters that are defined as one type in their type hint or doc comment but seem to be used as a narrower type, i.e an implementation of an interface or a subclass.

Consider changing the type of the parameter or doing an instanceof check before assuming your parameter is of the expected type.

Loading history...
100 7
101
        return $this->processOperations($operations, $oldListNode, $newListNode);
0 ignored issues
show
Compatibility introduced by
$oldListNode of type object<DOMNode> is not a sub-type of object<DOMElement>. It seems like you assume a child class of the class DOMNode to be always present.

This check looks for parameters that are defined as one type in their type hint or doc comment but seem to be used as a narrower type, i.e an implementation of an interface or a subclass.

Consider changing the type of the parameter or doing an instanceof check before assuming your parameter is of the expected type.

Loading history...
Compatibility introduced by
$newListNode of type object<DOMNode> is not a sub-type of object<DOMElement>. It seems like you assume a child class of the class DOMNode to be always present.

This check looks for parameters that are defined as one type in their type hint or doc comment but seem to be used as a narrower type, i.e an implementation of an interface or a subclass.

Consider changing the type of the parameter or doing an instanceof check before assuming your parameter is of the expected type.

Loading history...
102 7
    }
103
104
    protected function findListNode(DOMDocument $dom) : DOMNode
105
    {
106
        $xPathQuery = '//' . implode('|//', self::LIST_TAG_NAMES);
107
        $xPath      = new DOMXPath($dom);
108
        $listNodes  = $xPath->query($xPathQuery);
109
110
        if ($listNodes->length > 0) {
111 7
            return $listNodes->item(0);
112
        }
113
114 7
        throw new LogicException('Unable to diff list; missing list node');
115 7
    }
116
117 7
    /**
118
     * @return Operation[]
119 7
     */
120 7
    protected function getListItemOperations(DOMElement $oldListNode, DOMElement $newListNode) : array
121
    {
122 7
        // Prepare arrays of list item content to use in LCS algorithm
123 7
        $oldListText = $this->getListTextArray($oldListNode);
124 7
        $newListText = $this->getListTextArray($newListNode);
125 7
126 7
        $lcsMatches = $this->lcsService->longestCommonSubsequence($oldListText, $newListText);
127
128 7
        $oldLength = count($oldListText);
129 7
        $newLength = count($newListText);
130
131
        $operations = array();
132 7
        $currentLineInOld = 0;
133 7
        $currentLineInNew = 0;
134
        $lcsMatches[$oldLength + 1] = $newLength + 1;
135 7
        foreach ($lcsMatches as $matchInOld => $matchInNew) {
136
            // No matching line in new list
137 1
            if ($matchInNew === 0) {
138 1
                continue;
139
            }
140 1
141
            $nextLineInOld = $currentLineInOld + 1;
142 1
            $nextLineInNew = $currentLineInNew + 1;
143
144 7
            if ($matchInNew > $nextLineInNew && $matchInOld > $nextLineInOld) {
145
                // Change
146 3
                $operations[] = new Operation(
147 3
                    Operation::CHANGED,
148
                    $nextLineInOld,
149
                    $matchInOld - 1,
150
                    $nextLineInNew,
151 3
                    $matchInNew - 1
152
                );
153 7
            } elseif ($matchInNew > $nextLineInNew && $matchInOld === $nextLineInOld) {
154
                // Add items before this
155 2
                $operations[] = new Operation(
156 2
                    Operation::ADDED,
157
                    $currentLineInOld,
158 2
                    $currentLineInOld,
159
                    $nextLineInNew,
160
                    $matchInNew - 1
161
                );
162
            } elseif ($matchInNew === $nextLineInNew && $matchInOld > $nextLineInOld) {
163
                // Delete items before this
164 7
                $operations[] = new Operation(
165 7
                    Operation::DELETED,
166
                    $nextLineInOld,
167
                    $matchInOld - 1,
168 7
                    $currentLineInNew,
169
                    $currentLineInNew
170
                );
171
            }
172
173
            $currentLineInNew = $matchInNew;
174
            $currentLineInOld = $matchInOld;
175
        }
176 7
177
        return $operations;
178 7
    }
179 7
180 7
    /**
181
     * @return string[]
182
     */
183 7
    protected function getListTextArray(DOMElement $listNode) : array
184
    {
185
        $output = [];
186
187
        foreach ($listNode->childNodes as $listItem) {
188
            if ($listItem instanceof DOMText) {
189
                continue;
190
            }
191 7
192
            $output[] = $this->getRelevantNodeText($listItem);
193 7
        }
194 3
195
        return $output;
196
    }
197 5
198 5
    protected function getRelevantNodeText(DOMNode $node) : string
199
    {
200 5
        if ($node->hasChildNodes() === false) {
201 5
            return $node->textContent;
202 4
        }
203
204
        $output = '';
205
206
        /** @var DOMElement $child */
207 5
        foreach ($node->childNodes as $child) {
208
            if ($child->hasChildNodes() === false) {
209
                $output .= $this->getOuterText($child);
210
211
                continue;
212
            }
213
214
            if (in_array($child->tagName, static::$listContentTags, true) === true) {
215 3
                $output .= sprintf(
216
                    '<%1$s>%2$s</%1$s>',
217 3
                    $child->tagName,
218 3
                    $this->getRelevantNodeText($child)
219
                );
220 3
            }
221
        }
222
223
        return $output;
224
    }
225
226
    protected function deleteListItem(DOMElement $li) : string
227
    {
228
        $this->wrapNodeContent($li, 'del');
229 4
230
        $this->appendClassToNode($li, self::CLASS_LIST_ITEM_DELETED);
231 4
232 4
        return $this->getOuterText($li);
233
    }
234 4
235
    protected function addListItem(DOMElement $li, bool $replacement = false) : string
236
    {
237
        $this->wrapNodeContent($li, 'ins');
238
239
        $this->appendClassToNode(
240
            $li,
241
            $replacement === true ? self::CLASS_LIST_ITEM_CHANGED : self::CLASS_LIST_ITEM_ADDED
242
        );
243
244 7
        return $this->getOuterText($li);
245
    }
246 7
247
    /**
248 7
     * @param Operation[] $operations
249 7
     */
250 7
    protected function processOperations(array $operations, DOMElement $oldListNode, DOMElement $newListNode) : string
251
    {
252 7
        $output = '';
253 5
254 5
        $indexInOld = 0;
255 4
        $indexInNew = 0;
256 4
        $lastOperation = null;
257 4
258 4
        foreach ($operations as $operation) {
259
            $replaced = false;
260 4
            while ($operation->startInOld > ($operation->action === Operation::ADDED ? $indexInOld : $indexInOld + 1)) {
261
                $li = $this->getChildNodeByIndex($oldListNode, $indexInOld);
262 4
                $matchingLi = null;
263 4
                if ($operation->startInNew > ($operation->action === Operation::DELETED ? $indexInNew
264 4
                        : $indexInNew + 1)
265 4
                ) {
266
                    $matchingLi = $this->getChildNodeByIndex($newListNode, $indexInNew);
267 4
                }
268
269 4
                if (null !== $matchingLi) {
270 1
                    $htmlDiff = HtmlDiff::create(
271 1
                        $this->getInnerHtml($li),
272
                        $this->getInnerHtml($matchingLi),
273 4
                        $this->config
274
                    );
275 4
276 4
                    $this->setInnerHtml($li, $htmlDiff->build());
277
278
                    $indexInNew++;
279 5
                }
280
281 3
                $class = self::CLASS_LIST_ITEM_NONE;
282 3
283
                if ($lastOperation === Operation::DELETED && !$replaced) {
284 3
                    $class = self::CLASS_LIST_ITEM_CHANGED;
285 3
                    $replaced = true;
286
                }
287
288 2
                $this->appendClassToNode($li, $class);
289 2
290
                $output .= $this->getOuterText($li);
291 2
                $indexInOld++;
292 2
            }
293
294
            switch ($operation->action) {
295 1
                case Operation::ADDED:
296 1
                    for ($i = $operation->startInNew; $i <= $operation->endInNew; $i++) {
297 1
                        $output .= $this->addListItem(
298 1
                            $this->getChildNodeByIndex($newListNode, $i - 1)
299
                        );
300 1
                    }
301 1
                    $indexInNew = $operation->endInNew;
302 1
                    break;
303
304 1
                case Operation::DELETED:
305 1
                    for ($i = $operation->startInOld; $i <= $operation->endInOld; $i++) {
306 1
                        $output .= $this->deleteListItem(
307
                            $this->getChildNodeByIndex($oldListNode, $i - 1)
308
                        );
309 5
                    }
310
                    $indexInOld = $operation->endInOld;
311
                    break;
312 7
313 7
                case Operation::CHANGED:
314 7
                    $changeDelta = 0;
315 4
                    for ($i = $operation->startInOld; $i <= $operation->endInOld; $i++) {
316 4
                        $output .= $this->deleteListItem(
317 4
                            $this->getChildNodeByIndex($oldListNode, $i - 1)
318 4
                        );
319
                        $changeDelta--;
320 4
                    }
321 4
                    for ($i = $operation->startInNew; $i <= $operation->endInNew; $i++) {
322 4
                        $output .= $this->addListItem(
323 4
                            $this->getChildNodeByIndex($newListNode, $i - 1),
324
                            ($changeDelta < 0)
325 4
                        );
326
                        $changeDelta++;
327 4
                    }
328
                    $indexInOld = $operation->endInOld;
329
                    $indexInNew = $operation->endInNew;
330 4
                    break;
331
            }
332 4
333 4
            $lastOperation = $operation->action;
334
        }
335
336 7
        $oldCount = $this->childCountWithoutTextNode($oldListNode);
337 7
        $newCount = $this->childCountWithoutTextNode($newListNode);
338
339 7
        while ($indexInOld < $oldCount) {
340
            $li = $this->getChildNodeByIndex($oldListNode, $indexInOld);
341
            $matchingLi = null;
342
            if ($indexInNew < $newCount) {
343
                $matchingLi = $this->getChildNodeByIndex($newListNode, $indexInNew);
344
            }
345
346 5
            if (null !== $matchingLi) {
347
                $htmlDiff = HtmlDiff::create(
348 5
                    $this->getInnerHtml($li),
349 5
                    $this->getInnerHtml($matchingLi),
350
                    $this->config
351
                );
352
353
                $this->setInnerHtml($li, $htmlDiff->build());
354
355
                $indexInNew++;
356
            }
357
358
            $class = self::CLASS_LIST_ITEM_NONE;
359
360
            if ($lastOperation === Operation::DELETED) {
361
                $class = self::CLASS_LIST_ITEM_CHANGED;
362
            }
363
364
            $this->appendClassToNode($li, $class);
365
366
            $output .= $this->getOuterText($li);
367
            $indexInOld++;
368
        }
369
370
        $this->setInnerHtml($newListNode, $output);
371
        $this->appendClassToNode($newListNode, 'diff-list');
372
373
        return $newListNode->ownerDocument->saveHTML($newListNode);
374
    }
375
376
    protected function appendClassToNode(DOMElement $node, string $class)
377
    {
378
        $node->setAttribute(
379
            'class',
380
            trim(sprintf('%s %s', $node->getAttribute('class'), $class))
381
        );
382
    }
383
384
    private function getOuterText(DOMNode $node) : string
385
    {
386
        return $node->ownerDocument->saveHTML($node);
387
    }
388
389
    private function getInnerHtml(DOMNode $node) : string
390
    {
391
        $bufferDom = new DOMDocument('1.0', 'UTF-8');
392
393
        foreach($node->childNodes as $childNode)
0 ignored issues
show
Coding Style introduced by
Expected 1 space(s) after FOREACH keyword; 0 found
Loading history...
394
        {
395
            $bufferDom->appendChild($bufferDom->importNode($childNode, true));
396
        }
397
398
        return trim($bufferDom->saveHTML());
399
    }
400
401
    private function setInnerHtml(DOMNode $node, string $html) : void
402
    {
403
        $html = sprintf('<%s>%s</%s>', 'body', $html, 'body');
404
        $html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');
405
406
        $node->nodeValue = '';
407
408
        $bufferDom = new DOMDocument('1.0', 'UTF-8');
409
        $bufferDom->loadHTML($html);
410
411
        $bodyNode = $bufferDom->getElementsByTagName('body')->item(0);
412
413
        foreach ($bodyNode->childNodes as $childNode) {
414
            $node->appendChild($node->ownerDocument->importNode($childNode, true));
415
        }
416
417
        $this->nodeCache = [];
418
    }
419
420
    private function wrapNodeContent(DOMElement $node, string $tagName) : void
421
    {
422
        $childNodes = [];
423
424
        foreach ($node->childNodes as $childNode) {
425
            $childNodes[] = $childNode;
426
        }
427
428
        $wrapNode = $node->ownerDocument->createElement($tagName);
429
430
        $node->appendChild($wrapNode);
431
432
        foreach ($childNodes as $childNode) {
433
            $wrapNode->appendChild($childNode);
434
        }
435
    }
436
437
    private function childCountWithoutTextNode(DOMNode $node) : int
438
    {
439
        $counter = 0;
440
441
        foreach ($node->childNodes as $childNode) {
442
            if ($childNode instanceof DOMText) {
443
                continue;
444
            }
445
446
            $counter++;
447
        }
448
449
        return $counter;
450
    }
451
452
    private function getChildNodeByIndex(DOMNode $node, int $index) : DOMElement
453
    {
454
        $nodeHash = spl_object_hash($node);
455
456
        if (isset($this->nodeCache[$nodeHash]) === true) {
457
            return $this->nodeCache[$nodeHash][$index];
458
        }
459
460
        $listCache[$nodeHash] = [];
0 ignored issues
show
Coding Style Comprehensibility introduced by
$listCache was never initialized. Although not strictly required by PHP, it is generally a good practice to add $listCache = array(); before regardless.

Adding an explicit array definition is generally preferable to implicit array definition as it guarantees a stable state of the code.

Let’s take a look at an example:

foreach ($collection as $item) {
    $myArray['foo'] = $item->getFoo();

    if ($item->hasBar()) {
        $myArray['bar'] = $item->getBar();
    }

    // do something with $myArray
}

As you can see in this example, the array $myArray is initialized the first time when the foreach loop is entered. You can also see that the value of the bar key is only written conditionally; thus, its value might result from a previous iteration.

This might or might not be intended. To make your intention clear, your code more readible and to avoid accidental bugs, we recommend to add an explicit initialization $myArray = array() either outside or inside the foreach loop.

Loading history...
461
462
        foreach ($node->childNodes as $childNode) {
463
            if ($childNode instanceof DOMText === false) {
464
                $this->nodeCache[$nodeHash][] = $childNode;
465
            }
466
        }
467
468
        return $this->nodeCache[$nodeHash][$index];
469
    }
470
}
471