Passed
Pull Request — master (#31)
by Josh
04:14
created

TableDiff::normalizeFormat()   D

Complexity

Conditions 19
Paths 433

Size

Total Lines 63
Code Lines 38

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 0
CRAP Score 380

Importance

Changes 5
Bugs 0 Features 2
Metric Value
c 5
b 0
f 2
dl 0
loc 63
ccs 0
cts 48
cp 0
rs 4.3489
cc 19
eloc 38
nc 433
nop 0
crap 380

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
namespace Caxy\HtmlDiff\Table;
4
5
use Caxy\HtmlDiff\AbstractDiff;
6
use Caxy\HtmlDiff\HtmlDiff;
7
use Caxy\HtmlDiff\Operation;
8
9
/**
10
 * @todo Add getters to TableMatch entity
11
 * @todo Move applicable functions to new table classes
12
 * @todo find matches of row/cells in order to handle row/cell additions/deletions
13
 * @todo clean up way to iterate between new and old cells
14
 * @todo Make sure diffed table keeps <tbody> or other table structure elements
15
 * @todo Encoding
16
 */
17
class TableDiff extends AbstractDiff
18
{
19
    /**
20
     * @var null|Table
21
     */
22
    protected $oldTable = null;
23
24
    /**
25
     * @var null|Table
26
     */
27
    protected $newTable = null;
28
29
    /**
30
     * @var null|Table
31
     */
32
    protected $diffTable = null;
33
34
    /**
35
     * @var null|\DOMDocument
36
     */
37
    protected $diffDom = null;
38
39
    /**
40
     * @var int
41
     */
42
    protected $newRowOffsets = 0;
43
44
    /**
45
     * @var int
46
     */
47
    protected $oldRowOffsets = 0;
48
49
    /**
50
     * @var array
51
     */
52
    protected $cellValues = array();
53
54
    /**
55
     * @var \HTMLPurifier
56
     */
57
    protected $purifier;
58
59
    /**
60
     * @var string
61
     */
62
    protected $strategy = self::STRATEGY_MATCHING;
63
64
    public function __construct($oldText, $newText, $encoding, $specialCaseTags, $groupDiffs)
65
    {
66
        parent::__construct($oldText, $newText, $encoding, $specialCaseTags, $groupDiffs);
67
68
        $config = \HTMLPurifier_Config::createDefault();
69
        $this->purifier = new \HTMLPurifier($config);
70
    }
71
72
    public function build()
73
    {
74
        $this->buildTableDoms();
75
76
        $this->diffDom = new \DOMDocument();
77
78
        $this->normalizeFormat();
79
80
        $this->indexCellValues($this->newTable);
0 ignored issues
show
Bug introduced by
It seems like $this->newTable can be null; however, indexCellValues() does not accept null, maybe add an additional type check?

Unless you are absolutely sure that the expression can never be null because of other conditions, we strongly recommend to add an additional type check to your code:

/** @return stdClass|null */
function mayReturnNull() { }

function doesNotAcceptNull(stdClass $x) { }

// With potential error.
function withoutCheck() {
    $x = mayReturnNull();
    doesNotAcceptNull($x); // Potential error here.
}

// Safe - Alternative 1
function withCheck1() {
    $x = mayReturnNull();
    if ( ! $x instanceof stdClass) {
        throw new \LogicException('$x must be defined.');
    }
    doesNotAcceptNull($x);
}

// Safe - Alternative 2
function withCheck2() {
    $x = mayReturnNull();
    if ($x instanceof stdClass) {
        doesNotAcceptNull($x);
    }
}
Loading history...
81
82
        $this->diffTableContent();
83
84
        return $this->content;
85
    }
86
87
    protected function normalizeFormat()
88
    {
89
        $oldRows = $this->oldTable->getRows();
90
        $newRows = $this->newTable->getRows();
91
92
        foreach ($newRows as $rowIndex => $newRow) {
93
            $oldRow = isset($oldRows[$rowIndex]) ? $oldRows[$rowIndex] : null;
94
95
            if (!$oldRow) {
96
                continue;
97
            }
98
99
            $newRowOffset = 0;
100
            $oldRowOffset = 0;
101
102
            $newCells = $newRow->getCells();
103
            $oldCells = $oldRow->getCells();
104
105
            foreach ($newCells as $cellIndex => $newCell) {
106
                $oldCell = isset($oldCells[$cellIndex]) ? $oldCells[$cellIndex] : null;
107
108
                if ($oldCell) {
109
                    $oldNode = $oldCell->getDomNode();
110
                    $newNode = $newCell->getDomNode();
111
112
                    $oldRowspan = $oldNode->getAttribute('rowspan') ?: 1;
113
                    $newRowspan = $newNode->getAttribute('rowspan') ?: 1;
114
115
                    if ($oldRowspan > $newRowspan) {
116
                        // add placeholders in next row of new rows
117
                        $offset = $oldRowspan - $newRowspan;
118
                        if ($offset > $newRowOffset) {
119
                            $newRowOffset = $offset;
120
                        }
121
                    } elseif ($newRowspan > $oldRowspan) {
122
                        $offset = $newRowspan - $oldRowspan;
123
                        if ($offset > $oldRowOffset) {
124
                            $oldRowOffset = $offset;
125
                        }
126
                    }
127
                }
128
            }
129
130
            if ($oldRowOffset > 0 && isset($newRows[$rowIndex + 1])) {
131
                $blankRow = $this->diffDom->createElement('tr');
132
133
                $insertArray = array();
134
                for ($i = 0; $i < $oldRowOffset; $i++) {
135
                    $insertArray[] = new TableRow($blankRow);
136
                }
137
138
                $this->oldTable->insertRows($insertArray, $rowIndex + 1);
139
            } elseif ($newRowOffset > 0 && isset($newRows[$rowIndex + 1])) {
140
                $blankRow = $this->diffDom->createElement('tr');
141
142
                $insertArray = array();
143
                for ($i = 0; $i < $newRowOffset; $i++) {
144
                    $insertArray[] = new TableRow($blankRow);
145
                }
146
                $this->newTable->insertRows($insertArray, $rowIndex + 1);
147
            }
148
        }
149
    }
150
151
    protected function diffTableContent()
152
    {
153
        $this->diffDom = new \DOMDocument();
154
        $this->diffTable = $this->diffDom->importNode($this->newTable->getDomNode()->cloneNode(false), false);
0 ignored issues
show
Documentation Bug introduced by
It seems like $this->diffDom->importNo...loneNode(false), false) of type object<DOMNode> is incompatible with the declared type null|object<Caxy\HtmlDiff\Table\Table> of property $diffTable.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
155
        $this->diffDom->appendChild($this->diffTable);
156
157
        $oldRows = $this->oldTable->getRows();
158
        $newRows = $this->newTable->getRows();
159
160
        $oldMatchData = array();
161
        $newMatchData = array();
162
163
        /* @var $oldRow TableRow */
164
        foreach ($oldRows as $oldIndex => $oldRow) {
165
            $oldMatchData[$oldIndex] = array();
166
167
            // Get match percentages
168
            /* @var $newRow TableRow */
169
            foreach ($newRows as $newIndex => $newRow) {
170
                if (!array_key_exists($newIndex, $newMatchData)) {
171
                    $newMatchData[$newIndex] = array();
172
                }
173
174
                // similar_text
175
                $percentage = $this->getMatchPercentage($oldRow, $newRow);
176
177
                $oldMatchData[$oldIndex][$newIndex] = $percentage;
178
                $newMatchData[$newIndex][$oldIndex] = $percentage;
179
            }
180
        }
181
182
        // new solution for diffing rows
183
        switch ($this->strategy) {
184
            case self::STRATEGY_MATCHING:
185
                $matches = $this->getRowMatches($oldMatchData, $newMatchData);
186
                $this->diffTableRowsWithMatches($oldRows, $newRows, $matches);
187
                break;
188
189
            case self::STRATEGY_RELATIVE:
190
                $this->diffTableRows($oldRows, $newRows, $oldMatchData);
191
                break;
192
193
            default:
194
                $matches = $this->getRowMatches($oldMatchData, $newMatchData);
195
                $this->diffTableRowsWithMatches($oldRows, $newRows, $matches);
196
                break;
197
        }
198
199
        $this->content = $this->htmlFromNode($this->diffTable);
200
    }
201
202
    /**
203
     * @param TableRow[] $oldRows
204
     * @param TableRow[] $newRows
205
     * @param RowMatch[] $matches
206
     */
207
    protected function diffTableRowsWithMatches($oldRows, $newRows, $matches)
208
    {
209
        $operations = array();
210
211
        $indexInOld = 0;
212
        $indexInNew = 0;
213
214
        $oldRowCount = count($oldRows);
215
        $newRowCount = count($newRows);
216
217
        $matches[] = new RowMatch($newRowCount, $oldRowCount, $newRowCount, $oldRowCount);
218
219
        // build operations
220
        foreach ($matches as $match) {
221
            $matchAtIndexInOld = $indexInOld === $match->getStartInOld();
222
            $matchAtIndexInNew = $indexInNew === $match->getStartInNew();
223
224
            $action = 'equal';
225
226
            if (!$matchAtIndexInOld && !$matchAtIndexInNew) {
227
                $action = 'replace';
228
            } elseif ($matchAtIndexInOld && !$matchAtIndexInNew) {
229
                $action = 'insert';
230
            } elseif (!$matchAtIndexInOld && $matchAtIndexInNew) {
231
                $action = 'delete';
232
            }
233
234
            if ($action !== 'equal') {
235
                $operations[] = new Operation($action, $indexInOld, $match->getStartInOld(), $indexInNew, $match->getStartInNew());
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 131 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
236
            }
237
238
            $operations[] = new Operation('equal', $match->getStartInOld(), $match->getEndInOld(), $match->getStartInNew(), $match->getEndInNew());
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 147 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
239
240
            $indexInOld = $match->getEndInOld();
241
            $indexInNew = $match->getEndInNew();
242
        }
243
244
        $appliedRowSpans = array();
245
246
        // process operations
247
        foreach ($operations as $operation) {
248
            switch ($operation->action) {
249
                case 'equal':
250
                    $this->processEqualOperation($operation, $oldRows, $newRows, $appliedRowSpans);
251
                    break;
252
253
                case 'delete':
254
                    $this->processDeleteOperation($operation, $oldRows, $appliedRowSpans);
255
                    break;
256
257
                case 'insert':
258
                    $this->processInsertOperation($operation, $newRows, $appliedRowSpans);
259
                    break;
260
261
                case 'replace':
262
                    $this->processReplaceOperation($operation, $oldRows, $newRows, $appliedRowSpans);
263
                    break;
264
            }
265
        }
266
    }
267
268 View Code Duplication
    protected function processInsertOperation(Operation $operation, $newRows, &$appliedRowSpans, $forceExpansion = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 121 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
269
    {
270
        $targetRows = array_slice($newRows, $operation->startInNew, $operation->endInNew - $operation->startInNew);
271
        foreach ($targetRows as $row) {
272
            $this->diffAndAppendRows(null, $row, $appliedRowSpans, $forceExpansion);
273
        }
274
    }
275
276 View Code Duplication
    protected function processDeleteOperation(Operation $operation, $oldRows, &$appliedRowSpans, $forceExpansion = false)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 121 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
277
    {
278
        $targetRows = array_slice($oldRows, $operation->startInOld, $operation->endInOld - $operation->startInOld);
279
        foreach ($targetRows as $row) {
280
            $this->diffAndAppendRows($row, null, $appliedRowSpans, $forceExpansion);
281
        }
282
    }
283
284
    protected function processEqualOperation(Operation $operation, $oldRows, $newRows, &$appliedRowSpans)
285
    {
286
        $targetOldRows = array_values(array_slice($oldRows, $operation->startInOld, $operation->endInOld - $operation->startInOld));
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 132 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
287
        $targetNewRows = array_values(array_slice($newRows, $operation->startInNew, $operation->endInNew - $operation->startInNew));
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 132 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
288
289
        foreach ($targetNewRows as $index => $newRow) {
290
            if (!isset($targetOldRows[$index])) {
291
                continue;
292
            }
293
294
            $this->diffAndAppendRows($targetOldRows[$index], $newRow, $appliedRowSpans);
295
        }
296
    }
297
298
    protected function processReplaceOperation(Operation $operation, $oldRows, $newRows, &$appliedRowSpans)
299
    {
300
        $this->processDeleteOperation($operation, $oldRows, $appliedRowSpans, true);
301
        $this->processInsertOperation($operation, $newRows, $appliedRowSpans, true);
302
    }
303
304
    protected function getRowMatches($oldMatchData, $newMatchData)
0 ignored issues
show
Documentation introduced by
The return type could not be reliably inferred; please add a @return annotation.

Our type inference engine in quite powerful, but sometimes the code does not provide enough clues to go by. In these cases we request you to add a @return annotation as described here.

Loading history...
305
    {
306
        $matches = array();
307
308
        $startInOld = 0;
309
        $startInNew = 0;
310
        $endInOld = count($oldMatchData);
311
        $endInNew = count($newMatchData);
312
313
        $this->findRowMatches($newMatchData, $startInOld, $endInOld, $startInNew, $endInNew, $matches);
314
315
        return $matches;
316
    }
317
318
    protected function findRowMatches($newMatchData, $startInOld, $endInOld, $startInNew, $endInNew, &$matches)
319
    {
320
        $match = $this->findRowMatch($newMatchData, $startInOld, $endInOld, $startInNew, $endInNew);
321
        if ($match !== null) {
322
            if ($startInOld < $match->getStartInOld() &&
323
                $startInNew < $match->getStartInNew()
324
            ) {
325
                $this->findRowMatches(
326
                    $newMatchData,
327
                    $startInOld,
328
                    $match->getStartInOld(),
329
                    $startInNew,
330
                    $match->getStartInNew(),
331
                    $matches
332
                );
333
            }
334
335
            $matches[] = $match;
336
337
            if ($match->getEndInOld() < $endInOld &&
338
                $match->getEndInNew() < $endInNew
339
            ) {
340
                $this->findRowMatches($newMatchData, $match->getEndInOld(), $endInOld, $match->getEndInNew(), $endInNew, $matches);
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 131 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
341
            }
342
        }
343
    }
344
345
    protected function findRowMatch($newMatchData, $startInOld, $endInOld, $startInNew, $endInNew)
346
    {
347
        $bestMatch = null;
348
        $bestPercentage = 0;
349
350
        foreach ($newMatchData as $newIndex => $oldMatches) {
351
            if ($newIndex < $startInNew) {
352
                continue;
353
            }
354
355
            if ($newIndex >= $endInNew) {
356
                break;
357
            }
358
            foreach ($oldMatches as $oldIndex => $percentage) {
359
                if ($oldIndex < $startInOld) {
360
                    continue;
361
                }
362
363
                if ($oldIndex >= $endInOld) {
364
                    break;
365
                }
366
367
                if ($percentage > $bestPercentage) {
368
                    $bestPercentage = $percentage;
369
                    $bestMatch = array(
370
                        'oldIndex' => $oldIndex,
371
                        'newIndex' => $newIndex,
372
                        'percentage' => $percentage,
373
                    );
374
                }
375
            }
376
        }
377
378
        if ($bestMatch !== null) {
379
            return new RowMatch($bestMatch['newIndex'], $bestMatch['oldIndex'], $bestMatch['newIndex'] + 1, $bestMatch['oldIndex'] + 1, $bestMatch['percentage']);
0 ignored issues
show
Coding Style introduced by
This line exceeds maximum limit of 120 characters; contains 162 characters

Overly long lines are hard to read on any screen. Most code styles therefor impose a maximum limit on the number of characters in a line.

Loading history...
380
        }
381
382
        return null;
383
    }
384
385
    /**
386
     * @param $oldRows
387
     * @param $newRows
388
     * @param $oldMatchData
389
     */
390
    protected function diffTableRows($oldRows, $newRows, $oldMatchData)
391
    {
392
        $appliedRowSpans = array();
393
        $currentIndexInOld = 0;
394
        $oldCount = count($oldRows);
395
        $newCount = count($newRows);
396
        $difference = max($oldCount, $newCount) - min($oldCount, $newCount);
397
398
        foreach ($newRows as $newIndex => $row) {
399
            $oldRow = $this->oldTable->getRow($currentIndexInOld);
400
401
            if ($oldRow) {
402
                $matchPercentage = $oldMatchData[$currentIndexInOld][$newIndex];
403
404
                // does the old row match better?
405
                $otherMatchBetter = false;
406 View Code Duplication
                foreach ($oldMatchData[$currentIndexInOld] as $index => $percentage) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
407
                    if ($index > $newIndex && $percentage > $matchPercentage) {
408
                        $otherMatchBetter = $index;
409
                    }
410
                }
411
412
                if (false !== $otherMatchBetter && $newCount > $oldCount && $difference > 0) {
413
                    // insert row as new
414
                    $this->diffAndAppendRows(null, $row, $appliedRowSpans);
415
                    $difference--;
416
417
                    continue;
418
                }
419
420
                $nextOldIndex = array_key_exists($currentIndexInOld + 1, $oldRows) ? $currentIndexInOld + 1 : null;
421
422
                $replacement = false;
423
424
                if ($nextOldIndex !== null &&
425
                    $oldMatchData[$nextOldIndex][$newIndex] > $matchPercentage &&
426
                    $oldMatchData[$nextOldIndex][$newIndex] > $this->matchThreshold
427
                ) {
428
                    // Following row in old is better match, use that.
429
                    $this->diffAndAppendRows($oldRows[$currentIndexInOld], null, $appliedRowSpans, true);
430
431
                    $currentIndexInOld++;
432
                    $matchPercentage = $oldMatchData[$currentIndexInOld];
0 ignored issues
show
Unused Code introduced by
$matchPercentage is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
433
                    $replacement = true;
434
                }
435
436
                $this->diffAndAppendRows($oldRows[$currentIndexInOld], $row, $appliedRowSpans, $replacement);
437
                $currentIndexInOld++;
438
            } else {
439
                $this->diffAndAppendRows(null, $row, $appliedRowSpans);
440
            }
441
        }
442
443
        if (count($oldRows) > count($newRows)) {
444
            foreach (array_slice($oldRows, count($newRows)) as $row) {
445
                $this->diffAndAppendRows($row, null, $appliedRowSpans);
446
            }
447
        }
448
    }
449
450
    /**
451
     * @param TableRow|null $oldRow
452
     * @param TableRow|null $newRow
453
     * @param array         $appliedRowSpans
454
     * @param bool          $forceExpansion
455
     *
456
     * @return \DOMNode
0 ignored issues
show
Documentation introduced by
Should the return type not be array?

This check compares the return type specified in the @return annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.

Loading history...
457
     */
458
    protected function diffRows($oldRow, $newRow, array &$appliedRowSpans, $forceExpansion = false)
459
    {
460
        // create tr dom element
461
        $rowToClone = $newRow ?: $oldRow;
462
        $diffRow = $this->diffDom->importNode($rowToClone->getDomNode()->cloneNode(false), false);
463
464
        $oldCells = $oldRow ? $oldRow->getCells() : array();
465
        $newCells = $newRow ? $newRow->getCells() : array();
466
467
        $position = new DiffRowPosition();
468
469
        $extraRow = null;
470
471
        $expandCells = array();
472
        $cellsWithMultipleRows = array();
473
474
        // @todo: Do cell matching
475
476
        $newCellCount = count($newCells);
477
        while ($position->getIndexInNew() < $newCellCount) {
478
            if (!$position->areColumnsEqual()) {
479
                $type = $position->getLesserColumnType();
480
                if ($type === 'new') {
481
                    $row = $newRow;
482
                    $targetRow = $extraRow;
483
                } else {
484
                    $row = $oldRow;
485
                    $targetRow = $diffRow;
486
                }
487
                if ($row && (!$type === 'old' || isset($oldCells[$position->getIndexInOld()]))) {
488
                    $this->syncVirtualColumns($row, $position, $cellsWithMultipleRows, $targetRow, $type, true);
489
490
                    continue;
491
                }
492
            }
493
494
            /* @var $newCell TableCell */
495
            $newCell = $newCells[$position->getIndexInNew()];
496
            /* @var $oldCell TableCell */
497
            $oldCell = isset($oldCells[$position->getIndexInOld()]) ? $oldCells[$position->getIndexInOld()] : null;
498
499
            if ($oldCell && $newCell->getColspan() != $oldCell->getColspan()) {
500
                if (null === $extraRow) {
501
                    $extraRow = $this->diffDom->importNode($rowToClone->getDomNode()->cloneNode(false), false);
502
                }
503
504
                // @todo: How do we handle cells that have both rowspan and colspan?
505
506
                if ($oldCell->getColspan() > $newCell->getColspan()) {
507
                    $this->diffCellsAndIncrementCounters(
508
                        $oldCell,
509
                        null,
510
                        $cellsWithMultipleRows,
511
                        $diffRow,
512
                        $position,
513
                        true
514
                    );
515
                    $this->syncVirtualColumns($newRow, $position, $cellsWithMultipleRows, $extraRow, 'new', true);
516
                } else {
517
                    $this->diffCellsAndIncrementCounters(
518
                        null,
519
                        $newCell,
520
                        $cellsWithMultipleRows,
521
                        $extraRow,
522
                        $position,
523
                        true
524
                    );
525
                    $this->syncVirtualColumns($oldRow, $position, $cellsWithMultipleRows, $diffRow, 'old', true);
526
                }
527
            } else {
528
                $diffCell = $this->diffCellsAndIncrementCounters(
529
                    $oldCell,
530
                    $newCell,
531
                    $cellsWithMultipleRows,
532
                    $diffRow,
533
                    $position
534
                );
535
                $expandCells[] = $diffCell;
536
            }
537
        }
538
539
        $oldCellCount = count($oldCells);
540
        while ($position->getIndexInOld() < $oldCellCount) {
541
            $diffCell = $this->diffCellsAndIncrementCounters(
542
                $oldCells[$position->getIndexInOld()],
543
                null,
544
                $cellsWithMultipleRows,
545
                $diffRow,
546
                $position
547
            );
548
            $expandCells[] = $diffCell;
549
        }
550
551
        if ($extraRow) {
552
            foreach ($expandCells as $expandCell) {
553
                $expandCell->setAttribute('rowspan', $expandCell->getAttribute('rowspan') + 1);
554
            }
555
        }
556
557
        if ($extraRow || $forceExpansion) {
558
            foreach ($appliedRowSpans as $rowSpanCells) {
559
                foreach ($rowSpanCells as $extendCell) {
560
                    $extendCell->setAttribute('rowspan', $extendCell->getAttribute('rowspan') + 1);
561
                }
562
            }
563
        }
564
565
        if (!$forceExpansion) {
566
            array_shift($appliedRowSpans);
567
            $appliedRowSpans = array_values($appliedRowSpans);
568
        }
569
        $appliedRowSpans = array_merge($appliedRowSpans, array_values($cellsWithMultipleRows));
570
571
        return array($diffRow, $extraRow);
572
    }
573
574
    /**
575
     * @param TableCell|null $oldCell
576
     * @param TableCell|null $newCell
577
     *
578
     * @return \DOMElement
579
     */
580
    protected function getNewCellNode(TableCell $oldCell = null, TableCell $newCell = null)
581
    {
582
        // If only one cell exists, use it
583
        if (!$oldCell || !$newCell) {
584
            $clone = $newCell
585
                ? $newCell->getDomNode()->cloneNode(false)
586
                : $oldCell->getDomNode()->cloneNode(false);
0 ignored issues
show
Bug introduced by
It seems like $oldCell is not always an object, but can also be of type null. Maybe add an additional type check?

If a variable is not always an object, we recommend to add an additional type check to ensure your method call is safe:

function someFunction(A $objectMaybe = null)
{
    if ($objectMaybe instanceof A) {
        $objectMaybe->doSomething();
    }
}
Loading history...
587
        } else {
588
            $oldNode = $oldCell->getDomNode();
589
            $newNode = $newCell->getDomNode();
590
591
            $clone = $newNode->cloneNode(false);
592
593
            $oldRowspan = $oldNode->getAttribute('rowspan') ?: 1;
594
            $oldColspan = $oldNode->getAttribute('colspan') ?: 1;
595
            $newRowspan = $newNode->getAttribute('rowspan') ?: 1;
596
            $newColspan = $newNode->getAttribute('colspan') ?: 1;
597
598
            $clone->setAttribute('rowspan', max($oldRowspan, $newRowspan));
599
            $clone->setAttribute('colspan', max($oldColspan, $newColspan));
600
        }
601
602
        return $this->diffDom->importNode($clone);
603
    }
604
605
    protected function diffCells($oldCell, $newCell, $usingExtraRow = false)
606
    {
607
        $diffCell = $this->getNewCellNode($oldCell, $newCell);
608
609
        $oldContent = $oldCell ? $this->getInnerHtml($oldCell->getDomNode()) : '';
610
        $newContent = $newCell ? $this->getInnerHtml($newCell->getDomNode()) : '';
611
612
        $htmlDiff = new HtmlDiff(
613
            mb_convert_encoding($oldContent, 'UTF-8', 'HTML-ENTITIES'),
614
            mb_convert_encoding($newContent, 'UTF-8', 'HTML-ENTITIES'),
615
            $this->encoding,
616
            $this->specialCaseTags,
617
            $this->groupDiffs
618
        );
619
        $htmlDiff->setMatchThreshold($this->matchThreshold);
620
        $diff = $htmlDiff->build();
621
622
        $this->setInnerHtml($diffCell, $diff);
623
624
        if (null === $newCell) {
625
            $diffCell->setAttribute('class', trim($diffCell->getAttribute('class').' del'));
626
        }
627
628
        if (null === $oldCell) {
629
            $diffCell->setAttribute('class', trim($diffCell->getAttribute('class').' ins'));
630
        }
631
632
        if ($usingExtraRow) {
633
            $diffCell->setAttribute('class', trim($diffCell->getAttribute('class').' extra-row'));
634
        }
635
636
        return $diffCell;
637
    }
638
639
    protected function buildTableDoms()
640
    {
641
        $this->oldTable = $this->parseTableStructure(mb_convert_encoding($this->oldText, 'HTML-ENTITIES', 'UTF-8'));
642
        $this->newTable = $this->parseTableStructure(mb_convert_encoding($this->newText, 'HTML-ENTITIES', 'UTF-8'));
643
    }
644
645
    protected function parseTableStructure($text)
646
    {
647
        $dom = new \DOMDocument();
648
        $dom->loadHTML($text);
649
650
        $tableNode = $dom->getElementsByTagName('table')->item(0);
651
652
        $table = new Table($tableNode);
653
654
        $this->parseTable($table);
655
656
        return $table;
657
    }
658
659
    protected function parseTable(Table $table, \DOMNode $node = null)
660
    {
661
        if ($node === null) {
662
            $node = $table->getDomNode();
663
        }
664
665
        foreach ($node->childNodes as $child) {
666
            if ($child->nodeName === 'tr') {
667
                $row = new TableRow($child);
668
                $table->addRow($row);
669
670
                $this->parseTableRow($row);
671
            } else {
672
                $this->parseTable($table, $child);
673
            }
674
        }
675
    }
676
677
    protected function parseTableRow(TableRow $row)
678
    {
679
        $node = $row->getDomNode();
680
681
        foreach ($node->childNodes as $child) {
682
            if (in_array($child->nodeName, array('td', 'th'))) {
683
                $cell = new TableCell($child);
684
                $row->addCell($cell);
685
            }
686
        }
687
    }
688
689
    protected function getInnerHtml($node)
690
    {
691
        $innerHtml = '';
692
        $children = $node->childNodes;
693
694
        foreach ($children as $child) {
695
            $innerHtml .= $this->htmlFromNode($child);
696
        }
697
698
        return $innerHtml;
699
    }
700
701 View Code Duplication
    protected function htmlFromNode($node)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
702
    {
703
        $domDocument = new \DOMDocument();
704
        $newNode = $domDocument->importNode($node, true);
705
        $domDocument->appendChild($newNode);
706
707
        return trim($domDocument->saveHTML());
708
    }
709
710
    protected function setInnerHtml($node, $html)
711
    {
712
        // DOMDocument::loadHTML does not allow empty strings.
713
        if (strlen($html) === 0) {
714
            $html = '<span class="empty"></span>';
715
        }
716
717
        $doc = new \DOMDocument();
718
        $doc->loadHTML(mb_convert_encoding($this->purifier->purify($html), 'HTML-ENTITIES', 'UTF-8'));
719
        $fragment = $node->ownerDocument->createDocumentFragment();
720
        $root = $doc->getElementsByTagName('body')->item(0);
721
        foreach ($root->childNodes as $child) {
722
            $fragment->appendChild($node->ownerDocument->importNode($child, true));
723
        }
724
725
        $node->appendChild($fragment);
726
    }
727
728
    protected function indexCellValues(Table $table)
729
    {
730
        foreach ($table->getRows() as $rowIndex => $row) {
731
            foreach ($row->getCells() as $cellIndex => $cell) {
732
                $value = trim($cell->getDomNode()->textContent);
733
734
                if (!isset($this->cellValues[$value])) {
735
                    $this->cellValues[$value] = array();
736
                }
737
738
                $this->cellValues[$value][] = new TablePosition($rowIndex, $cellIndex);
739
            }
740
        }
741
    }
742
743
    /**
744
     * @param        $tableRow
745
     * @param        $currentColumn
746
     * @param        $targetColumn
747
     * @param        $currentCell
748
     * @param        $cellsWithMultipleRows
749
     * @param        $diffRow
750
     * @param        $currentIndex
751
     * @param string $diffType
752
     */
753
    protected function syncVirtualColumns(
754
        $tableRow,
755
        DiffRowPosition $position,
756
        &$cellsWithMultipleRows,
757
        $diffRow,
758
        $diffType,
759
        $usingExtraRow = false
760
    ) {
761
        $currentCell = $tableRow->getCell($position->getIndex($diffType));
762
        while ($position->isColumnLessThanOther($diffType) && $currentCell) {
763
            $diffCell = $diffType === 'new' ? $this->diffCells(null, $currentCell, $usingExtraRow) : $this->diffCells(
764
                $currentCell,
765
                null,
766
                $usingExtraRow
767
            );
768
            // Store cell in appliedRowSpans if spans multiple rows
769
            if ($diffCell->getAttribute('rowspan') > 1) {
770
                $cellsWithMultipleRows[$diffCell->getAttribute('rowspan')][] = $diffCell;
771
            }
772
            $diffRow->appendChild($diffCell);
773
            $position->incrementColumn($diffType, $currentCell->getColspan());
774
            $currentCell = $tableRow->getCell($position->incrementIndex($diffType));
775
        }
776
    }
777
778
    /**
779
     * @param null|TableCell  $oldCell
780
     * @param null|TableCell  $newCell
781
     * @param array           $cellsWithMultipleRows
782
     * @param \DOMElement     $diffRow
783
     * @param DiffRowPosition $position
784
     * @param bool            $usingExtraRow
785
     *
786
     * @return \DOMElement
787
     */
788
    protected function diffCellsAndIncrementCounters(
789
        $oldCell,
790
        $newCell,
791
        &$cellsWithMultipleRows,
792
        $diffRow,
793
        DiffRowPosition $position,
794
        $usingExtraRow = false
795
    ) {
796
        $diffCell = $this->diffCells($oldCell, $newCell, $usingExtraRow);
797
        // Store cell in appliedRowSpans if spans multiple rows
798
        if ($diffCell->getAttribute('rowspan') > 1) {
799
            $cellsWithMultipleRows[$diffCell->getAttribute('rowspan')][] = $diffCell;
800
        }
801
        $diffRow->appendChild($diffCell);
802
803
        if ($newCell !== null) {
804
            $position->incrementIndexInNew();
805
            $position->incrementColumnInNew($newCell->getColspan());
806
        }
807
808
        if ($oldCell !== null) {
809
            $position->incrementIndexInOld();
810
            $position->incrementColumnInOld($oldCell->getColspan());
811
        }
812
813
        return $diffCell;
814
    }
815
816
    /**
817
     * @param      $oldRow
818
     * @param      $newRow
819
     * @param      $appliedRowSpans
820
     * @param bool $forceExpansion
821
     */
822
    protected function diffAndAppendRows($oldRow, $newRow, &$appliedRowSpans, $forceExpansion = false)
823
    {
824
        list($rowDom, $extraRow) = $this->diffRows(
825
            $oldRow,
826
            $newRow,
827
            $appliedRowSpans,
828
            $forceExpansion
829
        );
830
831
        $this->diffTable->appendChild($rowDom);
0 ignored issues
show
Bug introduced by
The method appendChild() does not seem to exist on object<Caxy\HtmlDiff\Table\Table>.

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
832
833
        if ($extraRow) {
834
            $this->diffTable->appendChild($extraRow);
0 ignored issues
show
Bug introduced by
The method appendChild() does not seem to exist on object<Caxy\HtmlDiff\Table\Table>.

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
835
        }
836
    }
837
838
    protected function getMatchPercentage(TableRow $oldRow, TableRow $newRow)
839
    {
840
        $firstCellWeight = 3;
841
        $thresholdCount = 0;
842
        $totalCount = (min(count($newRow->getCells()), count($oldRow->getCells())) + $firstCellWeight) * 100;
843
        foreach ($newRow->getCells() as $newIndex => $newCell) {
844
            $oldCell = $oldRow->getCell($newIndex);
0 ignored issues
show
Bug introduced by
Are you sure the assignment to $oldCell is correct as $oldRow->getCell($newIndex) (which targets Caxy\HtmlDiff\Table\TableRow::getCell()) seems to always return null.

This check looks for function or method calls that always return null and whose return value is assigned to a variable.

class A
{
    function getObject()
    {
        return null;
    }

}

$a = new A();
$object = $a->getObject();

The method getObject() can return nothing but null, so it makes no sense to assign that value to a variable.

The reason is most likely that a function or method is imcomplete or has been reduced for debug purposes.

Loading history...
845
846
            if ($oldCell) {
847
                $percentage = null;
848
                similar_text($oldCell->getInnerHtml(), $newCell->getInnerHtml(), $percentage);
849
850
                if ($percentage > ($this->matchThreshold * 0.50)) {
851
                    $increment = $percentage;
852
                    if ($newIndex === 0 && $percentage > 95) {
853
                        $increment = $increment * $firstCellWeight;
854
                    }
855
                    $thresholdCount += $increment;
856
                }
857
            }
858
        }
859
860
        $matchPercentage = ($totalCount > 0) ? ($thresholdCount / $totalCount) : 0;
861
862
        return $matchPercentage;
863
    }
864
}
865