Passed
Push — master ( 2c7f9b...98b05a )
by Vitaly
07:38
created

StringConditionTree::innerProcessor()   B

Complexity

Conditions 2
Paths 2

Size

Total Lines 34
Code Lines 8

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 34
rs 8.8571
c 0
b 0
f 0
cc 2
eloc 8
nc 2
nop 4
1
<?php declare(strict_types = 1);
2
/**
3
 * Created by Vitaly Iegorov <[email protected]>.
4
 * on 02.03.17 at 13:25
5
 */
6
namespace samsonframework\stringconditiontree;
7
use phpDocumentor\Reflection\Types\Integer;
8
9
/**
10
 * Class StringConditionTree
11
 *
12
 * @author Vitaly Egorov <[email protected]>
13
 */
14
class StringConditionTree
15
{
16
    /** Tree node root element identifier, needed for recursion */
17
    const ROOT_NAME = '';
18
19
    /** Final tree node branch identifier */
20
    const SELF_NAME = '@self';
21
22
    /** String parameter start marker */
23
    const PARAMETER_START = '{';
24
25
    /** String parameter end marker */
26
    const PARAMETER_END = '}';
27
28
    /** Parameter sorting length value for counting */
29
    const PARAMETER_COF = 2000;
30
31
    /** @var TreeNode Resulting collection for debugging */
32
    protected $debug;
33
34
    /** @var array Collection of input string => identifier */
35
    protected $source;
36
37
    /** @var string Parametrized string start marker */
38
    protected $parameterStartMarker = self::PARAMETER_START;
39
40
    /** @var string Parametrized string end marker */
41
    protected $parameterEndMarker = self::PARAMETER_END;
42
43
    /**
44
     * StringConditionTree constructor.
45
     *
46
     * @param string $parameterStartMarker Parametrized string start marker
47
     * @param string $parameterEndMarker Parametrized string end marker
48
     */
49
    public function __construct(string $parameterStartMarker = self::PARAMETER_START, string $parameterEndMarker = self::PARAMETER_END)
50
    {
51
        $this->parameterStartMarker = $parameterStartMarker;
52
        $this->parameterEndMarker = $parameterEndMarker;
53
    }
54
55
    /**
56
     * Build similarity strings tree.
57
     *
58
     * @param array $input Collection of strings
59
     *
60
     * @return TreeNode Resulting similarity strings tree
61
     */
62
    public function process(array $input): TreeNode
63
    {
64
        $this->source = $input;
65
66
        /**
67
         * We need to find first matching character that present at least at one two string
68
         * to start building tree. Otherwise there is no reason to build tree.
69
         */
70
        $this->innerProcessor(self::ROOT_NAME, array_keys($input), $this->debug = new TreeNode());
71
72
        $this->debug = $this->debug->children[self::ROOT_NAME];
0 ignored issues
show
Documentation Bug introduced by
$this->debug->children[self::ROOT_NAME] is of type object<samsonframework\s...ntree\IterableTreeNode>, but the property $debug was declared to be of type object<samsonframework\s...conditiontree\TreeNode>. Are you sure that you always receive this specific sub-class here, or does it make sense to add an instanceof check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a given class or a super-class is assigned to a property that is type hinted more strictly.

Either this assignment is in error or an instanceof check should be added for that assignment.

class Alien {}

class Dalek extends Alien {}

class Plot
{
    /** @var  Dalek */
    public $villain;
}

$alien = new Alien();
$plot = new Plot();
if ($alien instanceof Dalek) {
    $plot->villain = $alien;
}
Loading history...
73
74
        return $this->debug;
75
    }
76
77
    /**
78
     * Build string character group structure considering parametrized
79
     * and not parametrized character groups and their length(PCG, NPCG).
80
     *
81
     * @param string $prefix Prefix string
82
     *
83
     * @return array String character groups structure
84
     */
85
    protected function getPrefixStructure(string $prefix): array
86
    {
87
        /** @var array $structureMatrix String PCG(0)/NPCG(1) structure matrix for comparison */
88
        $structureMatrix = [];
89
90
        // Flags for showing current string character group
91
        /** @var bool $isPCG Flags showing PCG started */
92
        $isPCG = false;
93
        /** @var bool $isNPCG Flags showing NPCG started */
94
        $isNPCG = true;
95
96
        // Pointer to current CG to count string NPCG length
97
        $currentCG = 0;
98
99
        /**
100
         * TODO: Try to find PCG filter :... pattern and process it also as
101
         * PCG with filters should be prioritized over PSG without filter
102
         * even if filter is .*
103
         */
104
105
        // Iterate string by characters
106
        for ($i = 0, $length = strlen($prefix); $i < $length; $i++) {
107
            if (!$isPCG && $prefix{$i} === $this->parameterStartMarker) {
108
                $isPCG = true;
109
                $isNPCG = false;
110
                $structureMatrix[] = [0,0,$prefix];
111
                $currentCG = &$structureMatrix[count($structureMatrix)-1][1];
112
            } elseif ($isPCG && $prefix{$i} === $this->parameterEndMarker) {
113
                $isPCG = false;
114
                $isNPCG = true;
115
            } elseif ($isNPCG) {
116
                $isNPCG = false;
117
                $structureMatrix[] = [1,0,$prefix];
118
                $currentCG = &$structureMatrix[count($structureMatrix)-1][1];
119
            }
120
121
            // Store current character group length
122
            $currentCG++;
123
        }
124
125
        return $structureMatrix;
126
    }
127
128
    /**
129
     * Compare string structures.
130
     *
131
     * @param array $initial Initial string structure
132
     * @param array $compared Compared string structure
133
     *
134
     * @return int Result of array elements comparison
135
     */
136
    protected function compareStringStructure(array $initial, array $compared): int
137
    {
138
        $maxStructureSize = max(count($initial), count($compared));
139
140
        // Make structures same size preserving previous existing structure value
141
        for ($i = 1; $i < $maxStructureSize; $i++) {
142
            if (!array_key_exists($i, $initial)) {
143
                $initial[$i] = $initial[$i-1];
144
            }
145
            if (!array_key_exists($i, $compared)) {
146
                $compared[$i] = $compared[$i-1];
147
            }
148
        }
149
150
        // Iterate every structure group
151
        for ($i = 0; $i < $maxStructureSize; $i++) {
152
            // If initial structure has NPCG than it has higher priority
153
            if ($initial[$i][0] > $compared[$i][0]) {
154
                return -1;
155
            }
156
157
            // If compared structure has NPCG than it has higher priority
158
            if ($initial[$i][0] < $compared[$i][0]) {
159
                return 1;
160
            }
161
162
            // Compare NOT starting NPCG length
163
            if ($i > 0 && $initial[$i][0] === 1 ) {
0 ignored issues
show
Coding Style introduced by
Expected 0 spaces before closing bracket; 1 found
Loading history...
164
                if ($initial[$i][1] > $compared[$i][1]) {
165
                    return -1;
166
                }
167
168
                if ($initial[$i][1] < $compared[$i][1]) {
169
                    return 1;
170
                }
171
            }
172
173
            // They are equal continue to next structure group comparison
174
        }
175
176
        // If both structures are equal compare lengths of NPCG
177 View Code Duplication
        for ($i = 0; $i < $maxStructureSize; $i++) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
178
            // If current CG is NPCG
179
            if ($initial[$i][0] === 1) {
180
                if ($initial[$i][1] > $compared[$i][1]) {
181
                    return 1;
182
                }
183
184
                if ($initial[$i][1] < $compared[$i][1]) {
185
                    return -1;
186
                }
187
            }
188
189
            // Current NPCG character groups have equal length - continue
190
        }
191
192
        // If both structures are equal and NPCG length are equal - compare lengths of PCG
193 View Code Duplication
        for ($i = 0; $i < $maxStructureSize; $i++) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
194
            // If current CG is PCG
195
            if ($initial[$i][0] === 0) {
196
                if ($initial[$i][1] > $compared[$i][1]) {
197
                    return -1;
198
                }
199
200
                if ($initial[$i][1] < $compared[$i][1]) {
201
                    return 1;
202
                }
203
            }
204
205
            // Current PCG character groups have equal length - continue
206
        }
207
208
        // Structures are absolutely equal
209
        return 0;
210
    }
211
212
    /**
213
     * Sort strings array considering PCG and NPCG string structure.
214
     *
215
     * @param array $input Input array for sorting
216
     * @return array Sorted array
217
     */
218
    protected function sortArrayByKeys(array &$input)
219
    {
220
        $prefixes = array_map([$this, 'getPrefixStructure'], array_keys($input));
221
222
        usort($prefixes, [$this, 'compareStringStructure']);
223
224
        $result = [];
225
        foreach ($prefixes as $sortingData) {
226
            $result[$sortingData[0][2]] = $input[$sortingData[0][2]];
227
        }
228
229
        return $result;
230
    }
231
232
    /**
233
     * Add only unique value to array.
234
     *
235
     * @param mixed $value  Unique value
236
     * @param array $array  Array for adding unique value
237
     * @param bool  $strict Strict uniqueness check
238
     *
239
     * @return bool True if unique value was added
240
     * @see in_array();
241
     *
242
     */
243
    protected function addUniqueToArray($value, &$array, bool $strict = true)
244
    {
245
        // Create array if not array is passed
246
        if (!is_array($array)) {
247
            $array = [];
248
        }
249
250
        // Add value to array if it is unique
251
        if (!in_array($value, $array, $strict)) {
252
            $array[] = $value;
253
254
            return true;
255
        } else { // Value is already in array
256
            return false;
257
        }
258
    }
259
260
    /**
261
     * Find longest matching prefix between two strings.
262
     *
263
     * @param string $initialString  Initial string
264
     * @param string $comparedString Compared string
265
     *
266
     * @return string Longest matching prefix
267
     */
268
    protected function getLongestMatchingPrefix(string $initialString, string $comparedString): string
269
    {
270
        // Iterate and compare how string matches
271
        $longestPrefix = '';
272
273
        // Define shortest & longest strings to avoid errors
274
        $initialLength = strlen($initialString);
275
        $comparedLength = strlen($comparedString);
276
277
        // Define shortest and longest strings to avoid character matching errors
278
        $shortestString = $initialLength < $comparedLength ? $initialString : $comparedString;
279
        $longestString = $initialLength >= $comparedLength ? $initialString : $comparedString;
280
281
        // Iterate initial string characters
282
        $isPattern = false;
283
        $parametrizedPrefix = '';
284
        for ($z = 0, $length = strlen($shortestString); $z < $length; $z++) {
285
            // Pattern support
286
            // TODO: Probably can be optimized
287
            if ($isPattern || $shortestString{$z} === $this->parameterStartMarker) {
288
                $isPattern = true;
289
290
                // Concatenate longest matching prefix
291
                $parametrizedPrefix .= $shortestString{$z};
292
293
                // Compare characters with compared string
294
                if ($shortestString{$z} !== $longestString{$z}) {
295
                    // Clear pattern as pattern characters mismatch
296
                    $parametrizedPrefix = '';
0 ignored issues
show
Unused Code introduced by
$parametrizedPrefix is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
297
                    break;
298
                }
299
300
                // If pattern id closed unset flag fro special behaviour
301
                if ($shortestString{$z} === $this->parameterEndMarker) {
302
                    // If parametrized part ended append to longest matching prefix
303
                    $longestPrefix .= $parametrizedPrefix;
304
                    // Clear parametrized prefix as we can have other parametrized parts
305
                    $parametrizedPrefix = '';
306
                    // Reset parametrized flag
307
                    $isPattern = false;
308
                }
309
            } else {
310
                // Compare characters with compared string
311
                if ($shortestString{$z} !== $longestString{$z}) {
312
                    break; // Exit on first mismatching character
313
                }
314
315
                // Concatenate matching part of two strings
316
                $longestPrefix .= $initialString{$z};
317
            }
318
        }
319
320
        return $longestPrefix;
321
    }
322
323
    /**
324
     * Remove key string from the beginning of all sub-array strings.
325
     *
326
     * @param array  $array Input array of key => [keyStrings...]
327
     *
328
     * @return array Processed array with removed keys from beginning of sub arrays
329
     */
330
    protected function removeKeyFromArrayStrings(array $array): array
331
    {
332
        $processed = [];
333
        /** @var string[] $stringsCollection */
334
        foreach ($array as $keyString => $stringsCollection) {
335
            $lmpLength = strlen((string)$keyString);
336
            foreach ($stringsCollection as $stringValue) {
337
                // Remove LMP from string
338
                $newString = substr($stringValue, $lmpLength);
339
340
                // This string has something left besides lmp
341
                if ($newString !== false && $newString !== '') {
342
                    $processed[$keyString][] = $newString;
343
                } elseif (array_key_exists($keyString, $processed) === false) {
344
                    // Add empty array to LMP if missing
345
                    $processed[$keyString] = [];
346
                }
347
            }
348
        }
349
350
        return $processed;
351
    }
352
353
    /**
354
     * Find all duplication of source array values in compared array and remove them.
355
     *
356
     * @param array $source Source array
357
     * @param array $compared Compared array for filtering duplicates
358
     */
359
    protected function removeDuplicatesInSubArray(array $source, array &$compared)
360
    {
361
        foreach ($source as $value) {
362
            foreach ($compared as $key => &$subValue) {
363
                if ($subValue === $value) {
364
                    unset($compared[$key]);
365
                }
366
            }
367
        }
368
    }
369
370
    /**
371
     * Iterate LMP and remove duplicate strings in other LMPs.
372
     *
373
     * @param array $prefixes LMP collection, returning value
374
     */
375
    protected function filterLMPStrings(array &$prefixes)
376
    {
377
        $keys = array_keys($prefixes);
378
        for ($i = 0, $length = count($keys); $i < $length; $i++) {
379
            for ($j = $i + 1; $j < $length; $j++) {
380
                $this->removeDuplicatesInSubArray($prefixes[$keys[$i]], $prefixes[$keys[$j]]);
381
            }
382
        }
383
    }
384
385
    /**
386
     * Get collection of grouped longest matching prefixes with strings sub-array.
387
     *
388
     * @param array $input Input strings array
389
     *
390
     * @return array Longest matching prefixes array
391
     */
392
    protected function getLMPCollection(array $input): array
393
    {
394
        $longestPrefixes = [];
395
        foreach ($input as $initial) {
396
            $foundLMP = false;
397
            foreach ($input as $compared) {
398
                if ($initial !== $compared) {
399
                    $longestMatchedPrefix = $this->getLongestMatchingPrefix($initial, $compared);
400
401
                    // We have found at least one matching character between strings
402
                    if ($longestMatchedPrefix !== '') {
403
                        $foundLMP = true;
404
                        $this->addUniqueToArray($initial, $longestPrefixes[$longestMatchedPrefix]);
405
                    }
406
                }
407
            }
408
409
            // Add initial string as LMP
410
            if ($foundLMP === false) {
411
                $this->addUniqueToArray($initial, $longestPrefixes[$initial]);
412
            }
413
        }
414
415
        return $longestPrefixes;
416
    }
417
418
    /**
419
     * Recursive string similarity tree builder.
420
     *
421
     * @param string   $prefix
422
     * @param array    $input
423
     * @param TreeNode $result
424
     * @param string   $selfMarker
425
     */
426
    protected function innerProcessor(string $prefix, array $input, TreeNode $result, $selfMarker = self::SELF_NAME)
0 ignored issues
show
Unused Code introduced by
The parameter $selfMarker is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
427
    {
428
        // Create tree node. Pass string identifier if present
429
        $newChild = $result->append($prefix, $this->source[$result->fullValue . $prefix] ?? '');
430
431
        /**
432
         * Iterate all combinations of strings and group by LMP, also if no LMP is
433
         * found consider strings as LMP itself
434
         */
435
        $longestPrefixes = $this->getLMPCollection($input);
436
437
        /**
438
         * Sort LMPs(array keys) ascending by key length
439
         */
440
        $longestPrefixes = $this->sortArrayByKeys($longestPrefixes);
441
442
        /**
443
         * Iterate all sorted LMP strings and remove duplicates from LMP string ordered lower
444
         */
445
        $this->filterLMPStrings($longestPrefixes);
446
447
448
        /**
449
         * After filtering LMPs remove LMP from matched string arrays
450
         */
451
        $longestPrefixes = $this->removeKeyFromArrayStrings($longestPrefixes);
452
453
        /**
454
         * Recursively iterate current level LMPs
455
         */
456
        foreach ($longestPrefixes as $longestPrefix => $strings) {
457
            $this->innerProcessor((string)$longestPrefix, $strings, $newChild);
458
        }
459
    }
460
}
461