Passed
Push — master ( 98b05a...c181e2 )
by Vitaly
14:04 queued 04:18
created

StringConditionTree::getLongestMatchingPrefix()   C

Complexity

Conditions 9
Paths 24

Size

Total Lines 51
Code Lines 23

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 51
rs 6.2727
c 0
b 0
f 0
cc 9
eloc 23
nc 24
nop 2

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php declare(strict_types = 1);
2
/**
3
 * Created by Vitaly Iegorov <[email protected]>.
4
 * on 02.03.17 at 13:25
5
 */
6
namespace samsonframework\stringconditiontree;
7
8
/**
9
 * Class StringConditionTree
10
 *
11
 * @author Vitaly Egorov <[email protected]>
12
 */
13
class StringConditionTree
14
{
15
    /** Tree node root element identifier, needed for recursion */
16
    const ROOT_NAME = '';
17
18
    /** Final tree node branch identifier */
19
    const SELF_NAME = '@self';
20
21
    /** String parameter start marker */
22
    const PARAMETER_START = '{';
23
24
    /** String parameter end marker */
25
    const PARAMETER_END = '}';
26
27
    /** Parameter sorting length value for counting */
28
    const PARAMETER_COF = 2000;
29
30
    /** @var TreeNode Resulting collection for debugging */
31
    protected $debug;
32
33
    /** @var array Collection of input string => identifier */
34
    protected $source;
35
36
    /** @var string Parametrized string start marker */
37
    protected $parameterStartMarker = self::PARAMETER_START;
38
39
    /** @var string Parametrized string end marker */
40
    protected $parameterEndMarker = self::PARAMETER_END;
41
42
    /**
43
     * StringConditionTree constructor.
44
     *
45
     * @param string $parameterStartMarker Parametrized string start marker
46
     * @param string $parameterEndMarker   Parametrized string end marker
47
     */
48
    public function __construct(
49
        string $parameterStartMarker = self::PARAMETER_START,
50
        string $parameterEndMarker = self::PARAMETER_END
51
    )
52
    {
0 ignored issues
show
Coding Style introduced by
The closing parenthesis and the opening brace of a multi-line function declaration must be on the same line
Loading history...
53
        $this->parameterStartMarker = $parameterStartMarker;
54
        $this->parameterEndMarker = $parameterEndMarker;
55
    }
56
57
    /**
58
     * Build similarity strings tree.
59
     *
60
     * @param array $input Collection of strings
61
     *
62
     * @return TreeNode Resulting similarity strings tree
63
     */
64
    public function process(array $input): TreeNode
65
    {
66
        $this->source = $input;
67
68
        /**
69
         * We need to find first matching character that present at least at one two string
70
         * to start building tree. Otherwise there is no reason to build tree.
71
         */
72
        $this->innerProcessor(self::ROOT_NAME, array_keys($input), $this->debug = new TreeNode());
73
74
        $this->debug = $this->debug->children[self::ROOT_NAME];
0 ignored issues
show
Documentation Bug introduced by
$this->debug->children[self::ROOT_NAME] is of type object<samsonframework\s...ntree\IterableTreeNode>, but the property $debug was declared to be of type object<samsonframework\s...conditiontree\TreeNode>. Are you sure that you always receive this specific sub-class here, or does it make sense to add an instanceof check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a given class or a super-class is assigned to a property that is type hinted more strictly.

Either this assignment is in error or an instanceof check should be added for that assignment.

class Alien {}

class Dalek extends Alien {}

class Plot
{
    /** @var  Dalek */
    public $villain;
}

$alien = new Alien();
$plot = new Plot();
if ($alien instanceof Dalek) {
    $plot->villain = $alien;
}
Loading history...
75
76
        return $this->debug;
77
    }
78
79
    /**
80
     * Recursive string similarity tree builder.
81
     *
82
     * @param string   $prefix
83
     * @param array    $input
84
     * @param TreeNode $result
85
     */
86
    protected function innerProcessor(string $prefix, array $input, TreeNode $result)
87
    {
88
        // Create tree node. Pass string identifier if present
89
        $newChild = $result->append($prefix, $this->source[$result->fullValue . $prefix] ?? '');
90
91
        /**
92
         * Iterate all combinations of strings and group by LMP, also if no LMP is
93
         * found consider strings as LMP itself
94
         */
95
        $longestPrefixes = $this->getLMPCollection($input);
96
97
        /**
98
         * Sort LMPs(array keys) ascending by key length
99
         */
100
        $longestPrefixes = $this->sortArrayByKeys($longestPrefixes);
101
102
        /**
103
         * Iterate all sorted LMP strings and remove duplicates from LMP string ordered lower
104
         */
105
        $this->filterLMPStrings($longestPrefixes);
106
107
108
        /**
109
         * After filtering LMPs remove LMP from matched string arrays
110
         */
111
        $longestPrefixes = $this->removeKeyFromArrayStrings($longestPrefixes);
112
113
        /**
114
         * Recursively iterate current level LMPs
115
         */
116
        foreach ($longestPrefixes as $longestPrefix => $strings) {
117
            $this->innerProcessor((string)$longestPrefix, $strings, $newChild);
118
        }
119
    }
120
121
    /**
122
     * Get collection of grouped longest matching prefixes with strings sub-array.
123
     *
124
     * @param array $input Input strings array
125
     *
126
     * @return array Longest matching prefixes array
127
     */
128
    protected function getLMPCollection(array $input): array
129
    {
130
        $longestPrefixes = [];
131
        foreach ($input as $initial) {
132
            $foundLMP = false;
133
            foreach ($input as $compared) {
134
                if ($initial !== $compared) {
135
                    $longestMatchedPrefix = $this->getLongestMatchingPrefix($initial, $compared);
136
137
                    // We have found at least one matching character between strings
138
                    if ($longestMatchedPrefix !== '') {
139
                        $foundLMP = true;
140
                        $this->addUniqueToArray($initial, $longestPrefixes[$longestMatchedPrefix]);
141
                    }
142
                }
143
            }
144
145
            // Add initial string as LMP
146
            if ($foundLMP === false) {
147
                $this->addUniqueToArray($initial, $longestPrefixes[$initial]);
148
            }
149
        }
150
151
        return $longestPrefixes;
152
    }
153
154
    /**
155
     * Find longest matching prefix between two strings.
156
     *
157
     * @param string $initialString  Initial string
158
     * @param string $comparedString Compared string
159
     *
160
     * @return string Longest matching prefix
161
     */
162
    protected function getLongestMatchingPrefix(string $initialString, string $comparedString): string
163
    {
164
        // Iterate and compare how string matches
165
        $longestPrefix = '';
166
167
        // Define shortest & longest strings to avoid errors
168
        $initialLength = strlen($initialString);
169
        $comparedLength = strlen($comparedString);
170
171
        // Define shortest and longest strings to avoid character matching errors
172
        $shortestString = $initialLength < $comparedLength ? $initialString : $comparedString;
173
        $longestString = $initialLength >= $comparedLength ? $initialString : $comparedString;
174
175
        // Iterate initial string characters
176
        $isPattern = false;
177
        $parametrizedPrefix = '';
178
        for ($z = 0, $length = strlen($shortestString); $z < $length; $z++) {
179
            // Pattern support
180
            if ($isPattern || $shortestString{$z} === $this->parameterStartMarker) {
181
                $isPattern = true;
182
183
                // Concatenate longest matching prefix
184
                $parametrizedPrefix .= $shortestString{$z};
185
186
                // Compare characters with compared string
187
                if ($shortestString{$z} !== $longestString{$z}) {
188
                    break;
189
                }
190
191
                // If pattern id closed unset flag fro special behaviour
192
                if ($shortestString{$z} === $this->parameterEndMarker) {
193
                    // If parametrized part ended append to longest matching prefix
194
                    $longestPrefix .= $parametrizedPrefix;
195
                    // Clear parametrized prefix as we can have other parametrized parts
196
                    $parametrizedPrefix = '';
197
                    // Reset parametrized flag
198
                    $isPattern = false;
199
                }
200
            } else {
201
                // Compare characters with compared string
202
                if ($shortestString{$z} !== $longestString{$z}) {
203
                    break; // Exit on first mismatching character
204
                }
205
206
                // Concatenate matching part of two strings
207
                $longestPrefix .= $initialString{$z};
208
            }
209
        }
210
211
        return $longestPrefix;
212
    }
213
214
    /**
215
     * Add only unique value to array.
216
     *
217
     * @param mixed $value  Unique value
218
     * @param array $array  Array for adding unique value
219
     * @param bool  $strict Strict uniqueness check
220
     *
221
     * @return bool True if unique value was added
222
     * @see in_array();
223
     *
224
     */
225
    protected function addUniqueToArray($value, &$array, bool $strict = true)
226
    {
227
        // Create array if not array is passed
228
        if (!is_array($array)) {
229
            $array = [];
230
        }
231
232
        // Add value to array if it is unique
233
        if (!in_array($value, $array, $strict)) {
234
            $array[] = $value;
235
236
            return true;
237
        }
238
239
        // Value is already in array
240
        return false;
241
    }
242
243
    /**
244
     * Sort strings array considering PCG and NPCG string structure.
245
     *
246
     * @param array $input Input array for sorting
247
     *
248
     * @return array Sorted array
249
     */
250
    protected function sortArrayByKeys(array &$input)
251
    {
252
        $prefixes = array_map([$this, 'getPrefixStructure'], array_keys($input));
253
254
        usort($prefixes, [$this, 'compareStringStructure']);
255
256
        $result = [];
257
        foreach ($prefixes as $sortingData) {
258
            $result[$sortingData[0][2]] = $input[$sortingData[0][2]];
259
        }
260
261
        return $result;
262
    }
263
264
    /**
265
     * Iterate LMP and remove duplicate strings in other LMPs.
266
     *
267
     * @param array $prefixes LMP collection, returning value
268
     */
269
    protected function filterLMPStrings(array &$prefixes)
270
    {
271
        $keys = array_keys($prefixes);
272
        for ($i = 0, $length = count($keys); $i < $length; $i++) {
273
            for ($j = $i + 1; $j < $length; $j++) {
274
                $this->removeDuplicatesInSubArray($prefixes[$keys[$i]], $prefixes[$keys[$j]]);
275
            }
276
        }
277
    }
278
279
    /**
280
     * Find all duplication of source array values in compared array and remove them.
281
     *
282
     * @param array $source   Source array
283
     * @param array $compared Compared array for filtering duplicates
284
     */
285
    protected function removeDuplicatesInSubArray(array $source, array &$compared)
286
    {
287
        foreach ($source as $value) {
288
            foreach ($compared as $key => &$subValue) {
289
                if ($subValue === $value) {
290
                    unset($compared[$key]);
291
                }
292
            }
293
        }
294
    }
295
296
    /**
297
     * Remove key string from the beginning of all sub-array strings.
298
     *
299
     * @param array $array Input array of key => [keyStrings...]
300
     *
301
     * @return array Processed array with removed keys from beginning of sub arrays
302
     */
303
    protected function removeKeyFromArrayStrings(array $array): array
304
    {
305
        $processed = [];
306
        /** @var string[] $stringsCollection */
307
        foreach ($array as $keyString => $stringsCollection) {
308
            $lmpLength = strlen((string)$keyString);
309
            foreach ($stringsCollection as $stringValue) {
310
                // Remove LMP from string
311
                $newString = substr($stringValue, $lmpLength);
312
313
                // This string has something left besides lmp
314
                if ($newString !== false && $newString !== '') {
315
                    $processed[$keyString][] = $newString;
316
                } elseif (array_key_exists($keyString, $processed) === false) {
317
                    // Add empty array to LMP if missing
318
                    $processed[$keyString] = [];
319
                }
320
            }
321
        }
322
323
        return $processed;
324
    }
325
326
    /**
327
     * Build string character group structure considering parametrized
328
     * and not parametrized character groups and their length(PCG, NPCG).
329
     *
330
     * @param string $prefix Prefix string
331
     *
332
     * @return array String character groups structure
333
     */
334
    protected function getPrefixStructure(string $prefix): array
335
    {
336
        /** @var array $structureMatrix String PCG(0)/NPCG(1) structure matrix for comparison */
337
        $structureMatrix = [];
338
339
        // Flags for showing current string character group
340
        /** @var bool $isPCG Flags showing PCG started */
341
        $isPCG = false;
342
        /** @var bool $isNPCG Flags showing NPCG started */
343
        $isNPCG = true;
344
345
        // Pointer to current CG to count string NPCG length
346
        $currentCG = 0;
347
348
        /**
349
         * TODO: Try to find PCG filter :... pattern and process it also as
350
         * PCG with filters should be prioritized over PSG without filter
351
         * even if filter is .*
352
         */
353
354
        // Iterate string by characters
355
        for ($i = 0, $length = strlen($prefix); $i < $length; $i++) {
356
            if (!$isPCG && $prefix{$i} === $this->parameterStartMarker) {
357
                $isPCG = true;
358
                $isNPCG = false;
359
                $structureMatrix[] = [0, 0, $prefix];
360
                $currentCG = &$structureMatrix[count($structureMatrix) - 1][1];
361
            } elseif ($isPCG && $prefix{$i} === $this->parameterEndMarker) {
362
                $isPCG = false;
363
                $isNPCG = true;
364
            } elseif ($isNPCG) {
365
                $isNPCG = false;
366
                $structureMatrix[] = [1, 0, $prefix];
367
                $currentCG = &$structureMatrix[count($structureMatrix) - 1][1];
368
            }
369
370
            // Store current character group length
371
            $currentCG++;
372
        }
373
374
        return $structureMatrix;
375
    }
376
377
    /**
378
     * Compare string structures.
379
     *
380
     * @param array $initial  Initial string structure
381
     * @param array $compared Compared string structure
382
     *
383
     * @return int Result of array elements comparison
384
     */
385
    protected function compareStringStructure(array $initial, array $compared): int
386
    {
387
        $maxStructureSize = max(count($initial), count($compared));
388
389
        // Make structures same size preserving previous existing structure value
390
        for ($i = 1; $i < $maxStructureSize; $i++) {
391
            if (!array_key_exists($i, $initial)) {
392
                $initial[$i] = $initial[$i - 1];
393
            }
394
            if (!array_key_exists($i, $compared)) {
395
                $compared[$i] = $compared[$i - 1];
396
            }
397
        }
398
399
        // Iterate every structure group
400
        for ($i = 0; $i < $maxStructureSize; $i++) {
401
            // If initial structure has NPCG than it has higher priority
402
            if ($initial[$i][0] > $compared[$i][0]) {
403
                return -1;
404
            }
405
406
            // If compared structure has NPCG than it has higher priority
407
            if ($initial[$i][0] < $compared[$i][0]) {
408
                return 1;
409
            }
410
411
            // Compare NOT starting NPCG length
412
            if ($i > 0 && $initial[$i][0] === 1) {
413
                if ($initial[$i][1] > $compared[$i][1]) {
414
                    return -1;
415
                }
416
417
                if ($initial[$i][1] < $compared[$i][1]) {
418
                    return 1;
419
                }
420
            }
421
422
            // They are equal continue to next structure group comparison
423
        }
424
425
        // If both structures are equal compare lengths of NPCG
426 View Code Duplication
        for ($i = 0; $i < $maxStructureSize; $i++) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
427
            // If current CG is NPCG
428
            if ($initial[$i][0] === 1) {
429
                if ($initial[$i][1] > $compared[$i][1]) {
430
                    return 1;
431
                }
432
433
                if ($initial[$i][1] < $compared[$i][1]) {
434
                    return -1;
435
                }
436
            }
437
438
            // Current NPCG character groups have equal length - continue
439
        }
440
441
        // If both structures are equal and NPCG length are equal - compare lengths of PCG
442 View Code Duplication
        for ($i = 0; $i < $maxStructureSize; $i++) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
443
            // If current CG is PCG
444
            if ($initial[$i][0] === 0) {
445
                if ($initial[$i][1] > $compared[$i][1]) {
446
                    return -1;
447
                }
448
449
                if ($initial[$i][1] < $compared[$i][1]) {
450
                    return 1;
451
                }
452
            }
453
454
            // Current PCG character groups have equal length - continue
455
        }
456
457
        // Structures are absolutely equal
458
        return 0;
459
    }
460
}
461