Completed
Push — master ( 3bb451...3f9448 )
by Siro Díaz
02:11
created

TrieTree   B

Complexity

Total Complexity 48

Size/Duplication

Total Lines 275
Duplicated Lines 2.18 %

Coupling/Cohesion

Components 1
Dependencies 1

Importance

Changes 1
Bugs 1 Features 0
Metric Value
wmc 48
c 1
b 1
f 0
lcom 1
cbo 1
dl 6
loc 275
rs 8.4864

15 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 5 1
A empty() 0 3 1
A size() 0 3 1
C add() 0 29 7
C contains() 0 26 7
C delete() 0 43 11
A clear() 0 9 2
A _clear() 0 8 2
A getWords() 0 3 1
A startsWith() 0 3 1
A withPrefix() 3 14 4
A _traverseWithPrefix() 3 15 4
A getNodeFromPrefix() 0 19 4
A wordCount() 0 3 1
A count() 0 3 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like TrieTree often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use TrieTree, and based on these observations, apply Extract Interface, too.

1
<?php
2
/**
3
 * DataStructures for PHP
4
 *
5
 * @link      https://github.com/SiroDiaz/DataStructures
6
 * @copyright Copyright (c) 2017 Siro Díaz Palazón
7
 * @license   https://github.com/SiroDiaz/DataStructures/blob/master/README.md (MIT License)
8
 */
9
namespace DataStructures\Trees;
10
11
use DataStructures\Trees\Nodes\TrieNode;
12
use Countable;
13
14
/**
15
 * TrieTree
16
 *
17
 * The TrieTree class is a trie (also called digital tree and sometimes radix tree or prefix tree)
18
 * that is used to get in O(m) being m the word length.
19
 * It is used in software like word corrector and word suggest.
20
 *
21
 * @author Siro Diaz Palazon <[email protected]>
22
 */
23
class TrieTree implements Countable {
24
    private $root;
25
    private $numWords;
26
    private $size;
27
    
28
    public function __construct() {
29
        $this->root = new TrieNode();
30
        $this->numWords = 0;
31
        $this->size = 0;
32
    }
33
34
    /**
35
     * Returns true if the tree is empty. Number of prefixes is 0.
36
     *
37
     * @return bool true if empty.
38
     */
39
    public function empty() : bool {
40
        return $this->size === 0;
41
    }
42
43
    /**
44
     * Returns the number of prefixes that are contained in the trie.
45
     *
46
     * @return int the num of prefixes.
47
     */
48
    public function size() : int {
49
        return $this->size;
50
    }
51
52
    /**
53
     * Inserts a new word in the tree. If the word exists it doesn't do nothing.
54
     *
55
     * @param string $word the word to be added.
56
     */
57
    public function add($word) {
58
        $word = trim($word);
59
        if(mb_strlen($word) === 0) {
60
            return;
61
        }
62
63
        $current = &$this->root;
64
        for($i = 0; $i < mb_strlen($word); $i++) {
65
            $char = mb_substr($word, $i, 1, 'UTF-8');
66
            if(!isset($current->children[$char])) {
67
                if($i === mb_strlen($word) - 1) {
68
                    $current->children[$char] = new TrieNode($char, $current, true);
69
                    $this->numWords++;
70
                } else {
71
                    $current->children[$char] = new TrieNode($char, $current, false);
72
                }
73
                $this->size++;
74
            } else {
75
                if($i === mb_strlen($word) - 1) {
76
                    if($current->children[$char]->isWord === false) {
77
                        $current->children[$char]->isWord = true;
78
                        $this->numWords++;
79
                    }
80
                }
81
            }
82
83
            $current = &$current->children[$char];
84
        }
85
    }
86
87
    /**
88
     * Returns true if the word is stored in the tree.
89
     *
90
     * @param string $word The word to check if exists.
91
     * @param bool true if is contained.
92
     */
93
    public function contains($word) : bool {
94
        $word = trim($word);
95
        $wordLength = mb_strlen($word);
96
        if($wordLength === 0) {
97
            return true;
98
        }
99
100
        $i = 0;
101
        $found = true;
102
        $current = $this->root;
103
        while($found && $i < $wordLength) {
104
            $char = mb_substr($word, $i, 1);
105
            if(isset($current->children[$char])) {
106
                $current = $current->children[$char];
107
            } else {
108
                $found = false;
109
            }
110
            $i++;
111
        }
112
113
        if($found && $current->isWord) {
114
            return true;
115
        }
116
117
        return false;
118
    }
119
120
    /**
121
     * Removes a word from the tree.
122
     *
123
     * @param string $word the word to delete if it is contained in the trie.
124
     */
125
    public function delete($word) {
126
        $wordLength = mb_strlen($word);
127
        $i = 0;
128
        $current = $this->root;
129
        
130
        while($i < $wordLength) {
131
            if($current->hasChildren()) {
132
                $char = mb_substr($word, $i, 1, 'UTF-8');
133
                if(isset($current->children[$char])) {
134
                    $node = &$current->children[$char];
135
                    if(($i === $wordLength - 1) && $node->isWord) {
136
                        if($node->hasChildren()) {
137
                            $node->isWord = false;
138
                        } else {
139
                            $parentNode = &$node->parent;
140
                            unset($parentNode->children[$char]);
141
                            $this->size--;
142
                            $j = $i - 1;
143
144
                            while($j >= 0 && $parentNode->isLeaf() && !$parentNode->isWord) {
145
                                $char = mb_substr($word, $j, 1, 'UTF-8');
146
                                $parentNode = &$parentNode->parent;
147
                                unset($parentNode->children[$char]);
148
                                $this->size--;
149
                                $j--;
150
                            }
151
                        }
152
153
                        $this->numWords--;
154
                    }
155
                }
156
            } else {
157
                return;
158
            }
159
160
            if(isset($current->children[$char])) {
161
                $current = $current->children[$char];
162
            } else {
163
                $current = $current->parent;    //TODO need to delete all leaf nodes
164
            }
165
            $i++;
166
        }
167
    }
168
169
    /**
170
     * Removes all nodes (and words) in the tree and resets the size
171
     * and word count.
172
     */
173
    public function clear() {
174
        foreach($this->root->children as $key => &$node) {
175
            $this->_clear($node);
176
            unset($this->root->children[$key]);
177
        }
178
179
        $this->size = 0;
180
        $this->numWords = 0;
181
    }
182
183
    /**
184
     * Recursive clear that removes all nodes, included leaf, except
185
     * the references to the first children.
186
     *
187
     * @param DataStructures\Trees\Nodes\TrieNode|null the node to traverse.
188
     * @return DataStructures\Trees\Nodes\TrieNode|null the next node to delete.
189
     */
190
    private function _clear(TrieNode &$node = null) {
191
        foreach($node->children as $char => &$n) {
192
            $aux = $node->children[$char];
193
            unset($node->children[$char]);
194
            $node = null;
195
            return $this->_clear($aux);
196
        }
197
    }
198
199
    public function getWords() : array {
200
        return [];
201
    }
202
203
    /**
204
     * Checks if there is any word that starts with a concrete prefix.
205
     *
206
     * @param string $prefix The prefix to look up.
207
     * @return true if there are words that start with the especified prefix.
208
     */
209
    public function startsWith($prefix) : bool {
210
        return $this->getNodeFromPrefix($prefix) !== null;
211
    }
212
213
    /**
214
     * Returns an array with all words that has the especified prefix.
215
     * For example, with prefix 'he' it will retrieve: hell, hello, ....
216
     *
217
     * @param string $prefix The prefix that must have all words.
218
     * @return array All words that contains the prefix.
219
     */
220
    public function withPrefix($prefix) : array {
221
        $node = $this->getNodeFromPrefix($prefix);
222
        $words = [];
223
        if($node !== null) {
224
            if($node->isWord) {
225
                $words[] = $prefix;
226
            }
227 View Code Duplication
            foreach($node->children as $char => $n) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
228
                $words = $this->_traverseWithPrefix($node->children[$char], $words, $prefix . $char);
229
            }
230
        }
231
232
        return $words;
233
    }
234
235
    /**
236
     *
237
     */
238
    private function _traverseWithPrefix(TrieNode $node = null, $words = [], $word) {
239
        if($node->isWord) {
240
            $words[] = $word;
241
        }
242
243
        if(empty($node->children)) {
244
            return $words;
245
        }
246
247 View Code Duplication
        foreach($node->children as $char => &$n) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
248
            $words = $this->_traverseWithPrefix($node->children[$char], $words, $word . $char);
249
        }
250
251
        return $words;
252
    }
253
254
    /**
255
     * Retrieves the node where ends the prefix especified.
256
     *
257
     * @param string $prefix The prefix to look for.
258
     * @return DataStructures\Trees\Nodes\TrieNode|null null if not found.
259
     */
260
    private function getNodeFromPrefix($prefix) {
261
        if($this->size === 0) {
262
            return null;
263
        }
264
        
265
        $i = 0;
266
        $current = $this->root;
267
        $prefixLength = mb_strlen(trim($prefix));
268
        while($i < $prefixLength) {
269
            $char = mb_substr($prefix, $i, 1, 'UTF-8');
270
            if(!isset($current->children[$char])) {
271
                return null;
272
            }
273
            $current = $current->children[$char];
274
            $i++;
275
        }
276
277
        return $current;
278
    }
279
280
    /**
281
     * Gets the number of words stored in the tree.
282
     *
283
     * @return int The word count.
284
     */
285
    public function wordCount() : int {
286
        return $this->numWords;
287
    }
288
289
    /**
290
     * Gets the number of prefixes stored in the tree.
291
     *
292
     * @return int The word count.
293
     */
294
    public function count() : int {
295
        return $this->size();
296
    }
297
}