Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.
Common duplication problems, and corresponding solutions are:
Complex classes like MetaExtractor often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use MetaExtractor, and based on these observations, apply Extract Interface, too.
1 | <?php |
||
19 | class MetaExtractor extends AbstractModule implements ModuleInterface { |
||
20 | use ArticleMutatorTrait; |
||
21 | |||
22 | /** @var string[] */ |
||
23 | protected static $SPLITTER_CHARS = [ |
||
24 | '|', '-', '»', ':', |
||
25 | ]; |
||
26 | |||
27 | /** @var string */ |
||
28 | protected static $A_REL_TAG_SELECTOR = "a[rel='tag'], a[href*='/tag/']"; |
||
29 | |||
30 | /** @var string[] */ |
||
31 | protected static $VIDEO_PROVIDERS = [ |
||
32 | 'youtube\.com', |
||
33 | 'youtu\.be', |
||
34 | 'vimeo\.com', |
||
35 | 'blip\.tv', |
||
36 | 'dailymotion\.com', |
||
37 | 'dai\.ly', |
||
38 | 'flickr\.com', |
||
39 | 'flic\.kr', |
||
40 | ]; |
||
41 | |||
42 | /** |
||
43 | * @param Article $article |
||
44 | */ |
||
45 | public function run(Article $article) { |
||
65 | |||
66 | /** |
||
67 | * Retrieve all OpenGraph meta data |
||
68 | * |
||
69 | * Ported from python-goose https://github.com/grangier/python-goose/ by Xavier Grangier |
||
70 | * |
||
71 | * @return string[] |
||
72 | */ |
||
73 | private function getOpenGraph() { |
||
97 | |||
98 | /** |
||
99 | * Clean title text |
||
100 | * |
||
101 | * Ported from python-goose https://github.com/grangier/python-goose/ by Xavier Grangier |
||
102 | * |
||
103 | * @param string $title |
||
104 | * |
||
105 | * @return string |
||
106 | */ |
||
107 | private function cleanTitle($title) { |
||
147 | |||
148 | /** |
||
149 | * Get article title |
||
150 | * |
||
151 | * Ported from python-goose https://github.com/grangier/python-goose/ by Xavier Grangier |
||
152 | * |
||
153 | * @return string |
||
154 | */ |
||
155 | private function getTitle() { |
||
175 | |||
176 | /** |
||
177 | * @param Document $doc |
||
178 | * @param string $tag |
||
179 | * @param string $property |
||
180 | * @param string $value |
||
181 | * |
||
182 | * @return \DOMWrap\NodeList |
||
183 | */ |
||
184 | private function getNodesByLowercasePropertyValue(Document $doc, $tag, $property, $value) { |
||
187 | |||
188 | /** |
||
189 | * @param Document $doc |
||
190 | * @param string $property |
||
191 | * @param string $value |
||
192 | * @param string $attr |
||
193 | * |
||
194 | * @return string |
||
195 | */ |
||
196 | private function getMetaContent(Document $doc, $property, $value, $attr = 'content') { |
||
208 | |||
209 | /** |
||
210 | * If the article has meta language set in the source, use that |
||
211 | * |
||
212 | * @return string |
||
213 | */ |
||
214 | private function getMetaLanguage() { |
||
245 | |||
246 | /** |
||
247 | * If the article has meta description set in the source, use that |
||
248 | * |
||
249 | * @return string |
||
250 | */ |
||
251 | private function getMetaDescription() { |
||
264 | |||
265 | /** |
||
266 | * If the article has meta keywords set in the source, use that |
||
267 | * |
||
268 | * @return string |
||
269 | */ |
||
270 | private function getMetaKeywords() { |
||
273 | |||
274 | /** |
||
275 | * If the article has meta canonical link set in the url |
||
276 | * |
||
277 | * @return string |
||
278 | */ |
||
279 | private function getCanonicalLink() { |
||
300 | |||
301 | /** |
||
302 | * @return string[] |
||
303 | */ |
||
304 | private function getTags() { |
||
315 | |||
316 | /** |
||
317 | * Pulls out videos we like |
||
318 | * |
||
319 | * @return string[] |
||
320 | */ |
||
321 | private function getVideos() { |
||
347 | |||
348 | /** |
||
349 | * Pulls out links we like |
||
350 | * |
||
351 | * @return string[] |
||
352 | */ |
||
353 | private function getLinks() { |
||
369 | |||
370 | /** |
||
371 | * @return string[] |
||
372 | */ |
||
373 | private function getPopularWords() { |
||
406 | } |
||
407 |
Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.
You can also find more detailed suggestions in the “Code” section of your repository.