Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.
Common duplication problems, and corresponding solutions are:
Complex classes like utf8 often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use utf8, and based on these observations, apply Extract Interface, too.
1 | <?php defined('SYSPATH') or die('No direct access allowed.'); |
||
81 | final class utf8 |
||
82 | { |
||
83 | |||
84 | // Called methods |
||
85 | public static $called = array(); |
||
86 | |||
87 | /** |
||
88 | * Recursively cleans arrays, objects, and strings. Removes ASCII control |
||
89 | * codes and converts to UTF-8 while silently discarding incompatible |
||
90 | * UTF-8 characters. |
||
91 | * |
||
92 | * @param string string to clean |
||
93 | * @return string |
||
94 | */ |
||
95 | public static function clean($str) |
||
120 | |||
121 | /** |
||
122 | * Tests whether a string contains only 7bit ASCII bytes. This is used to |
||
123 | * determine when to use native functions or UTF-8 functions. |
||
124 | * |
||
125 | * @param string string to check |
||
126 | * @return bool |
||
127 | */ |
||
128 | public static function is_ascii($str) |
||
132 | |||
133 | /** |
||
134 | * Strips out device control codes in the ASCII range. |
||
135 | * |
||
136 | * @param string string to clean |
||
137 | * @param string $str |
||
138 | * @return string |
||
139 | */ |
||
140 | public static function strip_ascii_ctrl($str) |
||
144 | |||
145 | /** |
||
146 | * Strips out all non-7bit ASCII bytes. |
||
147 | * |
||
148 | * @param string string to clean |
||
149 | * @return string |
||
150 | */ |
||
151 | public static function strip_non_ascii($str) |
||
155 | |||
156 | /** |
||
157 | * Replaces special/accented UTF-8 characters by ASCII-7 'equivalents'. |
||
158 | * |
||
159 | * @author Andreas Gohr <[email protected]> |
||
160 | * |
||
161 | * @param string string to transliterate |
||
162 | * @param integer -1 lowercase only, +1 uppercase only, 0 both cases |
||
163 | * @return string |
||
164 | */ |
||
165 | public static function transliterate_to_ascii($str, $case = 0) |
||
176 | |||
177 | /** |
||
178 | * Returns the length of the given string. |
||
179 | * @see http://php.net/strlen |
||
180 | * |
||
181 | * @param string string being measured for length |
||
182 | * @return integer |
||
183 | */ |
||
184 | public static function strlen($str) |
||
195 | |||
196 | /** |
||
197 | * Finds position of first occurrence of a UTF-8 string. |
||
198 | * @see http://php.net/strlen |
||
199 | * |
||
200 | * @author Harry Fuecks <[email protected]> |
||
201 | * |
||
202 | * @param string haystack |
||
203 | * @param string needle |
||
204 | * @param integer offset from which character in haystack to start searching |
||
205 | * @param string $str |
||
206 | * @return integer position of needle |
||
207 | * @return boolean FALSE if the needle is not found |
||
208 | */ |
||
209 | public static function strpos($str, $search, $offset = 0) |
||
220 | |||
221 | /** |
||
222 | * Finds position of last occurrence of a char in a UTF-8 string. |
||
223 | * @see http://php.net/strrpos |
||
224 | * |
||
225 | * @author Harry Fuecks <[email protected]> |
||
226 | * |
||
227 | * @param string haystack |
||
228 | * @param string needle |
||
229 | * @param integer offset from which character in haystack to start searching |
||
230 | * @param string $str |
||
231 | * @return integer position of needle |
||
232 | * @return boolean FALSE if the needle is not found |
||
233 | */ |
||
234 | public static function strrpos($str, $search, $offset = 0) |
||
245 | |||
246 | /** |
||
247 | * Returns part of a UTF-8 string. |
||
248 | * @see http://php.net/substr |
||
249 | * |
||
250 | * @author Chris Smith <[email protected]> |
||
251 | * |
||
252 | * @param string input string |
||
253 | * @param integer offset |
||
254 | * @param integer length limit |
||
255 | * @return string |
||
256 | */ |
||
257 | View Code Duplication | public static function substr($str, $offset, $length = null) |
|
268 | |||
269 | /** |
||
270 | * Replaces text within a portion of a UTF-8 string. |
||
271 | * @see http://php.net/substr_replace |
||
272 | * |
||
273 | * @author Harry Fuecks <[email protected]> |
||
274 | * |
||
275 | * @param string input string |
||
276 | * @param string replacement string |
||
277 | * @param integer offset |
||
278 | * @return string |
||
279 | */ |
||
280 | View Code Duplication | public static function substr_replace($str, $replacement, $offset, $length = null) |
|
291 | |||
292 | /** |
||
293 | * Makes a UTF-8 string lowercase. |
||
294 | * @see http://php.net/strtolower |
||
295 | * |
||
296 | * @author Andreas Gohr <[email protected]> |
||
297 | * |
||
298 | * @param string mixed case string |
||
299 | * @return string |
||
300 | */ |
||
301 | public static function strtolower($str) |
||
312 | |||
313 | /** |
||
314 | * Makes a UTF-8 string uppercase. |
||
315 | * @see http://php.net/strtoupper |
||
316 | * |
||
317 | * @author Andreas Gohr <[email protected]> |
||
318 | * |
||
319 | * @param string mixed case string |
||
320 | * @param string $str |
||
321 | * @return string |
||
322 | */ |
||
323 | public static function strtoupper($str) |
||
334 | |||
335 | /** |
||
336 | * Makes a UTF-8 string's first character uppercase. |
||
337 | * @see http://php.net/ucfirst |
||
338 | * |
||
339 | * @author Harry Fuecks <[email protected]> |
||
340 | * |
||
341 | * @param string mixed case string |
||
342 | * @return string |
||
343 | */ |
||
344 | public static function ucfirst($str) |
||
355 | |||
356 | /** |
||
357 | * Makes the first character of every word in a UTF-8 string uppercase. |
||
358 | * @see http://php.net/ucwords |
||
359 | * |
||
360 | * @author Harry Fuecks <[email protected]> |
||
361 | * |
||
362 | * @param string mixed case string |
||
363 | * @return string |
||
364 | */ |
||
365 | public static function ucwords($str) |
||
376 | |||
377 | /** |
||
378 | * Case-insensitive UTF-8 string comparison. |
||
379 | * @see http://php.net/strcasecmp |
||
380 | * |
||
381 | * @author Harry Fuecks <[email protected]> |
||
382 | * |
||
383 | * @param string string to compare |
||
384 | * @param string string to compare |
||
385 | * @return integer less than 0 if str1 is less than str2 |
||
386 | * @return integer greater than 0 if str1 is greater than str2 |
||
387 | * @return integer 0 if they are equal |
||
388 | */ |
||
389 | public static function strcasecmp($str1, $str2) |
||
400 | |||
401 | /** |
||
402 | * Returns a string or an array with all occurrences of search in subject (ignoring case). |
||
403 | * replaced with the given replace value. |
||
404 | * @see http://php.net/str_ireplace |
||
405 | * |
||
406 | * @note It's not fast and gets slower if $search and/or $replace are arrays. |
||
407 | * @author Harry Fuecks <[email protected] |
||
408 | * |
||
409 | * @param string|array text to replace |
||
410 | * @param string|array replacement text |
||
411 | * @param string|array subject text |
||
412 | * @param integer number of matched and replaced needles will be returned via this parameter which is passed by reference |
||
413 | * @return string if the input was a string |
||
414 | * @return array if the input was an array |
||
415 | */ |
||
416 | View Code Duplication | public static function str_ireplace($search, $replace, $str, & $count = null) |
|
427 | |||
428 | /** |
||
429 | * Case-insenstive UTF-8 version of strstr. Returns all of input string |
||
430 | * from the first occurrence of needle to the end. |
||
431 | * @see http://php.net/stristr |
||
432 | * |
||
433 | * @author Harry Fuecks <[email protected]> |
||
434 | * |
||
435 | * @param string input string |
||
436 | * @param string needle |
||
437 | * @return string matched substring if found |
||
438 | * @return boolean FALSE if the substring was not found |
||
439 | */ |
||
440 | public static function stristr($str, $search) |
||
451 | |||
452 | /** |
||
453 | * Finds the length of the initial segment matching mask. |
||
454 | * @see http://php.net/strspn |
||
455 | * |
||
456 | * @author Harry Fuecks <[email protected]> |
||
457 | * |
||
458 | * @param string input string |
||
459 | * @param string mask for search |
||
460 | * @param integer start position of the string to examine |
||
461 | * @param integer length of the string to examine |
||
462 | * @return integer length of the initial segment that contains characters in the mask |
||
463 | */ |
||
464 | View Code Duplication | public static function strspn($str, $mask, $offset = null, $length = null) |
|
475 | |||
476 | /** |
||
477 | * Finds the length of the initial segment not matching mask. |
||
478 | * @see http://php.net/strcspn |
||
479 | * |
||
480 | * @author Harry Fuecks <[email protected]> |
||
481 | * |
||
482 | * @param string input string |
||
483 | * @param string mask for search |
||
484 | * @param integer start position of the string to examine |
||
485 | * @param integer length of the string to examine |
||
486 | * @return integer length of the initial segment that contains characters not in the mask |
||
487 | */ |
||
488 | View Code Duplication | public static function strcspn($str, $mask, $offset = null, $length = null) |
|
499 | |||
500 | /** |
||
501 | * Pads a UTF-8 string to a certain length with another string. |
||
502 | * @see http://php.net/str_pad |
||
503 | * |
||
504 | * @author Harry Fuecks <[email protected]> |
||
505 | * |
||
506 | * @param string input string |
||
507 | * @param integer desired string length after padding |
||
508 | * @param string string to use as padding |
||
509 | * @param string padding type: STR_PAD_RIGHT, STR_PAD_LEFT, or STR_PAD_BOTH |
||
510 | * @return string |
||
511 | */ |
||
512 | View Code Duplication | public static function str_pad($str, $final_str_length, $pad_str = ' ', $pad_type = STR_PAD_RIGHT) |
|
523 | |||
524 | /** |
||
525 | * Converts a UTF-8 string to an array. |
||
526 | * @see http://php.net/str_split |
||
527 | * |
||
528 | * @author Harry Fuecks <[email protected]> |
||
529 | * |
||
530 | * @param string input string |
||
531 | * @param integer maximum length of each chunk |
||
532 | * @param string $str |
||
533 | * @return array |
||
534 | */ |
||
535 | public static function str_split($str, $split_length = 1) |
||
546 | |||
547 | /** |
||
548 | * Reverses a UTF-8 string. |
||
549 | * @see http://php.net/strrev |
||
550 | * |
||
551 | * @author Harry Fuecks <[email protected]> |
||
552 | * |
||
553 | * @param string string to be reversed |
||
554 | * @return string |
||
555 | */ |
||
556 | public static function strrev($str) |
||
567 | |||
568 | /** |
||
569 | * Strips whitespace (or other UTF-8 characters) from the beginning and |
||
570 | * end of a string. |
||
571 | * @see http://php.net/trim |
||
572 | * |
||
573 | * @author Andreas Gohr <[email protected]> |
||
574 | * |
||
575 | * @param string input string |
||
576 | * @param string string of characters to remove |
||
577 | * @return string |
||
578 | */ |
||
579 | View Code Duplication | public static function trim($str, $charlist = null) |
|
590 | |||
591 | /** |
||
592 | * Strips whitespace (or other UTF-8 characters) from the beginning of a string. |
||
593 | * @see http://php.net/ltrim |
||
594 | * |
||
595 | * @author Andreas Gohr <[email protected]> |
||
596 | * |
||
597 | * @param string input string |
||
598 | * @param string string of characters to remove |
||
599 | * @param string $str |
||
600 | * @return string |
||
601 | */ |
||
602 | View Code Duplication | public static function ltrim($str, $charlist = null) |
|
613 | |||
614 | /** |
||
615 | * Strips whitespace (or other UTF-8 characters) from the end of a string. |
||
616 | * @see http://php.net/rtrim |
||
617 | * |
||
618 | * @author Andreas Gohr <[email protected]> |
||
619 | * |
||
620 | * @param string input string |
||
621 | * @param string string of characters to remove |
||
622 | * @return string |
||
623 | */ |
||
624 | View Code Duplication | public static function rtrim($str, $charlist = null) |
|
635 | |||
636 | /** |
||
637 | * Returns the unicode ordinal for a character. |
||
638 | * @see http://php.net/ord |
||
639 | * |
||
640 | * @author Harry Fuecks <[email protected]> |
||
641 | * |
||
642 | * @param string UTF-8 encoded character |
||
643 | * @return integer |
||
644 | */ |
||
645 | public static function ord($chr) |
||
656 | |||
657 | /** |
||
658 | * Takes an UTF-8 string and returns an array of ints representing the Unicode characters. |
||
659 | * Astral planes are supported i.e. the ints in the output can be > 0xFFFF. |
||
660 | * Occurrances of the BOM are ignored. Surrogates are not allowed. |
||
661 | * |
||
662 | * The Original Code is Mozilla Communicator client code. |
||
663 | * The Initial Developer of the Original Code is Netscape Communications Corporation. |
||
664 | * Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. |
||
665 | * Ported to PHP by Henri Sivonen <[email protected]>, see http://hsivonen.iki.fi/php-utf8/. |
||
666 | * Slight modifications to fit with phputf8 library by Harry Fuecks <[email protected]>. |
||
667 | * |
||
668 | * @param string UTF-8 encoded string |
||
669 | * @return array unicode code points |
||
670 | * @return boolean FALSE if the string is invalid |
||
671 | */ |
||
672 | public static function to_unicode($str) |
||
683 | |||
684 | /** |
||
685 | * Takes an array of ints representing the Unicode characters and returns a UTF-8 string. |
||
686 | * Astral planes are supported i.e. the ints in the input can be > 0xFFFF. |
||
687 | * Occurrances of the BOM are ignored. Surrogates are not allowed. |
||
688 | * |
||
689 | * The Original Code is Mozilla Communicator client code. |
||
690 | * The Initial Developer of the Original Code is Netscape Communications Corporation. |
||
691 | * Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. |
||
692 | * Ported to PHP by Henri Sivonen <[email protected]>, see http://hsivonen.iki.fi/php-utf8/. |
||
693 | * Slight modifications to fit with phputf8 library by Harry Fuecks <[email protected]>. |
||
694 | * |
||
695 | * @param array unicode code points representing a string |
||
696 | * @return string utf8 string of characters |
||
697 | * @return boolean FALSE if a code point cannot be found |
||
698 | */ |
||
699 | public static function from_unicode($arr) |
||
710 | } // End utf8 |
||
711 |
Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.