Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.
Common duplication problems, and corresponding solutions are:
Complex classes like utf8 often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use utf8, and based on these observations, apply Extract Interface, too.
| 1 | <?php defined('SYSPATH') or die('No direct access allowed.'); |
||
| 81 | final class utf8 |
||
| 82 | { |
||
| 83 | |||
| 84 | // Called methods |
||
| 85 | public static $called = array(); |
||
| 86 | |||
| 87 | /** |
||
| 88 | * Recursively cleans arrays, objects, and strings. Removes ASCII control |
||
| 89 | * codes and converts to UTF-8 while silently discarding incompatible |
||
| 90 | * UTF-8 characters. |
||
| 91 | * |
||
| 92 | * @param string string to clean |
||
| 93 | * @return string |
||
| 94 | */ |
||
| 95 | public static function clean($str) |
||
| 120 | |||
| 121 | /** |
||
| 122 | * Tests whether a string contains only 7bit ASCII bytes. This is used to |
||
| 123 | * determine when to use native functions or UTF-8 functions. |
||
| 124 | * |
||
| 125 | * @param string string to check |
||
| 126 | * @return bool |
||
| 127 | */ |
||
| 128 | public static function is_ascii($str) |
||
| 132 | |||
| 133 | /** |
||
| 134 | * Strips out device control codes in the ASCII range. |
||
| 135 | * |
||
| 136 | * @param string string to clean |
||
| 137 | * @param string $str |
||
| 138 | * @return string |
||
| 139 | */ |
||
| 140 | public static function strip_ascii_ctrl($str) |
||
| 144 | |||
| 145 | /** |
||
| 146 | * Strips out all non-7bit ASCII bytes. |
||
| 147 | * |
||
| 148 | * @param string string to clean |
||
| 149 | * @return string |
||
| 150 | */ |
||
| 151 | public static function strip_non_ascii($str) |
||
| 155 | |||
| 156 | /** |
||
| 157 | * Replaces special/accented UTF-8 characters by ASCII-7 'equivalents'. |
||
| 158 | * |
||
| 159 | * @author Andreas Gohr <[email protected]> |
||
| 160 | * |
||
| 161 | * @param string string to transliterate |
||
| 162 | * @param integer -1 lowercase only, +1 uppercase only, 0 both cases |
||
| 163 | * @return string |
||
| 164 | */ |
||
| 165 | public static function transliterate_to_ascii($str, $case = 0) |
||
| 176 | |||
| 177 | /** |
||
| 178 | * Returns the length of the given string. |
||
| 179 | * @see http://php.net/strlen |
||
| 180 | * |
||
| 181 | * @param string string being measured for length |
||
| 182 | * @return integer |
||
| 183 | */ |
||
| 184 | public static function strlen($str) |
||
| 195 | |||
| 196 | /** |
||
| 197 | * Finds position of first occurrence of a UTF-8 string. |
||
| 198 | * @see http://php.net/strlen |
||
| 199 | * |
||
| 200 | * @author Harry Fuecks <[email protected]> |
||
| 201 | * |
||
| 202 | * @param string haystack |
||
| 203 | * @param string needle |
||
| 204 | * @param integer offset from which character in haystack to start searching |
||
| 205 | * @param string $str |
||
| 206 | * @return integer position of needle |
||
| 207 | * @return boolean FALSE if the needle is not found |
||
| 208 | */ |
||
| 209 | public static function strpos($str, $search, $offset = 0) |
||
| 220 | |||
| 221 | /** |
||
| 222 | * Finds position of last occurrence of a char in a UTF-8 string. |
||
| 223 | * @see http://php.net/strrpos |
||
| 224 | * |
||
| 225 | * @author Harry Fuecks <[email protected]> |
||
| 226 | * |
||
| 227 | * @param string haystack |
||
| 228 | * @param string needle |
||
| 229 | * @param integer offset from which character in haystack to start searching |
||
| 230 | * @param string $str |
||
| 231 | * @return integer position of needle |
||
| 232 | * @return boolean FALSE if the needle is not found |
||
| 233 | */ |
||
| 234 | public static function strrpos($str, $search, $offset = 0) |
||
| 245 | |||
| 246 | /** |
||
| 247 | * Returns part of a UTF-8 string. |
||
| 248 | * @see http://php.net/substr |
||
| 249 | * |
||
| 250 | * @author Chris Smith <[email protected]> |
||
| 251 | * |
||
| 252 | * @param string input string |
||
| 253 | * @param integer offset |
||
| 254 | * @param integer length limit |
||
| 255 | * @return string |
||
| 256 | */ |
||
| 257 | View Code Duplication | public static function substr($str, $offset, $length = null) |
|
| 268 | |||
| 269 | /** |
||
| 270 | * Replaces text within a portion of a UTF-8 string. |
||
| 271 | * @see http://php.net/substr_replace |
||
| 272 | * |
||
| 273 | * @author Harry Fuecks <[email protected]> |
||
| 274 | * |
||
| 275 | * @param string input string |
||
| 276 | * @param string replacement string |
||
| 277 | * @param integer offset |
||
| 278 | * @return string |
||
| 279 | */ |
||
| 280 | View Code Duplication | public static function substr_replace($str, $replacement, $offset, $length = null) |
|
| 291 | |||
| 292 | /** |
||
| 293 | * Makes a UTF-8 string lowercase. |
||
| 294 | * @see http://php.net/strtolower |
||
| 295 | * |
||
| 296 | * @author Andreas Gohr <[email protected]> |
||
| 297 | * |
||
| 298 | * @param string mixed case string |
||
| 299 | * @return string |
||
| 300 | */ |
||
| 301 | public static function strtolower($str) |
||
| 312 | |||
| 313 | /** |
||
| 314 | * Makes a UTF-8 string uppercase. |
||
| 315 | * @see http://php.net/strtoupper |
||
| 316 | * |
||
| 317 | * @author Andreas Gohr <[email protected]> |
||
| 318 | * |
||
| 319 | * @param string mixed case string |
||
| 320 | * @param string $str |
||
| 321 | * @return string |
||
| 322 | */ |
||
| 323 | public static function strtoupper($str) |
||
| 334 | |||
| 335 | /** |
||
| 336 | * Makes a UTF-8 string's first character uppercase. |
||
| 337 | * @see http://php.net/ucfirst |
||
| 338 | * |
||
| 339 | * @author Harry Fuecks <[email protected]> |
||
| 340 | * |
||
| 341 | * @param string mixed case string |
||
| 342 | * @return string |
||
| 343 | */ |
||
| 344 | public static function ucfirst($str) |
||
| 355 | |||
| 356 | /** |
||
| 357 | * Makes the first character of every word in a UTF-8 string uppercase. |
||
| 358 | * @see http://php.net/ucwords |
||
| 359 | * |
||
| 360 | * @author Harry Fuecks <[email protected]> |
||
| 361 | * |
||
| 362 | * @param string mixed case string |
||
| 363 | * @return string |
||
| 364 | */ |
||
| 365 | public static function ucwords($str) |
||
| 376 | |||
| 377 | /** |
||
| 378 | * Case-insensitive UTF-8 string comparison. |
||
| 379 | * @see http://php.net/strcasecmp |
||
| 380 | * |
||
| 381 | * @author Harry Fuecks <[email protected]> |
||
| 382 | * |
||
| 383 | * @param string string to compare |
||
| 384 | * @param string string to compare |
||
| 385 | * @return integer less than 0 if str1 is less than str2 |
||
| 386 | * @return integer greater than 0 if str1 is greater than str2 |
||
| 387 | * @return integer 0 if they are equal |
||
| 388 | */ |
||
| 389 | public static function strcasecmp($str1, $str2) |
||
| 400 | |||
| 401 | /** |
||
| 402 | * Returns a string or an array with all occurrences of search in subject (ignoring case). |
||
| 403 | * replaced with the given replace value. |
||
| 404 | * @see http://php.net/str_ireplace |
||
| 405 | * |
||
| 406 | * @note It's not fast and gets slower if $search and/or $replace are arrays. |
||
| 407 | * @author Harry Fuecks <[email protected] |
||
| 408 | * |
||
| 409 | * @param string|array text to replace |
||
| 410 | * @param string|array replacement text |
||
| 411 | * @param string|array subject text |
||
| 412 | * @param integer number of matched and replaced needles will be returned via this parameter which is passed by reference |
||
| 413 | * @return string if the input was a string |
||
| 414 | * @return array if the input was an array |
||
| 415 | */ |
||
| 416 | View Code Duplication | public static function str_ireplace($search, $replace, $str, & $count = null) |
|
| 427 | |||
| 428 | /** |
||
| 429 | * Case-insenstive UTF-8 version of strstr. Returns all of input string |
||
| 430 | * from the first occurrence of needle to the end. |
||
| 431 | * @see http://php.net/stristr |
||
| 432 | * |
||
| 433 | * @author Harry Fuecks <[email protected]> |
||
| 434 | * |
||
| 435 | * @param string input string |
||
| 436 | * @param string needle |
||
| 437 | * @return string matched substring if found |
||
| 438 | * @return boolean FALSE if the substring was not found |
||
| 439 | */ |
||
| 440 | public static function stristr($str, $search) |
||
| 451 | |||
| 452 | /** |
||
| 453 | * Finds the length of the initial segment matching mask. |
||
| 454 | * @see http://php.net/strspn |
||
| 455 | * |
||
| 456 | * @author Harry Fuecks <[email protected]> |
||
| 457 | * |
||
| 458 | * @param string input string |
||
| 459 | * @param string mask for search |
||
| 460 | * @param integer start position of the string to examine |
||
| 461 | * @param integer length of the string to examine |
||
| 462 | * @return integer length of the initial segment that contains characters in the mask |
||
| 463 | */ |
||
| 464 | View Code Duplication | public static function strspn($str, $mask, $offset = null, $length = null) |
|
| 475 | |||
| 476 | /** |
||
| 477 | * Finds the length of the initial segment not matching mask. |
||
| 478 | * @see http://php.net/strcspn |
||
| 479 | * |
||
| 480 | * @author Harry Fuecks <[email protected]> |
||
| 481 | * |
||
| 482 | * @param string input string |
||
| 483 | * @param string mask for search |
||
| 484 | * @param integer start position of the string to examine |
||
| 485 | * @param integer length of the string to examine |
||
| 486 | * @return integer length of the initial segment that contains characters not in the mask |
||
| 487 | */ |
||
| 488 | View Code Duplication | public static function strcspn($str, $mask, $offset = null, $length = null) |
|
| 499 | |||
| 500 | /** |
||
| 501 | * Pads a UTF-8 string to a certain length with another string. |
||
| 502 | * @see http://php.net/str_pad |
||
| 503 | * |
||
| 504 | * @author Harry Fuecks <[email protected]> |
||
| 505 | * |
||
| 506 | * @param string input string |
||
| 507 | * @param integer desired string length after padding |
||
| 508 | * @param string string to use as padding |
||
| 509 | * @param string padding type: STR_PAD_RIGHT, STR_PAD_LEFT, or STR_PAD_BOTH |
||
| 510 | * @return string |
||
| 511 | */ |
||
| 512 | View Code Duplication | public static function str_pad($str, $final_str_length, $pad_str = ' ', $pad_type = STR_PAD_RIGHT) |
|
| 523 | |||
| 524 | /** |
||
| 525 | * Converts a UTF-8 string to an array. |
||
| 526 | * @see http://php.net/str_split |
||
| 527 | * |
||
| 528 | * @author Harry Fuecks <[email protected]> |
||
| 529 | * |
||
| 530 | * @param string input string |
||
| 531 | * @param integer maximum length of each chunk |
||
| 532 | * @param string $str |
||
| 533 | * @return array |
||
| 534 | */ |
||
| 535 | public static function str_split($str, $split_length = 1) |
||
| 546 | |||
| 547 | /** |
||
| 548 | * Reverses a UTF-8 string. |
||
| 549 | * @see http://php.net/strrev |
||
| 550 | * |
||
| 551 | * @author Harry Fuecks <[email protected]> |
||
| 552 | * |
||
| 553 | * @param string string to be reversed |
||
| 554 | * @return string |
||
| 555 | */ |
||
| 556 | public static function strrev($str) |
||
| 567 | |||
| 568 | /** |
||
| 569 | * Strips whitespace (or other UTF-8 characters) from the beginning and |
||
| 570 | * end of a string. |
||
| 571 | * @see http://php.net/trim |
||
| 572 | * |
||
| 573 | * @author Andreas Gohr <[email protected]> |
||
| 574 | * |
||
| 575 | * @param string input string |
||
| 576 | * @param string string of characters to remove |
||
| 577 | * @return string |
||
| 578 | */ |
||
| 579 | View Code Duplication | public static function trim($str, $charlist = null) |
|
| 590 | |||
| 591 | /** |
||
| 592 | * Strips whitespace (or other UTF-8 characters) from the beginning of a string. |
||
| 593 | * @see http://php.net/ltrim |
||
| 594 | * |
||
| 595 | * @author Andreas Gohr <[email protected]> |
||
| 596 | * |
||
| 597 | * @param string input string |
||
| 598 | * @param string string of characters to remove |
||
| 599 | * @param string $str |
||
| 600 | * @return string |
||
| 601 | */ |
||
| 602 | View Code Duplication | public static function ltrim($str, $charlist = null) |
|
| 613 | |||
| 614 | /** |
||
| 615 | * Strips whitespace (or other UTF-8 characters) from the end of a string. |
||
| 616 | * @see http://php.net/rtrim |
||
| 617 | * |
||
| 618 | * @author Andreas Gohr <[email protected]> |
||
| 619 | * |
||
| 620 | * @param string input string |
||
| 621 | * @param string string of characters to remove |
||
| 622 | * @return string |
||
| 623 | */ |
||
| 624 | View Code Duplication | public static function rtrim($str, $charlist = null) |
|
| 635 | |||
| 636 | /** |
||
| 637 | * Returns the unicode ordinal for a character. |
||
| 638 | * @see http://php.net/ord |
||
| 639 | * |
||
| 640 | * @author Harry Fuecks <[email protected]> |
||
| 641 | * |
||
| 642 | * @param string UTF-8 encoded character |
||
| 643 | * @return integer |
||
| 644 | */ |
||
| 645 | public static function ord($chr) |
||
| 656 | |||
| 657 | /** |
||
| 658 | * Takes an UTF-8 string and returns an array of ints representing the Unicode characters. |
||
| 659 | * Astral planes are supported i.e. the ints in the output can be > 0xFFFF. |
||
| 660 | * Occurrances of the BOM are ignored. Surrogates are not allowed. |
||
| 661 | * |
||
| 662 | * The Original Code is Mozilla Communicator client code. |
||
| 663 | * The Initial Developer of the Original Code is Netscape Communications Corporation. |
||
| 664 | * Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. |
||
| 665 | * Ported to PHP by Henri Sivonen <[email protected]>, see http://hsivonen.iki.fi/php-utf8/. |
||
| 666 | * Slight modifications to fit with phputf8 library by Harry Fuecks <[email protected]>. |
||
| 667 | * |
||
| 668 | * @param string UTF-8 encoded string |
||
| 669 | * @return array unicode code points |
||
| 670 | * @return boolean FALSE if the string is invalid |
||
| 671 | */ |
||
| 672 | public static function to_unicode($str) |
||
| 683 | |||
| 684 | /** |
||
| 685 | * Takes an array of ints representing the Unicode characters and returns a UTF-8 string. |
||
| 686 | * Astral planes are supported i.e. the ints in the input can be > 0xFFFF. |
||
| 687 | * Occurrances of the BOM are ignored. Surrogates are not allowed. |
||
| 688 | * |
||
| 689 | * The Original Code is Mozilla Communicator client code. |
||
| 690 | * The Initial Developer of the Original Code is Netscape Communications Corporation. |
||
| 691 | * Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. |
||
| 692 | * Ported to PHP by Henri Sivonen <[email protected]>, see http://hsivonen.iki.fi/php-utf8/. |
||
| 693 | * Slight modifications to fit with phputf8 library by Harry Fuecks <[email protected]>. |
||
| 694 | * |
||
| 695 | * @param array unicode code points representing a string |
||
| 696 | * @return string utf8 string of characters |
||
| 697 | * @return boolean FALSE if a code point cannot be found |
||
| 698 | */ |
||
| 699 | public static function from_unicode($arr) |
||
| 710 | } // End utf8 |
||
| 711 |
Short variable names may make your code harder to understand. Variable names should be self-descriptive. This check looks for variable names who are shorter than a configured minimum.