Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.
Common duplication problems, and corresponding solutions are:
Complex classes like MimeAnalyzer often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use MimeAnalyzer, and based on these observations, apply Extract Interface, too.
| 1 | <?php  | 
            ||
| 30 | class MimeAnalyzer implements LoggerAwareInterface { | 
            ||
| 31 | /** @var string */  | 
            ||
| 32 | protected $typeFile;  | 
            ||
| 33 | /** @var string */  | 
            ||
| 34 | protected $infoFile;  | 
            ||
| 35 | /** @var string */  | 
            ||
| 36 | protected $xmlTypes;  | 
            ||
| 37 | /** @var callable */  | 
            ||
| 38 | protected $initCallback;  | 
            ||
| 39 | /** @var callable */  | 
            ||
| 40 | protected $detectCallback;  | 
            ||
| 41 | /** @var callable */  | 
            ||
| 42 | protected $guessCallback;  | 
            ||
| 43 | /** @var callable */  | 
            ||
| 44 | protected $extCallback;  | 
            ||
| 45 | /** @var array Mapping of media types to arrays of MIME types */  | 
            ||
| 46 | protected $mediaTypes = null;  | 
            ||
| 47 | /** @var array Map of MIME type aliases */  | 
            ||
| 48 | protected $mimeTypeAliases = null;  | 
            ||
| 49 | /** @var array Map of MIME types to file extensions (as a space separated list) */  | 
            ||
| 50 | protected $mimetoExt = null;  | 
            ||
| 51 | |||
| 52 | /** @var array Map of file extensions types to MIME types (as a space separated list) */  | 
            ||
| 53 | public $mExtToMime = null; // legacy name; field accessed by hooks  | 
            ||
| 54 | |||
| 55 | /** @var IEContentAnalyzer */  | 
            ||
| 56 | protected $IEAnalyzer;  | 
            ||
| 57 | |||
| 58 | /** @var string Extra MIME types, set for example by media handling extensions */  | 
            ||
| 59 | private $extraTypes = '';  | 
            ||
| 60 | /** @var string Extra MIME info, set for example by media handling extensions */  | 
            ||
| 61 | private $extraInfo = '';  | 
            ||
| 62 | |||
| 63 | /** @var LoggerInterface */  | 
            ||
| 64 | private $logger;  | 
            ||
| 65 | |||
| 66 | /**  | 
            ||
| 67 | * Defines a set of well known MIME types  | 
            ||
| 68 | * This is used as a fallback to mime.types files.  | 
            ||
| 69 | * An extensive list of well known MIME types is provided by  | 
            ||
| 70 | * the file mime.types in the includes directory.  | 
            ||
| 71 | *  | 
            ||
| 72 | * This list concatenated with mime.types is used to create a MIME <-> ext  | 
            ||
| 73 | * map. Each line contains a MIME type followed by a space separated list of  | 
            ||
| 74 | * extensions. If multiple extensions for a single MIME type exist or if  | 
            ||
| 75 | * multiple MIME types exist for a single extension then in most cases  | 
            ||
| 76 | * MediaWiki assumes that the first extension following the MIME type is the  | 
            ||
| 77 | * canonical extension, and the first time a MIME type appears for a certain  | 
            ||
| 78 | * extension is considered the canonical MIME type.  | 
            ||
| 79 | *  | 
            ||
| 80 | * (Note that appending the type file list to the end of self::$wellKnownTypes  | 
            ||
| 81 | * sucks because you can't redefine canonical types. This could be fixed by  | 
            ||
| 82 | * appending self::$wellKnownTypes behind type file list, but who knows  | 
            ||
| 83 | * what will break? In practice this probably isn't a problem anyway -- Bryan)  | 
            ||
| 84 | */  | 
            ||
| 85 | protected static $wellKnownTypes = <<<EOT  | 
            ||
| 86 | application/ogg ogx ogg ogm ogv oga spx  | 
            ||
| 87 | application/pdf pdf  | 
            ||
| 88 | application/vnd.oasis.opendocument.chart odc  | 
            ||
| 89 | application/vnd.oasis.opendocument.chart-template otc  | 
            ||
| 90 | application/vnd.oasis.opendocument.database odb  | 
            ||
| 91 | application/vnd.oasis.opendocument.formula odf  | 
            ||
| 92 | application/vnd.oasis.opendocument.formula-template otf  | 
            ||
| 93 | application/vnd.oasis.opendocument.graphics odg  | 
            ||
| 94 | application/vnd.oasis.opendocument.graphics-template otg  | 
            ||
| 95 | application/vnd.oasis.opendocument.image odi  | 
            ||
| 96 | application/vnd.oasis.opendocument.image-template oti  | 
            ||
| 97 | application/vnd.oasis.opendocument.presentation odp  | 
            ||
| 98 | application/vnd.oasis.opendocument.presentation-template otp  | 
            ||
| 99 | application/vnd.oasis.opendocument.spreadsheet ods  | 
            ||
| 100 | application/vnd.oasis.opendocument.spreadsheet-template ots  | 
            ||
| 101 | application/vnd.oasis.opendocument.text odt  | 
            ||
| 102 | application/vnd.oasis.opendocument.text-master otm  | 
            ||
| 103 | application/vnd.oasis.opendocument.text-template ott  | 
            ||
| 104 | application/vnd.oasis.opendocument.text-web oth  | 
            ||
| 105 | application/javascript js  | 
            ||
| 106 | application/x-shockwave-flash swf  | 
            ||
| 107 | audio/midi mid midi kar  | 
            ||
| 108 | audio/mpeg mpga mpa mp2 mp3  | 
            ||
| 109 | audio/x-aiff aif aiff aifc  | 
            ||
| 110 | audio/x-wav wav  | 
            ||
| 111 | audio/ogg oga spx ogg  | 
            ||
| 112 | image/x-bmp bmp  | 
            ||
| 113 | image/gif gif  | 
            ||
| 114 | image/jpeg jpeg jpg jpe  | 
            ||
| 115 | image/png png  | 
            ||
| 116 | image/svg+xml svg  | 
            ||
| 117 | image/svg svg  | 
            ||
| 118 | image/tiff tiff tif  | 
            ||
| 119 | image/vnd.djvu djvu  | 
            ||
| 120 | image/x.djvu djvu  | 
            ||
| 121 | image/x-djvu djvu  | 
            ||
| 122 | image/x-portable-pixmap ppm  | 
            ||
| 123 | image/x-xcf xcf  | 
            ||
| 124 | text/plain txt  | 
            ||
| 125 | text/html html htm  | 
            ||
| 126 | video/ogg ogv ogm ogg  | 
            ||
| 127 | video/mpeg mpg mpeg  | 
            ||
| 128 | EOT;  | 
            ||
| 129 | |||
| 130 | /**  | 
            ||
| 131 | * Defines a set of well known MIME info entries  | 
            ||
| 132 | * This is used as a fallback to mime.info files.  | 
            ||
| 133 | * An extensive list of well known MIME types is provided by  | 
            ||
| 134 | * the file mime.info in the includes directory.  | 
            ||
| 135 | */  | 
            ||
| 136 | protected static $wellKnownInfo = <<<EOT  | 
            ||
| 137 | application/pdf [OFFICE]  | 
            ||
| 138 | application/vnd.oasis.opendocument.chart [OFFICE]  | 
            ||
| 139 | application/vnd.oasis.opendocument.chart-template [OFFICE]  | 
            ||
| 140 | application/vnd.oasis.opendocument.database [OFFICE]  | 
            ||
| 141 | application/vnd.oasis.opendocument.formula [OFFICE]  | 
            ||
| 142 | application/vnd.oasis.opendocument.formula-template [OFFICE]  | 
            ||
| 143 | application/vnd.oasis.opendocument.graphics [OFFICE]  | 
            ||
| 144 | application/vnd.oasis.opendocument.graphics-template [OFFICE]  | 
            ||
| 145 | application/vnd.oasis.opendocument.image [OFFICE]  | 
            ||
| 146 | application/vnd.oasis.opendocument.image-template [OFFICE]  | 
            ||
| 147 | application/vnd.oasis.opendocument.presentation [OFFICE]  | 
            ||
| 148 | application/vnd.oasis.opendocument.presentation-template [OFFICE]  | 
            ||
| 149 | application/vnd.oasis.opendocument.spreadsheet [OFFICE]  | 
            ||
| 150 | application/vnd.oasis.opendocument.spreadsheet-template [OFFICE]  | 
            ||
| 151 | application/vnd.oasis.opendocument.text [OFFICE]  | 
            ||
| 152 | application/vnd.oasis.opendocument.text-template [OFFICE]  | 
            ||
| 153 | application/vnd.oasis.opendocument.text-master [OFFICE]  | 
            ||
| 154 | application/vnd.oasis.opendocument.text-web [OFFICE]  | 
            ||
| 155 | application/javascript text/javascript application/x-javascript [EXECUTABLE]  | 
            ||
| 156 | application/x-shockwave-flash [MULTIMEDIA]  | 
            ||
| 157 | audio/midi [AUDIO]  | 
            ||
| 158 | audio/x-aiff [AUDIO]  | 
            ||
| 159 | audio/x-wav [AUDIO]  | 
            ||
| 160 | audio/mp3 audio/mpeg [AUDIO]  | 
            ||
| 161 | application/ogg audio/ogg video/ogg [MULTIMEDIA]  | 
            ||
| 162 | image/x-bmp image/x-ms-bmp image/bmp [BITMAP]  | 
            ||
| 163 | image/gif [BITMAP]  | 
            ||
| 164 | image/jpeg [BITMAP]  | 
            ||
| 165 | image/png [BITMAP]  | 
            ||
| 166 | image/svg+xml [DRAWING]  | 
            ||
| 167 | image/tiff [BITMAP]  | 
            ||
| 168 | image/vnd.djvu [BITMAP]  | 
            ||
| 169 | image/x-xcf [BITMAP]  | 
            ||
| 170 | image/x-portable-pixmap [BITMAP]  | 
            ||
| 171 | text/plain [TEXT]  | 
            ||
| 172 | text/html [TEXT]  | 
            ||
| 173 | video/ogg [VIDEO]  | 
            ||
| 174 | video/mpeg [VIDEO]  | 
            ||
| 175 | unknown/unknown application/octet-stream application/x-empty [UNKNOWN]  | 
            ||
| 176 | EOT;  | 
            ||
| 177 | |||
| 178 | /**  | 
            ||
| 179 | * @param array $params Configuration map, includes:  | 
            ||
| 180 | * - typeFile: path to file with the list of known MIME types  | 
            ||
| 181 | * - infoFile: path to file with the MIME type info  | 
            ||
| 182 | * - xmlTypes: map of root element names to XML MIME types  | 
            ||
| 183 | * - initCallback: initialization callback that is passed this object [optional]  | 
            ||
| 184 | * - detectCallback: alternative to finfo that returns the mime type for a file.  | 
            ||
| 185 | * For example, the callback can return the output of "file -bi". [optional]  | 
            ||
| 186 | * - guessCallback: callback to improve the guessed MIME type using the file data.  | 
            ||
| 187 | * This is intended for fixing mistakes in fileinfo or "detectCallback". [optional]  | 
            ||
| 188 | * - extCallback: callback to improve the guessed MIME type using the extension. [optional]  | 
            ||
| 189 | * - logger: PSR-3 logger [optional]  | 
            ||
| 190 | * @note Constructing these instances is expensive due to file reads.  | 
            ||
| 191 | * A service or singleton pattern should be used to avoid creating instances again and again.  | 
            ||
| 192 | */  | 
            ||
| 193 | 	public function __construct( array $params ) { | 
            ||
| 215 | |||
| 216 | 	protected function loadFiles() { | 
            ||
| 377 | |||
| 378 | 	public function setLogger( LoggerInterface $logger ) { | 
            ||
| 381 | |||
| 382 | /**  | 
            ||
| 383 | * Adds to the list mapping MIME to file extensions.  | 
            ||
| 384 | * As an extension author, you are encouraged to submit patches to  | 
            ||
| 385 | * MediaWiki's core to add new MIME types to mime.types.  | 
            ||
| 386 | * @param string $types  | 
            ||
| 387 | */  | 
            ||
| 388 | 	public function addExtraTypes( $types ) { | 
            ||
| 391 | |||
| 392 | /**  | 
            ||
| 393 | * Adds to the list mapping MIME to media type.  | 
            ||
| 394 | * As an extension author, you are encouraged to submit patches to  | 
            ||
| 395 | * MediaWiki's core to add new MIME info to mime.info.  | 
            ||
| 396 | * @param string $info  | 
            ||
| 397 | */  | 
            ||
| 398 | 	public function addExtraInfo( $info ) { | 
            ||
| 401 | |||
| 402 | /**  | 
            ||
| 403 | * Returns a list of file extensions for a given MIME type as a space  | 
            ||
| 404 | * separated string or null if the MIME type was unrecognized. Resolves  | 
            ||
| 405 | * MIME type aliases.  | 
            ||
| 406 | *  | 
            ||
| 407 | * @param string $mime  | 
            ||
| 408 | * @return string|null  | 
            ||
| 409 | */  | 
            ||
| 410 | 	public function getExtensionsForType( $mime ) { | 
            ||
| 428 | |||
| 429 | /**  | 
            ||
| 430 | * Returns a list of MIME types for a given file extension as a space  | 
            ||
| 431 | * separated string or null if the extension was unrecognized.  | 
            ||
| 432 | *  | 
            ||
| 433 | * @param string $ext  | 
            ||
| 434 | * @return string|null  | 
            ||
| 435 | */  | 
            ||
| 436 | 	public function getTypesForExtension( $ext ) { | 
            ||
| 442 | |||
| 443 | /**  | 
            ||
| 444 | * Returns a single MIME type for a given file extension or null if unknown.  | 
            ||
| 445 | * This is always the first type from the list returned by getTypesForExtension($ext).  | 
            ||
| 446 | *  | 
            ||
| 447 | * @param string $ext  | 
            ||
| 448 | * @return string|null  | 
            ||
| 449 | */  | 
            ||
| 450 | 	public function guessTypesForExtension( $ext ) { | 
            ||
| 462 | |||
| 463 | /**  | 
            ||
| 464 | * Tests if the extension matches the given MIME type. Returns true if a  | 
            ||
| 465 | * match was found, null if the MIME type is unknown, and false if the  | 
            ||
| 466 | * MIME type is known but no matches where found.  | 
            ||
| 467 | *  | 
            ||
| 468 | * @param string $extension  | 
            ||
| 469 | * @param string $mime  | 
            ||
| 470 | * @return bool|null  | 
            ||
| 471 | */  | 
            ||
| 472 | 	public function isMatchingExtension( $extension, $mime ) { | 
            ||
| 484 | |||
| 485 | /**  | 
            ||
| 486 | * Returns true if the MIME type is known to represent an image format  | 
            ||
| 487 | * supported by the PHP GD library.  | 
            ||
| 488 | *  | 
            ||
| 489 | * @param string $mime  | 
            ||
| 490 | *  | 
            ||
| 491 | * @return bool  | 
            ||
| 492 | */  | 
            ||
| 493 | 	public function isPHPImageType( $mime ) { | 
            ||
| 507 | |||
| 508 | /**  | 
            ||
| 509 | * Returns true if the extension represents a type which can  | 
            ||
| 510 | * be reliably detected from its content. Use this to determine  | 
            ||
| 511 | * whether strict content checks should be applied to reject  | 
            ||
| 512 | * invalid uploads; if we can't identify the type we won't  | 
            ||
| 513 | * be able to say if it's invalid.  | 
            ||
| 514 | *  | 
            ||
| 515 | * @todo Be more accurate when using fancy MIME detector plugins;  | 
            ||
| 516 | * right now this is the bare minimum getimagesize() list.  | 
            ||
| 517 | * @param string $extension  | 
            ||
| 518 | * @return bool  | 
            ||
| 519 | */  | 
            ||
| 520 | 	function isRecognizableExtension( $extension ) { | 
            ||
| 538 | |||
| 539 | /**  | 
            ||
| 540 | * Improves a MIME type using the file extension. Some file formats are very generic,  | 
            ||
| 541 | * so their MIME type is not very meaningful. A more useful MIME type can be derived  | 
            ||
| 542 | * by looking at the file extension. Typically, this method would be called on the  | 
            ||
| 543 | * result of guessMimeType().  | 
            ||
| 544 | *  | 
            ||
| 545 | * @param string $mime The MIME type, typically guessed from a file's content.  | 
            ||
| 546 | * @param string $ext The file extension, as taken from the file name  | 
            ||
| 547 | *  | 
            ||
| 548 | * @return string The MIME type  | 
            ||
| 549 | */  | 
            ||
| 550 | 	public function improveTypeFromExtension( $mime, $ext ) { | 
            ||
| 592 | |||
| 593 | /**  | 
            ||
| 594 | * MIME type detection. This uses detectMimeType to detect the MIME type  | 
            ||
| 595 | * of the file, but applies additional checks to determine some well known  | 
            ||
| 596 | * file formats that may be missed or misinterpreted by the default MIME  | 
            ||
| 597 | * detection (namely XML based formats like XHTML or SVG, as well as ZIP  | 
            ||
| 598 | * based formats like OPC/ODF files).  | 
            ||
| 599 | *  | 
            ||
| 600 | * @param string $file The file to check  | 
            ||
| 601 | * @param string|bool $ext The file extension, or true (default) to extract  | 
            ||
| 602 | * it from the filename. Set it to false to ignore the extension. DEPRECATED!  | 
            ||
| 603 | * Set to false, use improveTypeFromExtension($mime, $ext) later to improve MIME type.  | 
            ||
| 604 | *  | 
            ||
| 605 | * @return string The MIME type of $file  | 
            ||
| 606 | */  | 
            ||
| 607 | 	public function guessMimeType( $file, $ext = true ) { | 
            ||
| 629 | |||
| 630 | /**  | 
            ||
| 631 | * Guess the MIME type from the file contents.  | 
            ||
| 632 | *  | 
            ||
| 633 | * @todo Remove $ext param  | 
            ||
| 634 | *  | 
            ||
| 635 | * @param string $file  | 
            ||
| 636 | * @param mixed $ext  | 
            ||
| 637 | * @return bool|string  | 
            ||
| 638 | * @throws UnexpectedValueException  | 
            ||
| 639 | */  | 
            ||
| 640 | 	private function doGuessMimeType( $file, $ext ) { | 
            ||
| 824 | |||
| 825 | /**  | 
            ||
| 826 | * Detect application-specific file type of a given ZIP file from its  | 
            ||
| 827 | * header data. Currently works for OpenDocument and OpenXML types...  | 
            ||
| 828 | * If can't tell, returns 'application/zip'.  | 
            ||
| 829 | *  | 
            ||
| 830 | * @param string $header Some reasonably-sized chunk of file header  | 
            ||
| 831 | * @param string|null $tail The tail of the file  | 
            ||
| 832 | * @param string|bool $ext The file extension, or true to extract it from the filename.  | 
            ||
| 833 | * Set it to false (default) to ignore the extension. DEPRECATED! Set to false,  | 
            ||
| 834 | * use improveTypeFromExtension($mime, $ext) later to improve MIME type.  | 
            ||
| 835 | *  | 
            ||
| 836 | * @return string  | 
            ||
| 837 | */  | 
            ||
| 838 | 	function detectZipType( $header, $tail = null, $ext = false ) { | 
            ||
| 927 | |||
| 928 | /**  | 
            ||
| 929 | * Internal MIME type detection. Detection is done using the fileinfo  | 
            ||
| 930 | * extension if it is available. It can be overriden by callback, which could  | 
            ||
| 931 | * use an external program, for example. If detection fails and $ext is not false,  | 
            ||
| 932 | * the MIME type is guessed from the file extension, using guessTypesForExtension.  | 
            ||
| 933 | *  | 
            ||
| 934 | * If the MIME type is still unknown, getimagesize is used to detect the  | 
            ||
| 935 | * MIME type if the file is an image. If no MIME type can be determined,  | 
            ||
| 936 | * this function returns 'unknown/unknown'.  | 
            ||
| 937 | *  | 
            ||
| 938 | * @param string $file The file to check  | 
            ||
| 939 | * @param string|bool $ext The file extension, or true (default) to extract it from the filename.  | 
            ||
| 940 | * Set it to false to ignore the extension. DEPRECATED! Set to false, use  | 
            ||
| 941 | * improveTypeFromExtension($mime, $ext) later to improve MIME type.  | 
            ||
| 942 | *  | 
            ||
| 943 | * @return string The MIME type of $file  | 
            ||
| 944 | */  | 
            ||
| 945 | 	private function detectMimeType( $file, $ext = true ) { | 
            ||
| 1007 | |||
| 1008 | /**  | 
            ||
| 1009 | * Determine the media type code for a file, using its MIME type, name and  | 
            ||
| 1010 | * possibly its contents.  | 
            ||
| 1011 | *  | 
            ||
| 1012 | * This function relies on the findMediaType(), mapping extensions and MIME  | 
            ||
| 1013 | * types to media types.  | 
            ||
| 1014 | *  | 
            ||
| 1015 | * @todo analyse file if need be  | 
            ||
| 1016 | * @todo look at multiple extension, separately and together.  | 
            ||
| 1017 | *  | 
            ||
| 1018 | * @param string $path Full path to the image file, in case we have to look at the contents  | 
            ||
| 1019 | * (if null, only the MIME type is used to determine the media type code).  | 
            ||
| 1020 | * @param string $mime MIME type. If null it will be guessed using guessMimeType.  | 
            ||
| 1021 | *  | 
            ||
| 1022 | * @return string A value to be used with the MEDIATYPE_xxx constants.  | 
            ||
| 1023 | */  | 
            ||
| 1024 | 	function getMediaType( $path = null, $mime = null ) { | 
            ||
| 1101 | |||
| 1102 | /**  | 
            ||
| 1103 | * Returns a media code matching the given MIME type or file extension.  | 
            ||
| 1104 | * File extensions are represented by a string starting with a dot (.) to  | 
            ||
| 1105 | * distinguish them from MIME types.  | 
            ||
| 1106 | *  | 
            ||
| 1107 | * This function relies on the mapping defined by $this->mMediaTypes  | 
            ||
| 1108 | * @access private  | 
            ||
| 1109 | * @param string $extMime  | 
            ||
| 1110 | * @return int|string  | 
            ||
| 1111 | */  | 
            ||
| 1112 | 	function findMediaType( $extMime ) { | 
            ||
| 1140 | |||
| 1141 | /**  | 
            ||
| 1142 | * Get the MIME types that various versions of Internet Explorer would  | 
            ||
| 1143 | * detect from a chunk of the content.  | 
            ||
| 1144 | *  | 
            ||
| 1145 | * @param string $fileName The file name (unused at present)  | 
            ||
| 1146 | * @param string $chunk The first 256 bytes of the file  | 
            ||
| 1147 | * @param string $proposed The MIME type proposed by the server  | 
            ||
| 1148 | * @return array  | 
            ||
| 1149 | */  | 
            ||
| 1150 | 	public function getIEMimeTypes( $fileName, $chunk, $proposed ) { | 
            ||
| 1154 | |||
| 1155 | /**  | 
            ||
| 1156 | * Get a cached instance of IEContentAnalyzer  | 
            ||
| 1157 | *  | 
            ||
| 1158 | * @return IEContentAnalyzer  | 
            ||
| 1159 | */  | 
            ||
| 1160 | 	protected function getIEContentAnalyzer() { | 
            ||
| 1166 | }  | 
            ||
| 1167 | 
In PHP, under loose comparison (like
==, or!=, orswitchconditions), values of different types might be equal.For
stringvalues, the empty string''is a special case, in particular the following results might be unexpected: