March 26 Johannes Schmitt schmittjoh

Finding Duplicated Code in PHP reloaded

Code duplication is generally considered a bad programming style. While it might be appealing and in certain situations faster to duplicate certain code fragments, it often comes at the cost of increased maintenance efforts like harder to change code. Usually, duplicated code is introduced either by copy/pasting. In large code bases, it could also happen that two developers write similar code without even knowing about it.

In PHP, Copy/Paste Detector so far could be used for detecting code clones. On Scrutinizer, it was enabled by default on all PHP projects. PHP Copy/Paste Detector operates on the token stream which is generated by PHP’s token_get_all function and uses string hashing techniques to find duplicates. This makes it quite good at detecting large literal code duplications. However, often developers modify copy/pasted code slightly which could then not be detected anymore.

To overcome the current shortcomings, we are happy to introduce a new tool for duplicated code detection, PHP Similarity Analyzer. PHP Similarity Analyzer is based on latest academic research combined with our in-depth practical experience. It has been evaluated against a range of open-source and closed-source projects and its results are already very promising. In contrast to PHP Copy/Paste Detector, it is robust against code modification and also finds smaller code fragments which make very good targets for refactoring. Besides, it not only detects copy/pasted code, but also semantically similar code.

You can enable PHP Similarity Analyzer on your repository with the following configuration:

tools:
    php_sim: true

    # PHP Similarity Analyzer and Copy/paste Detector cannot be used at
    # the same time right now. Make sure to either remove, or disable one.
    php_cpd: false

Check it out, and let us know what you think.

Happy inspecting :)

 

Have Feedback? Tweet to @scrutinizerci

If you experienced a bug or have any questions, please send them to [email protected].