Find duplicate code with CPD

A copy/paste detector

I’m sure that you know PMD, the famous static code analyzer. But have you ever tried CPD?

Installation

CPD belongs to PMD. You have to install PMD first.

 $ cd $HOME
 $ wget https://github.com/pmd/pmd/releases/download/pmd_releases%2F5.8.1/pmd-bin-5.8.1.zip
 $ unzip pmd-bin-5.8.1.zip
 $ alias pmd="$HOME/tools/pmd-bin-5.8.1/bin/run.sh pmd"
 $ alias cpd="$HOME/tools/pmd-bin-5.8.1/bin/run.sh cpd"

Command line usage

I give you some examples here.

 cpd --minimum-tokens 20 --language java --files src > cpd-20.txt
 cpd --minimum-tokens 20 --format csv --files src > cpd-20.csv

minimum-tokens (minimum duplicate size) and files (source directory) are the only required options. Then you can specify the language and the report format. What I really like is the csv format. I sort the results in Excel (tokens) and attack the Top 5 duplicates.

Conclusion

CPD helps you to find copied and pasted code. Duplicate code tells you how to improve the code structure.

Links

https://pmd.github.io/pmd-5.8.1/usage/cpd-usage.html