DESOSA 2022

PMD - Quality and Evolution

PMD is a powerful code quality analysis tool, but what about the quality of PMD itself? In this essay, we will discuss how PMD’s own quality attributes are fulfilled through the following sections.

Software Quality Processes

The software quality processes of PMD are equipped with a standardized continuous integration (CI) routine establised with a clear guideline1. Developers can easily find directions on how to contribute to bug reports, documentations, and questions in line with the code standard set by PMD. In general, the development process of a contribution to PMD includes building locally, checking code style by checkstyle, creating a pull request (PR), waiting for review, and finally, getting the PR successfully merged. Specifically, the PR will be reviewed by one of the core developers together with five CI checks. The processes are well-ordered and efficient.

Continuous Integration

Since PMD 6.30.0 the project is deployed using GitHub Actions2, a continuous integration and continuous delivery (CI/CD) platform used to automatically build and test in the deployment pipeline. GitHub Actions builds a PR on Windows, Linux, and Mac3. Furthermore, git-repo-sync prevents occasional adhesion of unrelated repositories and deletion of branches that have the same names. An overview of checks in PRs can be found in the figure below.

Figure: Pull Request checks

On top of GitHub Actions, PMD uses Coveralls as a service to keep track of test coverage of the whole project (47% currently) and individual folders.

Figure: Use coveralls to check coverage changes

Moreover, pmd_test is generated as a assistant tool by Danger which is not directly visible in the PR checks. Danger generates violations, errors report to lint the rote tasks in daily code review. In the report 4, it shows the PMD violation, error and config error for openjdk-11, checkstyle and spring-framework. An example of report is #issue 3842:

Figure: Additional checks generated by Danger

Test Processes

After analyzing the repository, we have found that PMD has unit tests for the components in pmd-core and for each specific language in the Language modules. Additionally, there are also basic command line interface tests, and regression tests (using Pmdtester5) to ensure there is no unexpected behavior. To ensure the quality of new rules, a guideline for testing rules is provided in the documentation6. The test coverage of the repository, as calculated by Coveralls, can be found in the figure below.

Figure: Test coverage provided by Coveralls

The low test coverage of pmd-core is mainly caused by PMD.java, which has a coverage of 38.26%. However, most of the untested parts are related to deprecated methods. The test coverage of the two major Language modules of PMD, pmd-apex and pmd-java, is relatively high with 86.41% and 82.31% respectively. For most of the other Language modules, there is room for improvement regarding test coverage.

Hotspot Components

After specifying the standards of continuous integration and test processes, we may expect steady progress to develop and improve the software. However, this is usually not the case: some packages need careful planning, implementation and strict testing, or some files need frequent changes because of low performance or bugs. These components are called hotspots. We will walk through file-level and package-level hotspot components of PMD with the help of two powerful tools, Pydriller7 and CodeScene, and discuss how modularity has become the core of the development process of PMD. Also, we will shed light on serviceability in PMD’s growth.

File-Level Hotspots

From 2002 when the initial commit of PMD was pushed to Sourceforge8, till ten days ago when the most recent pull request was merged9, PMD always centers at PMD.java and supplements functionalities in other files when necessary. It is noticeable that the entry point of the program should adapt to upcoming customer needs, and CodeScene tells that 26 commits on average indeed happen in this central file every year; but if we inspect the details of the following bar charts, we will find the growth and main concentration of PMD: parsing Java syntax, forming an understanding of Java classes/objects, refining other modules, highlighting its power on Java and Apex… What’s the next step of PMD? Within the past two months, a new Java file PmdAnalysis.java was born with 21 new commits1011. Why not just sit back and observe how PMD developers are composing to make their tool more intelligent?

Figure: File-Level Hotspots: Most Frequently Modified Files of PMD

Package-Level Hotspots

If we jump to a higher, package level, no one would overlook the status of pmd-apex since PMD announced its support to Salesforce Apex in 20161213. It might be an indication that the pre-Apex era, where PMD focused on plugin adaptation to IDEs and Java frameworks (e.g. Maven), has ended. Now we are gradually seeing more language-specific modules (pmd-ui, pmd-visualforce, pmd-apex, etc.). The transition from plugins to more languages is not explicitly stated, but the trend is clear.

Figure: Package-Level Hotspots: Most Frequently Modules Files of PMD

Modularity and Serviceability in Hotspots

If modularity has become the norm of software development, then PMD was ahead of the curve. Notably, PMD.java is in the most frequently modified file list 2011-2015 twice. This is because PMD adjusted file structure and created a new module for each of its supported languages14, and correspondingly the file PMD.java was placed to its new destination15. In addition to the package-level modularity, PMD’s file-level modularity plays a core role in welcoming new rules and metrics.

Another observation is that PMD has been increasingly exposed to the press after it provided its support for Apex. We can hardly ignore the serviceability of PMD in this progress which allows the press coverage, and furthermore a tighter collaboration between Apex and PMD that drives pmd-apex to a new hotspot.

Code Quality Analysis

To further investigate the quality of the code, we also analyzed the code using SonarQube, which is an automatic code review tool to detect bugs, vulnerabilities and code smells16.

Figure: Summary of an analysis on the PMD repository by SonarQube

As seen in the figure above, SonarQube found quite a few bugs and vulnerabilities, which result in a low score in reliability and security. However, if we look at the individual modules, we find that most Language modules have an A-score for reliability and security. The overall low score is caused by the cpd (Copy-paste detector) and util modules in pmd-core. These modules contain at least one ‘blocker’ issue, which leads to a high probability to impact the behavior of the application in production17. An example of a blocker bug can be found in the figure below, where the input stream is not closed. While this is not good practice, PMD is a tool that the user runs locally for its own gain, thus this issue might not be as severe as it normally would be.

Figure: A blocker bug in CPD.

The figure below shows the ratings for hotspots mentioned in File-Level Hotspots and Package-Level Hotspots. The B-score for PMD.java in maintainability indicates that the technical debt ratio is between 6 to 10%18.

Figure: Reliability, Security, and Maintainability ratings for hotspots.

The C-score for pmd-apex and pmd-java in reliability indicates that there is at least 1 major bug. The two figures below show that these bugs are related to null-checks, which are relatively easy to fix.

Figure: Reliability bug in pmd-apex.

Figure: Reliability bug in pmd-java.

Quality Culture

In the previous section we have looked into the code quality of PMD. We will now discuss how PMD maintains the code quality by looking at the standard procedure of pull requests of contributions that are considered significant. PMD has guidelines for introducing new rules, languages, and metrics to the tool. These guidelines are necessary to prevent low quality contributions and to reduce reviewing time.

First, for introducing new rules to PMD, there are some ‘rule rules’ (see PMD - System Architecture). These rules are regarded to the rule naming and rule messages, in order to keep the naming consistent throughout the whole system. Furthermore, for a rule to be included in PMD, it needs: no overlap with other rules, broad applicability, solid documentation, and solid benchmarks19. PR #164720 is an example of an introduction of a new rule. The reviewer of this PR mentions that the contribution should contain sufficient test scenarios before it could be merged. Furthermore, it is mentioned that the rule should be defined in the category XML, while preserving the standard format.

Second, the introduction of a new language should follow strict guidelines21, which emphasize the importance of testing and debugging the support for the new language, in order to stay consistent with the quality of the other supported languages. PR #204122 introduces the support of Modelica language. In this PR there are many suggestions that could improve the converted grammar. Besides these small suggestions to the code, a more general review is also provided, mainly regarding the API design principles of PMD.

Third, PMD has an extensive framework for calculating metrics, which makes it convenient for developers to implement new or updated metrics23. PR #48224 introduces a test framework for metrics and improved capabilities for metrics. The first thing noticed by the reviewer is that the contribution caused test failures, since there was a bug introduced during refactoring. Furthermore, there is a discussion with some suggestions to further improve the quality of the code.

Finally, we also looked into some general PRs that involve enhancements and bugfixes25262728293031. Overall, the reviewers are helpful and they make sure that the quality of the contribution is on par with the rest of the code. The reviewers actively discuss the changes with the contributor to clarify uncertainty and suggest improvements. Most of the time two of the maintainers review the pull request, which reduces the possibility of missing a mistake or inconsistency.

Technical Debt

Technical debt is defined as the amount of “cruft” in a system: the aspects of the system that make it more difficult than it could be to extend and adapt the system. Technical debt generally builds up over a project’s lifecycle, being introduced alongside new features. Accordingly, no project exists entirely without cruft, and how it is dealt with is a crucial part of the projects development culture. PMD is no exception, and has had its fair share of technical debt build up over its long life.

However, the main selling point of PMD as a static analysis tool is avoiding the buildup of technical debt through well-defined software development practices (and of course, PMD is run on itself32). It is no surprise then that PMD’s development culture has an emphasis on measuring and paying off technical debt, with equal importance given to creating new features and refactoring existing ones to reduce cruft. The PMD developers keep track of technical debt, and factor it into development decisions. PMD’s new feature task lists even split refactorings into pre and post-implementation, recognizing the importance of reducing the amount of technical debt each new feature introduces. For example, shown here is the post-feature refactoring checklist of a pull request introducing new Java-Grammar functionality33:

Figure: Post-feature refactoring checklist for new Java-Grammar functionality.

As a result of considering the technical debt and refactoring costs of every feature, the PMD project has mitigated a majority of the buildup of cruft. There is however one aspect of PMD’s system architecture that has not avoided technical debt: API design. As mentioned in previous essays and in Hotspot Components, PMD has not evolved in a vacuum: it has needed to add support for new languages, plugins, integration tools, parsers, and more. All of these modifications have required additions to PMD’s API (PMD.java and PmdAnalysis.java), causing a buildup of complexity that could not easily be refactored without breaking user configurations. To put it succinctly34:

Figure: The reason for PMD 7

Even if the technical debt caused by the API cannot easily be removed, the PMD developers have put considerable effort into keeping track of it. Aside from dedicated issues tracking such changes35, it is immediately obvious in the PMD code base as walls of deprecated functions and “@internal” tags:

Figure: A wall of deprecated functions in PMD.java, the entry point for the command line API.

Overall, despite a strong cultural emphasis on mitigating cruft, the PMD project is currently undergoing a major release specifically to pay back a decades worth of technical debt.

References


  1. PMD Developers. (2022). How to contribute to PMD. Retrieved March 21, 2022, from https://github.com/pmd/pmd/blob/master/CONTRIBUTING.md ↩︎

  2. GitHub. (2022). Understanding Github Actions. Retrieved March 21, 2022, from https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions ↩︎

  3. PMD Developers. (2022). System Build Checks - GitHub Actions. Retrieved March 21, 2022, from https://github.com/pmd/pmd/runs/5409822890?check_suite_focus=true ↩︎

  4. (2022). PMD Regression Tester Report. Retrieved March 21, 2022, from https://chunk.io/pmd/5839b187a7914cd1b92f809fd0fce6a6/diff1/index.html ↩︎

  5. PMD. (2019). PMD 7.0.0 development. Retrieved March 21, 2022, from https://pmd.github.io/pmd-6.43.0/pmd_devdocs_pmdtester.html ↩︎

  6. PMD. (2022). Testing your rules. Retrieved March 21, 2022, from https://pmd.github.io/latest/pmd_userdocs_extending_testing.html ↩︎

  7. Davide Spadini, Maurício Aniche and Alberto Baccheli. (2018). PyDriller: Python framework for mining software repositories. Retrieved March 21, 2022, from http://dl.acm.org/citation.cfm?doid=3236024.3264598 ↩︎

  8. PMD. (2002). Release Notes Version 0.1. Retrieved March 21, 2022, from https://pmd.github.io/latest/pmd_release_notes_old.html#june-25-2002---01 ↩︎

  9. PMD Developers. (2022). Pull Request #3836 - Make TOC scrollable when too many subheadings. Retrieved March 21, 2022, from https://github.com/pmd/pmd/pull/3836 ↩︎

  10. PMD Developers. (2022). PmdAnalysis.java - File History. Retrieved March 21, 2022, from https://github.com/pmd/pmd/commits/master/pmd-core/src/main/java/net/sourceforge/pmd/PmdAnalysis.java ↩︎

  11. PMD Developers. (2022). PmdAnalysisBuilder.java - File History. Retrieved March 21, 2022, from https://github.com/pmd/pmd/commits/f943fa85ad8c4d2152da11f85b9ef9072672e6fa/pmd-core/src/main/java/net/sourceforge/pmd/PmdAnalysisBuilder.java ↩︎

  12. PMD Developers. (2016). Pull Request #86 - Added language module for Salesforce.com Apex incl. rules ported from Java and new ones. Retrieved March 21, 2022, from https://github.com/pmd/pmd/pull/86 ↩︎

  13. PMD. (2016). Release Notes Version 5.5.0. Retrieved March 21, 2022, from https://pmd.github.io/latest/pmd_release_notes_old.html#25-june-2016---550 ↩︎

  14. PMD. (2014). Release Notes Version 5.2.0. Retrieved March 21, 2022, from https://pmd.github.io/latest/pmd_release_notes_old.html#october-17-2014---520 ↩︎

  15. PMD Developers. (2014). Commit “Rename pmd -> pmd-core and pmd-aggregate -> pmd”. Retrieved March 21, 2022, from https://github.com/pmd/pmd/commit/f748539c45aa4a042c022ca52c75d0e2d947534e ↩︎

  16. SonarQube. (2021). SonarQube Documentation. Retrieved March 20, 2022, from https://docs.sonarqube.org/latest/ ↩︎

  17. SonarQube. (2021). SonarQube Documentation: Issues. Retrieved March 20, 2022, from https://docs.sonarqube.org/latest/user-guide/issues/ ↩︎

  18. SonarQube. (2021). SonarQube Documentation: Metric Definitions. Retrieved March 21, 2022, from https://docs.sonarqube.org/latest/user-guide/metric-definitions/ ↩︎

  19. PMD. (2021). Guidelines for standard rules. Retrieved March 18, 2022, from https://pmd.github.io/pmd-6.43.0/pmd_devdocs_major_rule_guidelines.html ↩︎

  20. PMD Developers. (2019). Pull Request #1647 - Rule to detect overly verbose array initialization. Retrieved March 18, 2022, from https://github.com/pmd/pmd/pull/1647 ↩︎

  21. PMD. (2020). Adding PMD support for a new language. Retrieved March 18, 2022, from https://pmd.github.io/pmd-6.43.0/pmd_devdocs_major_adding_new_language.html ↩︎

  22. PMD Developers. (2019). Pull Request #2041 - Initial implementation for PMD. Retrieved March 18, 2022, from https://github.com/pmd/pmd/pull/2041 ↩︎

  23. PMD. (2020). Adding support for metrics to a language. Retrieved March 18, 2022, from https://pmd.github.io/pmd-6.43.0/pmd_devdocs_major_adding_new_metrics_framework.html ↩︎

  24. PMD Developers. (2019). Pull Request #482 - Metrics testing framework + improved capabilities for metrics. Retrieved March 18, 2022, from https://github.com/pmd/pmd/pull/482 ↩︎

  25. PMD Developers. (2018). Pull Request #828 - Add operations to manipulate a document. Retrieved March 20, 2022, from https://github.com/pmd/pmd/pull/828 ↩︎

  26. PMD Developers. (2018). Pull Request #1075 - Rework benchmarking code. Retrieved March 20, 2022, from https://github.com/pmd/pmd/pull/1075 ↩︎

  27. PMD Developers. (2017). Pull Request #479 - Typesafe and immutable properties. Retrieved March 20, 2022, from https://github.com/pmd/pmd/pull/479 ↩︎

  28. PMD Developers. (2017). Pull Request #679 - Token scheme generalization. Retrieved March 20, 2022, from https://github.com/pmd/pmd/pull/679 ↩︎

  29. PMD Developers. (2018). Pull Request #1182 - XPath AutoComplete. Retrieved March 20, 2022, from https://github.com/pmd/pmd/pull/1182 ↩︎

  30. PMD Developers. (2018). Pull Request #958 - Refactor how we ignore annotated elements in rules. Retrieved March 20, 2022, from https://github.com/pmd/pmd/pull/958 ↩︎

  31. PMD Developers. (2022). Pull Request #3693 - ApexDoc: Add reportProperty property. Retrieved March 21, 2022, from https://github.com/pmd/pmd/pull/3693 ↩︎

  32. PMD Developers. (2022). Pull Request #3657 - Dogfood for PMD 7. Retrieved March 18, 2022, from https://github.com/pmd/pmd/pull/3657 ↩︎

  33. PMD Developers. (2020). Pull Request #2701 - Java-grammar progress document. Retrieved March 18, 2022, from https://github.com/pmd/pmd/issues/2701 ↩︎

  34. PMD. (2022). PMD 7.0.0 development. Retrieved March 18, 2022, from https://pmd.github.io/pmd/pmd_next_major_development.html#new-api-support-guidelines ↩︎

  35. PMD Developers. (2020). Pull Request #881 - Breaking API changes for 7.0.0. Retrieved March 18, 2022, from https://github.com/pmd/pmd/issues/881 ↩︎

PMD
Authors
Jerrit Eickhoff
Jason Qiu
Bailey Tjiong
Liang Zhang