DESOSA 2022

Checkstyle: Quality and Evolution

For a project like Checkstyle, being able to assure some quality standards is both beneficial for developers and users, as it has the advantage of likely making the software more reliable, as well as creating a project that is easier to maintain. Checkstyle makes an attempt to ensure quality by going through several measures. In this essay, we will be evaluating the actions the Checkstyle team has chosen to ensure quality, analyse them together with some discussion, and find possible reasons for their choices.

Current State of Key Quality Attributes

As discussed in our previous essay, Checkstyle: Architecture for Extensibility, there are numerous key quality attributes that have the main focus of the Checkstyle project. These mainly include reliability, extensibility, and maintainability. To start with reliability, it is trivial to understand what the implications of unreliable software can be, hence the importance. In Checkstyle, unreliable software could show unexpected and unwanted behaviour in some specific cases. Something that Checkstyle does to measure this is regression testing. For every change, Checkstyle is run on a given open-source project, both before and after it. More on this will be discussed later in this essay, but the main point here is that by going through the effort of incorporating such time-consuming and intensive tasks into your pipeline, one is at least doing their best for ensuring reliable software.

Extensibility is one of the other main key attributes of Checkstyle. Since this is about the ease of extending the existing codebase, it goes hand-in-hand with maintainability. While this is difficult to measure in an exact sense, a possible way to measure this would be to look at how many issues people have with writing their own tests. A measurable indicator hereof could be the number of GitHub issues regarding custom test creation. If many such issues exist and remain unsolved, one could argue that the extensibility might not be as well-designed as intended. However, while many such issues do exist #4869, #3523, #619 (these are just examples, a lot more were found), only a minor amount of them remain unsolved. Therefore, we argue that the extensibility and maintainability are quite well satisfied.

Overall Software Quality Processes, Continuous Integration, and Test Processes

As one would expect, a tool built for ensuring code quality also has a high standard for its own code. Different methods are used to ensure this standard is attained to.

For all potential code changes, in total 74 automatic checks are run. One of these checks is Checkstyle itself. Doing this sounds logical, but it is also vulnerable to false negatives. To mitigate this risk, various other tools are run to double-check the code is up to standard:

Tool Purpose
Checkstyle Static analysis
SonarQube Static analysis
PMD Static analysis
Spotbugs Static analysis
XML plugin Verify XML file format
Modernizer Disallow legacy API calls
Forbidden API checker Disallow specific API calls
JUnit Testing
Failsafe Integration testing
Pitest Mutation testing
Jacoco Code coverage by tests

All of these plugins are executed automatically using Continuous Integration (CI). To ensure compatibility, this CI pipeline is run on Ubuntu, MacOS, and Windows. Furthermore, some of these checks are executed on multiple different JDK versions. What is interesting to note is that multiple different external platforms are used to run the pipelines. Examples include Azure, Cirrus, Drone, and Travis.

What stands out is the strictness of the ruleset. For example, the Jacoco configuration requires 100% line coverage on most files. Exceptions to all rules are clearly documented. If some files are excluded from a certain rule, there is a brief comment explaining why this is the case (e.g. the files are generated and shouldn’t be touched, or there is a specific reason why the default rules do not work on this specific file).

Once all external tools pass, there are two more steps to go through. Firstly, one or more of the core maintainers of the tool will review the pull request thoroughly. In our experience with working on the Checkstyle repository these reviews go very deep, discussing the tiniest of potential improvements that could be made to the suggested change(s). Once the reviewers agree with the change set, one last automatic check is run. Checkstyle will be run on some open source project, once with the current code and once with the suggested changes. Any differences between these reports should be the intended result of the pull request. The Checkstyle Tester tool1 is used to facilitate this process.

Hotspot Components

We have analysed hotspots using the online tool CodeScene2. We have run one analysis for commits made over the past six months, and one for commits made since the start of the project. The images can be seen below. One of the main things that is noticeable is the fact that the entire checks package, where the independent checks reside, is mostly stable recently, compared to how it was in the beginning. The core parts of the project, which are not in any subpackage, and the API package have also mostly settled down, but there are a couple of classes that are still hotspots in the past six months.

Figure: Hotspot overview over the past 6 months.

Figure: Hotspot overview over all time.

The biggest class hotspot recently is the file api.TokenTypes. This file is practically a wrapper for an enum. The enum itself could not be used directly due to issues in circular dependencies. The reason this was identified as a hotspot is the numerous long comments with examples that have been updated recently. The actual lines of code have not been edited in the past six months.

One recent hotspot that does have actual code is the JavaAstVisitor class. This is a large and important class of more than 2000 lines, that is responsible for abstract syntax tree visiting nodes and running checks. Even though this is a hotspot, it only has six commits over the past six months; this shows us that most of Checkstyle’s codebase is quite stable. Within the checks package, the hotpots seem to be approximately spread between different types of test. No single type of test has gotten significantly more attention than others.

Looking at the current open issues, we do not believe there will be any significant new hotspots any time soon; the open issues seem to cover a wide range of files.

Code Quality, Focused on Hotspot Components

Let’s get the obvious thing out of the way: Checkstyle’s code style is excellent. Of course, this is only a small part of code quality. We have also looked at coupling, complexity, cohesion, and file length using CodeMR3. The figure below shows an overview of the whole Checkstyle project.

Figure: Code quality.

Focusing on the recent JavaAstVisitor hotspot identified earlier, codeMR identifies a low complexity, low-medium level of coupling, medium-high issues in cohesion, and (as you would have guessed from a 2000 line file) high issues in size. The complexity and coupling are not issues, but a high and medium-high file size and lack of cohesion can tell us that this class might be trying to do too many things at once. The Checkstyle developers could consider splitting off parts of the code into different classes. This would reduce the file size and improve cohesion, though it would increase coupling, as the new class(es) would be highly coupled with the current class.

This hotspot class is not the worst offender however. Two problematic classes are Main and Checker, both with more than medium-high problems in all metrics. These two classes form the interface of the Checkstyle project; they take in the files and return a list of check violations. There is also a large number of Javadoc-related check files with problems. 5/10 files in the top ten most problematic classes are related to this. The issues for these files are mostly in the complexity and lack of cohesion metrics. Checkstyle could consider an overhaul for Javadoc checks, though this would be a major undertaking, as Javadoc checks form a large part of Checkstyle.

The Checkstyle Quality Culture

According to Meyer and Reniers 4, we can evaluate the quality culture of Checkstyle in three different domains: people, procedures and technology. While this model is used by Meyer and Reniers4 to discuss safety culture, the model can be easily and accurately adapted to quality culture. In all three domains, we evaluate what actions have been taken to influence the quality culture, to avoid and prevent unwanted events, and mitigate their consequences.

Technology

We have already discussed how Checkstyle uses a large number of tools to ensure that the codebase remains functional, clean, and well tested. In addition, this approach plays an indirect but significant role in influencing quality culture for two reasons. Firstly, it clearly and objectively signals that this is a project that cares about quality. Secondly, it forces contributors to think about quality. If they don’t, their contributions will not even be considered.

There is one potential downside to this approach however: contributors might assume that if the CI passes, their code must be of high quality. This is not necessarily true - it is important to always think critically about quality. For this, Checkstyle uses strict PR review procedures.

Procedures

While this procedure is not written down in its entirety, evaluating a number of architecturally significant issues and pull requests shows that it is, and has been for quite some time, strictly adhered to.

The Checkstyle review procedure is as follows: when a PR is made, a maintainer first makes sure that it adheres to the strict contributing guidelines[^contributing]. Then, changes are requested based on the content of the PR. Interestingly, these comments are most often about code style, tests, and documentation: the actual content has often already been decided on in the discussion in the issue. When the reviewer eventually accepts the changes, they assign the next reviewer. Pull requests that make significant changes are always reviewed by at least 3 maintainers, though the procedure is so thorough that often the second and third reviewers have nothing to add.

People

It is clear that the maintainers value the quality of the code highly, as well as the quality culture. Contributors are also encouraged (or forced) to consider this. This is not only done by the quality culture, but the entire pipeline from issue to merged PR is designed in such a way that it forces contributors to think about code quality. All in all, this results in a quality culture that values solid code quality, automated checking, and thorough evaluations.

Assessment of Technical Debt

While in general there is not much evidence for large technical debt, there are some interesting historical design choices that have led to difficulties at current time. An interesting example of this is issue #11166. In this issue, numerous tests are stated that make use of the getFileContents() method. While this method allows the creators of tests to traverse the contents of the file directly, this is in conflict with the intended idea of the design of Checkstyle; it should be traversing the AST instead. While this is an understandable decision, as this made the creation of these tests a lot easier, it has brought Checkstyle to the point where it now has to provide both support for the AST-based method, as well as the file traversal based method. The getFileContents() method has been made deprecated, but in the past 3 months, only one of the twenty-one stated tests was rewritten. This is partly caused by the fact that the developers who seem to have the required knowledge, are all working on high priority issues instead, as well as maintaining Checkstyle in general. This is the logical consequence of thinking; “we will do this correctly later when we have time.” However, it is highly unlikely to ever find this time. Other than this and some other minor issues we found to be unsolved as a result of poor historical design choices, there does not seem to be any evidence of large technical debt. Therefore, we believe the Checkstyle project is doing quite a good job here. An advantage that Checkstyle has is its maturity and thoroughness; the years of active development have likely caused a lot of the originally poor design choices to be rethought and fixed.

Conclusion

Overall, the Checkstyle pipeline seems to be designed with high quality assurance in mind. The various different tools that this pipeline includes, together with its strict quality culture, seems to have led to a project where quality control is central for all developers and contributors. As a result of this, we have discussed possible technical debt, but found this to be quite minimal. While there exist some minor issues, this is to be expected from a large-scale project like Checkstyle. We therefore do not believe this to be a direct implication of insufficient quality control.


  1. https://github.com/checkstyle/contribution/tree/master/checkstyle-tester/ ↩︎

  2. https://codescene.io/ ↩︎

  3. https://www.codemr.co.uk/ ↩︎

  4. Meyer, T., & Reniers, G. (2016). Engineering risk management. De Gruyter. ↩︎

Checkstyle
Authors
Sander Gielisse
Otto Kaaij
Ricardo Vogel
Yoshi van den Akker