Checkstyle: Architecture for Extensibility

As discussed in our previous essay, The Checkstyle Vision: Keeping Your Code in Check, Checkstyle is a static analysis tool that can help a Java development team keep the same coherent code style across their project, which keeps the code readable, maintainable, and consistent. This essay will talk about the core architecture of Checkstyle. Firstly, we will talk about the main design pattern used in Checkstyle: the visitor pattern. Then, we will go over four different levels of overview: the container view, the component and connector view, the development view, and the runtime view. These four levels give a top to bottom overview of the project’s architecture. Due to the nature of the project, some will be discussed in more detail than others. Then, we will go over key quality attributes of the system, and how they are visible from the architecture. Finally, we will discuss how Checkstyle used API principles in their architecture.

The Visitor Pattern

The main design pattern that Checkstyle uses to organize its codebase is the Visitor Pattern1. The vast majority of checks in Checkstyle operate on the level of the Abstract Syntax Tree (AST)2. Instead of writing checks on plain Java files, Checkstyle first parses the files into an AST. This AST provides a structured overview of the Java file.

We could define a function, check(), for each node in the AST. When this function is called, it should perform all the checks that are relevant for this node, and raise any violations it finds. However, this approach is limited. We need to make changes to the AST structure to change functionality. Therefore, we cannot easily extend or adapt functionality; all checks become dependent on each other. Especially when performing many different checks, this can become messy.

What we need is a way to separate the algorithm from the object structure it operates on: the visitor pattern. Checkstyle defines visitor classes that implement the code style checks. Then, the AST is traversed using a depth-first traversal, and the visitors ‘visit’ the AST nodes, and perform their Checks. This way, it is trivial to add more Checks - just add more visitor classes -, and keep these checks independent.

In the simplest version of the Visitor pattern, every visitor class defines one method: visit(). However, Checkstyle implements a slightly more complex version. For every visitor, the following methods are called during the traversal of the AST:

  • beginTree(): This method is called before any part of the tree is parsed, to allow the Visitor to run some initialization code.
  • visitToken(): The normal visit method. This is where the main check is likely performed.
  • leaveToken(): When the entire subtree is visited and the traversal is backtracking through the tree, the leaveToken() method is called.
  • finishTree(): This method is called when the entire tree has been parsed.

Using the Visitor pattern allows Checkstyle to remain configurable, maintainable and adaptable, while keeping different check modules well encapsulated and independent.

Container View

Checkstyle does not truly have containers to speak of. It consists of a single monolith that can only be deployed on its own. However, it is connected to external components. Checkstyle takes in Java files from other projects to check, and it can be called by the outside world, either directly or through a plugin. Because of its monolithic architecture, we have not created a diagram for the container view.

Component and Connector View

Internally, the project consists of two separate components. There is of course the main application that checks the code style. This main application consists of multiple parts, which will be discussed in the Development view below. However, there is one more separated component. It’s possible that even a developer who has used Checkstyle for years is not aware of its existence. We’re talking about the Checkstyle GUI. The GUI can display the abstract syntax tree generated by Checkstyle. The application does not support any actual code checks yet; it is mostly useful for people contributing to Checkstyle, or developers building their own checks. To generate the abstract syntax tree, both the GUI and the main application call the JavaParser class, which turns given Java code into an AST.

So how are these main components connected? All connections between the components discussed before are pretty straightforward. The Java files to check are read by Checkstyle. Both the GUI and main process make use of it. The files are also read by the external plugins. All other connections are done with calls. The main process and the GUI both call the JavaParser, and external plugins call the main process.

An overview of the components and connectors can be seen below in the figure below.

Figure: The Checkstyle Component and Connector view.

Development View

While the GUI can be a useful tool, the real bread and butter of Checkstyle is its main component. This component contains the main functionality of the Checkstyle project, and is designed with extensibility in mind: developers should be able to easily write their own code style checks. For the development view, we focus on the parts of the codebase that are useful for this process.

The primary class is the Checker class. This class has a list of FileSetCheck, which is an abstract basis for all code style checks. It defines the function called process, which will process a code file using the textual content of this file. Any check on code files can be implemented as a subclass of this FileSetCheck to facilitate easy integration. Examples of built-in checks that use this pattern are the check for maximum line lengths and the check for indentation using tabs or spaces.

Some checks require more information about the code; they have to operate on the syntax tree of the program instead of the textual representation. Examples of these could be checks on the cyclomatic complexity or unused imports in the code. To ease the creation of such checks, Checkstyle provides a TreeWalker class, a special subclass of the FileSetCheck . As the name suggests, this class takes care of “walking” over the abstract syntax tree, which is created with the JavaParser class. This is also where the visitor pattern is applied. For every node in the syntax tree, the TreeWalker can execute some checks. These checks all inherit from the AbstractCheck class, which also serves as the abstract class for the visitor pattern described before.

By extending one of the two abstract classes, FileSetCheck for file based checks and AbstractCheck for AST checks, one can quite easily implement new checks for code style according to their own requirements. The architectural design of Checkstyle greatly increases the extendability of the tool, making it much more versatile.

Checkstyle also offers several utility packages that may be used in checks or are used in (parts of) the visitor pattern. An example of this is the TokenUtil class, which provides tools which can be used by both the GUI or checks to get information out of an AST node. These utility classes generally have simple functions that are used in all parts of the system.

An overview of the development view can be seen below the figure below.

Figure: The Checkstyle Development view.

Runtime View

Checkstyle’s primary software architecture is based on the visitor pattern. This pattern is at the core of the runtime process of Checkstyle. The visitor pattern allows checks to be run independently, without getting in each other’s way. It also allows for efficient checks on the abstract syntax tree, as the source code does not have to be checked for every check. More details about the visitor pattern were discussed earlier.

Key Quality Attributes

Extensibility is a core attribute that shines through in the software architecture of Checkstyle. Developers of Checkstyle and even developers of other systems that use it should be allowed to extend Checkstyle, and add their own checks. Through the visitor pattern, they can be assured that any extension they make will not create issues in other checks. This is also noticeable in the development view. Since checks are in a separate package and all extend from the same abstract class, it is easy to add more of them.

A lot of Checkstyle’s decisions revolve around the reliability of their software. As discussed in our previous blog post; the implications of both false positives and false negatives are large. Ensuring reliability is done in many ways. Not only are tests written for every line of code that Checkstyle has, but also does Checkstyle run a vast collection of different tools to statically analyze their software.

While this is all great, the cost of this is not negligible. Setting up tools, writing tests, maintaining everything are things that consume lots of time, hence the affordability of this approach takes a hit. To achieve their goals, Checkstyle uses a large collection of different dependencies, which in turn also have their own dependencies again.

Relying up to a certain degree upon open-source libraries seems logical with regard to the affordability, as writing and testing everything from scratch is not only difficult, but also cost-inefficient. Unfortunately, using third party dependencies possibly leads to decreased efficiency, predictability, and security. By doing some time analysis on the Checkstyle code, we found that the processor spends a significant amount of time on methods from other libraries. This is not a problem on its own, but the implementation provided by the library might not be as efficient as writing your own, as the library’s code may not have been written with (only) your use case in mind. If you choose to write your own, you gain back more control over the system, increasing the efficiency and predictability. This does however come at a cost of maintainability, as you need to maintain this new piece of code. Lastly, the security is something the Checkstyle team must have considered; depending on third-party libraries adds additional ways of malicious code to end up in your software. Potential security risks in dependencies are probably never seen nor reviewed by any Checkstyle developers.

Overall, it is clear that Checkstyle is making some important trade-offs, but they seem to strike a healthy balance between maintainability, affordability, and reliability. For this project, reliability is of the utmost importance, and it is acceptable to take hits in affordability and efficiency. These hits are however minimized through intelligent software design and thorough reviews of the system.

API Principles

Checkstyle stresses the importance of extensibility for the developer. The types of checks developers need for their system can be diverse. It is important that even starting developers would easily be able to make sense of the API. To ensure this, Checkstyle documents the check creation logic quite well on a designated page3.

Throughout Checkstyle’s software architecture there is one thing that jumps out the most: extensibility. In the development view we have seen that Checkstyle allows for easy creation of new checks by making sure that checks are not connected to each other. This is also visible in runtime through use of the visitor pattern, which assures that checks are run on each node of the abstract syntax tree, and makes sure that these checks do not interfere with each other.

References


  1. https://en.wikipedia.org/wiki/Visitor_pattern ↩︎

  2. https://en.wikipedia.org/wiki/Abstract_syntax_tree ↩︎

  3. https://checkstyle.org/writingchecks.html ↩︎