DESOSA 2022

Essay 3

The degree to key quality attributes satisfied

Key quality attributes that make Ghidra unique are extensions (such as plugins), Ghidra server for multi-user collaboration and the GUI.

  • Extensions: Ghidra is currently highly extensible, and a large portion of it is composed of plugins. One large portion of changes are related to development and fixes of extensions. By supporting the option to extend Ghidra according to one’s personal needs, Ghidra becomes a multifunctional platform that can be used in many different contexts.

  • Ghidra server: Ghidra started with a highly collaborative Software Reverse Engineering (SRE) tool for use inside the NSA, which is an already completed feature that has had no major changes in recent versions. It allows for cooperation across larger teams that together use reverse engineering as a way to inspect malware.

  • GUI: The Ghidra GUI is the default way of using Ghidra. It is easy to use, even for beginners. This allows software engineers new to the field to use the tool as a way to get accustomed with reverse engineering. There also exists various extensive guides that help one get to know the system, ranging from ones aimed at beginners, to ones aimed at more expert users.

The most traditional way of using Ghidra: via the GUI

Overall software quality process

Ghidra follows the general code quality control process. Principles include minimal reasonable changes, not accepting repackaging, renaming and other refactorings, ensuring complete testing, avoiding global replacement for code styles, and a greater focus on fixing bugs found in actual use and testing as described in CONTRBUTING.MD in the Ghidra repository.

Before each Pull Request (PR) is merged, reviewers (who can be anyone) will be looking for potential problems like threading issues, performance impact, API design, duplication of existing features, readability and code style, avoiding bloat (scope creep), etc. The reviewers can then ask the committer to implement some changes if things are not according to the quality control process. After the changes are implemented and the reviewers are satisfied, someone (with the privileges, since not everyone is allowed to merge) will merge the PR to main.

Testing, and its role in contributing

Part of the motivation behind making a tool such as Ghidra open-source is that it allows anyone to attempt to make a meaningful contribution. Ghidra encourages this and provides contributors with some tips to streamline the process of making a contribution.

Testing is an important part of making sure that your contribution is worthwhile. Ghidra uses Gradle to streamline the testing process. Ghidra mainly consists of two different types of tests; unit tests and integration tests. This allows us to see both how individual modules work, and how the different modules function when working together.

To run both unit and integration tests and to generate a report, one can use the following command:

gradle combinedTestReport

To make a useful contribution, Ghidra suggests that one should focus mainly on bug fixes that are discovered through real-world usage, i.e. things one may have run into while using the tool themselves. Moreover, it goes without saying that a new piece of code should be well-tested and not break any of the existing tests run by Gradle. Since the project is launched on GitHub, once someone submits a pull request with some suggested change/addition/bugfix, reviewers can ask questions and propose changes. Only authorized users (which are people working on Ghidra as part of the NSA) can then accept the request and merge it with the main branch.

Gradle logo

Continuous Integration

Ghidra is a large and complex system with many dependencies. Its build process is not straightforward and requires dependencies that need to be downloaded manually. As of right now, the GitHub repository of Ghidra does not consist of any continuous integration pipeline.

However, there have been initiatives from contributors to automate the build process. The idea is that by releasing some development builds, developers will be able to improve the linting of their code, and end-users are able to try out the system without too much hurdle. However, the Ghidra team has expressed that the automation of the build process is not something they currently are working towards.

This does however not mean that it is not something that could be added in the future. Platforms such as Qodana allow for the monitoring of code quality, fully integrated into GitHub’s CI/CD pipeline and may be helpful in keeping track of the quality of an expensive system such as Ghidra.

One possible way to streamline the development process can be to get support from some external Continuous Integration and Continuous Delivery (CI/CD) platform such as CircleCI. Such a platform automates the building, testing and deploying process and releases new versions of the code continuously.

CI/CD process

Hot Components

To get an idea of where the main activity currently lies in the Ghidra project, we have analyzed the number of commits of different directories within the code over the past month.

Directory # of Commits
Debug 19
Extensions 0
Features 67
Processors 16
RuntimeScripts 4
Test 5

It can easily be concluded that the main activity currently resides within the Features directory of the code. An important component that resides in this directory, home to roughly 30 percent of its commits, is Ghidra’s Sleigh decompiler. Sleigh is a specification language designed specifically for Ghidra and provides support for many different processors. It translates any machine code into a universal p-code, allowing developers to write algorithms for all processors without having to worry about the specifics of each individual model.

Another directory of the code that contains a moderate amount of activity is Debug. Ghidra’s debugger is a relatively new feature as it has been released in the past year, and allows us to debug Ghidra itself. It makes use of the GNU Debugger, and with just a simple command, we can debug for example the decompiler.

gdb path_to_repo/Ghidra/Features/Decompiler/build/os/linux64/decompile

The code quality, with a focus on hotspot components

The code quality plays a role for better understanding, easier maintenance, and low-cost extending of software products. To analyze the code quality of Ghidra, several aspects should be taken into consideration:

  • Readability: In general, Ghidra provides code in a highly understandable fashion with clear naming convention and proper modularity, showing simplicity.

  • Extensibility: Ghidra is highly extensible, since it’s plugin-based in nature. Users can extend the functionality of Ghidra in various ways based on the Ghidra plugin skeleton.

  • Maintainability: Ghidra is a well-structured project, with a clear project folder and package structure, along with a good naming scheme and using inheritance.

  • Documentation: Rather detailed documentation and user manual aligning well with the source code, making the user & the developer’s life better.

Issues related to code quality do exist in the source code, which will be focused on in the remaining part of this section.

Code quality can be categorized into two parts, namely static code quality and runtime code quality, focusing on key-quality-attributes-related (those related to plugins, GUI, and server) components. For runtime quality, Ghidra’s developers have been working hard over the past two decades to make sure it works with full functionality and good reliability. To this end, the analysis focuses on the static code quality. We make use of codacy to find out code quality issues. Some typical code quality issues, their location and comments are listed below:

Source File Line(s) Comment
Ghidra/Debug/…/gui /memview/MemviewMap.java 21 Avoid unused private fields
Ditto 41 Avoid reassigning parameters such as ‘offset’ make the code more difficult to understand. The code is often read with the assumption that parameter values don’t change and an assignment violates therefore the principle of least astonishment.
Ghidra/Debug/…/gui/ target/DebuggerTargets PluginScreenShots.java 41 Document empty method body By explicitly commenting empty method bodies it is easier to distinguish between intentional (commented) and unintentional empty methods.
Ditto 45 Fields should be declared at the top of the class, before any method declarations, constructors, initializers or inner classes.
Ditto 97 JUnit tests should include at least one assertion. This makes the tests more robust, and using assert with messages provide the developer a clearer idea of what the test does.
Ghidra/Features/… /gui/provider/matchtable/ NumberRangeSubFilterChecker.java 74 Avoid unnecessary if-then-else statements when returning a boolean. The result of the conditional test can be returned instead.
Ghidra/Debug/…/plugin /core/debug/gui/time /DebuggerTimeProvider.java 136 Avoid unused method parameters.
Ghidra/Features/…/gui/provider/ VTFunctionReferenceCorrelator_ELF_Test.java 27 The class name doesn’t match ‘[A-Z][a-zA-Z0-9]*’.
Ghidra/Processors/…/plugin/core/ analysis/MipsPreAnalyzerTest.java 87 assertTrue(!expr) can be replaced by assertFalse(expr). Avoid negation in an assertTrue or assertFalse test.
Ghidra/Test/…/screenshot/ ComputeChecksumsPluginScreenShots.java 26 Avoid unnecessary constructors – the compiler will generate these for you, when there is only one constructor and the constructor is identical to the default constructor.
Ghidra/RuntimeScripts/Common/server/ jaas_external_program.example.sh 31 Variable is unused.
Ghidra/Features/… /server/remote/GhidraServer.java 208 Possible unsafe assignment to a non-final static field in a constructor.
Ditto 262-381 getRepositoryServer() method is too long.
Ditto 514-833 main() method is too long.

Codacy logo

The quality culture, as evidenced in actual discussions and tests taking place in architecturally significant feature and pull requests

  • Quick response: the large majority of pull requests get a response from Ghidra contributors within 2 days.
  • Cautious acceptance: Although minor or relatively trivial pull requests get approved and merged quickly, Ghidra contributors are cautious about major or bottom-level changes. It’s typical that PR’s remain unmerged for months, due to incomplete work from the contributors’ perspective.
  • Open to long-term changes and a long development cycle: Many pull requests take months to be finally finalized. However, Ghidra contributors are willing to keep track of the progress and discuss any recent improvements.

Analyzing 10 thoroughly discussed pull requests that were finally merged by Ghidra contributors, with respect to the scale of work, time spent and comments involved lead to the following results:

Issue # of Lines Changed Time Before Merged # of Comments
RISC-V processor 9,735 111 19
Align virtualSize while importing PE files 8 52 22
Simplified types.h 211 291 14
Javadoc Fixes 5,658 6 68
TriCore processor module 17,907 89 55
fix many memory leaks/string handling issues. Still a few leaks left 1,473 36 100
Add V850E2M Architecture 2,734 105 167
CP1600-series processor support 726 204 75
Improves TypeDescriptorModel performance 169 78 29
Build PDB.exe using MSBuild instead of devenv 4 41 23

From the comparatively long period of merging and sometimes large changes, we can see Ghidra is not working in a highly agile way.

An assessment of technical debt present in the system.

Technical debt is the cost arising from making bad decisions in order to make quick moves in the short term, instead of those needing cautious consideration and, as a result, more time.

Ghidra is a system that has existed for more than 20 years. If the function development and code maintenance at a certain early stage are not properly maintained, it is very easy to accumulate “debt” in the subsequent development process. This means that the subsequent development process needs to pay for the wrong decisions made earlier in the development process, such as continuing to write bad code to contain the negative effects of existing bad code. At the same time, development of new features based on old code will make things worse until the system becomes unmaintainable and requires a complete refactoring. Fortunately, Ghidra has not undergone such a refactoring since it was officially released as an open source project.

To analyze the existence and impact of Ghidra’s technical debt, let’s measure Ghidra’s Technical Debt Ratio, which is calculated as (Remediation Cost / Development Cost) x 100%. Since it is impossible to measure the time spent developing code, Remediation Cost and Development Cost are defined as the number of lines of code changed. This section takes the latest version of Ghidra, v10.1, as the analysis object, and divides Ghidra’s contribution into two parts: Remediation and Development. The analysis results are as follows:

Type Number of PRs Cost
Development 4 958
Remediation 22 2,810

It can be seen from the data that the Technical Debt Ratio of Ghidra v10.1 is (958 / 2810) x 100% ≈ 34.1%, which is way larger than the recommended 5% threshold for software systems. This is actually very understandable because Ghidra is an SRE tool. After more than 20 years of development, the function has matured, and in the later stage of the software development life cycle, the maintenance cost of the code will exceed the cost of new function development.

Ghidra
Authors
Hakan Ilbas
Yingkai Song
Lola Dekhuijzen
Johannes Ijpma