Wireshark - Quality and Evolution

Introduction

Having seen Wireshark’s product vision and architecture, this essay will focus on the quality of the application and its evolution.

Key quality attributes

When it comes to quality and evolution, the qualities of Wireshark that stand out are compatibility, consistency, modularity, and robustness. Cross-platform compatibility and portability are ensured in the continuous integration/continuous deployment (CI/CD) pipeline, as it runs workflows on the three most popular OS architectures: Windows, macOs and Linux (Ubuntu). Furthermore, to achieve consistency and clarity, the developer README1 provides guidelines on syntax (C11 standard is required), naming and whitespace conventions, as well as how to handle compiler warnings. In addition, modularity and scalability are maintained by providing an API implementing the core functionality of Wireshark that is used across multiple components. Lastly, Wireshark also puts emphasis on robustness and provides corresponding guidelines, due to the fact that communication protocols could be buggy, especially in their early stages of development or when new versions of existing protocols are released.

Overall software quality processes

In order to keep quality consistent and to avoid as many bugs as possible, Wireshark’s developer guides has a chapter dedicated to adding contributions2. Within this chapter, there are some guidelines which can be viewed as having the role of maintaining software quality.

For instance, section 3.10.3 highlights how to make a good patch, including suggestions such as: use the latest sources, inspect the patch carefully, put changes regarding different components in different merge requests, and preview the changes by running the CI pipelines.

On the other hand, section 3.10.5 focuses on code requirements. Together with the developer README, which can be found on Gitlab1 and which describes the coding style used, this section ensures that code written by different developers adheres to the same style and that it is “high-quality, portable, and maintainable”3. Additionally, this section mentions fuzz testing as a valuable testing technique, and points to further resources which can help facilitate testing the added changes in this manner.

Testing and code quality ensure that, regardless of the user’s operating system or configuration, the Wireshark executable is correctly generated and ready to be used.

Continuous integration processes

Wireshark is a large single application that users directly install onto their devices. Therefore, integration processes mainly take place in one location. Since the users directly interact with the application and network data is accessed and stored, is it important for these integration processes to happen properly.

When a developer wants to add new features to the software repository, a forking workflow structure from Gitlab is used by Wireshark 4. Any developer can work on a locally forked repository and pull from the main repository. The fork must be kept public at any time 4.

When code changes are made, CMake is used to build the application, and GCC is used to compile Wireshark. Unit tests are written in Python and the Pytest framework is used to conduct these tests 5. Before the changes can be integrated into the Wireshark codebase, the following actions must be performed by the developer. Wireshark provides a Wireshark-specific pre-commit hook that automatically checks for errors in the added code 4. It mainly checks for whitespace errors and performs formatting checks. A compatible license must be used by the incoming code. Wireshark is released under a GPL license of version 2 or above 4. It is highly recommended to fuzzy test the changes made to Wireshark. This is done by feeding randomly changed capture files to TShark, the command-line instance of Wireshark. This process is described in detail on the Wireshark wiki 6.

After code changes are pushed, they are thoroughly reviewed by trusted Wireshark developers and if approved, the code is merged into the codebase. Developers can immediately pull these changes from the codebase. Most users will likely wait until a new major release of Wireshark becomes available, here multiple changes are combined 7. New changes must continuously be tested, for this, test processes are in place.

Test processes

In order to be able to include newly added code in the Wireshark application, testing is a very important and necessary step. Testing suite is mainly written in Python, making use of Pytest. The choice is argumented in the developer’s guide5, with the main points being Pytest’s support for fixtures (the context in which the test executes, including the environment and the data8).

Wireshark mainly operates on packets directly captured from the network or on packets found in imported capture files. These could include packets that are not formatted properly, which could be due to errors caused by transmission or an attacker. The goal of fuzz testing is to ensure that Wireshark is robust enough to withstand these kinds of packets without crashing. This is accomplished by randomly changing the data inside the packets and running the tests.

Hotspot components

We have used CodeScene9 to identify hotspots in Wireshark’s codebase during the past 4 years of development and found several files changing regularly. At component level, Wireshark is composed of a core, capture engine, wiretap (for storing packets), GUI, and the Epan, which contains all the dissectors used to analyze packets for a wide range of protocols. In terms of hotspots, they are present at each component level in the project.

The largest hotspots can be found in the Epan: packet-ieee80211.c which is the dissector for the IEEE 802.1110 standard frames at the physical and data-link layer and has 42339 lines of code (Loc) and ~97 commits / 1 year, packet-nvme.c (NVMe11 packet dissector) has 8061 LoC and ~59 commits / 1 year, while packet-woww.c (World of Warcraft World12 packet dissector) has 4932 LoC and ~61 commits / 1 year. Furthermore, proto.c, containing the functionality for supporting multiple protocols is another significant hotspot in the Epan component with 9248 LoC and 54 commits / 1 year. In wiretap, pcapng.c (responsible for supporting pcap files to store packets) is the main hotspot with 4124 LoC and ~60 commits/year. Furthermore, the hotspots in the GUI component mostly entail changes in TypeScript files that offer translations to a variety of languages. Aside from the GUI files, most of the aforementioned hotspots triggered CodeScene to issue a low code health score due to their increase in complexity over time.

Figure: Wireshark hotspots identified by CodeScene

In the future, it is expected that proto.c will remain a hotspot, constantly undergoing changes as new dissectors and protocols are added to Wireshark. Similarly, packet-ieee80211.c will change every time the Wifi Standard is updated, currently IEEE 802.11be (Wifi 7)10 is expected to be released in 2024. In addition, the dissectors will undergo changes as their corresponding protocols are updated over time to support new features and performance improvements.

Figure: File metrics computed by CodeScene for packet-ieee80211.c

Code quality

It was unexpected to see files of several thousands lines of code, the longest one (packet-lte-rrc.c), having 136380 LoC at the time of writing. While packet-lte-rrc.c is the largest file in the project, it was not marked with low code health or as a hotspot. To assess the code quality of the hotspots, we used the SonarLint13 plugin for CLion14, and found several warnings concerning the Cyclomatic Complexity (CC), the highest being 100, C pointer casts, and some minor warnings such as redundant use of parentheses, sections of commented code, and the use of ellipsis notation to declare functions with a variable number of parameters.

The large CC is difficult to mitigate, as it is caused by extensive switch statements that are necessary when parsing and dissecting packets, and the benefits of refactoring will likely not outweigh the costs. A cause of concern could be the pointer casts that change the length of a variable, potentially losing precision in numeric values, as well as casts that drop the const qualifier making a variable mutable. Furthermore, SonarLint issued bug warnings on Out of bound memory access in proto.c, within a function computing pointer arithmetic. Considering the significant size of files, and of the project in general, the static analysis warnings are not so numerous, since Wireshark provides a clear and thorough list of style rules in the developer guide. However, further investigating the pointer casts and the out of bounds memory access could lead to bug discoveries.

Quality culture

The Wireshark project is an open source project with hundreds of contributors over the years 15. To preserve code quality, every contributor must make use of the contribution workflow described by the Wireshark developers team 2. Before a contributors’ merge request (MR) is approved, Wireshark’s developers team reviews the code.

Name convention is one of the topics mentioned in the coding style documentation. MRs using wrong naming convention 16 are only approved after the naming convention is solved 17. Consistency is also an important topic for the development team of Wireshark. This can be seen from MRs having inconsistent coding style 18 19. In the two examples referenced, the inconsistency is about the placement of the curly brackets in an if-statement. This sounds like a minor thing, however, the Wireshark developers will not approve these ways of coding, to keep the code quality as high as possible. Variable type usage is also well documented in the coding style document. MRs using wrong variable types, for example: a global variable where being global is not necessary for the variable 20, will also not be merged without altering the variable type.

Not only the coding style is checked by Wireshark’s developers team, but also the commit messages have to follow the Wireshark standards 21 before being approved. Commit messages not following these standards are not being merged 22 23 24 25. The reasoning behind the strict commit messaging rules is to keep a clean commit history.

Even though an MR might be approved by one of the members of Wireshark’s development team, the Wireshark GitLab Utility also has to run successfully. The Wireshark GitLab Utility can fail for various reasons, examples are: potential zero division 26 or portability between different platforms 27.

Technical debt present

Technical debt can often occur in large software applications when a piece of software gains more and more dependencies over time. It then becomes increasingly more difficult to alter or make changes to this piece as it requires more refactoring and changes to other parts of the codebase 28. Uncompleted changes give rise to technical debt. It can be a choice to take on technical debt to increase development time 28. Like most forms of debt, the negative consequences usually take effect over a longer period of time. This is especially evident when bugs occur in the system.

In general Wireshark is a well-designed application that has a finely defined testing structure 5 which helps in dealing with technical debt. Even though the application is extensive, Wireshark has an extensive network of developers that are involved in good software practices such as code review. On Gitlab the CI/CD pipeline analytics are provided 29, where it becomes clear that with over 30000 pipelines and a success ratio of 87%, most developers adhere to the coding style and best practices as defined by Wireshark.

However, there are still instances where technical debt can occur. One example is the CMake file that is used to build the application. Making changes to the CMake file was discouraged by other Wireshark developers during the selection of our contributions to Wireshark, due to the likelihood of contributing to technical debt in the system.

Extensive testing and adding tests to the CI/CD pipeline can further reduce the technical debt. Even though good testing practices are in place 5, a software piece can never have enough testing.

Conclusion

It is, thus, clear that many structures are in place to deal with the continuous development of Wireshark. These ensure that Wireshark is able to evolve effectively and with a high standard of quality.

References


  1. Wireshark Gitlab. Retrieved March 13, 2022, from https://gitlab.com/wireshark/wireshark/-/blob/master/doc/README.developer ↩︎

  2. Wireshark developer guide. Chapter 3.10. Contribute Your Changes. Retrieved 13 March, 2022, from https://www.wireshark.org/docs/wsdg_html_chunked/ChSrcContribute.html ↩︎

  3. Wireshark developer guide. Chapter 3.10.5. Code requirements. Retrieved 13 March, 2022, from https://www.wireshark.org/docs/wsdg_html_chunked/ChSrcContribute.html#:~:text=high%2Dquality%2C%20portable%2C%20and%20maintainable ↩︎

  4. Wireshark developers guide. Chapter 3. Work with the Wireshark sources. Retrieved 19 March, 2022, from https://www.wireshark.org/docs/wsdg_html/#ChSrcGitRepository ↩︎

  5. Wireshark developer guide. Chapter 13.2. Test suite structure. Retrieved 14 March, 2022, from https://www.wireshark.org/docs/wsdg_html_chunked/ChTestsStructure.html ↩︎

  6. Wireshark Gitlab. Fuzzytest documentation. Retrieved 19 March, 2022, from https://gitlab.com/wireshark/wireshark/wikis/FuzzTesting ↩︎

  7. Wireshark developers guide. Chapter 1.5. Releases and distributions. Retrieved 19 March, 2022, from https://www.wireshark.org/docs/wsdg_html/#ChIntroReleases ↩︎

  8. Pytest documentation. About fixtures. Retrieved 14 March, 2022, from https://docs.pytest.org/en/latest/explanation/fixtures.html ↩︎

  9. CodeScene. Retrieved March 18, 2022, https://codescene.com/ ↩︎

  10. Wikipedia. IEEE 802.11. Retrieved March 18, 2022, https://en.wikipedia.org/wiki/IEEE_802.11 ↩︎

  11. Wikipedia. NVM Express. Retrieved March 18, 2022, https://en.wikipedia.org/wiki/NVM_Express ↩︎

  12. Phoronix. Wireshark 3.6 Released With Support For World of Warcraft “WOWW” Protocol, Many Others. Retrieved March 18, 2022, https://www.phoronix.com/scan.php?page=news_item&px=Wireshark-3.6-Released ↩︎

  13. SonarLint. Retrieved March 18, 2022, from https://www.sonarlint.org/ ↩︎

  14. JetBrains. CLion. Retrieved March 18, 2022, https://www.jetbrains.com/clion/promo/?source=google&medium=cpc&campaign=11960748875 ↩︎

  15. Wireshark. Developers. https://wiki.wireshark.org/Developers ↩︎

  16. Wireshark GitLab. Issue #468: extcap for etw. https://gitlab.com/wireshark/wireshark/-/merge_requests/468#note_440340547 ↩︎

  17. Wireshark GitLab. Issue #697: DRAFT: etw dissector .https://gitlab.com/wireshark/wireshark/-/merge_requests/697 ↩︎

  18. Wireshark GitLab. Issue #51: MBIM: dissect new UICC commands of MBIM extended version 1.0. https://gitlab.com/wireshark/wireshark/-/merge_requests/51#note_403508662 ↩︎

  19. Wireshark GitLab. Issue #3936: Proto: Enable fields to be forcefully marked as referenced. https://gitlab.com/wireshark/wireshark/-/merge_requests/3936#note_657752825 ↩︎

  20. Wireshark GitLab. Issue #6000: PROFINET: TimeAware Dissection and RSI FREQ block fix. https://gitlab.com/wireshark/wireshark/-/merge_requests/6000#note_819126594 ↩︎

  21. Wireshark GitLab. Writing A Good Commit Message. https://gitlab.com/wireshark/wireshark/-/wikis/Development/SubmittingPatches#writing-a-good-commit-message ↩︎

  22. Wireshark GitLab. Issue #6195: Add newly assigned option classes. https://gitlab.com/wireshark/wireshark/-/merge_requests/6195#note_838661030 ↩︎

  23. Wireshark GitLab. Issue #5165: Fix a couple of filters. https://gitlab.com/wireshark/wireshark/-/merge_requests/5165#note_739467210 ↩︎

  24. Wireshark GitLab. Issue #2863: Issue #17217 - extending support for RFC 8926. https://gitlab.com/wireshark/wireshark/-/merge_requests/2863#note_564015929 ↩︎

  25. Wireshark GitLab. Issue #1037: SFlow Improve dissection of Lag counter. https://gitlab.com/wireshark/wireshark/-/merge_requests/1037#note_454958577 ↩︎

  26. Wireshark GitLab. Issue #1222: O-RAN fronthaul UC-plane dissector. https://gitlab.com/wireshark/wireshark/-/merge_requests/1222#note_465110841 ↩︎

  27. Wireshark GitLab. Issue #4081: Separate in/out buf sizes, fix memory leak. https://gitlab.com/wireshark/wireshark/-/merge_requests/4081#note_668044670 ↩︎

  28. Atlassian. D. Radigan. Software development. Escaping the black hole of technical debt. Retrieved 19 March, 2022, from https://www.atlassian.com/agile/software-development/technical-debt ↩︎

  29. Wireshark Gitlab. CI/CD pipelines. Charts. Retrieved 19 March, 2022, from https://gitlab.com/wireshark/wireshark/-/pipelines/charts ↩︎