Wireshark - From vision to architecture

While the previous post focused on the general, top-level view of Wireshark as a product and its goals and ethical implications, this time we will be taking a deep dive into its architecture. Everything from architectural style, to the C4 model¹, and even the API will be discussed and analyzed.

Main architectural style

The Wireshark project consists of both a graphical user interface (GUI) as well as a command-line interface named TShark. Both interfaces present the data captured in the network. The user can define which data is shown using filters.

Wireshark runs on multiple different operating systems and can be compiled with several different compilers. To achieve portability, Wireshark has a strict coding style policy for the Wireshark core written in C11 ². The coding style documentation focuses on the usage of the APIs within Wireshark to prevent duplication of code, datatype usage to protect the portability, naming convention, and white space convention for readability. Using the coding style, as described by Wireshark, maintains the code quality. Only a select group of contributors are allowed to accept contributions to the Wireshark project. To contribute to Wireshark, Wireshark makes use of a forking workflow using GitLab, shown in the figure below ³:

One of the things Wireshark stands out in is the variety of packet dissectors it has to offer. Inside the Wireshark project, the packet dissector part is separated from the rest of the Wireshark core. Because of this architectural decision, developers can write their own dissector without having to learn about the rest of the Wireshark project code using just the dissector documentation ⁴.

Containers view

The Wireshark application is a desktop application that can be considered as the main container of Wireshark. The execution environment is the local device of the user. Another container that can be considered is the command line operation of Wireshark using TShark. From the Wireshark Gitlab ⁵ and the Wireshark developers guide ⁶, it becomes clear that Wireshark can roughly be split into the following components: Capture, Wiretap, Epan, Core and the GUI.

Components view

The main container used in Wireshark is the application itself. The official Wireshark developer guide provides an image describing the “Wireshark function blocks”⁷ (see image below). While the developers might call them function blocks, we can consider them the component’s view of the application.

Based on this image and the code sturcture, we have created the following Components diagram (in standard UML 2 format):

Users mainly interact with the application through its graphical user interface. It is written using qt, a C++ library for building GUIs.

The core of the application has the main role of acting as the intermediary between the other components that are part of the application and the user interface. The core’s functionality can also be accessed without the use of the GUI, through TShark.

The wiretap component consists of a custom library (named libwiretap) for packet capturing. This library is developed with the intent of replacing libpcap, which was the standard library for capturing packets on Unix systems before libwiretap’s creation. However, it seems that its adoption as a standard was unsuccessful.

“Capture” is the component that acts as an interface between the core functionality and Dumpcap, which is a component external to the Wireshark application that uses the libpcap library to capture the packets from the network interface.

The “Enhanced Packet ANalyzer” (Epan) is the component that provides access to two important sub-components: the dissectors which decompose the packet to allow the users to inspect its contents, and the filters which help select specific packets from among the captured ones.

Lastly, there are also several utility files grouped together under the “Util” component. These generally help with colors, cyclic redundancy checks, base 32/64 conversions, file system management, logging, and other similar utilities.

Connectors view

The components are connected in different ways, all components are connected through the core of the application. The core is mainly responsible for collecting the data from the Capture, Wiretap, and Epan components and displaying them onto the GUI. It also allows for user interaction with this data. These components share classes and methods with the core of the application and vice versa by including their respective h-files.

The Capture component is the component that mostly interacts with the external Dumpcap capture engine that collects network data. It is also connected to the Wiretap component. The Wiretap component mostly interacts with the local harddrive of the user to access captured and stored network data. Epan and the GUI are connected only to the core of the application.

Since the application is installed locally on the device of the user, there is no connection to a database or any other online storage medium. The application only interacts with the user and the Dumpcap engine. The Dumpcap engine connects to the libpcap library which connects to the network, but since it is an external component to Wireshark it is not further analyzed in this part.

Development view

The structure of the code closely follows the component’s view, as individual components are (usually) organized within the same folder⁵. The diagram below provides an overview of the code directories and their contents.

The root directory contains most of the core functionality. This includes TShark, helper files for interacting with libpcap, helper files which help with capturing the packets, several files for file and I/O management, the test directory (along with the fuzzer directory), CMake and configuration files, and documentation (along with several files and directories which probably also help with testing debugging, and generally making the other components fit together, but were difficult to accurately derive the function of). Aside from the aforementioned, there are also directories that directly correspond to the components and have the same or similar names to them: image and ui/qt (for GUI), wiretap (for wiretap), epan (for epan), wsutil (for util).

Run time view

After a user selects a network interface to capture, optionally predefines filters, and starts the Wireshark network sniffing, Wireshark will start obtaining network data from the network interface card using Dumpcap. To do so, the Dumpcap process needs to have root privilege. The Wireshark project itself can run with normal user rights. Data packets received from Dumpcap are analyzed via Wiresharks’ packet dissectors and the user-defined filters are applied after which the data is presented to the user using the utility files groups. The data capturing will continue until the user tells Wireshark to stop. During the capturing process, the user can still alter the obtainable data using filters.

Key quality attributes

Previously, we have mentioned Wireshark’s support for a wide range of dissectors as a key quality attribute, since the end-users are very likely to be able to analyze packets from their protocol of interest. Furthermore, with the modular design approach, new dissectors can be plugged in by the development community using Wireshark’s Lua API⁸, achieving scalability, extensibility, and flexibility, as a result.

Additionally, another advantage of the modular architecture is the fact that the components are swappable. For instance, the GUI, capture engine, wiretap and epan could be replaced or changed without impacting the rest of the system. The core is the most dependable component, as depicted in the Overview Figure⁷. Thus, dependability has been traded for scalability in this instance, which makes maintaining the core slightly more difficult than the other components. Moreover, by separating the capture engine from wiretap, which is used to dump the data frames, the network traffic can be viewed in real-time without needing to store it to be accessible, improving usability and responsiveness.

API design principles applied

According to ² Wireshark has two main APIs: an internal API consisting of libraries that need to be installed on the host system in order to be able to run the Wireshark programs, and an external API used by plugins. Therefore, the internal API defined by wsutil and libwireshark is publicly available to the Wireshark and TShark programs (among others), which could be considered the “client users of the internal API”², while the plugin dissectors are the “client users of the external API and is a loosely defined subset of the internal API plus any infrastructure required to support a plugin system”². As a result, Wireshark can be scaled to support numerous plugins. One important aspect to note is that the internal API is not stable and it is frequently changed as a result of maintenance. However, maintenance changes are not allowed to break the plugins unless they are necessary or with good cause.

The Lua API⁸ is used to support the following functionalities to the plugins: saving capture files, obtaining dissection data, GUI support, functions for new protocols and dissectors, binary encode/decode support, GLib regular expressions, as well as other packet dissection related features. The API is made quite usable and accessible with the API Reference Manual⁸ provided by Wireshark. The API manual is sectioned by functionalities and their related methods. Each method is defined along with a description, arguments, returns, and potential error messages that could be thrown if applicable. The descriptions and error messages are meaningful and steer the developers in the right direction towards fixing the problem. Furthermore, method naming is consistent across the API: objects conform to upper camel case or Pascal case, while methods are written in snake case. In addition, “Wireshark developers have historically tried to keep the Lua API very stable and provide strong backward-compatibility guarantees”², hence, providing reliability.

References

C4 model. Retrieved March 14, 2022, from https://c4model.com/ ↩︎
Wireshark Gitlab. Developer README. Retrieved March 9, 2022, from https://gitlab.com/wireshark/wireshark/-/blob/master/doc/README.developer ↩︎
Wireshark. Contribute Your Changes. Retrieved March 9, 2022, from https://www.wireshark.org/docs/wsdg_html_chunked/ChSrcContribute.html#ChSrcUpload ↩︎
Wireshark. README.dissector. Retrieved March 7, 2022, from https://gitlab.com/wireshark/wireshark/-/blob/master/doc/README.dissector ↩︎
Wireshark Gitlab. Retrieved March 8, 2022, from https://github.com/wireshark/wireshark ↩︎
Wireshark. Wireshark Developer’s Guide. Retrieved February 27, 2022, from https://www.wireshark.org/docs/wsdg_html_chunked/ ↩︎
Wireshark. Chapter 6. How Wireshark Works. 6.2. Overview. Retrieved March 8, 2022, from https://www.wireshark.org/docs/wsdg_html_chunked/ChWorksOverview.html ↩︎
Wireshark. Chapter 11. Wireshark’s Lua API Reference Manual. Retrieved March 9, 2022, from https://www.wireshark.org/docs/wsdg_html_chunked/wsluarm_modules.html ↩︎