DESOSA 2022

Quality and Evolution

Netdata is created by the Netdata company, who is still mainly responsible for the development of the project. Because Netdata is an open-source project, many people with many different coding styles contribute to Netdata concurrently. To maintain high code quality for Netdata, the company has set up a set up rules and guidelines for contributors. This essay describes the safeguards for the quality and integrity of Netdata’s software.

Satisfying the key quality attributes

In Essay 1, we described the four key quality attributes of Netdata to be: functionality, cost, security, and scalability. The functionality attribute is satisfied by two rules/guidelines: 1) Only metrics with real meaning may be collected1, preventing the collection of useless metrics. 2) All committed code is checked for performance and only approved if the performance is sufficient2.

The cost attribute remains satisfied as long as enough money gets invested into Netdata, making it possible for the software to remain free.

The security attribute is mostly satisfied by the architectural design, which was discussed in Essay 2. Here it states that no raw data is ever exposed outside the system and that Netdata runs as a normal system user, making it hard for contributors to weaken the security of the system. Next to this, Netdata also has an extensive security policy3, where people who find a vulnerability in the system can send them an e-mail. The developers at Netdata will than immediately look at the issue and try to fix it with the highest priority.

The scalability attribute is also mostly satisfied by the architectural design. The way in which collectors are set up provides endless scalability. To ensure the quality of these collectors, developers are encouraged to write new collectors in Go4. Next to this, all collectors need to make use of the Module interface5, ensuring that each collector outputs its data in the same way.

Software quality processes

While satisfying the key quality attributes of the project, the Netdata company has also taken various measures to ensure the quality of the software such as: contribution workflow, code styles, manual reviews, and continues integration.

Contribution workflow

Every developer that wants to contribute to the Netdata project needs to follow a specific workflow6. The first thing a developer needs to do is fork the project (create a clone repository on its own account7), keeping the original repository clear of branches. The changes made by the developer can then be merged back into the main repository, by using a pull request. These pull requests have a few strict guidelines8, which if not followed will result in the rejection of the pull request. These guidelines are: the title must give a short and complete description of the functionality and the description should include the following:

  • Description of the changes
  • Affected area/component
  • Test plan
  • Reference to an existing issue (if there exists any)

Code style

The coding style of Netdata follows one simple rule: check the surrounding code and try to imitate it9. Following this rule, they want to make sure that the style of the code is almost the same across the project. While they do not provide a specific code style for most parts of the project, for the c-part they try to follow the Linux kernel style as much as possible.

Manual reviews

Besides automatic tests, the contributed code also needs to be reviewed by all the code owners of the altered files10. This means that the number of required reviews depends on the number of areas the developer altered. These code owners are automatically assigned to the pull request, ensuring that the code owners have a final say over the quality of their owned part of the project.

Continuous integration processes

When a developer makes a pull request, the code goes through a pipeline with several stages. From one of our own pull requests we observed that, at the time of writing, there are 99 stages11 in this pipeline. It is mandatory for all these stages to pass before a pull request can get accepted. These stages include checking if the code builds on all the supported platforms and the following checks:

Quality checks and services

Netdata uses many static analysis tools to decrease the chance of bugs and to increase the code quality. As different languages adhere to different code styles and standards, multiple lint tools are used:

  • shellcheck - Statically analyses shell script to find bugs, syntax errors, stylistic errors, and gives suggestions on how to improve.
  • pylint - Helps with adhering python code to standards and provides error and bug detection. It also helps with refactoring the code and generating UML diagrams.
  • csslint - Checks CSS code, providing coding standards and refactoring suggestions.
  • jslint - Checks if the JavaScript code complies with the JavaScript coding standards and rules.
  • cppcheck - Checks the C and C++ code to detect bugs, undefined behavior, and dangerous coding constructs.

Netdata also uses online services that provide code quality metrics and improvements. The online services used are:

  • LGTM.com - Automatic code reviewer, provides granular views of errors and warnings and provides build logs12.
  • CodeClimate.com - Gives an overview of the code quality per language and shows the maintainability. Additionally, it shows how the code can be improved. 13.
  • Codacy - Looks at the technical depth of the project.
  • Coverity Scan - Multi-language static analysis tool.

Tests

Another key component is the feature tests. These check whether all the functionalities still work as intended, along minor elements for example checking if the documentation has any broken links.

While Netdata uses many static analysis and code quality tools, they do not adhere to a specific testing method. Nonetheless, testing your features before pulling into main is required. If this is not done, the pull request reviewers will comment that a test is missing.

Many of the tests are focused on checking whether the code can run on different platforms. Additionally, feature tests are prominent. These tests are in many languages, for example a python fuzzer to test the API, shell code to test the directory paths and C to test different profiles. Many of the tests are parameterized to efficiently test multiple scenarios. The feature tests are included in the continues integration pipeline.

Hotspots

Two hotspot components from the past are the former Backends engine, that has since been deprecated and moved to the new Exporting engine, and the Netdata.Cloud (online visualizer) architecture which was completely overhauled with the 1.33.0 version according to the development blogs 14.

As a result, the agent code has also a restructured Agent-Cloud Link component. Although these are the two key components that have been altered as part of a refactoring, almost all the version-to-version patches and changes relate to either individual external collectors or the local dashboard. The Agent architecture is the robust core of the Netdata project, while the dashboard and collectors are interfaced with its API.

A buggy collector may simply output bad data but that does not affect the system. The dashboard may be changed extensively to better suit the default user interface or users may also choose to use the Netdata.Cloud visualizer or their own databases through the exporting engine. These code parts are likely to remain code hotspots as development continues with demand for improvements to UI and the inclusion of more collectors, but because these components have high modularity the core remains robust to software faults.

Code quality

Code quality on the Agent side of the project is consistently high, scoring A+ on the LGTM comparative database, and similar static code analysis tools as well13, for both its C/C++ and Python code12.

Looking through previous patch and release notes, aside from the addition of new metric collectors or fixes to individual collectors, most new features and fixes pertain to the UX and UI of the local dashboard. The changes to the ACLK component were not necessarily a result of refactoring or code quality of that component, but the refactoring of the Netdata.Cloud backend that changed how the server would communicate and establish the link.

The refactoring and migration of the Backend engine to the current exporting engine was part of a quality upgrade. The new engine is a more consistent and unified whole that treats the various data formats and connections for different databases as wrappers and extents to the same core buffering and streaming functionality.

Quality culture

The quality culture is mostly managed by a team of core developers that review and fix most of the problems. To check how successful Netdata is, we need to look at several recent issues and pull requests. It is however in our opinion more important to show the types of issues the contributors have solved.

Issues are labeled in categories, areas, and issue concerns (e.g. area/ml). For each feature requests the user must state:

  1. The problem.
  2. How important they think the problem is and the value proposition of the feature.
  3. A proposed implementation for this feature.

Examples:

For bugs the process is different and stricter, here the user must:

  1. Describe the bug.
  2. State the expected behavior.
  3. Give the steps to reproduce the problem.
  4. State the installation method, system information, and Netdata build info.
  5. Add additional information that will help the developer fix the bug.

Bug examples:

  1. issue, pull request
  2. issue, pull request
  3. issue, pull request
  4. issue, pull request
  5. issue, pull request

Finally, some issues or bugs haven’t been merged to the main branch yet as there is doubt about the implemented solution.

  1. issue
  2. issue

Issues get opened quite a lot and the Netdata team responds quickly and extensively discusses any issue, which indicates that Netdata has a good quality culture. This is also evidenced by BadgeApp where most of the quality checklists are green.

Technical debt

Technical debt is a metaphor coined by Ward Cunningham that draws a comparison between technical complexity and financial debt. For every shortcut you take today to deliver something faster, you will pay interest in the form of additional software development hours every time you develop in that code area in the future15. As code becomes more complex and not well documented, technical debt increases.

Figure: Technical debt image

When it comes to Netdata, tools are used to measure the code quality, maintainability, documentation, test debt, etc. assuring that the technical debt and the maintainability of the project remain low.

Documentation

According to BadgeApp, we see that most of the documentation is very good when it comes to user experience. When it gets to code documentation however, there are still some gaps that need to be filled.

Maintainability

According to CodeClimate, Netdata has a good maintainability with an A which means the project has technical debt below 5% of the code base.

Code quality

Lgtm measures the quality of a project’s code by assigning both a score and a grade to it. The score is for how impressive the code is and the grade is for how it compares to similar reference projects. Netdata scores A+ for both C/C++ and python and a C for Js/Ts, which is very good compared to other similar projects lgtm. This is also shown in the image below for C/C++ (Netdata is the green dot with the unfilled circle).

Figure: Quality code

Test debt

For the test debt, on BadgeApp we can see that it is lacking in the Netdata project. Under the “Quality” tab we can see that “New functionality testing” checkpoint is not completed. This means that the project doesn’t have a general policy for adding major functionality.

We can conclude that Netdata has done a good job in maintaining the quality of the code and the project.

References


  1. Meaningful metrics https://learn.netdata.cloud/contribute/handbook#meaningful-metrics ↩︎

  2. Performance and efficiency https://learn.netdata.cloud/contribute/handbook#performance-and-efficiency ↩︎

  3. Security policy https://learn.netdata.cloud/contribute/security ↩︎

  4. Contribute a new collector https://learn.netdata.cloud/contribute/handbook#contribute-a-new-collector ↩︎

  5. How to write a Netdata collector in Go https://learn.netdata.cloud/docs/agent/collectors/go.d.plugin/docs/how-to-write-a-module ↩︎

  6. Contribution workflow https://learn.netdata.cloud/contribute/handbook#contribution-workflow ↩︎

  7. Fork a repo https://docs.github.com/en/get-started/quickstart/fork-a-repo ↩︎

  8. Pull request guidelines https://learn.netdata.cloud/contribute/handbook#pr-guidelines ↩︎

  9. Code style https://learn.netdata.cloud/contribute/handbook#code-style ↩︎

  10. Pull request approval process https://learn.netdata.cloud/contribute/handbook#pr-approval-process ↩︎

  11. Pull request Netdata checks https://github.com/netdata/netdata/pull/12208/checks ↩︎

  12. Static analysis tool lgtm https://lgtm.com/projects/g/netdata/netdata ↩︎

  13. Static analysis tool codeclimate https://codeclimate.com/github/netdata/netdata ↩︎

  14. Netdata Cloud new Architecture release https://www.netdata.cloud/blog/netdata-clouds-new-architecture ↩︎

  15. Technical Debt https://www.sealights.io/blog/its-time-to-rethink-technical-debt-management/ ↩︎

Netdata
Authors
Wesley de Hek
Daan Offerhaus
Ratish Thakoersingh
Marios Marinos