Egeria: Quality and Evolution

Introduction

Egeria is an open-source project that provides open metadata governance, allowing to easily share metadata across systems. In our previous essays, we have discussed Egeria’s product vision and the software architecture. In this essay, we discuss the implemented key quality attributes, look at the code and process quality, code change hotspots, and technical debt.

Implemented key quality attributes

The system’s key quality attributes, which you can find in the architectural design principles1, consist of the following: autonomic, integratable, adaptable, extensible, fair ecosystem, trustworthy, inclusive, and educational. The figure below shows that most of the developer platform and the deployment resources are released (green). These two categories implement the quality attributes: autonomic, adaptable, and extensible.

The Egeria developers still need to work on the Kubernetes operators for adaptability as it is not yet ready to be released (orange). Also, integration capabilities are still under development as each vendor can have a very different workflow, however, there is already a solid foundation to extend on. Even though there has been a lot of work done on the governance solutions, which are important for the trustworthiness of the system, it needs more work before they can be fully released. Last but not least, the Egeria team is working to increase the educational value of the system. The Dojo2 is the first step, but as can be seen, they want to add more components to increase the educational value of the system. The Dojo2: is already a good first step, but as can be seen, they want to add a lot more there.

Figure: Figure 1 - Overview of the status of the functions in Egeria today (Source: https://egeria-project.org/release-notes/roadmap/#current-status)

Quality process

Egeria has multiple quality principles in place. This includes automated testing, which we will discuss later in this essay, contributor roles and code change processes. Below we will summarize Egeria’s guidelines when it comes to code contributions.

Egeria differentiates between its members 3. Contributors are the active community members and maintainers who have much experience with Egeria. Currently, there are 13 recognized maintainers 4 and 14 contributors 5 who take responsibility for the quality of the project.

The code change process is very similar to the common issue lifecycle in many Github projects. It starts with writing the code. Egeria provides detailed coding guidelines containing rules the contributors should follow, emphasizing responsibility for wide applicability and longevity 6. More specifically, the guidelines instruct the contributors to follow a similar and consistent writing style 7, write documentation 8 and create tests. Additionally, they emphasized build warnings and extra dependencies, both of which should be avoided.

Before code is written, however, an issue lifecycle9 is started. An issue can be created by any Github user. The issue will then be handled by a maintainer who can evaluate the issue and assign different labels like milestones, relevant tags, owner, and more. The owner of the issue can then start working on the solution, creating a Pull Request (PR) and keep it updated. If the issue is not being worked on, it will be labelled as stale and eventually be closed.

When an issue is being worked on, it is recommended to do this on a fork. Only when code has been finished, tested and reviewed it should be merged into the master branch. There are a few guidelines10 provided by Egeria to review a PR. This mostly includes code quality review, obvious errors check, no logging to console and documentation updates. Besides reviews by contributors and maintainers, there are also automated checks, these will be elaborated on later in the essay.

If a change is large or when multiple changes have to be applied at once, an experienced contributor can open a feature branch, allowing the code to be first merged in there and afterwards in the master.

Quality culture

In the previous section, we discussed the intended way of working Egeria. We sampled merged PRs from the past six months to assess to what extent this quality process is adhered to.

When creating a PR or opening an issue there are different templates the user can choose from. Each of these templates consists of a form with multiple questions that should be answered. When suggesting an enhancement for example you are also asked to provide possible alternative solutions. By using templates it forces people to provide the necessary information 11 12 and think about the problem in more detail, leading to a higher quality of code and a more fruitful discussion.

A significant number of the sampled PRs do not discuss GitHub before being merged 13 14 15. Moreover, some of the PRs have no related issue assigned 16 14. It could be that the developers use a different communication channel to discuss open issues and PRs, as they actively promote their slack and weekly developer meetings. However, from our investigation, it seems like there is little documented evaluation and the prescribed work process is not always followed.

When a PR is created, code analysis is run by SonarCloud17 and DependaBot18. These analysis tools check for code quality and added dependencies. This analysis helps the developers minimize code warnings and extra dependencies.

Testing

Egeria’s open-source and extensible nature drives the design of its testing framework. It is not good enough to test Egeria software on its own. Instead, all software connected to these large integrated networks needs to conform to the standards that consumers expect of Egeria.

Egeria’s testing framework19 consists of three main categories: unit tests, functional verification tests (FVTs), and an open metadata conformance test suit. The unit testing allows the testing of single components. The FVTs test multiple components to ensure they function correctly together. They typically load test data into the in-memory repository and use the APIs to run the tests. These tests verify whether those components support all functionality that consumers expect.

The open metadata conformance suite20 provides a testing framework to help the developers integrate a specific technology into the open metadata ecosystem. This framework currently has two implemented workbenches: The platform workbench and the repository workbench. The platform workbench tests the rest API of the OMAG server platform and the repository workbench checks the REST and event-based API of an open metadata repository. Future workbenches will focus on testing other API and performance-based tests.

Each workbench has a set of profiles with mandatory and optional requirements specific to a system. The mandatory requirements ensure that the software cannot harm other systems it shares metadata with Software that can show that it passes at least the mandatory tests is allowed to put an Egeria conformance mark on their software. Egeria gives extra credit when the technology meets the optional requirements. This mark helps build trust between consumers and third-party integrations. It enables consumers to run these tests to verify the correctness of the mark.

As can be seen in the coverage report21 provided by Egeria the code coverage is low with only ~12% on the main branch. This coverage is calculated only using the unit tests as the FVT’s cannot be run without the correct environments. This is problematic and is known at Egeria 22 as it is now extremely difficult to get a good overview of what has been tested and to what extent.

To assist the developers of Egeria in quickly finding errors, they make use of Continuous Integration23. Egeria implements this using the GitHub Actions framework24.

Code hotspots

To better assess Egeria’s code quality we investigated code hotspots in Egeria (e.g. components in the system that are under heavy development and frequently changed). We made use of CodeScene25 to analyze Egeria’s codebase and find these hotspot components. This analysis, which looked at Egeria during the past year, can be seen in figure 2.

Figure: Figure 2 - Overview of hotspots in Egeria, generated using CodeScene

There is one class that immediately stands out, having both low code health and being a hotspot, namely the “OpenMetadataAPIGenericHandler”. This class has 47 commits the past year, is over 9000 lines of code, and has been increasing in size over the past year 25. Many of these functions have a large number of arguments (10+).

Even more interesting is that the analysis of “OpenMetadataAPIGenericHandler” shows a total of 227 code smells, 3 potential bugs, and 0.0% unit test coverage 26. It seems concerning that these issues are present despite all the code guidelines and the fact Egeria is using SonarCloud.

The OpenMetadataAPIGenericHandler is a handler that acts as an intermediate between the metadata access service and the repository service. The fact that it is an intermediate component explains the frequent changes being made. Additionally, it is stated in the readme that this part of Egeria is still under active development 27, thus it is to be expected to be a hotspot for the foreseeable future.

There are a couple of other interesting hotspots most being part of OMAS, dealing with the integration of services and open metadata25. For all of these, it is also stated they are still under active development. While not optimal their code health is much better than the OpenMetaDatAPIGenericHandler, in addition to all of these classes being of a much more manageable size 25. With the side note that they still contain code-smells and lack coverage 17

Technical Debt

To inspect the technical dept of the project, we have used SonarQube28. The tool has been widely accepted and used across the IT community 293031 and is also integrated with Egeria 32.

SonarQube gives the Egeria project a technical depth score of A, meaning it is maintainable. By maintainable, SonarQube means the technical Depth is less than 5%. More accurately, the debt ratio is 1.5% and the total debt is 665 days where a day corresponds to 8 hours of work.

How does SonarQube calculate the technical debt? It is the estimated time required to fix all Maintainability issues or code smells 33. These include vulnerability and reliability issues as having hard to change code. The technical debt ratio is calculated by dividing the technical debt by the development cost. The development cost is calculated by the cost to develop 1 line of code (0.06 days) times the number of lines of code 34.

In the figure below you can see the debt per code parts plotted per line of code. The outlier in this figure on the top right is the ‘OpenMetadataTypesArchive1_2.java’ class. This is used for educational purposes like open metadata demos, hands-on labs and samples35.

Figure: Figure 3 - Technical Debt overview Egeria, generated with SonarQube

Maybe the more interesting parts of the code are the parts that are rated worst. Egeria does not have an E rating (50%-100%) but does have some D ratings (21%-50%), seen in the figure below. These classes are parts of FVTs. Even though there are classes with a very low rating, it has minimal impact as these classes contain only a few lines.

Figure: Figure 4 - Technical Debt D and E rating Egeria, generated with SonarQube

Conclusion

Egeria has firm standards in place with regards to creating issues, contributing to the project and reviewing merge requests. Additionally, analysis tools and Continous Integration is used to increase code quality. However, after empirically investigating the PRs created on GitHub, investigating the code quality, and the lack of an overview on what is currently tested, there are certainly improvements that could be made to improve overall workflow and code quality.

References


  1. https://www.youtube.com/watch?v=n-Xm8_WIyBM&ab_channel=MandyChessell ↩︎

  2. https://egeria-project.org/education/egeria-dojo/ ↩︎

  3. https://egeria-project.org/guides/project-operations/#community-members ↩︎

  4. https://github.com/odpi/egeria/blob/master/MAINTAINERS.md ↩︎

  5. https://github.com/odpi/egeria/blob/master/CONTRIBUTORS.md ↩︎

  6. https://egeria-project.org/guides/contributor/guidelines/ ↩︎

  7. https://egeria-project.org/guides/documentation/formatting/#link-within-docs-using-absolute-links ↩︎

  8. https://egeria-project.org/guides/documentation/guide/ ↩︎

  9. https://egeria-project.org/guides/contributor/process/ ↩︎

  10. https://egeria-project.org/guides/contributor/guidelines/#review-code-changes ↩︎

  11. https://blog.natterstefan.me/discover-the-benefits-you-get-with-github-templates ↩︎

  12. https://madeintandem.com/blog/tandem-uses-pull-request-templates/ ↩︎

  13. https://github.com/odpi/egeria/pull/6233 ↩︎

  14. https://github.com/odpi/egeria/pull/6131 ↩︎

  15. https://github.com/odpi/egeria/pull/5957 ↩︎

  16. https://github.com/odpi/egeria/pull/6163 ↩︎

  17. https://sonarcloud.io/project/overview?id=odpi_egeria ↩︎

  18. https://github.com/dependabot ↩︎

  19. https://github.com/odpi/egeria/tree/master/open-metadata-test ↩︎

  20. https://github.com/odpi/egeria/tree/master/open-metadata-conformance-suite ↩︎

  21. https://sonarcloud.io/summary/overall?id=odpi_egeria ↩︎

  22. https://github.com/odpi/egeria/issues/3969 ↩︎

  23. https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration ↩︎

  24. https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions ↩︎

  25. https://codescene.io/projects/24556/jobs/404511/results/code/biomarkers ↩︎

  26. https://sonarcloud.io/project/issues?fileUuids=AXUwjjTq8Z59dRxV-Iua&types=CODE_SMELL&id=odpi_egeria ↩︎

  27. https://github.com/odpi/egeria/tree/master/open-metadata-implementation/common-services/generic-handlers ↩︎

  28. https://www.sonarqube.org/about/ ↩︎

  29. https://www.trustradius.com/products/sonarqube/reviews ↩︎

  30. https://www.g2.com/products/sonarqube/reviews ↩︎

  31. https://www.gartner.com/reviews/market/application-security-testing/vendor/sonarsource/product/sonarqube ↩︎

  32. https://sonarcloud.io/project/overview?id=odpi_egeria ↩︎

  33. https://docs.sonarqube.org/latest/user-guide/concepts/ ↩︎

  34. https://docs.sonarqube.org/latest/user-guide/metric-definitions/ ↩︎

  35. https://github.com/odpi/egeria/tree/master/open-metadata-resources ↩︎