In this essay we will explain the underlying issue that Egeria is trying to solve, efficiently utilising all value in and managing metadata. In addition to giving a layout of what Egeria is currently doing and planning to do to tackle this challenge.
We start by introducing the Egeria project, explaining the key concepts and the underlying domain model. Then, we present the features, benefits, interested parties, and ethical implications of the system before talking about where the project currently stands and its future roadmap.
Introduction to Egeria
Egeria is an open-source project that provides open metadata governance. It automatically enables capturing, managing, and exchanging metadata between tools and platforms, no matter the user or underlying technology1. By creating one shared metadata communication layer Egeria attempts to ease data transfer between different tools. Egeria aims to help reduce missed opportunities with the available data, which currently exist solely due to a lack of ability to efficiently and accurately share existing metadata between platforms2.
Systems can be connected with Egeria by building API’s between them. Egeria is “the man in the middle”, as every service can connect to Egeria rather than having to connect every service by hand.
Egeria is built in a plug-and-play way; Enabling users to easily write new plugins to connect their existing tools to Egeria or, in many situations, allowing users to use one of the existing connectors from the pre-built catalogue 3.
Key concepts
This section will briefly explain the meaning of metadata and governance as these terms are core to the Egeria system and the underlying domain.
Metadata, or the data about data, is the data that tells you what the numbers and letters you are looking at actually mean. Data’s usefulness scales with the quality of its associated metadata. If the metadata quality is high then you know exactly what the data represents, how trustworthy it is, and what you can do with it. As an example, a postcode is only useful when you know the system used to create that code.
Governance or data governance, in this case, is a plan to ensure the data quality is maintained throughout the lifecycle of the system. It can contain rules and regulations that ensure certain quality aspects are maintained, but also consists of best practices.
A subsystem of Egeria, the OMAS4, provides support for defining a data strategy, and regulations, and developing a governance program for the system’s data.
What does Egeria offer?
The purpose of Egeria is to allow connected services to store, share, and edit metadata. It functions as a communication layer between all applications where the services communicate using Egeria as a metadata highway. This model simplifies the communication necessary between services, making the whole system easier to implement, update, and use.
Without Egeria, see Figure 1, every subsystem communicates directly with every other subsystem. Each connection needs an adapter when those services work with different data. The new model, see Figure 2, ensures that all communication happens through Egeria. This model choice ensures that each service only requires one connector between itself and Egeria, drastically reducing the complexity of the overall system while still allowing all nodes to communicate with one another 3.
Egeria implements several metadata standards right out of the box, services using those standards can communicate immediately without any additional software to be written. Additionally, Egeria allows external adapters to connect to the system to support other services. This modularity makes Egeria very flexible, where tools can easily be integrated and connected. Egeria automatically verifies that the connector upholds the existing conventions. This verification is necessary to communicate with other Egeria components and ensures consistency of everything plugged into the Egeria system.
The new design, with Egeria acting as the middleman for all communications, has implications for the maintainability and usability of the system. If you, for example, replace or (significantly) update one part of the system, then only the component responsible for the communication between that part and Egeria needs to be changed. This new model removes the need to replace all existing connections helping to increase the maintainability and usability of the system in question by reducing complexity and coupling between tools3.
Interested parties.
Two of the largest identified stakeholders in Egeria are IBM and ING. These multinational companies use Egeria to connect their microservices. There are many other interested parties, like Hadoop and Sas, but have little publicly available information about their work with Egeria. Therefore, we mostly focussed on ING and IBM as they have a long history with the project and a lot of info about them is available.
One of the most important characteristics of Egeria is the maintained quality of the stored and transferred metadata contained within the system. As noted by ING, one of the creators and major stakeholders for Egeria, the accuracy with which they can interpret their data is of great importance for their internal decisions as well as the help or advice they can give to customers 5. Thus being able to access, edit, and transfer their data as well as metadata to all their services without losing quality is vital for their operations.
IBM also recognizes the importance of Egeria. They note that sharing metadata between different companies can be seen as travelling between countries and the large number of power adapters you then need6. As you are travelling you will need to take an adapter with you wherever you go to have access to power. Creating these adapters and hotels having to accommodate those who forgot theirs all bring the additional cost to having access to power. Now with the introduction of a universal adapter, this problem was solved, it is an interface between the electrical device and the power outlet of the country you are visiting 6. Fundamentally, metadata works the same way as every vendor uses different standards and making a new ‘adapter’ for every connection needed requires a huge amount of resources and will lead to missed opportunities. Egeria intents to be the universal metadata adapter, allowing everyone to connect their services and exchange metadata.
Another important attribute of Egeria is its open-source nature. While this naturally plays a role in the development and structure of the system, it is also vital to provide different users with the confidence they need to use Egeria. It is important to have an open platform and open standards for everyone to use and contribute to, without one organization being responsible for the codebase and making others hesitant to join 7.
Ethical concerns
Whenever data is involved in a project, especially personal data, there are numerous ethical implications to consider. Privacy issues, possible data leakage or data loss are all justifiable concerns and should be taken seriously by Egeria to build trust with the userbase.
Egeria does not explicitly state how they incorporate these ethical concerns in the project. With that said, Egeria is built on top of existing systems and not the primary owner, meaning that the data is saved in the user’s original system.
To preserve the privacy and security of data, Egeria is a fine-grained permissions system, where both the users and data can be assigned specific permissions. These permissions can even hide the existence of data if the knowledge of said data could already pose security or privacy issues.
In addition, during a developer meeting, we attended a new feature suggestion was discussed and the team spent most of the time investigating whether this feature could potentially impact the security of the system. Furthermore, Egeria stresses the importance to keep moving forward with the latest releases as they contain the latest security patches. Showing that the developer team takes (data) security seriously.
Roadmap
Egeria tries to make a new release every month which contains new features and fixes existing functions. They strive to make every update backwards compatible and will explicitly inform the users. If the release is not backwards compatible, there will be a guide made available on how to upgrade.
Egeria has grouped their directions for the coming years in 5 capability layers as can be seen in figure 3 to figure 7. These layers describe the whole system and secondary visions they have set for the project. In 2022 the focus of the project was extending the governance solutions. Egeria wants to fill the gaps in governance solutions to support individuals that are responsible for an open metadata ecosystem. With each release they update their roadmap and select new elements in the capability layer to focus on.
The capability layers are Governance solution, education, Integration platform, Developer platform and deployment resources. Below we go into more depth about these layers
Governance solution The governance solution focussed on delivering the users more functions and a well-developed interface. This means it is mainly centered around the extension of the user interface of Egeria, additional content and improvements around the integration platform.
Education Egeria not only tries to work on the open source project functionalities, but also tries to maintain a learning environment such that people can more easily adopt the software. They have created different practical assignments for professionals at different skill levels. Already there is a dojo, where you can get familiar with the project. The hands on labs and guidance on governance are still in progress.
Integration platform The project tries to make third party software and additional system UI integratable with Egeria. They strive towards a system where minimal coding is required, eventually even for the custom-made and less common software extensions.
Developer platform This layer, being the most extensive, seems to have priority for Egeria. It provides the handles for distributed governance solutions. It contains the code base of egeria and provides the support into the open metadata ecosystem.
Deployment resources In this layer the focus is on making the process of deploying Egeria into an operational environment as simplified as possible.
So, why Egeria?
When talking about metadata exchange we enter a complicated landscape of adaptors, governance, security, and maintainability. For organizations, or even smaller businesses, where large amounts of metadata need to be exchanged a central solution is crucial not only for capitalizing on the value metadata provides, but also to handle the increased complexity and security issues that come with it. Egeria provides a powerful open source solution for organizations to handle these issues, while still being flexible due to its modular nature.
References
-
Egeria homepage. https://egeria-project.org/ ↩︎
-
Egeria the challenge. https://egeria-project.org/introduction/challenge/ ↩︎
-
Egeria solution. https://egeria-project.org/introduction/overview/ ↩︎
-
Governance Program Open Metadata Access Service (OMAS). https://github.com/odpi/egeria/tree/master/open-metadata-implementation/access-services/governance-program ↩︎
-
How ING is becoming a metadata driven enterprise using Egeria. https://www.youtube.com/watch?v=LMmm74t-BBA&ab_channel=ODPi ↩︎
-
Egeria open source standard enhances hybrid cloud metadata and data governance initiatives. https://www.ibm.com/blogs/journey-to-ai/2020/09/egeria-open-source-standard-enhances-hybrid-cloud-metadata-and-data-governance-initiatives/ ↩︎
-
Egeria project description github. https://github.com/odpi/egeria/blob/master/open-metadata-publication/website/README.md https://egeria-project.org/ ↩︎