Data Mesh: What Is It and Why Is It Important?

What this blog covers:

Architectural roadblocks for businesses in achieving data democratization.
What is Data Mesh?
Importance of Data Mesh for organizations.
Role of Kyvos in your Data Mesh architecture.

Listen 09:04

We live in the era of self-service business intelligence, where organizations build their data empire with the intent to fulfill the need for democratization and scalability. They work endlessly to consolidate massive datasets in one place and assume that it is available and accessible to users across the enterprise. But just because it’s available doesn’t mean that it’s not buried behind the web of complex relationships of siloed data warehouses or data lakes with limited analytical capabilities.

Despite investing a huge amount of money in building a centralized architecture, they fail to scale and accommodate data coming from diverse domains within and outside the organization in massive volumes and deliver timely insights. Over the last decade, a broad spectrum of technologies has been introduced in the market to make data accessible to users in the best possible way. Now the question is, are these technologies solving the purpose?

Even after trying several, diverse technologies, organizations are far from achieving their goal of democratization and scalability with their currently monolithic technical architecture. They need to identify the symptoms in the current state of their architecture. Let’s look into some of the well-known issues.

Challenges Faced with a Centralized Data Architecture

Issue 1# Centralized architecture doesn’t work well for large enterprises with rapidly changing data sources and use cases.

Centralizing entire enterprise data into one data analytics data warehouse hinders the ability to process massive data from varying data sources and use cases. It requires importing data from edge locations to a data lake and querying it for analytics, which is an expensive and time-consuming task.

Issue 2# The increase in data sources and changing business requirements make it difficult for centralized data platforms to stay agile and responsive.

The increase in data sources requires methods to implement changes to the entire data pipeline. The changes made to the data pipeline are difficult to implement due to the band aided structure of the monolithic data warehouse and this impact business agility.

Issue 3# Centralized architecture results in a disconnect between data consumers and data producers.

There are three roles involved in a centralized architecture –

Data Producers – The first role is that of data producers. They have all the knowledge of the data and can change its shape.

Data consumers – The second role is of data consumers. They understand the enterprises’ business problems and based on the insights, gleaned from the analytics data they make intelligent decisions to resolve them.

Central Data Team – This team is responsible for delivering highly curated data. Yet, they neither possess the domain expertise nor deeply understand the business problems addressed by the consumers.

This unfortunate current paradigm of responsibilities creates a bottleneck that leads to delayed insights for the consumers.

The organizations facing these challenges should address them by analyzing the offering that a decentralized architecture can bring. A new architecture called Data Mesh is creating a buzz these days. It was introduced by Zhamak Dehghani a few years ago. Let’s find out what Data Mesh is.

What is Data Mesh

Data Mesh is an innovative approach that focuses on decentralization and distribution of responsibility among the people who are closely associated with data. It instill ‘data as a product’ paradigm, that supplies highly curated data to consumers. Data mesh drives organizations towards a well-governed data usage and self-service data infrastructure. It introduces four threads – domain-driven architectures, data as a product, self-serve infrastructure, and governance.

Domain-driven architectures – Data Mesh supports domain-specific data ownership, where each domain team is responsible for its own data assets. Since each domain team has a better understanding of their domain data, they build and maintain their data products. This way, data ownership resides with domain experts. They can help evolve business processes requirements rapidly and prioritize use cases from a domain perspective.

Data as a product – Data Mesh strongly emphasizes the concept of reusability so the data product can be anything based on the business needs. It can be algorithms, derived data, dashboards, raw data, etc. The complete ownership of these data products lies with the respective domain product teams. They are responsible for operating, administering, and managing consumers’ concerns. The data product owner within the domain is held accountable for the quality of the data products.

Self-serve infrastructure – The Data Mesh approach introduces standardization through self-service infrastructure provisioning where domain teams can maximize the use of IT resources and independently build and maintain their data products. It’s more the matter of governing the number of technologies. As Data Mesh supports scalability efficiently, domain teams are empowered to contribute to their expertise.

Federated data governance – Data mesh centrally defines the data governance standards and gives local domain teams the independence and resources to execute these standards appropriately concerning their particular environment. The approach maintains persistent access controls and data protections by giving the accountability of maintaining high-quality data products to the teams that are most familiar with it.

Data Mesh applies these four threads together and forms a new architectural paradigm that enables data analytics at scale. Using Data Mesh, organizations can connect distributed data sets and allow multiple domains to host, access, and share datasets in a user-friendly manner.

Why is Data Mesh Important?

Enterprises need to change their current model of providing analytics data from a centralized data lake or data warehouse to a distributed data products ecosystem. A centralized architecture works well for organizations whose business domains or data landscapes do not change frequently. But for organizations where new data sources are being introduced continuously; this is when an organization’s monolithic architecture starts to fall apart. Additionally, many hand-crafted steps are involved in centralized data architectures like data ingestion of objects, which are often not visible to teams. So, when it’s finally available to them, data warehouse teams face several challenges such as understanding data across a wide array of domains, excessive friction resulting from dependencies, and long analytical lead times to translate data into insights.

In contrast, Data Mesh distributes responsibilities from a centralized data warehouse team to domain teams, who are the experts of their own data, to onboard and manage their data products. They are free to operationalize it as they desire and can act faster to gain meaningful insights. d Data consumers can pull customized views and enterprise-wide views of their data across domains. Therefore, with Data Mesh, organizations can extract remarkable value from massive datasets and gain competitive advantage over others in the market.

The Future of Data Mesh

With stakeholders and data workers within organizations thinking about data from the business and use case perspectives, data mesh helps optimize digital investments. In this same context, the value of this architecture will keep growing in the next few years. Self-service analytics will become critical to the success of a data-driven enterprise. The most challenging part will be to ensure and manage data governance within decentralized departments, based on proper coordination. However, as businesses evolve their cloud solutions, they will continue optimizing data models with a robust security mechanism that adheres to industry regulations and policies.

How Does Data Mesh Architecture Work with Kyvos

Kyvos’ Smart OLAP™ technology, built for the cloud, can play a significant role in the consumer-facing data product layer of your data mesh architecture. The distributed and independent efforts in creating data products of different domains can be consolidated into the Kyvos layer to provide a big picture. This is the cloud-scalable and robustly securable layer where all the data products from different organization domains can come together to get instant results on the most complex queries.

To learn more about data mesh and how Kyvos can complement your data mesh architecture read our detailed, technical blog “Data Mesh Architecture and Kyvos as the Data Product Layer”.

FAQs

How does Data Mesh differ from traditional data architectures?

In traditional data architectures, the data lake is used to store data from multiple sources in structured and unstructured formats. Data mesh architecture allows organizations to use data lakes for building data products and enabling self-serve analytics. The datasets in the resulting catalogs are updated in real time, encouraging decentralization and cross-functional collaborations.

What are the key components of a Data Mesh?

The main components of a data mesh architecture are hub nodes for managing the routing paths, spokes used as network devices connected to the hubs, links for logical or physical connections between different spokes, and routing protocols to exchange information between spokes and hub nodes.

What is the concept of treating data as a product in Data Mesh?

Teams need a product-first approach in their data management for effective collaborations. This happens in data mesh where they treat their data assets as individual products while other teams/departments become the customers. The process helps decentralize data ownership by transferring it to departments that produce and consume this data instead of entrusting it all to one centralized data team.

How is self-serve data infrastructure implemented in Data Mesh?

Unlike distributed architecture where every domain needs to set up individual data pipelines for its data products, self-serve platforms minimize the workloads on these teams. In this architecture, engineers create an ecosystem where all business units can create and use their own datasets with proper distribution of ownership.

What is a federated computational ecosystem in the context of Data Mesh?

Federated computational governance is an approach where responsibilities are divided between one central unit and several domain-specific units for data security and governance. It ensures not only autonomous operations but also full compliance with governance policies at all levels.

Tags:

What is Data Mesh and Why is it Important?