6-minute read
Quick summary: Exploring the core principles of data mesh and its practical applications within Microsoft Fabric
Enterprises are constantly seeking innovative methods for managing their vast data landscapes efficiently. Data mesh has emerged as a transformative approach in this space, fundamentally changing how data is handled by advocating a decentralized, domain-oriented approach to data architecture. Unlike traditional centralized systems, data mesh empowers domain-specific teams with the ownership and tools necessary to manage their datasets effectively.
Thanks to Microsoft Fabric’s support for data mesh implementation, businesses can leverage self-service infrastructure, domain-centric governance, and enhanced data-sharing capabilities. This paradigm shift not only promotes responsibility among data engineers, but also streamlines the process of transforming data into actionable insights, fostering a truly data-driven business model. In this article, we’ll unpack the core principles of data mesh and explore its practical applications within the Microsoft Fabric ecosystem.
Thanks to Microsoft Fabric’s support for data mesh implementation, businesses can leverage self-service infrastructure, domain-centric governance, and enhanced data sharing capabilities.
What is data mesh?
Data mesh—an approach founded by Zhamak Dehghani—refers to a decentralized, distributed approach to enterprise data management. This holistic concept sees different datasets as distributed products, oriented around domains. The idea is that each domain-specific dataset has its own embedded engineers and product owners to manage that data and its availability to other teams. This approach drives a level of data ownership and responsibility that is often lacking in current data platforms that are centralized, monolithic, and often built around complex pipelines.
Data mesh is founded on four core principles:
Domain-driven data ownership and architecture
A domain is simply a collection of people typically organized around a common business purpose. The domains are responsible for the data they produce—for ingestion, transformation, and serving that data to end users. The people who are most knowledgeable about the data are the people preparing and providing the data for analysis. The data becomes another product that the domain produces and is responsible for, and the data engineers focus on data within a single domain, working closely with other domain SMEs to produce valuable data products.
Data as a product
A data product is data that is served by a domain and consumed by downstream users to produce business value. The data product is the heart of the data mesh—it is created, analyzed, and combined with business knowledge to allow businesses to use data to answer questions. Without the data product, a business cannot reach its goal of being data-driven.
Self-serve infrastructure as a platform
The domain should not be tasked with managing the underlying data infrastructure themselves. Rather, a central IT organization should enable the domain to use a provided self-service data platform that will provide functionality across the mesh, from storage and compute capabilities up through consumption. The central IT organization supplies a development platform that allows domain engineers to focus on building high-quality data products that garner business value in the form of data analytics.
Federated computational governance
Data mesh proposes a shared responsibility between the domains and the central IT organization to adhere to governance while allowing the domains adequate autonomy. This is a federated model, meaning there is a cross-domain agreement as to which aspects of governance are handled at the mesh level and which are handled by the domains.
The holistic concept of data mesh sees different datasets as distributed products, oriented around domains.
Data mesh on Microsoft Fabric
Microsoft Fabric supports implantation of data mesh by
- Providing a self-service infrastructure that can be utilized by various departments, by organizing the data into OneLake by domains
- Enabling data consumers to be able to filter and find content by domain
- Enabling federated governance, which means that some governance currently controlled at the tenant level can be delegated to domain-level
Domains
In Fabric, a domain is a way of logically grouping all the data in an organization that is relevant to a particular area or field. One of the most common uses for domains is to group data by business department, making it possible for departments to manage their data according to their specific regulations, restrictions, and needs.
Workspaces are associated with domains, and all items within a workspace receive domain attributes as part of their metadata. The association of workspaces and the items within them with domains enables a better consumption experience.
In addition to making the workspace items searchable, some tenant-level settings for managing and governing data can be delegated to the domain level, thus allowing domain-specific configuration for data governance. You can create subdomains under domains to refine the way data is structured. You organize your data into the appropriate domains and subdomains by assigning the workspaces where the data is located to the relevant domain or subdomain.
As a Fabric Admin, you can create domains by going to the admin portal, selecting the “Domains” tab, and clicking on “Create new domain” button. Domain name is a mandatory field, and you can also assign domain admins—ideally, subject matter experts who are familiar with the data as well as the regulations and restrictions that apply to it.
Domain admins have access to the “Domains” tab in the admin portal, but they can only see and edit the domains of which they’re administrators. They can update the domain description, define/update domain contributors, and associate workspaces with the domain. They can also define and update the domain image and override tenant settings for any specific settings the tenant admin has delegated to the domain level by going to “Delegated Settings.”
Domain contributors are workspace admins whom a domain or Fabric admin has authorized to assign the workspaces they’re the admins of to a domain, or to change the current domain assignment. Domain contributors assign their workspaces in the settings of the workspace itself. Once a workspace is assigned to a domain, the domain icon is displayed alongside the workspace name.
The Domain image created as part of the configuration process makes it easier for people to recognize the domain. When a domain is selected in the OneLake data hub, the domain image will become part of the theme of the data hub and displays only the artifacts belonging to that domain.
Data sharing
Microsoft Fabric supports the ability to share certified datasets with other domains. Admins can share the Lakehouse by clicking the Share icon and granting other domain users appropriate access.
Once data is shared, you can create a shortcut by clicking the “New shortcut” icon from the Tables section and selecting the shared domain’s Lakehouse table. The other domain data is now available locally without the need for copying data.
Object-level permissions on selected objects is also supported by using the GRANT SELECT statement.
Data mesh represents a significant departure from traditional data management practices, offering a scalable and responsive framework that aligns with the dynamic nature of modern enterprises. By embracing the principles of domain-driven design, product thinking for data, self-serve data platforms, and federated computational governance, organizations can enhance their analytical capabilities and accelerate decision-making processes. Microsoft Fabric’s integration with data mesh provides the necessary infrastructure and tools to facilitate this architectural shift, enabling businesses to not only manage but also maximize the value of their data across all domains. As the landscape of enterprise data continues to evolve, data mesh stands out as a crucial strategy for companies aiming to stay competitive in a data-centric world.
Put your data to work for you
- Data strategy
- Data science
- Data engineering
- Visual analytics
Like what you see?
Syed Zaidi is an Architect in Logic20/20’s Advanced Analytics practice.