Virtualizing for better links with the Logical Data Fabric - DEEP

The data mesh, which we introduced in a previous article, sets out the organizational framework for managing and accessing large-scale enterprise analytical data. Some have pointed out the difficulties faced by CDOs in aligning the multiple technical solutions already in place with the organizational processes implied by the data mesh. This is precisely because they are missing an operational pillar: data virtualization.
The myth of Daedalus applied to data
Let's take stock of the situation: organizations are accumulating data, and businesses are seeking to cross-reference it on a massive scale. Scattered sources are multiplying and adding up, along with a growing history. The ERPs, incentres and centralized systems of 20 years ago have been joined by data warehouses and data marts. Big data created the concept of volumetry, and with it its data lakes and their cohort of new tools and data varieties (XML, JSON, web services, Excel files, etc.). Today, AI and machine learning are giving rise to data platforms designed to provide the fundamental stacks of data volumes for their exploitation.
What all these technologies have in common is that they form physical, impermeable silos of data that cannot be used in conjunction with each other, lack compatibility, have specific and complex management rules and are in the hands of IT departments. To get round this inherent rigidity in the design of solutions, business units adopt third-party business analytics tools, the main result of which is a dizzying loss of data governance. Without global governance, organizations cannot hope to achieve sufficient semantic quality and accessible management rules to make data production easier and more reliable. In short, data analysis is going round in circles in a technological labyrinth built almost too meticulously, year after year.
Merging without betraying specific management rules
The great strength of the data mesh concept is that it gives business users access to data wherever and whatever it may be, using the tools they want, and giving them the ability to act on it without having to call on IT departments. To do this, you need a merging solution (that goes beyond the problem of silos) that does not affect the underlying systems.
This is the role of data virtualization, as proposed by the Logical Data Fabric, a logical framework for exposing, manipulating and modelling data. A sort of Ariadne's thread, in short, at the heart of the enterprise systems (which, of course, there is no question of discarding). Data virtualization breaks down the barriers encountered to date, one by one, thanks to the following 6 characteristics:
- Data abstraction: this is the ability to organize data virtually, according to a logical model.
- Zero replication, zero relocation: essential in all data-intensive models such as AI or data quality management, where the design of patterns is based on large quantities of information, zero replication also contributes to the drive for sobriety.
- Realtime/near real time: the absence of replication, and therefore of lengthy processes, means that data can be plugged in real time.
- Self-service data: virtualization means that source data is not impacted. Business users can add information to their models on the fly, translate any change to a standard into a management rule without any physical change to the data warehouse, generate web services, access sandboxes, etc., all within an exposure framework open to the whole organization if required.
- Centralized metadata, security and governance: the ability to expose metadata enables maximum interoperability. Organizations can then freely define their management processes, validation workflows and availability to business users. Security and governance guarantee traceability, role definition, anonymization rules, encryption, network security, etc.
- Hybridization towards the cloud: the Logical Data Fabric enables you to rely on third-party systems while meeting specific confidentiality and security constraints.
Common confusions
The organizational concept of the data mesh is all the rage: its promise to eliminate the traditional bottlenecks in data strategies without jeopardizing initial investments is both seductive and effective.
According to Gartner's definition, a data fabric is first and foremost a physical platform. In fact, that's how several major vendors see it. However, it seems to us that ignoring virtualization is a fundamental mistake: it is the organization’s ability to virtualize the data that opens up access to unlimited exploitation, management and transformation, and above all without the contingencies associated with the management rules or application logic specific to a data lake, ERP or data warehouse.
Similarly, it should not be confused with data federation. A relatively old technology, federation enables multi-source connection without replication, but the processing is carried out at tool level. Because data virtualization delegates operations to the source systems, the high performance allows all types of use, particularly those that are particularly resource hungry.
In conclusion, the Logical Data Fabric is first and foremost a framework. This means that it must be scalable and extensible, to respond to any technological problem encountered. This is the meaning and purpose of virtualization: it enables adaptation to new processes through the development of connectors and removes physical limitations. In other words, a Logical Data Fabric is the ultimate data accelerator.
Contact us
Do you have any questions about an article? Do you need help solving your IT issues?
Contact an expert







Our experts answer your questions
Do you have any questions about an article? Do you need help solving your IT issues?
Other articles in the category Data & AI
Federated Governance: A Key Pillar for Successful Data Mesh Implementation
Learn why federated governance is a critical organizational pillar in a Data Mesh architecture. A strategic issue for data-driven companies.
Published on
12 December 2023
Top 10 Databases of 2020: Popularity Ranking
Explore the ranking of the top 10 most popular databases in 2020 according to DB-Engines, including Oracle, MySQL, and Microsoft SQL Server.
Published on
14 November 2023