DataWorks: Data Layer(II)
I really want to say thia article should be the final.
Well let’s see.
Previous
In my previous article, I talked about what I’ve been struggling with: data definition.
Complex business requirements always lead to complex definitions. Complex definitions make naming difficult and force generalization — and generalization is the beginning of chaos.
Just like profit or net-profit, it is a such easy concept, but subtle definition difference can lead to different results.
Some of my friends told me that the problems I was struggling with—such as how to define profit—are essentially financial questions.
They asked why I, as a data engineer, would be so concerned with them. That’s actually a good point. But in the company I currently work for, those kinds of tasks are still handled by data engineers anyway.
So what can I say ?!
Data Layer
So definitions are important, and find the way to avoid gerneralization is also important.
The key is Data Layer.
and here is some core concepts.
Operational Data Store (ODS)
This layer is used to receive and process raw data that needs to be stored in a data warehouse. The structure of a data table at the ODS layer is the same as the structure of a data table in which the raw data is stored. The ODS layer serves as the staging area for the data warehouse. The following operations are performed on the raw data at the ODS > layer:
Synchronize incremental or full structured raw data to the data warehouse.
Structure unstructured raw data, such as logs, and store the outputs in MaxCompute.
Record changes in raw data or cleanse raw data based on your business requirements.
Dimension (DIM)
At this layer, data models are built based on dimensions. You can define dimensions, determine the primary keys, add dimension attributes, and associate different dimensions. This > ensures data consistency in data analysis and mitigates the risks of inconsistent data calculation specifications and algorithms.
Data Warehouse Detail (DWD)
At this layer, data models are built based on the business activities of an enterprise. You can create a fact table that uses the highest granularity level based on the characteristics of a specific business activity. You can duplicate some key attribute fields of dimensions in fact tables and create wide tables based on the data usage habits of > the enterprise. You can also associate fact tables with dimension tables as little as possible to improve the usability of fact tables.
Data Warehouse Summary (DWS)
At this layer, data models are built based on specific subject objects that you want to analyze. You can create a general aggregate table based on the metric requirements of > upper-layer applications and products.
Some general dimensions can be abstracted at the ODS layer based on preliminary classification and summary of user behavior. For example, the dimensions are time, IP address, and ID. You can use these dimensions to obtain statistical data, such as the numbers of products purchased by users at different logon IP addresses in each time period. At the DWS layer, you can add multi-granularity aggregate tables on top of general aggregate tables to improve the calculation efficiency. For example, you can save a long period of time if > you evaluate user behavior based on the time interval of 7 days, 30 days, or 90 days.
Application Data Service (ADS)
This layer is used to store the metric data of products and generate various reports. For example, the ADS layer can be used by an e-commerce enterprise to store statistical > information about the sales volume and the ranking of each type of ball sports goods in Hangzhou from June 9 to June 19.
And here is the reference link:
- Alibaba Cloud:Create a data layer
- Doris:Building a data warehouse for traditional industry
- Flink Is Attempting to Build a Data Warehouse Simply by Using a Set of SQL Statements
I was curious to find that all the articles I’ve read about the data layer are from Chinese sources — maybe because only Chinese business operators are so noisy with their data engineers.
To be honest, it was also my first time learning the whole process.
A case

Here is some core ideas about the diagram:
- ODS & DIM are the foundation, naturally, and no necessary to design.
- I try my best to make DWD simple and detailed.
- DWS does all the aggregation, and implement all the business intentions.
- Self-explanatory, the two DWS tables are used to differentiate the business intentions of the two departments.
