This should be the final article in this series. I thought the 2nd one would be the last, but I realized some points after writing it.

Data layer is a theoretical concept, the problems is , when the company is small, you almost can’t find any benefit from it. But when the company is big enough, you will find it is too late to build data layers.

And now I’m in the second situation. Currently, my team is rebuilding our data pipeline, and trying to build data layers. I’m the person in charge of the rebuilding. I intend to build data layers, the challenge is, first, messy data pipeline is already there and refactoring it is a big job, second, there is no benefit in short term.

It‘s actually a management or architecture problem. Our company is big but not so big that we can see the benefit of data layers very clearly. At the same time, the risk of rebuilding in a different way is very obviously high.

So this article is just a note for myself. Regardless of the theory is already finished. And there should be a keep-updating article in the future about how to build data layers in a real company.

Gradualism

Back to the theory.

Most of the data pipeline which already exists , basically, ODS and DIM layers are axiomatic. I can’t imagine a counterexample. Maybe some of ODS layer is not so clean, it’s already enhanced, but it is still ODS. And easily to be split into next layers - DWD.

Now my situation is like that, I have ODS and DIM layers. DWD layers are not clear enough, most of them are just duplicated from ODS. DWS is almost empty. So I plan to build DWD and DWS as much as possible while migrating the data pipeline.

As much as possible, in a gradual way. Because it is a rebuilding and migrating, so easily ADS layer already was there, and we just need find out data indicators which were defined in the ADS need relocated to be calculated from DWS, and just meke it happen. And of course, make a clean DWD layer.

Now is 10/2025. I’ll keep updating this article.