Preface: It just what I want to say in the past few years

I have been working on some data related projects for a long time, and cause of witch I also have been working in web or online service related projects at the same times, I have some personal opinions about the hole data process‘ founding and maintaining.

And also it is complaints about my work facts. I want to say that few times, but I just lazy to write it down, and now it is an appropriate time since I must write something to improve my English.

Data Definition

Actually, data has no definition by itself, data is kind of fact, definition comes from purpose, and the purpose comes from business.

Here is a simple example. A common online e-commerce company often has income data are as follows: orders record, payments, balances.

  • Orders record are the data of orders, there are every single order details including the order status, price, discount, etc.
  • Payments are the data of income and outcome, multiple payments channels, different payment methods, account period.
  • Balances are users’ balance change logs basically, including the balance change reason, the balance change amount, and the balance change time.

So here is the question: witch data should be used when we want to know the company’s profit?

Success delivered order could be refunded. Declared payment could be failed. Balance change is from users’ side and so indirectly.

So how to know the company’s profit in some period?

The answer is: it depends on the business purpose.

Like if the business purpose form operations department is to know some campaign’s effect, obviously, we should use the orders record data, calculate how many orders with the campaign sign are create. If the final status of the order is considered, maybe we can identify some concept like ‘effective order’ to classify the orders witch are not refunded or canceled.

As the same logic, riskcontrol department,finance department, and other departments may have different business purposes, and of course they do not trust the definition from operations department. It’s natural and of course not saying they data is incorrect. Different data definition has different data period, maybe some departments’ successful orders are not the same business meaning as others.

As I said, I have been working on data related projects so long, that not a data section problem, it is a business problem. Basic data is various, and concept should be defined, and everyone should be on the same page.

My point is ,don’t try to simplify those. All attempts I’ve heard about simplifying the data complexity are just make the data more complex.

And don’t try too hard to make connection between different data definition, it is not necessary and not useful, you can find nothing but chaos. For example, why daily order income is not equal to daily payment income? Because payments always after order, and if 15 minute pay later is allowed, then the time gap is 15 minutes between order and payment data. Do we need to make a connection between those two data? No, we don’t. Gap is fact, let it be.