Data Quality Discipline
Intelligent Enterprise’s latest issue has an article on Data Quality Discipline.
It talked about
how data quality and extract, transform and load (ETL) are tied together by describing the transofrmation explosion and analytics thrashing that occurs with poorly understood source data.
There are many issues that the BI community encounters during the ETL of data and these are the exact issues that the log analysis community encounters as well. They are
- Multiple meanings for the same data element
- Multiple sources of data elements
- Differing levels of history
- Data cleanliness and accuracy
- Deintegration for audits and validation
I was talking to a PS guy from a SIM vendor the other day and he mentioned that one of the biggest problem all SIM vendors have is the understanding of the the logs.
Logs come in at all different sizes, formats and syntax. In order to truly understand the logs, one has to perform extensive research in order to understand them. A lot of the log formats are not publicized, so it makes it even more difficult to find out what the logs contain.
Sometimes logs have multiple fields that are similar. For example, in Windows events, there could be up to 3 different user or account name fields. What do each of them mean? How do you distinguish them?
Understanding of the logs has a direct correlation to the quality of the reports. Without knowing what the logs provide, the reports will be meaningless. Without truly understanding the meaning of the different log fields, it’s impossible to correlate the different log sources to identify anomalies.
Log analysis vendors need to have at least a data architect whose sole responsibility is to understand the various log sources, dissect the log fields and generate meaningful reports. If one can afford it, these tasks should be performed an analytics team of 2 to 3 people.
When evaluating a log analysis product or vendor, this is definitely a question to ask. A vendor without a data architect or analytics team will not be able to provide the domain expertise for the implementation to succeed.