The story goes like this. A well-known tech company that got in to analytics, went and said to their insurance client after conducting a fancy data mining study, those who are above 65 years of age have a higher tendency to lodge a claim!
Regardless of the authenticity of the story, it tries to drive a point, a point that is often forgotten in the big data conversation. Things can go horribly wrong if data driven solutions are devised without adequate involvement of domain specialists. This is especially true in the case of B2B solutions, where invariably solutions fall within the financial, operational and regulatory frameworks.
Let’s look at another example from the same industry. It is against US regulation to price motor insurance policies based on age. However, data indicates that young drivers, say below 21 years of age, are riskier than the middle-aged drivers by multiple folds. The workaround? To price policies based on ‘driving experience’ allowing a clear differentiation amongst the young and middle aged.
Sometimes I wonder if the conversation is too biased towards the most sophisticated neural net algorithm or the db that allows you to execute thousands of queries in milliseconds. Of course, these are important elements in the equation. However, the significance of domain knowledge cannot be emphasized enough.
This also makes us ponder about the role of the data scientist as well. Shouldn’t the data scientist be aware of some of these intricacies? Perhaps yes, but the problem is, the data scientist who is a modeling and algorithm specialist, both back and front-end technology specialist and also a domain specialist belongs to a very rare breed, a breed that can be as rare as unicorns!