I was recently having a chat on Slack with a friend about ‘what is Data Science’. What brought up this conversation was something that I feel is the mismatch between what some people hear when people say ‘data science’ and what it is in fact in the real world.
My thinking on this has been influenced by countless chats with friends, on slack, and my own jottings.
Data Science is marketing
Firstly, it’s worth pointing out that for a lot of companies, what they talk about publicly – ‘our recommendation engine’ or whatever is a kind of aspirational speech for a lot of companies – it’s to indicate to both investors (whether the company is public or VC backed or whatever) and future employees that the company is innovative.
I think one problem that those on the Data Science job market have is that they hear ‘Google is doing X’ and assume that ALL of the data science job market is about Machine Learning jobs. This isn’t always intentional but the need to ‘be seen as innovative’ sometimes leads to companies over emphasising their ML needs. I joke that ‘knowing ML means you know when not to use ML’ – and in reality a lot of my work over the last few years has been about the tactics of deploying ML/ picking the right problems.
Data Science is a functional thing in a company
I’ll say something strong but true about the world – “data science and Machine Learning will be necessary functions in all companies at some point in the near future” due to things such as competitive advantage, the increasing storage of data. If you want to read more just read what McKinsey has to say about this!
Within the broad church of Data Science there are actually distinct roles.
- Producing product insights (involves lots and lots of SQL)
- Producing Machine Learning models in production systems
Now attached to 1. is also things like advertising spend forecasting, analytics in general (that is reporting and analysis), statistical models, survival analysis, A/B testing, improving the product conversion flow, visualisation, etc.
It’s really difficult to do 2 in some domain without understanding the domain. So you need to look at your data, and improve your understanding of the domain. How do you do that?
You do that by doing some of 1., even if your job involves Machine Learning or you are a Machine Learning engineer you need to understand the data generating process of your domain.
There are other reasons you should do 1 first. On a purely cost-benefit analysis of 1. and 2. Unless you’ve exhausted all the low-hanging fruit the cost of say a 6 months recommendation engine project isn’t justified – there’s often a lot of value in just summarising the data or understand insights or patterns.
Creating a data culture
One of the ideas Adam spoke to me about was ‘creating a data culture’ and he said that an advantage of the data analysis. So let’s say you’re at an e-commerce company and you look at page views by category and discover that in the top 10 categories there’s shoes 5 times!!! Then you can inform marketing and focus your efforts on shoes. This can drive a lot of value to the company. It’ll also help you understand the subtleties and nuances of the data, especially since good business managers will often know their numbers super well.
Don’t be sad if you’re not working on ML all the time
I guess the message here is to focus on business value and not super new techniques. Your job is to add business value.
- Sometimes that involves doing lots of Exploratory Data Analysis (something I need to spend more time on myself 🙂 )
- Sometimes A/B testing
- Sometimes building and developing analytics for BizDev deals
- Sometimes cohort analysis
It took me a while to grok this. It took me a while to realise that while we don’t necessarily value things like EDA – I personally rush to model sometimes, is because of some hangover from academia plus the fact that we’re all victims of hype.
A data scientist is someone who adds value with data – so just focus on doing that.
Erik has written about this – and a lot of my thinking has been inspired by Erik.
Thanks to Mick Cooney, Eddie Bell and Bertil Hatt for discussions about this.