This is a fairly opinionated post. It doesn’t represent the views of anyone else other than myself.
I recommend the post above, and I’ll give my take on it for non-technical managers managing or leading Data Science teams. There’s considerable overlap, since in fact a lot of modern day Data Science work is Software focused.
Firstly – why is this a problem for your company or organisation. Well anecdotally in the world of Data Science there’s a feeling that projects/ teams aren’t delivering the value that was expected. Some of this is a function of hype, some of this is a function of a fast changing technology ecosystem but as I’ve experienced first hand – one of the problems is poor leadership or poor management.
Like Deepak in the Software article I think there exists a considerable communication disconnect between technical specialists and management on Machine Learning projects. I feel this extends as useful advice to data science leaders (and I am a Senior Data Scientist) and scrum masters, product managers. By no means do I not want to encourage people from various backgrounds from managing Data Science teams, but I’d like to explain why it’s harder without leveling up your technical skills.
As Deepak says
Firstly, it benefits you – the non-technical manager. There are two important benefits you get if you are a technically aware manager.
Better management skills – You will always know what you are doing.
Better communication – You will always know what you are talking about
As one anecdote I was once in a room with non-technical managers, brainstorming for a project. Their responsibility was to ‘represent’ or ‘translate’ requirements from the business. However they were frankly poor at this, because they didn’t understand what was possible – in terms of what various classes of algorithms could do, in terms of interpret-ability of algorithms (would a decision tree work well, or do you need high accuracy/ precision/recall), nor did they understand the complexity of connecting to various data sources. It was an incredibly frustrating and not very fruitful conversation, especially because when I discussed the technical aspects – they were unable to tell if I was ‘bullshitting’ or not.
Unfortunately that environment tended to reward the ‘talkers’ rather than the builders, and projects were often poorly managed. I even heard one manager say something like ‘we need to move past machine learning into deep learning’ which is absolute nonsense. Deep Learning is a sub field of Machine Learning 🙂
So three things that I think non-technical managers will get wrong in managing a technical project
Believing process will fix everything.
It’s definitely been my experience that non-technical managers when faced with poor performing (or misunderstanding the performance of) will implement ‘process’. Unfortunately this is a good way to both indicate a lack of trust/ appreciation for your team, and adding extra overhead often leads to meetings and planning poker and various other things that don’t work well for R and D projects. Since R and D projects are fundamentally creative, and non-linear. I’ve been building machine learning pipelines and doing machine learning/ technology work for a number of years now and I’m still astonished at how complicated building a product can be.
Process changes are the right solution all the time only for manual labour work, not for creative work. – Deepak Karanth
Not understanding the nuance of the work
Machine Learning is complicated. Statistics is complicated, there’s all sorts of problems you can run into, like have you violated a linear assumption in your model, have you correctly implemented cross-validation on time-series, have you run into Simpson’s paradox, is there selection bias. If you’re doing Survival Analysis are you violating the proportional hazards assumption? You need to understand this complexity. R and D is hard!
And as my friend Martin Goodson termed it to me.
This is not a Data Scientist or Data Science leader!
Job description: The director/manager/VP of BI has primary responsibility for setting the strategy and vision and for managing the day-to-day tactical operations of the BI teams. He/she will be responsible for all strategic, tactical, operational, financial, human, and technical resource managerial responsibilities associated with the following BI and BI-related functional areas:
- Data preparation (sourcing, acquisition, integration)
- Data warehousing (Forrester often recommend that the first two functional areas are managed separately by data management / data preparation team(s))
- BI governance (may be same or separate from Data Governance)
- Reporting, analytics, data exploration
This is a Chief Data Scientist job or head of Data Science job
We are looking for someone with::
- An advanced Degree (Master’s or PhD) in Computer Science, Statistics, Engineering, Mathematics, Physics, or a related quantitative field
- Experience working in the field of machine learning and data science
- Proven track record of working with large data sets to develop innovative data products and capabilities and extract actionable insights
- Expert knowledge of statistical modelling methods for supervised and unsupervised learning
- Commercial experience working with either Python or R (we use Python)
- Knowledge of databases and related languages/tools such as SQL and NoSQL
- Experience with cloud computing platforms (AWS is desirable)
- Strong knowledge of the mathematical foundations of statistical inference and forecasting such as time series analysis, multivariate analysis, cluster analysis, and optimization
- Ability to lead and manage a team of juniors Data Scientists
- Effective communication skills and ability to explain complex data products in simple terms
That’s it end off. Please companies stop mixing up those two types of people, they’re two different jobs, both have value and both are valuable at different types of your companies evolution.
You don’t understand the messiness of data sources
One of the biggest things that irritates me as a Data Scientist is the ‘magic quickly’ idea that is pervasive. Some things are complicated, and often it’s a function of the data. One friend who’s a good data science consultant says if no one has looked at the data before add on 6 months to a project. It’s often because the ‘data exhaust’ is a sludge, and for specific projects you need to understand the context of the problem you’re trying to solve.
Neil Lawrence of Amazon wrote a good framework for thinking about this. This can be a good set of principles to think about when you look at the data readiness of a certain area.
Unless you’ve hands on experience building data products, and extracting value from data, you run risk of missing a ton of crucial nuance and details. I’m not discounting the value that a data strategist can bring to the table, but there are trade offs with lacking the nuance of hands on experience of working with data. If you’re prepared to learn as you go along, this might slow the project, but at worst it can derail it.
In a future post I’ll write about how to avoid these pitfalls, and what you as a non-technical manager can do to better manager the developers and Data Scientists you interact with, and are charged with managing.