Avoiding being a ‘trophy’ data scientist

Recently I’ve been speaking to a number of data scientists about the challenges of adding value to companies. This isn’t an argument that data science doesn’t have positive ROI, but that there needs to be an understanding of the ‘team sport’ and organisational maturity to take advantage of these skills.

plant-prize-shelves-74942.jpeg

The biggest anti-pattern I’ve experienced personally as an individual contributor has been a lack of ‘leadership’ for data science. I’ve seen organisations without the budgetary support, the right champions or clear alignment of data science with their organisational goals. These are some of the anti-patterns I’ve seen, it’s non-exhaustive so I provide it.

The follow is an opinionated list of some of the anti-patterns.

  • I’ve written before about data strategy. I still think this is one of the things that’s most lacking in organisations. I think a welcome distinction is that data collection which needs to happen before data analysis, and that this needs to happen in accordance with the strategy of the company.

Solution: Organisations should map their data science projects to the key business concerns of the organisation. This will help shape how resources are allocated.

  • There needs to be an understanding of what kind of leadership you need for a data science team. This needs to be someone with hands-on experience of doing data science. This is not someone familiar with ‘analytics’ or ‘reporting systems’ and ‘delivery’. It is someone familiar with things like ‘probabilistic programming’, ‘neural networks’ and ‘A/B tests’. So don’t put an ‘analytics leader’ in charge of a team of data scientists.

Solution: Executives – feel free to reach out to me to discuss data strategy, I’ll gladly point you in the right direction.

  • You need Business intelligence not data science – there’s nothing wrong with reporting, or building analytics systems, but it’s not data science. Be honest about what your organisation needs.

Solution: Ask clarifying questions when interviewing about why the organisation needs data science versus other things.

tom-pumford-254867.jpg
When your data isn’t ready yet it makes Data Scientists sad.  –                         Photo by Tom Pumford on Unsplash
  • Your data isn’t ready yet. Lots of the ‘big data’ hype has gotten organisations think it’s very important to collect data, and this is progress. However it takes time to do a data-audit and ‘data readiness’ varies for different problems. This is very closely related to point number 1.

Solution: Do a data-audit before you begin a project, if the data hasn’t been explored yet add 6 months to any estimates.

  • Data science projects return on a power law-like distribution, much like Venture Capital. I despite having 8 years experience of doing data analysis and machine learning, and 4 of those are in industry have no idea before hand which projects will have the best ROI. I’ve learned to do several projects in a 6-12 month period, and learned when to cut projects that aren’t working for whatever reason. An organisation needs to admit to failure, and be honest about reality for this to happen.

Solution: Make sure stakeholders understand these returns.

  • Cultural mismatch – for data science to succeed in an organisation there needs to be an acceptance of some of the rather ‘non-commercial’ aspects. Those include lab meetings, going to conferences. Organisations that consider these things alien will kill off the high-risk high-reward innovation culture.

Solution: I think frankly cultural change is hard if you’re not a C-level exec – so a CEO, COO, CFO, etc…

  • Someone heard ‘data is the new oil‘. With all due respect to Gartner and McKinsey etc – please stop saying this metaphor. Data is not a commodity and needs to be turned into a product to add value. Building successful products in data science, has all the same challenges as building successful products elsewhere.

Solution: Focus on the business drivers for data science projects – focus especially on increasing revenue or decreasing costs.

  • You use data in a project before getting the right ethical or legal approval. I’ve heard of projects being killed because someone didn’t de-risk the legal or ethical aspects before starting the project. This is a huge waste of time, and amounts to poor planning by the organisation. With such laws as GDPR coming in this will become more pertinent.

Solution: At the ‘data gathering’ stage stakeholders should clarify what the data privacy, legality and accessibility constraints are. Legal support should be asked for.

  • Your organisation doesn’t have the right buy-in. Doing data science is a challenge in a lot of organisations and often needs at least one person on the board to support it.

Solution: This sort of dysfunction is tricky to fix. I think you need to make sure that you’ve a champion at the organisation you interview at. Otherwise go elsewhere.

  • Your organisation doesn’t allow those on the front-line to choose their tools. This can be due to such factors as concerns over using ‘the cloud’, draconian IT policies, binding outsourced agreements, and decisions about systems made at the executive level not at the front-line level. Good data scientists will quit after they spend 3 months trying to get permission to install R/Python/Scala. I’m a professional data scientist, I know my tools – I wouldn’t want an organisation to tell my lawyer which legal textbooks he should be reading, why should an organisation treat IT in the same way?

Solution: Let your front line professionals pick their own tools.

  • Your organisation has no clear product roadmap with data science aligned to that. I wish this wasn’t an anti-pattern I’ve seen. This often starts with an executive reading about Machine Learning, getting excited and never ends well

Solution: Have a clear product roadmap that lasts longer than 9 months. Especially ask about this when interviewing as an individual contributor.

  • Your organisation has no version control, Continuous Integration, servers, etc. If your organisation doesn’t have all of these things and more please don’t expect a data scientist to be a powerful change agent. It’s near impossible to change such things as an individual contributor.

Solution: If you want this to be your job, make it clear that this is part of your job. It takes time to change an organisation and needs a certain amount of enablement and support to effect such a change.

These anti-patterns will cause you to be a ‘trophy’ data scientist and that’s no fun. I care about adding value to organisations, not just being a smart nerd in the corner.

Remark: My friend Enda Ridge suggested that I left out such important things about how the team operates themselves. These are important but I felt it was out of scope for this article – nevertheless a great book on reproducibility, coordination and data science workflows is his on Guerilla Analytics – http://guerrilla-analytics.net/

Credits – my thinking on this has been influenced by conversations with or reading tweets from: Angela Bassa, Eddie Bell,  Chris Harland, Trey Causey, Jon Sedar, Thomas Wiecki, Ian Wong, Ian Ozsvald, Martin Goodson, Ollie Glass, Gabriel Straub and Calvin Giles.

Free Probabilistic Programming course

Leave a comment