An extension of the Data Science process – OSEMIC

Standard

One of the most famous taxonomies of data science is OSEMN pronounched ‘Awesome’.

It stands for Obtain, Scrub, Explore, Model, Interpret.

I was recently chatting to some data scientists on twitter and they pointed out that shouldn’t it be OSEMIC?

Obtain, Scrub, Explore, Model, Interpret and Communicate!!!

I hadn’t thought of this, but I agree it is part of the process, interpretation by a specialist like myself isn’t the full battle, it needs to be translated into something that business stakeholders can understand. And the challenge is to not lose them with ‘this is the R^2 part’.

I think this ‘last mile’ problem of data science is a real challenge, how do you get something complicated as a Machine Learning model or a differential equation model into something that stakeholders can act on. And I suspect that this is even harder than just learning the mathematics or the programming. I think data scientists can also learn a lot from storytellers such as journalists and designers.

Thanks to everyone who contributed ideas for this post.

The challenge of Data Science

Standard

I recently saw this – https://dartthrowingchimp.wordpress.com/2015/03/19/data-science-takes-work-too/ which is basically an article about the workload of Data Science.

This is a personal and opinionated piece, and all my views are my own and do not reflect anyone else’s. Yet I feel strongly as a working Data Analyst that one of the real unseen challenges is communicating or having people communicate the hard work aspect of it. So I welcome articles like this.

I have seen personally the situation where confusion about what a ‘model’ was led to a very difficult work environment for me. These miss-calibrated expectations that it would just be ‘magic’ or like a feature put unrealistic load on me.

Now maybe one of the things that data scientists must do is ‘explain’ the difficulty and the challenge. Today for instance it took me 3 hours to do a relatively simple bar chart – partly because of the difficulty in finding the data and adjusting the axes etc.

This was not an automated, scripted, process this was a bespoke data visualization developed by me to help share with colleagues and stakeholders the story of the current department I am in. And their challenges and key performance indicators.

I think what is often not acknowledged is just how complicated software and data analysis is – it takes an mixture of hard work, domain expertise, data visualization and modelling – and all these things are changing. I’ve built complicated models and reporting that need changed after 3 months because an API or database changes!

So I think we should share more of our challenges, and our frustrations and our success stories. Our success stories should also not be explained as if we are geniuses – we are just humans with rare and valuable skills.

So this should be explained constantly to stakeholders, and perhaps one of the things we can do is to get our colleagues to sit with us through a data analysis project or mini-project. Rather than just barking unrealistic expectations at us 🙂

I’m still thinking about this, but as Jay says I suspect the biggest problem is that ‘I think most people who don’t do this work simply have no idea.’

Perhaps the lesson here is the following – never underestimate the skill and craft of those you work with, and learn how valuable that is without making lots of assumptions.