How do we deliver Data Science in the Enterprise

Standard
Source

I’ve worked on Data Science projects and delivered Machine Learning models both in production code and more research type work at a few companies now. Some of these companies were around the Seed stage/ Series A stage and some are established companies listed on stock exchanges. The aim of this article is to simply share what I’ve learned — I don’t think I know everything. I think my audience consists of both managers and technical specialists who’ve just started working in the corporate world — perhaps after some years in Academia or in a Startup. My aim is to simply articulate some of the problems, and propose some solutions — and highlight the importance of culture in enabling data science.

I’ve been reflecting over the years as a practitioner why some of this ‘big data’ stuff is hard to do. I’ll present in this article a take that’s similar to some other commentary on the internet, so this won’t be unusual.

My views are inspired by http://mattturck.com/2016/02/01/big-data-landscape/ in this article Matt says:

Big Data success is not about implementing one piece of technology (like Hadoop or anything else), but instead requires putting together an assembly line of technologies, people and processes. You need to capture data, store data, clean data, query data, analyse data, visualise data. Some of this will be done by products, and some of it will be done by humans. Everything needs to be integrated seamlessly. Ultimately, for all of this to work, the entire company, starting from senior management, needs to commit to building a data-driven culture, where Big Data is not “a” thing, but “the” thing.

Often while speaking about our nascent profession with friends working in other companies we speak about ‘change management’. Change is very hard — particularly for established and non-digital native companies, companies who don’t produce e-commerce websites, social networks or search engines. These companies often have legacy infrastructure and don’t necessarily have technical product managers nor technical cultures. Also for them traditional Business Intelligence systems work quite well — reporting is done correctly, and it’s hard to make a case for machine learning in risk-averse environments like that.

Continue reading

One weird tip to improve the success of Data Science projects

Standard

I was recently speaking to some data science friends on Slack, and we were discussing projects and war stories. Something that came across was that ‘data science’ projects aren’t always successful.

light-311119_1280.png

Source: pixabay

Somewhere around this discussion a lightbulb went off in my head about some of the problems we have with embarking on data science projects. There’s a certain amount of Cargo cult Data Science and so collectively we as a community – of business people, technologists and executives don’t think deeply enough about the risks and opportunities of projects.

So I had my lightbulb moment and now I share it with everyone.

The one weird trick is to write down risks before embarking on a project.

Here’s some questions you should ask you start a project – preferably gather all data .

  • What happens if we don’t do this project? What is the worse case scenario?
  • What legal, ethical or reputational risks are there involved if we successfully deliver results with this project?
  • What engineering risks are there in the project? Is it possible this could turn into a 2 year engineering project as opposed to a quick win?
  • What data risks are there? What kinds of data do we have, and what are we not sure we have? What risks are there in terms of privacy and legal/ ethics?

I’ve found that gathering stakeholders around helps a lot with this, you hear different perspectives and it can help you figure out what the key risks in your project are. I’ve found for instance in the past that ‘lack of data’ killed certain projects. It’s good to clarify that before you spend 3 months on a project.

Try this out and let me know how it works for you! Share your stories with me at myfullname[at]google[dot]com.

 

 

Data Science and Soft Skills

Standard

I once did an internship under Andrew Fogg at Import.io.

I learned a lot about data science at that period, but one of the hardest lessons I had to learn was the importance of soft skills and project management in any data science projects.

John Foreman another idol of mine, talked a bit about this, in his book about data. 

So although I am not a super experienced data scientist, I am going to talk about what I have learned so far from the data science projects, which I have been involved in.

Sometimes it is a development project

 Sometimes you will encounter data science projects which actually need data engineering or software engineering. I think it is ok for data scientists to do some scripting and maybe hack together some web applications. But it is a bit different from what a software engineer team should do.

Data Science is not software engineering For reasons I have not quite well understood, some parts from project management in software engineering work in data science projects and sometimes do not. In my experience the notion that it is an agile project seems to work. Yet daily scrum meetings can sometimes be too much. Also too much interaction with business partners can derail analytics projects.

Gantt charts or burndown charts work to some degree

I have successfully used these in data science projects. They communicate to non-technical stakeholders that progress is being made. Which they often lack the mental model to sufficiently understand.

Solving a problem as stated is not a good idea, without further exploration Sometimes you are given a data science project and a suggested technique – and you try as an analyst to solve that problem. This generally backfires. Interaction with the business here helps, and lots of questions to sufficiently understand what their motivations are.

Deadlines are lies

I have never ever done an analytics project that worked in the way I expected. One reason is that some things are what I call ‘linear tasks’ and somethings are ‘non-linear’ tasks. Applying a basket analysis algorithm can be a linear task for example, but only if one has the right data set prepared and is familiar with the programming language and tools that are used.

So be very firm and explicit with your stakeholders about what is linear and what is not.

Of course if you are in an environment that does not allow you to control your own deadlines and has unrealistic expectations for good quality analytics work, then it is probably a sign the universe is telling you to clean up your Linkedin profile.

I will explore more of these concepts in the future.