It’s good to discuss the social and operational cost of a new technology. I’ve learned for instance that although kubernetes can be a great tool – there is a cost to training developers in it. This cost is non-trivial. And we as technological leaders should be careful when we bring these things in.
I think if you don’t carefully manage technological choices you can end up with something like Zalando’s radar, and while it’s good to have that – you don’t want that to be your entire stack.
I’ve worked on Data Science projects and delivered Machine Learning models both in production code and more research type work at a few companies now. Some of these companies were around the Seed stage/ Series A stage and some are established companies listed on stock exchanges. The aim of this article is to simply share what I’ve learned — I don’t think I know everything. I think my audience consists of both managers and technical specialists who’ve just started working in the corporate world — perhaps after some years in Academia or in a Startup. My aim is to simply articulate some of the problems, and propose some solutions — and highlight the importance of culture in enabling data science.
I’ve been reflecting over the years as a practitioner why some of this ‘big data’ stuff is hard to do. I’ll present in this article a take that’s similar to some other commentary on the internet, so this won’t be unusual.
Big Data success is not about implementing one piece of technology (like Hadoop or anything else), but instead requires putting together an assembly line of technologies, people and processes. You need to capture data, store data, clean data, query data, analyse data, visualise data. Some of this will be done by products, and some of it will be done by humans. Everything needs to be integrated seamlessly. Ultimately, for all of this to work, the entire company, starting from senior management, needs to commit to building a data-driven culture, where Big Data is not “a” thing, but “the” thing.
Often while speaking about our nascent profession with friends working in other companies we speak about ‘change management’. Change is very hard — particularly for established and non-digital native companies, companies who don’t produce e-commerce websites, social networks or search engines. These companies often have legacy infrastructure and don’t necessarily have technical product managers nor technical cultures. Also for them traditional Business Intelligence systems work quite well — reporting is done correctly, and it’s hard to make a case for machine learning in risk-averse environments like that.
Somewhere around this discussion a lightbulb went off in my head about some of the problems we have with embarking on data science projects. There’s a certain amount of Cargo cult Data Science and so collectively we as a community – of business people, technologists and executives don’t think deeply enough about the risks and opportunities of projects.
So I had my lightbulb moment and now I share it with everyone.
The one weird trick is to write down risks before embarking on a project.
Here’s some questions you should ask you start a project – preferably gather all data .
What happens if we don’t do this project? What is the worse case scenario?
What legal, ethical or reputational risks are there involved if we successfully deliver results with this project?
What engineering risks are there in the project? Is it possible this could turn into a 2 year engineering project as opposed to a quick win?
What data risks are there? What kinds of data do we have, and what are we not sure we have? What risks are there in terms of privacy and legal/ ethics?
I’ve found that gathering stakeholders around helps a lot with this, you hear different perspectives and it can help you figure out what the key risks in your project are. I’ve found for instance in the past that ‘lack of data’ killed certain projects. It’s good to clarify that before you spend 3 months on a project.
Try this out and let me know how it works for you! Share your stories with me at myfullname[at]google[dot]com.