How do we deliver Data Science in the Enterprise


I’ve worked on Data Science projects and delivered Machine Learning models both in production code and more research type work at a few companies now. Some of these companies were around the Seed stage/ Series A stage and some are established companies listed on stock exchanges. The aim of this article is to simply share what I’ve learned — I don’t think I know everything. I think my audience consists of both managers and technical specialists who’ve just started working in the corporate world — perhaps after some years in Academia or in a Startup. My aim is to simply articulate some of the problems, and propose some solutions — and highlight the importance of culture in enabling data science.

I’ve been reflecting over the years as a practitioner why some of this ‘big data’ stuff is hard to do. I’ll present in this article a take that’s similar to some other commentary on the internet, so this won’t be unusual.

My views are inspired by in this article Matt says:

Big Data success is not about implementing one piece of technology (like Hadoop or anything else), but instead requires putting together an assembly line of technologies, people and processes. You need to capture data, store data, clean data, query data, analyse data, visualise data. Some of this will be done by products, and some of it will be done by humans. Everything needs to be integrated seamlessly. Ultimately, for all of this to work, the entire company, starting from senior management, needs to commit to building a data-driven culture, where Big Data is not “a” thing, but “the” thing.

Often while speaking about our nascent profession with friends working in other companies we speak about ‘change management’. Change is very hard — particularly for established and non-digital native companies, companies who don’t produce e-commerce websites, social networks or search engines. These companies often have legacy infrastructure and don’t necessarily have technical product managers nor technical cultures. Also for them traditional Business Intelligence systems work quite well — reporting is done correctly, and it’s hard to make a case for machine learning in risk-averse environments like that.

Continue reading

AI in the Enterprise (the problem)


I was recently chatting to a friend who works as a Data Science consultant in the London Area – and a topic dear to my heart came up. How to successfully do ‘AI’ (or Data Science) in the enterprise. Now I work for an Enterprise SaaS company in the recruitment space, so I’ve got a certain amount of professional interest in doing this successfully.

My aim in this essay is to outline what the problem is, and provide some solutions.

Firstly it’s worth reflecting on the changes we’ve seen in Consumer apps – Spotify, Google, Amazon, etc – all of these apps have personalised experiences which are enhanced by machine learning techniques depending on the labelled data that consumers provide.

I’ll quote what Daniel Tuckelang (formerly of Linkedin) said about the challenges of doing this in the enterprise.

First, most enterprise data still lives in silos, whereas the intelligence comes from joining across data sets. Second, the enterprise suffers from weak signals — there’s little in the way of the labels or behavioral data that consumer application developers take for granted. Third, there’s an incentive problem: everyone promotes data reuse and knowledge sharing, but most organizations don’t reward it

I’ve personally seen this when working with enterprises, and being a consultant. The data is often very noisy, and while there are techniques to overcome that such as ‘distant supervision‘ it does make things harder than say building Ad-Tech models in the consumer space or customer churn models. Where the problem is more explicitly solvable by supervised techniques.

In my experience and the experience of others. Enterprises are much more likely to try buy in off-the-shelf solutions, but (to be sweepingly general) they still don’t have the expertise to understand/validate/train the models.There are often individuals in small teams here & there who’ve self-taught or done some formal education, but they’re not supported. (My friend Martin Goodson highlights this here)  There needs to be a cultural shift. At a startup you might have a CTO who’s willing to trust a bunch of relatively young data science chaps to try figure out an ML-based solution that does something useful for the company without breaking anything. And it’s also worth highlighting that there’s a difference in risk aversion between enterprises (with established practices  etc) and the more exploratory or R and D mindset of a startup.

The somewhat more experienced of us these days tend to have a reasonable idea of what can be done, what’s feasible, and furthermore how to convince the CEO that it’s doing something useful for his valuation.

Startups are far more willing to give things a go, there’s an existential threat. And not to forget that often Venture Capitalists and the assorted machinery expect Artificial Intelligence, and this is encouraged.

Increasingly I speculate that established companies now outsource their R and D to startups, hence the recent acquisitions like the one by GE Digital.

So I see roughly speaking two solutions to this problem. Two ways to de-risk data science projects in the enterprise.

1) Build it as an internal consultancy with two goals: identifying problems which can be solved with data solutions, and exposing other departments to new cultural thinking & approaches. I know of one large retailer who implemented this by doing 13 week agile projects, they’d do a consultation, then choose one team to build a solution for.

2) Start putting staff through training schemes similar to what is offered by General Assembly (there are others), but do it whole teams at a time, the culture of code review and programmatic analysis has to come back and be implemented at work. Similarly, give the team managers additional training in agile project management etc.

The first can have varied success – you need the right problems, and the right internal customers – and the second I’ve never seen implemented.

I’d love to hear some of the solutions you have seen. I’d be glad to chat about this.

Acknowledgements: I’d like to thank the following people for their conversations: John Sandall, Martin Goodson, Eddie Bell, Ian Ozsvald, Mick Delaney and I’m sorry about anyone else I’ve forgotten.