I caught up with Irish startup co-founder and ex-Analytics Manager Shane to discuss Data Science.
Shane described himself the following way –
I’m co-founder of KillBiller
, a company that helps mobile operators to gain new customers. We provide a mobile phone plan comparison service in Ireland that allows people to use their own call, text, and data usage information to find the best value mobile tariff for their individual needs. In this position, I’m finding my way as a tech-startup founder, learning the actual ropes of creating a profitable business, and stretching my tech muscles on a complex and scaleable python backend on the Amazon cloud. Its a blast.
I would like to add that his blog posts and contributions are really cool, and I’m glad to see contributions to the data science community that aren’t just from the West Coast of the USA.
1. What project have you worked on do you wish you could go back to, and do better?
Maybe every one?! I think that data science projects always have a bit of unfinished business. Its a key part of the trade to be able to identify when enough is enough, and when extra time would actually lead to tangible results. Is 4 hours tuning a model worth an extra 0.01 % in accuracy? Maybe in some cases, but not most. Unfortunately, I think that a huge amount of real data science business cases leave you with a little “ooh i could have tried…” or “oh we might have optimised…”.
2. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?
“The more I learn, the more I realise how much I don’t know.” There seems to be a never ending list of new technologies and new techniques to get your head around. I would say to budding professionals that if you can get a solid understanding of basic key techniques in your repertoire to start with, you’ll do better than learning buzz words about the latest trends. While the headline-grabbing bleeding edge research will always seem to sparkle, the reality of data science in business is that people are still using proven techniques that work reliably and simply – think regression and k-means over deep-learning and natural language processing. Get the basics right first.
3. What do you wish you knew earlier about being a data scientist?
Data preparation. I know you see it written down, but there is no exaggeration at all in the phrase – you’ll spend 80% of your time preparing the data. I’m sure everyone says it, and should know it, but its a key part of the work, and a very important step in the information discovery process.
4. How do you respond when you hear the phrase ‘big data’?
That depends on where it comes from. At a business conference from a sales man – sometimes with rolling eye. At a tech meetup in Dublin – maybe with some interest. I think that Big Data has been hyped to death, and the reality is that, for now, there’s very few companies that actually require a large scale Hadoop deployment. I’ve worked with some of the largest companies on data science projects, and to date, have been able to process the data required on a single machine. However, I’m aware that that is an Irish specific viewpoint, where naturally our population and market size reduces the volume of data in many fields. However, I do think that Big Data is ultimately a function of the IT department, data scientists will simple lever the tools to extract meaningful excerpts or subsets for analysis.
5. What is the most exciting thing about your field?
Its ever changing, ever growing, and moving quickly. While its daunting sometimes to think of the speed of progress, its also extremely exciting to be involved in a world where new ideas, tools, and techniques are being spread on a weekly basis. There’s a huge amount of enthusiasm out there in the community and a plethora of new opportunities to be explored.
6. How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?
I tend to start to tackle each problem after I’ve had a good look at the data behind it. Perhaps an extract, perhaps a MVP type model, but just enough to grasp the state of the data, the amount of cleansing required, and to identify potential problems and benefits. Its extremely difficult to accurately estimate the outcome of a data science problem before you start working – so a few hours of exploration are very worthwhile. Time spent is usually limited naturally by time and budget, and you can relatively quickly get to a point where negligible gains are being made for additional time investment.
7. You spent sometime as a Consultant in Data Analytics. How did you manage cultural challenges, dealing with stakeholders and executives? What advice do you have for new starters about this?
There’s a political landscape in every company that you’ll join. Take the time to learn the ropes and learn how your company deals with these items. I find that frequent and realistic updates on progress and expectations are key to managing the various parties. Don’t hide the dirty bits or the issues. And probably budget three times the time that you initially think for each task – there’s always hidden issues!
8. You have a cool startup can you comment on how important it is as a CEO to make a company such as that data-driven or data-informed?
I’m working on KillBiller
, an Irish startup that makes difficult decisions easy. KillBiller automatically audits your mobile phone usage and works out exactly what you would spend on every mobile network and tariff. We’ve saved almost 20,000 people money on their phone bills!
In our case, we’re all about data – processing peoples mobile usage, doing it securely, accurately, quickly, and presenting the results in a meaningful way. In addition, a data-driven approach to the startup world has its advantages – having a solid understanding of your marketing effectiveness, website traffic, user retention, and route to revenue allows us to make decisions backed on science over intuition.
More information about Shane can be found at his blog