Eddie Bell is a Lead Data Scientist at http://www.lyst.com a Fashion recommendation website.
Eddie has a PhD in Mathematics, and before he saw the light and joined Lyst he used to work in Finance!
1. What project have you worked on do you wish you could go back to, and do better?
At one point we moved all our data processing infrastructure to storm.
We’re a python shop with very little Java experience and it was an
absolute nightmare. Dealing with maven and dependency hell, trying to
deploy automatically, testing, VM parameters. The actual model worked
well in storm but we just weren’t prepared for all the supporting
In the end we moved back to python and used celery instead which
suited us perfectly. This storm transition cost me 3 months of my
life. If I could go back in time then i’d just stick with python.
I guess the take home message is: when you start a new project, you
really have to think about the cost of using a new technology.
Although learning a new technology is fun you should first try solving
the problem with a technology you are already familiar with.
2. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?
I would say there are three important areas.
1) Absolutely to learn to program. The better you can program the more independent you can be as a data scientist. 2) Theoretical foundations, stats and linear algebra mostly 3) Communication, you need to communicate well with your colleagues and the community.
3. What do you wish you knew earlier about being a data scientist?
How much of the job involves taking with business. You have to be able
translate business goals into machine learning solutions. You also
have to be able to tell people why some ideas are not possible to
implement. But you have to be very careful not to rule out their crazy
ideas, they might just teach you something! For example, last year
someone asked if I could generate descriptions from images. I laughed
and said it was impossible but now people are actually doing it
4. How do you respond when you hear the phrase ‘big data’?
Haha, shudder because it doesn’t really mean anything.
5. What is the most exciting thing about your field?
I’m all about building production machine learning system so for me
applications of deep learning are the most exciting. Deep models are
not magic bullets but they can achieve impressive results.
6. How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?
My hand-wavy answer is ‘intuition’ but more practically, agile
development has a concept called MVPs (minimum viable products). MVPs
let you iterate quickly and so failures cost you less. The same can be
applied to machine learning; first try to solve a problem on a simple
data set with a simple model. If that shows promise then you can
develop a more complex models with bigger and better data.