I recently gave a keynote at www.pycon.co the first PyCon conference in Colombia. I spoke on Data Science Models in Production, lessons learned and the cultural aspects.
I interviewed a Colombian Data Scientist – Juan Pablo Isaza Aristizábal
1. What project have you worked on do you wish you could go back to, and do better?
Back in 2015 I was working for Tappsi, a popular app to call taxis. They have a huge problem with fulfilling cab demand on peak hours, because Bogotá has a horrible traffic congestion problem, that is only getting worse. So we were using algorithms to try to fulfil as much demand as we could. One of the projects was to predict demand on a 10 minutes future window for each neighbourhood, so that drivers could head to neighbourhoods with the highest odds. I did the real time data ingestion and machine learning, then we published an MVP of the feature, but the algorithm was too slow. In the end the project was never completed because developers were doing other stuff and I ended up optimising other algorithms that increased metrics with half the hassle and complexity. Afterwards I realised that the lack of experience led me to write the program in the wrong language and I made wrong assumptions that led to low performance.
2. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?
I think academia is a great place to learn abstract and complex subjects, while industry is a great place to learn practical and social skills. Its easier to succeed at a working environment if you find a balance between being academic enough without forgetting practical aspects and communication skills with non-tech people. At the university there are brilliant people at cutting edge topics, although they might not know how to deal with more concrete aspects. While in industry you can see fast practical developers that know the latest tools but fall short trying to optimise a SQL query because they don’t know how an indexed query works behind the scenes in a database.
So I would advise you to study as much as you can, but always try to think in the possible applications of what you are learning. Also, communications skills with non tech people is extremely important, I have seen a couple of guys having meetings with sales people and try to explain statistical tests and p-values with no success.
3. What do you wish you knew earlier about being a data scientist?
I wish I had an earlier chance of working on startups. Being in Colombia made it difficult to work in the tech industry; before I had a couple of jobs not related to data science or software development.
4. How do you respond when you hear the phrase ‘big data‘?
I find the term a little misleading and simplistic. Its popular because it’s easy to understand for the general public, while algorithms are not. Although technically the term might just be a synonym with a couple of tools such Amazon redshift and Hadoop. Big data has enabled new and exciting applications but by no means it’s the only or biggest factor contributing to current advances in the field; new algorithms such as deep neural networks, reinforcement learning and a strong open source community has enabled a lot of improvement over the last few years.
5. What is the most exciting thing about your field?
For me is the excitement of science, which I have always embraced since I was a little kid, and the development speed and practicality of engineering. Being able to take an idea and transforming it to a working prototype in a few days is an amazing feeling; specially machine learning applications are really exciting to work with.
6. How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?
Usually I take an iterative approach, starting with the most obvious relations and the easiest data to handle. Trying to get to the answer in a series of incremental steps as I refine each input and expand the data set. Its similar to how you would build a MVP, first it is simple, then it becomes better with each version, finally the customer or user says: that’s good enough!
7. Can you talk a bit about the state of the tech industry and data science in Colombia? What would you change? What gives you hope?
Data science is coming as a byproduct of software development; there isn’t much, but still we are improving at giant steps. In the last few years startups such as Tappsi, Domicilios, Mercadoni, Rappi or Bunny have become more commonplace than in the past. What gives me hope is the peace deal with FARC guerrilla group, this crucial event will make many more foreigners come here as well as investment that can power new ideas.
I would shift the local focus of many startups for a more global one. This is difficult because the market is small and the economy is far behind other nations, making our problems and solutions different from the more advances economies. Still there are Colombian startups with global focus such as Bunny or VOIQ.
Thanks and best regards!