Ivana Balazevic is a Data Scientist at a Berkeley based startup Wise.io, where she is working in a small team of data scientists on solving problems in customer service for different clients. She did her bachelor’s degree in Computer Science at the Faculty of Electrical Engineering and Computing in Zagreb and she recently finished her master’s degree in Computer Science with the focus on Machine Learning at the Technical University Berlin.
1. What do you think about ‘big data’?
I try not to think about it that much, although nowadays that’s quite hard to avoid. 🙂 It’s definitely an overused term, a buzzword.
I think that adding more and more data can certainly be helpful up to a point, but the outcome of majority of the problems that people are trying to solve depends primarily on the feature engineering process, i.e. on extracting the necessary information from the data and deciding which features to create. However, I’m certain there are problems out there which require large amounts of data, but they are definitely not so common for the whole world to obsess about.
2. What is the hardest thing for you to learn about data science?
I would say the hardest things are those which can’t be learned at school, but which you gain through experience. Coming out of school and working mostly on toy datasets, you are rarely prepared for the messiness of the real-world data. It takes time to learn how to deal with it, how to clean it up, select the important pieces of information, and transform this information into good features. Although that can be quite challenging, it is a core process of the whole data science creativity and one of the things that make data science so interesting.
3. What advice do you have for graduate students in the sciences who wish to become Data Scientists?
I don’t know if I’m qualified enough to give such advice, being a recent graduate myself, but I’ll try to write down things that I learned from my own experience.
Invest time in your math and statistics courses, because you’re going to need it. Take a side project, which might give you a chance to learn some new programming concepts and introduce you to interesting datasets. Do your homeworks and don’t be afraid to ask questions whenever you don’t understand something in the lecture, since the best time to learn the basics is now and it’s much harder to fill those holes in knowledge than to learn everything the right way from the beginning.
4. What project would you back to do and change? How would you change it?
Most of them! I often catch myself looking back at a project I did a couple of years ago and wishing I knew then what I know now. The most recent project is my master’s thesis, I wish I tried out some things I didn’t have time for, but I hope I’ll manage to catch some time to work on it further in the next couple of months.
5. How do you go about scoping a data science project?
Usually when I’m faced with a new dataset, I get very excited about it and can’t wait to dig into it, which gets in the way of all the planning that should have been done beforehand. I hope I’ll manage to become more patient about it with time and learn to do it the “right” way.
One of the things that I find a bit limiting about the industry is that you often have to decide whether something is worth the effort of trying it out, since there are always certain deadlines you need to hold on to. Therefore, it is very important to have a clear final goal right from the beginning. However, one needs to be flexible and take into account that things at the end user’s side might change along the way and be prepared to adapt to the user’s needs accordingly.
6. What do you wish you knew earlier about being a data scientist?
That you don’t spend all of your time doing the fun stuff! A lot of the work done by the data scientists is invested into getting the data, making it into the right format, cleaning it up, battling different encoding issues, writing tests for the code you wrote, etc. When you sum everything up, you spend only a part of your time doing the actual “data science magic”.
7. What is the most exciting thing you’ve been working on lately?
We are a small team of data scientists at Wise who are working on many interesting projects. I am mostly involved with the natural language processing tasks, since that is the field I’m planning to do my PhD in starting this fall. My most recent project is on expanding the customer service support to multilingual datasets, which can be quite challenging considering the highly skewed language distribution (80% English, 20% all other languages) in the majority of datasets we are dealing with.
8. How do you manage learning the ‘soft’ skills and the ‘hard’ skills? Any tips?
Learning the hard skills requires a lot of time, patience, and persistence, and I highly doubt there is a golden formula for it. You just have to read a lot of books and papers, talk to people that are smarter and/or have more experience than you and be patient, because it will all pay off.
Soft skills, on the other hand, somehow come naturally to me. I’m quite an open person and I’ve never had problems talking to people. However, if you do have problems with it, I suggest you to take a deep breath, try to relax, focus and tell yourself that the people you are dealing with are just humans like you, with their good and bad days, their strengths and imperfections. I believe that picturing things this way takes a lot of pressure off your chest and gives you the opportunity to think much more clearly.