Interview with a Data Scientist: Brad Klingenberg

Bio

Brad Klingenberg is the Director of Styling Algorithms at Stitch Fix in San Francisco. His team uses data and algorithms to improve the selection of merchandise sent to clients. Prior to joining Stitch Fix Brad worked with data and predictive analytics at financial and technology companies. He studied applied mathematics at the University of Colorado at Boulder and earned his PhD in Statistics at Stanford University in 2012.

1. What project have you worked on do you wish you could go back to, and do better?

Nearly everything! A common theme would be not taking the framing of a problem for granted. Even seemingly basic questions like how to measure success can have subtleties. As a concrete example, I work at Stitch Fix, an online personal styling service for women. One of the problems that we study is predicting the probability that a client will love an item that we select and send to her. I have definitely tricked myself in the past by trying to optimize a measure of prediction error like AUC.

This is trickier than it seems because there are some sources of variance that are not useful for making recommendations. For example, if I can predict the marginal probability that a given client will love any item then that model may give me a great AUC when making predictions over many clients, because some clients may be more likely to love things than others and the model will capture this. But if the model has no other information it will be useless for making recommendations because it doesn’t even depend on the item. Despite its AUC, such a model is therefore useless for ranking items for a given client. It is important to think carefully about what you are really measuring.

2. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences and Social Sciences?

Focus on learning the basic tools of applied statistics. It can be tempting to assume that more complicated means better, but you will be well-served by investing time in learning workhorse tools like basic inference, model selection and linear models with their modern extensions. It is very important to be practical. Start with simple things.

Learn enough computer science and software engineering to be able to get things done. Some tools and best practices from engineering, like careful version control, go a long ways. Try to write clean, reusable code. Popular tools in R and Python are great for starting to work with data. Learn about convex optimization so you can fit your own models when you need to – it’s extremely useful to be able to cast statistical estimates as the solution to optimization problems.

Finally, try to get experience framing problems. Talk with colleagues about problems they are solving. What tools did they choose? Why? How should did they measure success? Being comfortable with ambiguity and successfully framing problems is a great way to differentiate yourself. You will get better with experience – try to seek out opportunities.

3. What do you wish you knew earlier about being a data scientist?

I have always had trouble identifying as a data scientist – almost everything I do with data can be considered applied statistics or (very) basic software engineering. When starting my career I was worried that there must be something more to it – surely, there had to be some magic that I was missing. There’s not. There is no magic. A great majority of what an effective data scientist does comes back to the basic elements of looking at data, framing problems, and designing experiments. Very often the most important part is framing problems and choosing a reasonable model so that you can estimate its parameters or make inferences about them.

4. How do you respond when you hear the phrase ‘big data’?

I tend to lose interest. It’s a very over-used phrase. Perhaps more importantly I find it to be a poor proxy for problems that are interesting. It can be true that big data brings engineering challenges, but data science is generally made more interesting by having data with high information content rather than by sheer scale. Having lots of data does not necessarily mean that there are interesting questions to answer or that those answers will be important to your business or application. That said, there are some applications like computer vision where it can be important to have a very large amount of data.

5. What is the most exciting thing about your field?

While “big data” is overhyped, a positive side effect has been an increased awareness of the benefits of learning from data, especially in tech companies. The range of opportunities for data scientists today is very exciting. The abundance of opportunities makes it easier to be picky and to find the problems you are most excited to work on. An important aspect of this is to look in places you might not expect. I work at Stitch Fix, an online personal styling service for women. I never imagined working in women’s apparel, but due to the many interesting problems I get to work on it has been the most exciting work of my career.

6. How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?

As I mentioned previously, it can be helpful to start framing a problem by thinking about how you would measure success. This will often help you figure out what to focus on. You will also seldom go wrong by starting simple. Even if you eventually find that another approach is more effective a simple model can be a hugely helpful benchmark. This will also help you understand how well you can reasonably expect your ultimate approach to perform. In industry, it is not uncommon to find problems where (1) it is just not worth the effort to do more than something simple, or (2) no plausible method will do well enough to be considered successful. Of course, measuring these trade-offs depends on the context of your problem, but a quick pass with a simple model can often help you make an assessment.

7. How do you explain to C-level execs the importance of Data Science? How do you deal with the ‘educated selling’ parts of the job? In particular – how does this differ from sports and industry?

It is usually better if you are not the first to evangelize the use of data. That said, data scientists will be most successful if they put themselves in situations where they have value to offer a business. Not all problems that are statistically interesting are important to a business. If you can deliver insights, products or predictions that have the potential to help the business then people will usually listen. Of course this is most effective when the data scientist clearly articulates the problem they are solving and what its impact will be.

The perceived importance of data science is also a critical aspect of choosing where to work – you should ask yourself if the company values what you will be working on and whether data science can really make it better. If this is the case then things will be much easier.

8. What is the most exciting thing you’ve been working on lately and tell us a bit about it.

I lead the styling algorithms team at Stitch Fix. Among the problems we work on is making recommendations to our stylists, human experts who curate our recommendations for our clients. Making recommendations with humans in the loop is fascinating problem because it introduces an extra layer of feedback – the selections made by our stylists. Combining this feedback with direct feedback from our clients to make better recommendations is an interesting and challenging problem.

9. What is the biggest challenge of leading a data science team?

Hiring and growing a team are constant challenges, not least because there is not much consensus around what data science even is. In my experience a successful data science team needs people with a variety of skills. Hiring people with a command of applied statistics fundamentals is a key element, but having enough engineering experience and domain knowledge can also be important. At Stitch Fix we are fortunate to partner with a very strong data platform team, and this enables us to handle the engineering work that comes with taking on ever more ambitious problems.

Models are illuminating and wrong

Data & Machine Learning & Product

Interview with a Data Scientist: Brad Klingenberg

Leave a comment Cancel reply