Interview with a Data Scientist – Jessica Graves


Jessica Graves is a Data Scientist who currently works on fashion problems in New York City. She’s worked with Hilary Mason at Fast Forward Labs and keeps in regular contact with the London startup scene. After many months of asking her for an interview she finally gave in – and she shares her unique perspective on the datafication of Fashion. She comes from a background in visual and performing arts, as well as fashion design. In her spare time you’ll find her reading a stack of papers or studying dance.

Cover image: CCO

Jessica Graves_02-1

  1. What project have you worked on do you wish you could go back to, and do better?

I worked with Dr. Laurens Mets on an iteration of the technology behind Electrochaea, a device where microbes convert waste electricity to clean natural gas. My job was to translate models from electrochemistry journals into code, to help simulate, measure and optimize the parameters of the device. We needed to facilitate electron transport and keep the microbes happy. Read papers, write code, and design alternative energy technology with math + data?! I would hand my past self How to Design Programs as a guide and learn to re-implement from scratch in an open source language. 

  1. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?

Listen! If you are a data scientist, listen carefully to the business problems of your industry, and see the problems for what they are, rather than putting the technical beauty of and personal interest in the solution first and foremost. You may find it’s more important to you to work with a certain type of problem than it is to work at a certain type of company, or vice versa. Watch very carefully when your team expresses frustration in general – articulate problems that no one knows they should be asking you to solve. At the same time, it can be tempting to work on a solution that has no problem. If you’re most interested in a specific machine learning technique, can you justify its use over another, or will high technical debt be a serious liability? Will a project be leveragable (legally, financially, technically, operationally)? Can you quantify the risk of not doing a project? 

  1. What do you wish you knew earlier about being a data scientist?

I wish I realized that data science is classical realist painting.

Classical realists train to accurately represent a 3D observation as a 2D image. In the strictest cases, you might not be allowed to use color for 1-3 years, working only with a stick of graphite, graduating to charcoal and pencils, eventually monotone paintings. Only after mastering the basics of form, line, value, shade, tone, are you allowed a more impactful weapon, color. With oil painting in particular, it matters immensely in what order at what layer you add which colors, which chemicals compose each color, of which quality pigment, at what thickness, with what ratio of which medium, with which shape of brush, at what angle, after what period of drying. Your primary objective is to continuously correct your mistakes of translating what you observe and suspending your preconception of what an object should look like.

There are many parallels with data science. At no point as a classical realist painter should you say, ‘well it’s a face, so I’m going to draw the same lines as last time’ just like as a data scientist, you should look carefully at the data before applying algorithm x, even if that’s what every blog post Google surfaces to the top of your results says to do in that situation. You have to be really true to what you observe and not what you know – sometimes a hand looks more like a potato than a hand, and obsessing over anatomical details because you know it’s a hand is a mistake. Does it produce desirable results in the domain of problems that you’re in? Are you assuming Gaussian distributions on skewed data? Did you go directly to deep learning when logistic regression would have sufficed? I wish I knew how often data science course offerings are paint by numbers. You won’t get very far once the lines are removed, the data is too big to extract on your laptop, and an out-of-memory error pops up running what you thought was a pretty standard algorithm on the subset you used instead. Let alone that you have to create or harvest the data set in the first place – or sweet talk someone into letting you have access to it.  

In addition, Nulla dies sine linea – it’s true for drawing, ballet, writing. It’s true for data science. No day without a line. It’s very difficult to achieve sophistication without crossing off days and days of working through code or theoretical examples (I think this is why Recurse Center is so special for programmers). Sets of bland but well-executed tiny piece of software. Unspectacular, careful work in high volumes raises the quality of all subsequent complex works. Bigger, slower projects benefit from myriads of partially explored pathways you already know not to take.

Also side notes to my past self: Linux. RAM. Thunderbolt ports. 

  1. How do you respond when you hear the phrase ‘big data’?

Big data? Like in the cloud? Or are we in the fog now? Honestly the first thing I see in my mind is PETABYTES. I think of petabytes of selfies raining from the sky and flowing into a data lake. Stagnant. Data-efficient AI is all the rage — less data, more primitives, smarter agents. In the meantime, optimizing hardware and code to work with large datasets is pretty fun. Fetishizing the size of the data works well …as long as you don’t care about robustness to diverse inputs. Can your algorithm do well with really niche patterns? What can you do with the bare minimum amount of data? 

  1. What is the most exciting thing about your field?

Fashion is visual. It’s inescapable. Every culture has garb or adornment, however minimal. A few trillion dollars of apparel, textiles, and accessories across the globe. The problems of the industry are very diverse and largely unsolved. A biologist might come to fashion to grow better silk. An AI researcher might turn to deep learning to sift through the massive semi-structured set of apparel images available online. So many problems that may have a tech solution are unsolved. Garment manufacturing is one of the most neglected areas of open source software development. LVMH and Richemont don’t fight over who provided the most sophisticated open-source tools to researchers the way that Amazon and Google do. You can start a deep learning company on a couple grand and use state-of-the-art software tools for cheap or free. You cannot start an apparel manufacturing vertical using state-of-the-art tools without serious investment, because the climate is still extremely unfavorable to support a true ecosystem of small-scale independent designers. The smartest software tools for the most innovative hardware are excessively expensive, closed-source, and barely marketed — or simply not talked about in publically accessible ways. Sewing has resisted automation for decades, although is finally now at a place now were the joining of fabrics into a seam is robot-automatable with computer vision used on a thread-by-thread basis to determine the location of the next stitch. 

High end, low end, or somewhere in between, the apparel side of fashion’s output is a physical object that has to be brought to life from scratch, or delivered seamlessly, to a human, who will put the object on their body. Many people participate in apparel by default, but the fashion crowd is largely self-selected and passionate, so it’s exciting (and difficult) to build for such an engaged group that don’t fit standard applications of standard machine learning algorithms.

  1. How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?

Artists learn this eventually: volume of works produced trumps perfectionism. Even to match something in classical realism, you start with ridiculous abstractions. Cubes and cylinders to approximate heads and arms. Break it down into the smallest possible unit. Listen to Polya, “If you can’t solve a problem, then there is an easier problem you can solve: find it.”

As for when to finish? Nothing is never good enough. The thing that is implemented is better than the abstract, possibly better thing, for now, and will probably outlive its original intentions. But make sure that solution correlates thoroughly with the problem, as described in the words of the stakeholder. Otherwise, for a consumer-facing product or feature, your users will usually give you clues as to what’s working. 

  1. You spent sometime as a Consultant in Data Analytics. How did you manage cultural challenges, dealing with stakeholders and executives? What advice do you have for new starters about this?

Be open. Fashion has a lot of space for innovation if you understand and quantify your impact on problems that are actually occurring and costing money or time, and show that you can solve them fast enough. “We built this new thing” has absolutely nothing to do with “We built this useful thing” and certainly not “We built this backwards-compatible thing”. You might be tempted to recommend a “new thing” and then complain that fashion isn’t sophisticated enough or “data” enough for it. As an industry that in some cases has largely ignored data for gut feelings with a serious payoff, I think the attitude should be more of pure respect than of condescension, and of transitioning rather than scrapping. That or build your own fashion thing instead of updating existing ones.  

  1. You have worked in fashion. Can you talk about the biggest opportunities for data in the fashion industry. Are there cultural challenges with datafication in such a ‘creative industry’.

Fashion needs ‘datafication’ that clearly benefits fashion. If you apply off-the-shelf collaborative filtering to fashion items with a fixed seasonal shelf life to users that never really interact with, you’re going to get poor results. Algorithms that work badly in other domains might work really well in fashion with a few tweaks. NIPS had an ecommerce workshop last year, and KDD has a fashion-specific workshop this year, which is exciting to see, although I’ll point out that researchers have been trying to solve textile manufacturing problems with neural networks since the 90s.

A fashion creative might very well LOVE artificial intelligence, machine learning, and data science if you tailor your language into what makes their lives easier. Louis Vuitton uses an algorithm to arrange handbag pattern pieces advantageously on a piece of leather (not all surfaces of the leather are appropriate for all pattern pieces of the handbag) and marks the lines with lasers before artisans hand-cut the pieces. The artisans didn’t seem particularly upset about this. 

The two main problems I still see right now are the doorman problem and fit. Use data and software to make it simple for designers of all scales to adjust garments to fit their real markets instead of their imagined muses. And, use as little input as possible to help online shoppers know which existing items will fit. Once they buy, make sure they get their packages on time, securely, discreetly. 


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s