3 tips for successful Data Science Projects

Standard

I’ve been doing Data Science projects, delivering software and doing Mathematical modelling for nearly 7 years (if you include grad school).

I really don’t know everything, but these are a few things I’ve learned.

Consider this like a ‘joel test‘ for Data Science.

  1. Use a reproducible framework like Cookiecutter Data Science. My workflow used to be use an IPython notebook and forget to name things correctly – and discover messy, badly written code 🙂 I’ve now turned to a project structure like Cookiecutter – this has helped me write better, more maintainable code and reminded me to document things and make my work reproducible.
  2. Have a spec for a data science project- all projects should start with an agreed spec between the business stakeholder and the project. This forces people to clarify what they really want. This project should have a ‘goal’. Just to clarify – I mean a well defined goal that is Specific, Measurable, Achievable, Realistic and Time bounded – SMART.
  3. Make sure your stakeholders are realistic about the ‘failure’ aspect of R and D. One of the anti-patterns I’ve encountered in Data Science is stakeholders being immature and not realizing that for example ‘this Bayesian model doesn’t work for this kind of problem’ isn’t a statement of incompetence but it is a statement of a fact of the matter about the world. If organizations can’t accept that, they deserve suboptimal Data Science. R and D work is not engineering – failures teach us something too!

What are your views? I’d love to hear them 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s