I’ve been doing Data Science projects, delivering software and doing Mathematical modelling for nearly 7 years (if you include grad school).
I really don’t know everything, but these are a few things I’ve learned.
Consider this like a ‘joel test‘ for Data Science.
- Use a reproducible framework like Cookiecutter Data Science. My workflow used to be use an IPython notebook and forget to name things correctly – and discover messy, badly written code 🙂 I’ve now turned to a project structure like Cookiecutter – this has helped me write better, more maintainable code and reminded me to document things and make my work reproducible.
- Have a spec for a data science project- all projects should start with an agreed spec between the business stakeholder and the project. This forces people to clarify what they really want. This project should have a ‘goal’. Just to clarify – I mean a well defined goal that is Specific, Measurable, Achievable, Realistic and Time bounded – SMART.
- Make sure your stakeholders are realistic about the ‘failure’ aspect of R and D. One of the anti-patterns I’ve encountered in Data Science is stakeholders being immature and not realizing that for example ‘this Bayesian model doesn’t work for this kind of problem’ isn’t a statement of incompetence but it is a statement of a fact of the matter about the world. If organizations can’t accept that, they deserve suboptimal Data Science. R and D work is not engineering – failures teach us something too!
What are your views? I’d love to hear them 🙂