Why Code review? Or why should I care as a data scientist.

Standard

The insightful Data Scientist Trey Causey talks about Software Development Skills for Data Scientists I’m going to write about my views on Code Review – as a Data Scientist with a few years experience, and experience delivering Data Products at organizations of varying sizes. I’m not perfect and I’m still maturing as an Engineer.

A good thorough introduction to Code Review comes from the excellent team at Lyst I suggest that as follow up reading!

The fundamental nugget is that ‘code reviews allow you to more effectively collaborate with your peers‘ and a lot of new Engineers and Data Scientists don’t know how to do that. This is one reason why I wrote ‘soft skills for data scientists‘. This article talks about a technical skill but I consider this a kind of ‘technical communication’.

Here are some views on ‘why code review’ – I share them here as reference, largely to remind myself. I steal a lot of these from this video series.

  • Peer to peer quality engineering and training 

As a Data Science community that is forming – and with us coming from various backgrounds there’s a lot of invaluable knowledge from others in the team. Don’t waste your chance at getting that 🙂

  • Catches bugs easily

There are many bugs that we all write when we write code.

Keeps team members on the same page

  • Domain knowledge 
    How do we share knowledge about our domain to others without sharing code?
  • Project style and architecture
    I’m a big believer in using structured projects like Cookiecutter Data Science and I’m sure there exist alternatives in other languages. Before hand I had a messy workflow like hacked together IPython notebooks and no idea what was what – refactoring code into modules is a good practice for a reason 🙂
  • Programming skills
    I learn a lot myself by reading other peoples code – a lot of the value of being part of an open source project like PyMC3 – is that I learn a lot from reading peoples code 🙂

Other good practices

  • PEP8 and Pylint (according to team standards)
  • Code review often, but by request of the author only

I think it’s a good idea (I think Roland Swingler mentioned this to me)

To not obsess too much about style – having a linter doing that is better, otherwise code reviews can become overly critical and pedantic. This can stop people sharing code and leads to criticism that can shake Junior Engineers in particular – who need psychological safety. As I mature as an Engineer and a Data Scientist I’m aware of this more and more 🙂

Keep code small

  • < 20 minutes, < 100 lines is best
  • Large code reviews make suggestions harder and can lead to bikeshedding

These are my own lessons so far and are based on experience writing code as a Data Scientist – I’d love to hear your views.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s