The insightful Data Scientist Trey Causey talks about Software Development Skills for Data Scientists I’m going to write about my views on Code Review – as a Data Scientist with a few years experience, and experience delivering Data Products at organizations of varying sizes. I’m not perfect and I’m still maturing as an Engineer.
A good thorough introduction to Code Review comes from the excellent team at Lyst I suggest that as follow up reading!
The fundamental nugget is that ‘code reviews allow you to more effectively collaborate with your peers‘ and a lot of new Engineers and Data Scientists don’t know how to do that. This is one reason why I wrote ‘soft skills for data scientists‘. This article talks about a technical skill but I consider this a kind of ‘technical communication’.
Here are some views on ‘why code review’ – I share them here as reference, largely to remind myself. I steal a lot of these from this video series.
- Peer to peer quality engineering and training
As a Data Science community that is forming – and with us coming from various backgrounds there’s a lot of invaluable knowledge from others in the team. Don’t waste your chance at getting that 🙂
- Catches bugs easily
There are many bugs that we all write when we write code.
Keeps team members on the same page
- Domain knowledge
How do we share knowledge about our domain to others without sharing code?
- Project style and architecture
I’m a big believer in using structured projects like Cookiecutter Data Science and I’m sure there exist alternatives in other languages. Before hand I had a messy workflow like hacked together IPython notebooks and no idea what was what – refactoring code into modules is a good practice for a reason 🙂
- Programming skills
I learn a lot myself by reading other peoples code – a lot of the value of being part of an open source project like PyMC3 – is that I learn a lot from reading peoples code 🙂
Other good practices
I think it’s a good idea (I think Roland Swingler mentioned this to me)
To not obsess too much about style – having a linter doing that is better, otherwise code reviews can become overly critical and pedantic. This can stop people sharing code and leads to criticism that can shake Junior Engineers in particular – who need psychological safety. As I mature as an Engineer and a Data Scientist I’m aware of this more and more 🙂
Keep code small
- < 20 minutes, < 100 lines is best
- Large code reviews make suggestions harder and can lead to bikeshedding
These are my own lessons so far and are based on experience writing code as a Data Scientist – I’d love to hear your views.