Data Science tools and processes

Standard

I’ve recently been experimenting with some Data Science tools and methodologies.

The first link is
Data Products how do we get there which discusses what methodologies people in the data science world use. I personally use one not used there called OSEMN – Obtain data, Scrub data, Explore data, Model data, Interpret results. Still the link is interesting. I’ve use CRISP-DM in a project as well, I found CRISP-DM suited a more report based and process based culture, whereas OSEMN allowed you to work in a more agile environment.

One of the challenges I find is finding the right tools to disseminate your ideas. So recently I’ve been learning how to use Flask and Jinja2 (for emails and automated reports) but I also came across an easier solution which is
runipy which can be used for report automation as well. This integrates well into my Ipython reporting workflow, and together with a cron job this could be very powerful. For say if you need to produce regularly a report for a metrics deck or something similar. An advantage of this sort of workflow is that it is reproducible and debuggable.

Python is getting a lot better tooling for these reporting challenges, and a sign that the python stack is getting even better. Unfortunately we’re not quite at Shiny or Kitnr level, but we’re getting there.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s