Interviews with Data Scientists: NLP for the win

Standard

Recently I decided to do some quick Data Analysis of my interviews with data scientists.

It seems natural when you collect a lot of data to explore it and do some data analysis on it.

You can access the code here.
The code isn’t in much depth but it is a simple example of how to use NLTK, and a few other libraries in Python to do some quick data analysis of ‘unstructured’ data.

First question:

What does a word cloud of the data look like?

Word cloud of my Corpus based on interviews published on Dataconomy

Word cloud of my Corpus based on interviews published on Dataconomy

Here we can see above that science, PHD, science, big etc all pop up a lot – which is not surprising given the subject matter.

Then I leveraged NLTK to do some word frequency analysis. Firstly I removed stop words, and punctuation.

I got the following result – unsurprisingly the most common word was data followed by science, however the other words are of interest – since they indicate what professional data scientists talk about in regards their work.

Source: All interviews published on Dataconomy by me until the end of last week – which was the end of September 2015.

barchart_nlp

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s