A map of the PyData Stack

One question you have when you use Python is what do I do with my data. How do I process it and analyze it. The aim of this flow chart is to simply provide a simple to use ‘map’ of the PyData stack.

At PyData Amsterdam I’ll present this and explain it in more detail but I hope this helps.


Thanks to Thomas Wiecki, Matt Rocklin, Stephan Hoyer and Rob Story for their feedback and discussion over the last year about this kind of problem. There’ll be a few iterations based on their feedback.

CC-0 (Creative Commons-0) 2016 Peadar Coyle


(I’ll share the source file eventually).


One thought on “A map of the PyData Stack

  1. This looks great! General comments:

    The “scientific data” category seems like it could overlap with tabular, or array data. Indeed the two projects listed xarray and bcolz fall into these two separate categories.

    XArray has two benefits, labeled axes and out-of-core. There are lots of in-memory cases where you would want to use xarray and lots of out-of-core cases where you would want to avoid it and just use the underlying dask.array library (e.g. machine learning.)

    Castra was an experiment. It’s not a super-stable format and I don’t promise to maintain it. It’s neat and performant, I just like to provide disclaimers whenever possible.

    MRJob, PySpark, Storm are all notably absent. Not sure if you wanted to bring those in or not.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s