Tag Archives: IPython

SciPy 2013: Day Four/Five

Day Four: Talking, myself and so many others talking…

I’ve been to several poster sessions since going back to school two summers ago. However, this was the first session that I presented in. My poster was for Scholarly. Scholarly, is a personal research project involving the network of scholarly citations that started off as a flop and has grown into a little monster. Feel free to read more about it on my current projects page, visit the GitHub repository, or download the poster. It was nice to see so many people interested in the project. Almost everyone instantly realized the importance this dataset will have and many asked the question, when can we access the data? Unfortunately, I don’t have a solid answer to that question — yet. But, we are making progress and our data collection servers should be on the web soon! Now, on to what other people spoke about.

Thursday was well, more talks. By the end of the day I was quite exhausted. While, many of them were incredibly interesting I guess I’ve discovered my limit for sitting and listening to people while I clack away at a computer — three days.

One talk I found of particular interest was Analyzing IBM Watson experiments with IPython Notebook by Torsten Bittner from IBM. I won’t go into all of the details but, the Watson team was able to take ~8,350 SLOC written in Java used for training and testing Watson to ~220 SLOC in an IPython notebook. And to make this even more impressive, it ran faster in the notebook. And with that, I’ll move into day five.

Day Five: Let the sprints commence!

Sprints? At a software conference? If you’re picturing 100 sweaty programmers running down hallways for fresh coffee you’re a bit off. There is coffee and occasionally sweat. But, very, very little running.

The sprints are an opportunity for open source projects to get people involved regardless of their skill level. It’s a pretty cool experience. You’re surrounded by programmers from all backgrounds, some legendary, most you’ve never heard of before, but all of whom are very approachable and patient.

After attending the morning pre-sprint session all of the projects distributed themselves in various rooms and went to work. I decided to try something a bit different today.

With all of this ranting and raving I’ve been doing about IPython notebooks I decided I needed to see what I could do with it. Here is the result. I was able to successfully integrate the notebook while benchmarking some search queries in a MongoDB instance populated with fake citation data. It wasn’t cumbersome to do so and documenting as I went along was actually quite enjoyable.

Time for a beer.

-H.

Tagged , , ,

SciPy 2013: Day Three

Opening Remarks:

With tutorials concluding yesterday, today began the talks. This is the largest SciPy to date. The registration increase of ~75% over last year was easily noticeable at the opening remarks as I sat in a packed room of fellow coders and scientists.  Co-hosts Andy and John announced that the primary themes for this years conference are reproducibility and machine learning before introducing the keynote speaker Fernando Perez.

Keynote: Fernando Perez of IPython — IPython: from the shell to a book with a single tool – The method behind the madness

As you would expect from Fernando, the talk was fast paced, informative, and enjoyable. The real icing on the cake however, was the delivery — an IPython Notebook slide show. I’ve already gone into who excited I am about the IPython notebook; I think it will make an awesome medium for teaching CS students. The slide show only further enhances the usefulness of the IPython tool set. -steps down from soapbox- Anyway, I’ve tried to summarize points from Fernando’s talk I found to be of interest.

After a brief set of opening remarks about the amazing spirit of the community and the phases of the research life cycle, Fernando explained some of IPython’s major mile stones:

  • 2001 – First version of IPython (it was only 259 lines of code!) It’s primary goal was to provide a better interactive Python shell.
  • 2004 – Interactive plotting with matplotlib.
  • 2005 – Interactive parallel computing.
  • 2007 – IPython embedding embedding in Wx apps.
  • 2010 – An improved shell and a protocol to go along with it.
  • 2010 – After 5 attempts, a sixth leads to what we now know as IPython notebook.
  • 2010 – Sharing notebooks with zero-install via nbviewer
  • 2012 – Reproducible research with IPython.parallel and StarCluster
  • 2012 – IPython notebook-based technical blogging
  • 2013 – The first White House hackathon (IPython and NetworkX go to DC)
  • 2013 – IPython notebook-based books: “Literate Computing” Probabilistic Programming and Bayesian Methods for Hackers.

Continuing, Fernando explained many of the lessons he’s learned since starting the project, highlighted alternative use cases written around IPython and IPython notebooks, thanked the community, and gave us some ideas into what lays ahead for IPython (1.0 in a few weeks!). Once the video becomes available, I will be sure to add it here and I highly recommend you watch it!

“The purpose of computing is insight, not numbers” — Hamming ’62

-H.

Tagged , , ,

SciPy 2013: Day Two

Tutorial Three: An Introduction to scikit-learn (I) – Gaël Varoquaux, Jake Vanderplas, Olivier Grisel

For a long time I’ve been very curious about machine learning. Up to this point it’s appeared to me much like a mystical unicorn. They seem really cool but you never really know much about them. This tutorial provided me with an excellent chance to break that mysticism down.

After a brief introduction to scikit-learn and a refresher on numpy/matplotlib we used IPython notebooks to walk through basic examples of what the suite is capable of. We then moved into a quick overview of what machine learning is and some common tactics for tackling data analysis. Now that we were a bit more familiar with the suite itself and machine learning principles, we moved onto more complex examples.

Again, using IPython notebooks we walked through examples of supervised learning (classification and regression), unsupervised learning (clustering and dimensionality reduction), and using PCA for data visualization. We ended the morning session with a couple of more advanced supervised learning examples (determining numbers of hand written digits and Boston house prices based on various factors) and an advanced unsupervised learning example in which we analyzed over 20,000 text articles to determine from which of four categories they likely originated.

One note for further research: How much data should be used for train vs test data? What factors play a role in this and are there any common standards or practices which researchers follow?

Tutorial Four: Statistical Data Analysis in Python – Christopher Fonnesbeck

Statistics is an area of for me. Combine that interest with Python Pandas and you’ve got an instant winner, right? Not exactly. While the talk was tagged for beginners it proved to be otherwise.

The speaker clearly had a very strong background in statistics. However, those that background didn’t transition into an easy to follow talk. The statistics language was very far above me and most of the room — if my observations were correct. Additionally, the version of pandas he used wasn’t the same as the version in the required packages noted in the talks description. This resulted in the majority of us not being able to follow along in IPython notebooks and being forced to watch him on the projector.

Please, don’t mistake this for a whine session. Chris knew his stuff and he was able to answer everyones’ questions and smashed some ‘stump the chump’ attempts without batting an eye. But, the talk should have been refined and rehearsed and versions of required packages should have been vetted earlier.

You can’t win them all, right?

-H.

Tagged , , , , ,

SciPy 2013: Day One

Registration:

This year I am fortunate enough to be able to attend SciPy. SciPy is a Python conference focused on scientific programming. A big shout out to the Center for Open Science for making this trip a possibility. This will be the first of a (hopefully) daily blog series in which I will briefly cover how my day went and any lasting impressions it left me with.

The conference organizers made checking in  quick and painless. We were served a breakfast buffet that was surprisingly good. Of paticular interest to me were the scrambed egg mini-bagels and lemon poppyseed bread slices — yum. Breakfast as followed by a series of tutorials which registrants chose in advance.

Tutorial One: Guide to Symbolic computing with SymPy — Ondřej Certik, Mateusz Paprocki, Aaron Meurer

The SymPy tutorial was … interesting. We simulataneously listened to the lecturer discuss and demonstrate common SymPy functions while completing examples in various IPython notebooks that they provided us with. It was fast paced, too fast for me anyway, and we had to skip over a lot of material.

The tutorial docs recommended experience with IPython and ‘basic mathematics.’ However, I was quite surprised how far my definition of basic mathematics was from theirs. Unfortunately, this left me struggling to keep up with the tutorial even from early on. After a mid-session break, we briefly covered Calculus functionality before being introduced to some real world applications. This is where I became utterly lost.

SymPy’s original lead developer showed us several examples of his use of SymPy while preparing his dissertation in Chemical Physics. These included Poisson Equations, Fast Fourier Transform(s) (FFTs), and Variation with Lagrange multipliers. Don’t know what some (any) of those are? It’s okay, neither do I. On a side note, they did show an interesting ‘hack’ using javascript injectin in an iPython notebook which allowed them to manipulate 3D figures.

While the tutorial itself felt a bit unpolished, the instructors knew there stuff. All in all SymPy seems like a really interesting tool which I plan to use. When combined with IPython notebooks I believe it could create very powerful, long lasting notes for a variety of math intensive classes. I’ll be testing this out next semester in Physics.

Tutorial Two: IPython in depth — Fernando Perez, Brian Granger

Anyone who has listened to either Fernando or Brian could have told you that his tutorial was going to be good. It was. They provided a solid tutorial environment with IPython notebooks that kept me feeling like I was actively working with them throughout the entire tutorial. Whenever anyone had a question they knew how to answer them quickly and concisely.

A few things I found of paticular interest:

  1. IPython Notebook: If haven’t heard of this, click the link and check out. I’m not kidding. This is a versatile web tool that is incredibly powerful. For some cool examples of what people have done with notebook (including writing a book!) click here.
  2. Awesome help functionality: With IPython’s built in help functionality  [ ?, ??,  %quickref, and %magic ] you can quickly get a syntax highlighted help description, the source for a module, or even access a nifty quick reference guide mostly eliminating the need  to pop out of a notebook or console and visit online docs.
  3. Kickass debugger: IPython’s shell is amazing. But, I’ve found myself using PyCharm for more advanced bit of code while debugging. After learning about IPython’s magic %debug and %run -d theprogram.py that may have changed. They provide you with very powerful and easy to use debugging abilities I wasn’t even aware existed.

Day one down. Time for sleep.

-H.

Tagged , , , , ,