Thursday, June 26, 2014

MATLAB or Python?

If you’re new to research or looking to expand your skill set, you might be asking yourself “Should I learn Python?” I think the answer to this question is usually a resounding “Yes.”
Here are 6 reasons you may want to consider learning this up-and-coming language:
0) Python is open source
This means it’s freely available and can be modified and built upon by anyone. Open source software works like this: programmers from lots of different backgrounds contribute code, which is then reviewed and edited by other programmers and eventually approved and packaged into the Python language.
Using open source software for research has many benefits. In the same way that open access journals make published scientific papers available to everyone, open source software makes it easier for people to benefit from and contribute to scientific code. The fact that open source software is free eliminates the need for pesky and expensive institutional licenses and creates the potential for contributors who are not affiliated with a university to make tools that researchers can use. 
1) Python libraries adapt quickly
In addition to the core of the language there are also a large number of extensions called “libraries”. Libraries are similar to MATLAB’s Toolboxes except there are many more of them, they tend to be much smaller and more specific, and they are usually distributed independently. Python libraries develop very organically - someone realizes the need for a certain feature for their project, they write the code and make it available for anyone to download. Users then submit bug reports, suggest features, and sometimes contribute to the project directly. 
As more Python tools have become available for scientific research, more researchers have started using python, and the quality and diversity of the tools is increasing. This decentralized model allows researchers to start using new features in a matter of weeks (instead of years) and it encourages specialized research communities to work together on developing tools for new scientific endeavors.
2) Numpy, Scipy, and Matplotlib have most of what you need
These three libraries form the backbone of Python for researchers. 
  • Numpy allows you to do all sorts of mathematical operations and data restructuring. 
  • Scipy does pretty much everything else: importing and exporting data (including to and from MATLAB), statistical functions, matrix operations, FFT’s and other signal processing, linear algebra, image processing, and a large number of other specialized mathematical and scientific features. 
  • Matplotlib is a flexible library for creating research-style plots similar to those in MATLAB. The library has a very wide range in terms of functionality (i.e. types of plots), and is sometimes easier to customize than MATLAB. (However, I am not a big fan of either MATLAB or Matplotlib in terms of style. I’m still waiting for a plotting library that doesn’t require me to use Adobe Illustrator for my final posters and presentations.)
That being said, pretty much anything you would want to do with basic MATLAB you can do with Numpy, Scipy, and Matplotlib.
3) Pandas is an amazing time saver
The number one reason I recommend Python is the Pandas library. Pandas is a tool for manipulating and analyzing data. It uses a structure called a data frame (copied from R), which allows you to sort, filter, and group your data. Pandas is seamlessly integrated with Numpy, Scipy, and Matplotlib. This means you can easily run statistical tests on your data in various groupings and with various filters. Pandas also has a number of tools for dealing with missing data and time series data which are particularly suited to many research purposes.
The Pandas workflow is really much simpler and more efficient than anything I've seen or done in MATLAB. Here is an example:

Raw data may be imported from a .mat file like this
import scipy.io
import pandas 
mat = scipy.io.loadmat( 'file.mat' )
df = pandas.DataFrame( mat[ 'saved_variable_name' ] )
The imported matrix is organized as a data frame. The next step is to rename the columns. This is as easy as
df.columns = ['trial', 'stim_value', 'response', 'etc']
Now it's no longer necessary to remember which columns are which variables, and the likelihood of thoughtless errors later in the analysis is greatly reduced. Let's say we only want to look at the data where the stimulus value is positive. We simply say
df_filtered = df[ df['stim_value'] > 0 ]
You can then analyze the data something like this
df_sum = df_filtered.sum()
And finally, we can plot the data as a bar graph with
df_sum.plot( type='bar' )
That's it! Pandas is really easy to use, and it's by far my first choice for data analysis. It's also a good place to start if you want to begin using Python in your research. Often data analysis is separate from the hardware and proprietary software demands of the experiment itself, and researchers tend to code up the analysis on their own and only share the output with their collaborators, thus alleviating some of the costs of switching to a new language (more about that below). You can also check out this 10 minute introduction to Pandas.
4) IPython Notebook makes writing and sharing code easy
IPython is an interactive command shell (like MATLAB's command window) that allows you to import libraries and data into a temporary workspace. IPython Notebook is a huge and beautiful extension on this idea of interactive computing. The Notebook is a rich text web interface for IPython, which means it runs in the browser (e.g. Firefox). It allows you to create annotated code snippets, including inline plots and mathematical formulas with LaTex support. Some people have even created blogs using IPython Notebook. I can't really do it justice, so if you want more information check out these links!
5) Transferable skills
Lastly, in the event you are considering a career outside of academia, any transferable skills you can develop while doing research are incredibly valuable. Python is widely used in industry, from tiny startups to major companies like Google and Dropbox. Learning it might be your ticket to a new career.

Of course, there are some costs to switching. In many fields MATLAB is still the predominant language, making it difficult to use Python when collaborating with others. Additionally, there is the issue of legacy code. Some MATLAB toolboxes and packages (for example Psychtoolbox and MrVista) have yet to be completely replicated in Python (although that is changing - check out Psychopy and Lyman). A tougher challenge, however, is the legacy code passed down from researcher to researcher in individual labs. Having to use and maintain legacy lab code often forces new researchers to learn MATLAB whether they want to or not. Yet, even with the larger initial cost of learning two languages, I would still argue that learning Python saves you time in the long-term.

Need even more reasons to learn Python? Here are two other blog posts extolling Python's virtues from an entirely different perspective:

Convinced, but don't know where to begin? Here are some resources for learning Python from the beginning and for switching from MATLAB.

  • You can download Python and all of its major scientific libraries for free through Enthought. Other libraries can be downloaded directly from their respective websites or through a software package manager such as homebrew or pip.
  • A great tutorial for switching from MATLAB to Python may be found here.
  • Two excellent introductory Python tutorials: here, and here.
  • To learn more, I recommend reading through the documentation for various libraries you might use and following along with the exercises on your own computer (for example, here is the Pandas documentation).

Happy coding!

No comments:

Post a Comment