This page describes the use of Python on the Odyssey cluster. See also the RC Website and the Practical Python on Odyssey tutorial

The Odyssey system Python is 2.6.6 and you can't do much with it

As of this writing, Odyssey uses a CentOS 6.8 operating system, which includes Python version 2.6.6. In addition to being very old, the site-packages directory is owned by root. As a result, any attempt to install a python package will be rebuffed.

On Odyssey, Python should be used within an Anaconda environment

Python is an extremely popular interpreted programming language in bioinformatics that has an array of excellent packages for scientific computing. It is not uncommon to see module systems on academic clusters provide some of the major Python packages like scipy and numpy. However, in our experience, this leads to complex PYTHONPATHs that can create incompatible environments.

Use of the Anaconda distribution from Continuum Analytics makes local control of Python package sets much easier. In addition to a large set of pre-installed packages, it is easy to create a local environment in your home directory and use that to define your environment.

You can use Python and Anaconda on Odyssey by running:

$ module load python/2.7.11-fasrc01

Loading this python module will load the 2.5.0 version of the Anaconda distribution.

(as of this writing, 2.7.6 is the Odyssey Python default, though that version is getting old enough to cause problems with default package repositories).

Anaconda has a concept of environments that can be used to manage alternative package sets and versions. This analogous to the environments that can be setup with virtualenv. For example, if you want newer versions of some packages in the default environment, you can make a new environment with your customizations.

First, load the base environment module, then create a new environment by cloning the Anaconda root distribution (substitute ody with whatever name you wish) and activate it.

$ module load python/2.7.11-fasrc01
$ conda create -n ody --clone $PYTHON_HOME
$ source activate ody
(ody) $ python
Python 2.7.13 |Anaconda 2.5.0 (64-bit)| (default, Dec 20 2016, 23:09:15) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>>

If you want to use this environment all the time, add the above line to your ~/.bashrc (or other appropriate shell config file) after the line that loads the module.

You can see where your clone is installed by doing a which python when it is activated.

To stop using the custom environment, run:

$ source deactivate

You can share a clone with collaborators by installing with a full path (-p)

When creating a clone with the -n switch, you are "naming" the clone and installing it under your home directory. On Odyssey, this is typically under $HOME/.conda/envs.

To share a clone with members of your laboratory or a collaborator, you can install it under a shared location using the full path (-p):

$ module load python/2.7.11-fasrc01
$ conda create -p /n/pi_lab/software/envs/picrust --clone $PYTHON_HOME
$ source activate /n/pi_lab/software/envs/picrust
(/n/pi_lab/software/envs/picrust) $

You can activate a clone by updating your PATH

The standard way to activate a clone is to use the source activate command after loading the appropriate python module. This only does 2 things: 1) add the bin directory of your clone to your PATH variable and 2) setup the prompt to include the name of the clone. The ability to source deactivate is also handy.

However, if you only use a single clone and don't need the prompt modification, you can simply add the bin directory of your clone to your PATH variable directly. Then you don't even need to load the python module.

$ export PATH=$HOME/.conda/envs/ody/bin:$PATH
$ which python
~/.conda/envs/ody/bin/python

No modifications of PYTHONPATH are necessary.

Within a conda clone, you can install python packages with conda install.

In addition to environment management, Anaconda provides package installation through the conda install command. This works just like pip for pure Python packages. However conda packages can go above and beyond just Python and include compiled C libraries and executables. For example, the conda install of PySam includes the samtools C libraries:

(ody) $ conda install pysam
Fetching package metadata ...............
Solving package specifications: .

Package plan for installation in environment /n/home_rc/akitzmiller/.conda/envs/ody:

The following NEW packages will be INSTALLED:

    bcftools: 1.5-0                bioconda   
    bzip2:    1.0.6-1              conda-forge
    htslib:   1.5-0                bioconda   
    pysam:    0.11.2.2-htslib1.5_2 bioconda

The following packages will be UPDATED:

    samtools: 1.4-0                bioconda    --> 1.5-0 bioconda

Proceed ([y]/n)? y

bzip2-1.0.6-1. 100% |#############################################################| Time: 0:00:00   2.21 MB/s
bcftools-1.5-0 100% |#############################################################| Time: 0:00:00   7.60 MB/s
htslib-1.5-0.t 100% |#############################################################| Time: 0:00:00  14.61 MB/s
samtools-1.5-0 100% |#############################################################| Time: 0:00:00  15.20 MB/s
pysam-0.11.2.2 100% |#############################################################| Time: 0:00:00  12.58 MB/s

(ody)$

Note the use of "conda-forge" and "bioconda". These are "channels" that provide specialized sets of conda packages. You can add a channel like so:

$ conda config --add channels conda-forge
$ conda config --add channels bioconda

You do not have to have a clone activated to do this; the command adds text to the .condarc file in your home directory.

The bioconda channel can be particularly useful for informatics software setup.

Within a conda clone, you can also install packages with pip.

Most pip installs will be successfull:

(ody) $ pip install biom-format
Collecting biom-format
  Downloading biom-format-2.1.6.tar.gz (860kB)
    100% |████████████████████████████████| 870kB 496kB/s 
Collecting pandas>=0.19.2 (from biom-format)
  Downloading pandas-0.20.3-cp27-cp27mu-manylinux1_x86_64.whl (22.4MB)
    100% |████████████████████████████████| 22.4MB 29kB/s 
Building wheels for collected packages: biom-format
  Running setup.py bdist_wheel for biom-format ... done
Successfully built biom-format
Installing collected packages: pandas, biom-format
  Found existing installation: pandas 0.17.1
    Uninstalling pandas-0.17.1:
      Successfully uninstalled pandas-0.17.1
Successfully installed biom-format-2.1.6 pandas-0.20.3
(ody) $

pip or python setup.py installs are needed when linking to system-tuned libraries

Many Python packages used in informatics and scientific computing in general are Python wrappers on top of high performance, system-tuned libraries like MPI or blas/lapack. As a result, it is preferable in these cases to do a Python source install via pip or python setup.py rather than conda install.

For example, if you want to use MPI to parallelize some python calculations, you'll want to install mpi4py against your preferred Odyssey MPI module.

$ module load python/2.7.11-fasrc01
$ source activate ody
(ody) $ module load gcc/5.2.0-fasrc01 openmpi/2.0.1-fasrc01
(ody) $ pip install mpi4py
Collecting mpi4py
Installing collected packages: mpi4py
Successfully installed mpi4py-2.0.0