Part of becoming an efficient data scientist is trying out and learning the tools that work best for you. There are definitely plenty out there to try. Here we have assembled lists of popular FREE software for common data science tasks. If you feel any information is inaccurate or out of date, or if you want to recommend a program to add to the lists, please contact me.

Programs listed with a GREEN BACKGROUND are ones used in this workshop.

Text editors

Your text editor will be your most used program, and will be how you interact with your data, so its important to find one that does exactly what you need!

EditorTypePlatformLink
nanoCommand lineLinux/UnixWebsite
vi/vimCommand lineLinux/Mac/WindowsWebsite
EmacsCommand lineLinux/Mac/WindowsWebsite
Visual Studio CodeGUILinux/Mac/WindowsWebsite
BBEditGUIMacWebsite
geditGUILinux/Mac/WindowsWebsite
Sublime textGUILinux/Mac/WindowsWebsite
AtomGUILinux/Mac/WindowsWebsite
TextMateGUIMacWebsite
Notepad++GUIWindowsWebsite
RstudioIDELinux/Mac/WindowsWebsite
Visual StudioIDEWindowsWebsite
File transfer programs

File transfer programs will allow you to move files between your machine and your lab's/institution's server, or between servers. Sometimes you'll only want to move one or a few files to inspect them and a graphical, drag and drop program is sufficient. Other times though you'll need to be moving thousands of files or very large files and a command line file transfer may be required to automate the process. Below are a list of some popular programs of each type.

Cloud services, like Box, Dropbox, OneDrive, or Google Drive are also extremely useful for syncing folders across devices, but can be difficult to set up on a server. I personally use Box to store all of my active documents and project folders (sans large data), and am able to sync between my home and work computers. These programs may not be free, but be sure to check with your institution, which may offer free accounts with large or even unlimited storage while you work for them

ProgramTypePlatformProtocolLink
WinSCPGUIWindowsFTP/SFTPWebsite
FileZillaGUILinux/Mac/WindowsFTP/SFTPWebsite
CyberduckGUIMac/WindowsFTP/SFTPWebsite
scpCommand lineLinux/Mac/WindowsSSHWebsite
rsyncCommand lineLinux/MacSSH/rsyncWebsite
sftpCommand lineLinux/Mac/WindowsSFTPWebsite
SSH clients to connect to remote servers

SSH is the protocol that allows us to connect our local machine to a remote machine and run commands on it in the terminal. The most common SSH client is openSSH and is widely used. Until recently, however, it was not available on Windows and a third-party client was required. PuTTY is by far the best SSH client for Windows, and is still a great option for older versions or versions without openSSH installed.

ProgramTypePlatformLink
openSSHCommand lineLinux/Mac/WindowsWebsite
PuTTYGUIWindowsWebsite
General genomics programs
ProgramAuthorYearUse casesLinkPaper
bedtoolsQuinnlan and Hall2010Perform operations on sets of genomic coordinates.WebsitePaper
bcftoolsNANAPerform operations on VCF and BCF formatted files.WebsiteNA
samtoolsLi2009Perform operations on SAM/BAM/CRAM formatted files.WebsitePaper
Picard toolsBroad Institute2019Performs many operations on SAM/BAM/CRAM and VCF files.WebsitePaper
gffreadPertea & Pertea2020General purpose GFF file manipulationWebsitePaper
seqtkLiNAA fast and lightweight tool for processing sequences in the FASTA or FASTQ formatWebsiteNA