Resources
The FAS Informatics Group creates resources for bioinformatics analysis in the form of tutorials, walkthroughs, and both online and in-person workshops. We have also compiled links to other online resources.
Terms
Here's some helpful terminology that we use throughout our trainings. Let us know if there is something we should add!
Current workshops
Below is a list of all current workshops the Informatics Group runs. Workshop files may be temporarily unavailable as we update them during ongoing sessions.
Introduction to Python Intensive V 2.0 (January 2025)
This workshop intends to be both an introduction to the concept of programming using python and an introduction to using python as a data science language. The first three days will be able the concept of programming while the last 3 days will be focused on introductory data analysis. During the course, you can find the jupyter notebooks below.
- Day 1: Python basics and intro to control
- Day 2: Intro to writing functions
- Day 3: Metaprogramming tips and more advanced function writing
- Day 4: Numpy arrays, reading and writing files
- Day 5: Pandas dataframes and plotting
- Day 6: Using the internet & LLMs and a longer data analysis exercise
Past Workshops
Introduction to R (Fall 2023)
This workshop aims to introduce first-time users to the R programming language and the RStudio development environment. We will provide a basic introduction to coding in R and then shift to data manipulation using the tidyverse, a set of R libraries designed to handle data tables in a consistent and easy way. Then, we'll learn how to generate some basic plots to explore our data using ggplot. You do not need any prior programming experience to take this workshop. But also note that this workshop is not a comprehensive programming class nor a comprehensive statistics class. The main goal of this workshop is to get you familiar with reading your data into R and performing basic operations and generating figures.
- Workshop information
- Get started
- Part 1: Introduction to R syntax - Download student RMD file
- Part 2: Introduction to data manipulation with the tidyverse - Download student RMD file
- Part 3: Introduction to data visualization with ggplot - Download student RMD file
Unix tips and tricks for bioinformatics (Spring 2024)
This workshop aims to introduce students to some basic bioinformatics file formats, tools, and general best practices. The first two days of the workshop will be dedicated to introductions of bioinformatics file formats and the command line tools that we use to view, manipulate, and analyze them. After that, we will begin to shift from using individual commands to writing shell scripts and constructing bioinformatics workflows.
- Workshop information
-
Part 1: Bioinformatics tools and file formats 1: FASTA, FASTQ, grep, BAM/SAM, samtools - Download student RMD file
- Part 2: Bioinformatics tools and file formats 2: bed, awk, bedtools - Download student RMD file
- Part 3: Bioinformatics tools and file formats 3: GFF, VCF, bcftools - Download student RMD file
- Part 4: Shell scripting - Download student RMD file
Healthy Habits for Data Science (Spring 2024)
This workshop aims to teach students how to be more effective at working on their projects using reproducible habits. We learn how to organize projects on the local machine as well as the Cannon cluster, how to manage software environments, how to use git and GitHub to track code changes, and how to write and scale scripts on an HPC. Loose transcripts of the lectures are available below. Download the pdfs of the slides (if applicable) to follow along with the lecture.
- Workshop Information
-
Day 1: Reproducibility, and project organization - Download pdf of slides
- Day 2: Installing and Managing Software
- Day 3: Version control with git and GitHub - Download pdf of slides
- Day 4: Running scripts on the Cannon cluster
Intro to Python Intensive (Fall 2024)
This is a four-day workshop that will introduce students to python as a data science language. This assumes no prior knowledge of python, but will move at a quick pace to cover all the content. The workshop meets for 3 hours for 4 sessions.
- Day 1: Whirlwind tour of Python, covering the basic concepts of data types, data structures, functions, and plotting in a broad overview
- Day 2: Deep dive into python functions, with more about how to write functions in python with additional practice exercises
- Day 3: Data structures with numpy and pandas, with some plotting with matplotlib
- Day 4: Putting it all together: We will cover some meta-cognitive tips & tricks as well as work through a longer exercise that combines the previous 3 days of concepts. Plus time for additional Q&A.
One hour workshops: (Fall 2024)
- Project organization & Data management
- Git & GitHub introduction
- Installing & Managing software (Conda, Containers)
- Submitting your first SLURM script or job array
- Data Transformation with R Tidyverse - Download RMD file
- Plotting with R ggplot - Download RMD file
- Introduction to Genome Annotation
- Workflow Management, nextflow demonstration - pdf of slides available here
- scRNA analysis introduction - Download the RMD file
- Scaling SLURM scripts on the HPC and benchmarking
- SNPArcher tutorial: A snakemake workflow for variant calling in non-model organisms
External resources
We have compiled a list of external resources and tagged them with the categories below. Click on each tag to see the links!