Welcome to the Harvard Informatics Bioinformatics Tips & Tricks workshop!

This web page will guide you through some of the activities we have planned for you today!

Instructors

Danielle Khost: A bioinformatics scientist in the FAS Informatics group at Harvard University.

Nathan Weeks: A research application developer in the FAS Informatics group at Harvard University.

Gregg Thomas: A bioinformatics scientist in the FAS Informatics group at Harvard University and recent postdoc at the University of Montana where he studied the phylogenetics and comparative genomics of the mouse and rat radiation. He got his PhD at Indiana University where he worked on comparative genomics of arthropods, mutation rate evolution in primates, and convergent evolution using comparative genomics. In general, Gregg uses and develops computational methods to study molecular evolution and phylogenetics to determine what forces drive divergence and adaptation between species.

Lei Ma received her PhD from the MIT-WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering. Her dissertation focused on the ecology of marine microorganisms in coral reefs and in Atlantic killifish. She is particularly interested in genotype-environment-microbiome interactions in animal hosts, such as the influence of host evolution on its microbiome. Other interests include mentoring, finding coding shortcuts, cats, video games, sci-fi, and knitting.

Tim Sackton: Director of the FAS Informatics group at Harvard University.

Workshop Summary & Outline

This workshop aims to introduce students to some basic bioinformatics file formats, tools, and general best practices. The first two days of the workshop will be dedicated to introductions of bioinformatics file formats and the command line tools that we use to view, manipulate, and analyze them. After that, we will begin to shift from using individual commands to writing scripts and constructing bioinformatics workflows, including setting up environments with conda, and interacting with the job scheduling software on the cluster, SLURM.

Here is a brief outline of the topics we'll be covering:

Day 1: Bioinformatics Tools, part 1

Wednesday November 8th, 9:30 am - 12:30 pm
Location: Jefferson Building room 453
  • Sequence files (FASTA, FASTQ)
  • Intro to commands useful for bioinformatics (grep, awk)
  • Alignment files (BAM/SAM) and samtools
  • Introduction to piping and redirecting

Day 2: Bioinformatics Tools, part 2

Thursday November 9th, 9:30 am - 12:30 pm
Location: Jefferson Building room 453
  • More on piping and redirecting
  • Interval files (bed, GFF)
  • More on grep and awk
  • Introduction to bedtools

Day 3: Shell scripting, part 1

Wednesday November 15th, 9:30 am - 12:30 pm
Location: Jefferson Building room 453
  • More about interval files (bed, GFF)
  • Variant files (VCF)
  • Introduction to bcftools
  • Shell scripting

Day 4: Shell scripting, part 2

Friday November 17th, 9:30 am - 12:30 pm
Location: CGIS South, S250
  • Loops
  • Conditional statements
  • Handling command line arguments in shell scripts
  • Reproducibility best practices

Click the Get Started link below to read some info before class. Additional links to resources will appear for each day of the workshop.


Get Started