Workshop Information

Topics

The format of the workshop includes a combination of short instructional videos, special guest lectures, real-data analyses practices and live demonstration sessions. The workshop aims to facilitate the learning of practical bioinformatics skills and build familiarity and basic competency. Using established tools and publicly available resources, we will focus on the analyses and interpretation of genomic and genetic data, making it more suitable for researchers with limited big data analytical skills.

Unit 0: COVID and Microbiome Data Analysis

In this special unit, we will introduce infectious and immune-mediated disease-related data sets, including COVID-19 and microbiome data. We will utilize literate programming techniques, including Jupyter Notebooks, to provide biological motivation and guided exploration of these rich datasets. Basic statistical analysis and R programming will be discussed.

Example dataset used in the COVID data analysis is part of the study of Assis et al, npj Vaccines, 2021.
Example datasets used in the microbiome data analysis are curated datasets from Bioconductor packges curatedMetagenomicData and the microbiomeDataSets package.

Unit 1: Transcriptomic Analyses

This session covers a complete analysis workflow for bulk and single-cell RNA-Seq, and an introduction to spatial transcriptomics. It will cover experimental design, quality control, read mapping, differential expression analyses, as well as pathway and enrichment analyses. The single-cell RNA-seq part will also cover methods for unsupervised clustering and detection of subpopulations of cells.

Participants will work on example datasets using a high-performance computing (HPC) environment on Anvil supercomputer, with combined Unix command-line (bash) tools and R packages in a Jupyter notebook interface. Signature bash tools for hands-on analysis of this session include SRA toolkit, STAR Aligner, and signature R tools include edgeR and Seurat.

Example datasets used for this session are from GEO database and are part of the studies from Yun, et al., Oncotarget., 2017 and Vickman, et al., The Prostate, 2019.

Unit 2: Epigenomic Analyses

This session covers epigenomics data analysis from ChIP-seq and ATAC-seq data. Participants will analyze example epigenomic datasets, using command-line tools for preprocessing and peak or accessibility calling, followed by R/Bioconductor packages in Jupyter to integrate signal tracks, perform quality checks, and relate epigenomic signatures to functional genomic regions. Example tools used in this session include bash tools, such as Bowtie2, and R packages, such as GenomicAlignments, DiffBind, TFBSTools, rGADEM, etc.

Example dataset used in this session is part of the study of Pancholi, et al, Oncogene, 2020.

Unit 3: Genome-wide Association Study

This session will focus on the single-nucleotide polymorphism (SNP) based genome wide association analysis. Topics include sample and SNP quality control, association tests, logistic regression for case-control studies, linear regression for continuous traits, gene-gene and gene-environment interactions. Lectures will also cover how to visualize the data and analysis results using popular packages.

Unit 4: Network Analysis

In this session, we will introduce the basic concepts and general ideas in constructing gene regulatory networks (GRNs), and focus on genomics studies for integrative analysis of transcriptomic and genomic data. With a case study in cancer research, we will go through cis-eQTL analysis and a state-of-the-art parallel algorithm 2SPLS to construct genome-wide GRNs. In addition, lectures will cover exploring the results by popular bioinformatics tools including STRING and Ingenuity Pathway Analysis.

Current Schedule

Download PDF Schedule

Previous Schedules

APPLY TODAY