Differential Splicing Tutorial

This tutorial demonstrates differential splicing analysis with leafcutter-ds (frequentist) or leafcutter-bayes (Bayesian).

Prerequisites

Clustering output from leafcutter-cluster (see Clustering Tutorial)
A sample groups file
Optional: exons file

Prepare groups file

Create a tab-separated file assigning each sample to a group:

sample1  control
sample2  control
sample3  case
sample4  case

For continuous phenotypes (e.g. age, RIN), use numeric values in the second column. Additional columns specify covariates/confounders:

sample1  control  batch1  25
sample2  control  batch2  30
sample3  case     batch1  27
sample4  case     batch2  31

Numeric covariate columns are z-scored; string columns are one-hot encoded (first level dropped).

Optional: exons file

This is a tsv file derived from a gtf file that has the following format:

chr	start	end	strand	gene_name
chromosome	exon start	exon end	gene strand	gene name

This can be prepared from a gtf using the following command:

leafcutter-gtf-to-exons input_gtf output_file

This supports gzip compressed input.

Run differential splicing

leafcutter-ds \
    my_study_perind_numers.counts.gz \
    groups.txt \
    --output_prefix ds_results \
    --exon_file gencode_exons.txt.gz

Interpret output

Both tools write two output files.

`<prefix>_cluster_significance.txt`

One row per tested cluster:

Column	Description
`cluster`	Cluster ID in `chr:clu_N_strand` format
`status`	Test outcome: `tested`, `NotEnoughSamples`, `SingleJunc`, etc.
`loglr`	Log-likelihood ratio statistic
`df`	Degrees of freedom
`p`	Nominal p-value
`p.adjust`	BH-adjusted p-value across all clusters
`genes`	Gene name(s) overlapping this cluster (requires `--exon_file`)
`annotations`	Splicing consequence categories present in the cluster (leafcutter2 annotated input only)

`<prefix>_effect_sizes.txt`

One row per intron:

Column	Description
`intron`	Intron ID in `chr:start:end:clu_N_strand` format
`logef_<group>`	Log effect size for each non-baseline group
`psi_<baseline>`	Estimated percent-spliced-in (PSI) in the baseline group
`psi_<group>`	Estimated PSI in the comparison group
`deltapsi_<group>`	Difference in PSI (`psi_<group>` − `psi_<baseline>`)

If using the leafcutter2 workflow, the intron will also have the predicted intron annotation appended to it (so chr:start:end:clu_N_strand:annotation), which can be any of the following: - PR (productive/protein-coding) - UP (unproductive) - NE (ambiguous in their functional effect) - IN (intergenic)

Options reference

The following options apply to both leafcutter-ds.

Positional arguments

Argument	Description
`counts_file`	Intron usage counts (`.txt` or `.txt.gz`), output of `leafcutter-cluster`
`groups_file`	Tab-separated: column 1 = sample name, column 2 = group/phenotype, optional further columns = covariates

Optional arguments

Option	Default	Description
`-0 / --baseline_group`	`Control`	Reference group for categorical comparisons; other groups are tested against this
`-o / --output_prefix`	`leafcutter_ds`	Prefix for the two output files
`-e / --exon_file`	—	Exon annotation file (`chr`, `start`, `end`, `strand`, `gene_name`). Used only to label clusters with gene names
`-s / --max_cluster_size`	∞	Skip clusters with more introns than this
`-i / --min_samples_per_intron`	`5`	Discard introns with at least one read in fewer than this many samples
`-g / --min_samples_per_group`	`3`	For categorical comparisons: require this many samples per group to have ≥ `min_coverage` reads
`-c / --min_coverage`	`20`	Minimum total reads in a cluster for it to be tested
`-u / --min_unique_vals`	`10`	For continuous phenotypes: minimum number of distinct phenotype values after coverage filtering
`--init`	`brr`	Initialisation strategy: `brr` (Bayesian ridge regression), `rr` (ridge regression), `mult` (multinomial logistic), or `0` (zero)
`-p / --num_threads`	`1`	Number of parallel threads (`leafcutter-ds` only)
`--timeit`	`False`	Print a breakdown of time spent at each stage (for benchmarking)

Proceed to Visualization to visualize splicing differences from leafcutter-ds.