Description

In this part of the project, Murine data was utilised.

  • Date started: 2020, November 18
  • Date projected: 2020, December 20
  • Date finished:

We're using the single-cell toolkit pipelines developed at Vijay's lab.

\(\checkmark\) Demultiplexing libraries

You can either use mkfastq or bcl2fastq. In this case we did it with bcl2fastq.

  • Please make sure names are appropriate and consistent across all runs.

Appropriate: e. g.: 102_B2_CD4_STIM_8D, 103_B2_CD8_STIM; the first part may not be really necessary (if you allow enough information to make the library unique). and the second name is missing a section (8D).

Consistent: e. g.: B1_CD4_STIM, B2_STIM_CD4.; the cell type and stimulation status changed in the second batch.

  • Prepare IEM sample sheet [carefully choosing the names].

Results are un run NV037.

\(\checkmark\) Demultiplexing cells

Pre-processing data with Cell Ranger. We have a wrapper for this if you don't want to prepare a script for each of your samples.

  1. Prepare feature reference and aggregation files.
  2. Check you're taking the right reference genome.

    sh /home/ciro/scripts/cellranger_wrappeR/run.sh -h
    sh /home/ciro/scripts/cellranger_wrappeR/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_cellranger_NV037.yaml -v

Some jobs will be created and you will need to wait a long time.1 Then you run the summary.

sh /home/ciro/scripts/cellranger_wrappeR/run.sh -y /home/ciro/simon/scripts/cellranger_config.yaml -s

Results

\(\checkmark\) Demultiplexing donors/subjects

You will make use of the ab_capture scripts.

  1. Have the donor metadata if you want to include it in the single-cell metadata. Up yo you if you want it in the object or if you'll make use of it after clustering.

  2. Check the hashtag structure is correct (eg. donor~hashtag_n~hashtag_id corresponds to "DONOR1-TSC5-C0305"). This is derived from the feature names (indicated in the column "names" when you run Cell Ranger).

    sh /home/ciro/scripts/ab_capture/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_ab_capture.yaml -v
    Rscript /home/ciro/scripts/ab_capture/summary.R -c /home/ciro/large/simon/results/ab_demux/SiEs09_100th \
      --tag_str time_days~donor~hashtag_n~hashtag_id # d11-I-TSC1-C0301

    Jobs will be created and you'll need to wait a few minutes.

Results

\(\checkmark\) Quality control

We're using this script tailored to explore QC metrics.2

  1. Be mindful of the variables you have in the metadata.

It takes about 5 minutes with 90K cells with 20gb and 1 node/1 processor.

Rscript /home/ciro/scripts/quality_control/single_cell.R -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_qc.yaml --log TRUE

- QC thresholds

Violins

QC metrics

Metrics' distributions table

QC metrics
nFeature_RNA nCount_RNA percent.mt
qc_filters.low 500.00 645.00 0.03
qc_filters.high 4000.00 11000.00 7.50
Min. 501.00 645.00 0.03
0.1% 504.76 659.50 0.05
10% 857.70 1275.20 0.79
1st Qu. 1210.75 1923.00 1.11
Median 1907.50 3965.00 1.70
Mean 2017.91 5167.53 2.46
3rd Qu. 2575.50 6671.00 2.98
90% 3279.80 10046.70 5.01
99.2% 5516.11 27639.74 11.11
Max. 6376.00 40979.00 38.90

\(\checkmark\) Clustering analysis

You need to be extra careful with this analysis because it branches very easily. The main variables you want to tweak are:

  1. Percentage of variability explained by the highly variable features.
  2. Number of principal components.
  3. Resolution.

    sh /home/ciro/scripts/clustering/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_clustering.yaml
    sh /home/ciro/scripts/clustering/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_clustering_filtered.yaml

Two results are explored here.

- 1. Loose

Using loose quality thresholds and keeping the doublets. The thresholds are selected based on the global QC metrics distribution. Note that the lines in this link are corresponding to the second option (next).

Results / Variable tweaking.

- 2. Strict

These were carefully selected based on the Singlet's QC metrics. We are aiming to keep only the singlets and throw the negative ones if they're too few cells.

Results / Variable tweaking.

\(\checkmark\) Biology/data lock

After we explore the biology revealed by the clustering analysis, we select a cluster resolution that gives us the most interesting prospect for a story.

Variance percentage PC Resolution
15 20 0.4

- Highlighted Plots

Clusters

Colouring by cluster

Subjects

Colouring by Mice-days

\(\checkmark\) TCR analysis

This analysis was performed using Vicente's pipeline.

This part was run in an interactive job.

mkdir --parents /home/ciro/large/simon/results/tcr/SiEs09
Rscript /home/vfajardo/scripts/TCR_data_analysis/10X_preliminary_analyses/preliminary_TCR_data_analysis.1.5.4.R \
  --ReportsPath=/home/ciro/large/simon/results/tcr/SiEs09 \
  --TCRContigs=/home/ciro/large/simon/raw/NV037/vdj/SiEs09_Mo_tumor_FOXP3PCD4P_TCR/outs/filtered_contig_annotations.csv \
  --TCRClonotypes=/home/ciro/large/simon/raw/NV037/vdj/SiEs09_Mo_tumor_FOXP3PCD4P_TCR/outs/clonotypes.csv \
  --SeuratObj=/home/ciro/large/simon/results/clustering/SiEs09_tfr_xdoublt_xclust4/.object_stem_seurat_mean0.01_pct15.rds \
  --Tags "c('orig.time_days', 'orig.donor', 'orig.ht_id', 'RNA_snn_res.0.4', 'cluster_days')"

- Highlighted Plots

Clonotype sharing per cluster

Venn diagram

Venn diagram

Clusters

Days

Days per cluster

\(\checkmark\) Gene Set Analyses

Rscript /home/ciro/simon/scripts/tumor_tfr/SiEs09_signatures.R

- GSEA

Results

- Module Scoring

Briefly, the score is defined for each cell by subtracting the mean expression of an aggregate of control gene lists from the mean of the signature gene list. Control gene lists were randomly selected (same size as the signature list) from bins delimited based on the level of expression of the signature list.

Results

\(\Box\) DGE analysis

sh /home/ciro/simon/scripts/tumor_tfr/SiEs09_dgea.sh

This time we have 10x data, so MAST was used.

Comparisons
Group1 Group2 Column Name Block Filter
2 REST RNA_snn_res.0.4 none global none
d11 d18 orig.time_days none cluster2 none
d11 d18 orig.time_days none cluster2 list(c('RNA_snn_res.0.4', '2'))

Results

\(\Box\) Trajectory analysis

We're using Monocle 3 for this analysis. This also helps us preserve the UMAP from the Seurat analysis.

  • First, Using same HVG+UMAP:
Rscript /home/ciro/simon/scripts/tumor_tfr/SiEs09_trajectory.R &> /home/ciro/simon/scripts/tumor_tfr/SiEs09_trajectory.out.txt

Clusters

Results

  • Standard (_std) Monocle3 different PCs
library(monocle3)
npcs = 20
out_dir = paste0("/home/ciro/large/simon/results/trajectory/SiEs09_tfr_xdoublt_xclust4_", npcs, "PCs_std")
if(!dir.exists(out_dir)) dir.create(out_dir); setwd(out_dir)
mycells = readRDS("/home/ciro/large/simon/results/clustering/SiEs09_tfr_xdoublt_xclust4/.object_stem_seurat_mean0.01_pct15.rds")
gene_annotation <- mycells@assays$RNA@meta.features
gene_annotation <- cbind(gene_short_name = rownames(gene_annotation), gene_annotation)
cds <- new_cell_data_set(
  expression_data = mycells@assays$RNA@counts,
  cell_metadata = mycells@meta.data,
  gene_metadata = gene_annotation
)
cds <- preprocess_cds(cds, num_dim = npcs)
cds <- align_cds(cds, residual_model_formula_str = "~nCount_RNA+percent.mt")
cds <- reduce_dimension(cds)
pdf("clusters.pdf")
plot_cells(cds, label_groups_by_cluster=FALSE,  color_cells_by = "RNA_snn_res.0.4", cell_size=2)
dev.off()
cds <- cluster_cells(cds)
cds <- learn_graph(cds)
p <- plot_cells(cds, color_cells_by = "partition")
pdf("partition.pdf"); print(p); dev.off()
pdf("trajectory.pdf");
plot_cells(cds,
           color_cells_by = "RNA_snn_res.0.4",
           label_groups_by_cluster=FALSE,
           label_leaves=FALSE,
           label_branch_points=FALSE)
dev.off()

Clusters - PC20

Results

Conclusion: It's hard to pull them appart but I'd say that cluster 2 is an extreme (starting/ending) a branch.

Task/Figures Tracker

- Custom plots

Rscript /home/ciro/simon/scripts/tumor_tfr/SiEs09_figures.R

We can continue discussions and documenting tasks or [ready to publish] figure in our Google Drive tracker sheet



  1. Critical step; it takes a lot of time and memory. We should probably start checking kallisto-bustools.

  2. You can run this step in an interactive job.