TFR project workflow

Description

In this part of the project, Murine data was utilised.

Date started: 2020, November 18
Date projected: 2020, December 20
Date finished:

We're using the single-cell toolkit pipelines developed at Vijay's lab.

\(\checkmark\) Demultiplexing libraries

You can either use mkfastq or bcl2fastq. In this case we did it with bcl2fastq.

Please make sure names are appropriate and consistent across all runs.

Appropriate: e. g.: 102_B2_CD4_STIM_8D, 103_B2_CD8_STIM; the first part may not be really necessary (if you allow enough information to make the library unique). and the second name is missing a section (8D).

Consistent: e. g.: B1_CD4_STIM, B2_STIM_CD4.; the cell type and stimulation status changed in the second batch.

Prepare IEM sample sheet [carefully choosing the names].

Results are un run NV037.

\(\checkmark\) Demultiplexing cells

Pre-processing data with Cell Ranger. We have a wrapper for this if you don't want to prepare a script for each of your samples.

Prepare feature reference and aggregation files.

Check you're taking the right reference genome.

sh /home/ciro/scripts/cellranger_wrappeR/run.sh -h
sh /home/ciro/scripts/cellranger_wrappeR/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_cellranger_NV037.yaml -v

Some jobs will be created and you will need to wait a long time.¹ Then you run the summary.

sh /home/ciro/scripts/cellranger_wrappeR/run.sh -y /home/ciro/simon/scripts/cellranger_config.yaml -s

Results

\(\checkmark\) Demultiplexing donors/subjects

You will make use of the ab_capture scripts.

Have the donor metadata if you want to include it in the single-cell metadata. Up yo you if you want it in the object or if you'll make use of it after clustering.

Check the hashtag structure is correct (eg. donor~hashtag_n~hashtag_id corresponds to "DONOR1-TSC5-C0305"). This is derived from the feature names (indicated in the column "names" when you run Cell Ranger).

sh /home/ciro/scripts/ab_capture/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_ab_capture.yaml -v
Rscript /home/ciro/scripts/ab_capture/summary.R -c /home/ciro/large/simon/results/ab_demux/SiEs09_100th \
  --tag_str time_days~donor~hashtag_n~hashtag_id # d11-I-TSC1-C0301

Jobs will be created and you'll need to wait a few minutes.

Results

\(\checkmark\) Quality control

We're using this script tailored to explore QC metrics.²

Be mindful of the variables you have in the metadata.

It takes about 5 minutes with 90K cells with 20gb and 1 node/1 processor.

Rscript /home/ciro/scripts/quality_control/single_cell.R -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_qc.yaml --log TRUE

- QC thresholds

Violins

QC metrics

Metrics' distributions table

QC metrics
	nFeature_RNA	nCount_RNA	percent.mt
qc_filters.low	500.00	645.00	0.03
qc_filters.high	4000.00	11000.00	7.50
Min.	501.00	645.00	0.03
0.1%	504.76	659.50	0.05
10%	857.70	1275.20	0.79
1st Qu.	1210.75	1923.00	1.11
Median	1907.50	3965.00	1.70
Mean	2017.91	5167.53	2.46
3rd Qu.	2575.50	6671.00	2.98
90%	3279.80	10046.70	5.01
99.2%	5516.11	27639.74	11.11
Max.	6376.00	40979.00	38.90

Results

\(\checkmark\) Clustering analysis

You need to be extra careful with this analysis because it branches very easily. The main variables you want to tweak are:

Percentage of variability explained by the highly variable features.
Number of principal components.

Resolution.

sh /home/ciro/scripts/clustering/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_clustering.yaml
sh /home/ciro/scripts/clustering/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_clustering_filtered.yaml

Two results are explored here.

- 1. Loose

Using loose quality thresholds and keeping the doublets. The thresholds are selected based on the global QC metrics distribution. Note that the lines in this link are corresponding to the second option (next).

Results / Variable tweaking.

- 2. Strict

These were carefully selected based on the Singlet's QC metrics. We are aiming to keep only the singlets and throw the negative ones if they're too few cells.

Results / Variable tweaking.

\(\checkmark\) Biology/data lock

After we explore the biology revealed by the clustering analysis, we select a cluster resolution that gives us the most interesting prospect for a story.

Variance percentage	PC	Resolution
15	20	0.4

- Highlighted Plots

Clusters

Colouring by cluster

Subjects

Colouring by Mice-days

Results

\(\checkmark\) TCR analysis

This analysis was performed using Vicente's pipeline.

This part was run in an interactive job.

mkdir --parents /home/ciro/large/simon/results/tcr/SiEs09
Rscript /home/vfajardo/scripts/TCR_data_analysis/10X_preliminary_analyses/preliminary_TCR_data_analysis.1.5.4.R \
  --ReportsPath=/home/ciro/large/simon/results/tcr/SiEs09 \
  --TCRContigs=/home/ciro/large/simon/raw/NV037/vdj/SiEs09_Mo_tumor_FOXP3PCD4P_TCR/outs/filtered_contig_annotations.csv \
  --TCRClonotypes=/home/ciro/large/simon/raw/NV037/vdj/SiEs09_Mo_tumor_FOXP3PCD4P_TCR/outs/clonotypes.csv \
  --SeuratObj=/home/ciro/large/simon/results/clustering/SiEs09_tfr_xdoublt_xclust4/.object_stem_seurat_mean0.01_pct15.rds \
  --Tags "c('orig.time_days', 'orig.donor', 'orig.ht_id', 'RNA_snn_res.0.4', 'cluster_days')"

- Highlighted Plots

Clusters

Days

Days per cluster

Results

\(\checkmark\) Gene Set Analyses

Rscript /home/ciro/simon/scripts/tumor_tfr/SiEs09_signatures.R

Libraries summary
List	Total	Used	Percentage
tcell_activation_GO.0042110	50	26	52%
cd4tcell_exhaustion_crawford	215	155	72.1%
tcell_costimulation_chen	21	13	61.9%
cell_anergy_safford	87	53	60.9%
cell_senescence_fridman	77	43	55.8%
tfh_signature_locci	280	160	57.1%
cell_cycling_best	251	74	29.5%
cell_cytotoxicity_patil	515	308	59.8%
treg_signature_simone	133	68	51.1%
tfr_signature_guo	34	25	73.5%
nontrm_signature_clarke	358	235	65.6%
trm_signature_clarke	338	217	64.2%
tcell_activation_flu	65	59	90.8%
cd8_activation_covid19	322	276	85.7%
tcr_signaling_RHSA202403	124	106	85.5%
tcell_interleukin_seumois	650	424	65.2%
treg_signature_schmiedle	350	95	27.1%
th2_signature0_arlehamn	285	122	42.8%
th1_signature1_arlehamn	168	57	33.9%
th17_signature2_arlehamndan	214	154	72%
type1n2_interferon_broadreact	297	196	66%
th2_il9_micosse	65	21	32.3%

- GSEA

Results

- Module Scoring

Briefly, the score is defined for each cell by subtracting the mean expression of an aggregate of control gene lists from the mean of the signature gene list. Control gene lists were randomly selected (same size as the signature list) from bins delimited based on the level of expression of the signature list.

Results

\(\Box\) DGE analysis

sh /home/ciro/simon/scripts/tumor_tfr/SiEs09_dgea.sh

This time we have 10x data, so MAST was used.

Comparisons
Group1	Group2	Column	Name	Block	Filter
2	REST	RNA_snn_res.0.4	none	global	none
d11	d18	orig.time_days	none	cluster2	none
d11	d18	orig.time_days	none	cluster2	list(c('RNA_snn_res.0.4', '2'))

Results

\(\Box\) Trajectory analysis

We're using Monocle 3 for this analysis. This also helps us preserve the UMAP from the Seurat analysis.

First, Using same HVG+UMAP:

Rscript /home/ciro/simon/scripts/tumor_tfr/SiEs09_trajectory.R &> /home/ciro/simon/scripts/tumor_tfr/SiEs09_trajectory.out.txt

Clusters

Results

Standard (_std) Monocle3 different PCs

library(monocle3)
npcs = 20
out_dir = paste0("/home/ciro/large/simon/results/trajectory/SiEs09_tfr_xdoublt_xclust4_", npcs, "PCs_std")
if(!dir.exists(out_dir)) dir.create(out_dir); setwd(out_dir)
mycells = readRDS("/home/ciro/large/simon/results/clustering/SiEs09_tfr_xdoublt_xclust4/.object_stem_seurat_mean0.01_pct15.rds")
gene_annotation <- mycells@assays$RNA@meta.features
gene_annotation <- cbind(gene_short_name = rownames(gene_annotation), gene_annotation)
cds <- new_cell_data_set(
  expression_data = mycells@assays$RNA@counts,
  cell_metadata = mycells@meta.data,
  gene_metadata = gene_annotation
)
cds <- preprocess_cds(cds, num_dim = npcs)
cds <- align_cds(cds, residual_model_formula_str = "~nCount_RNA+percent.mt")
cds <- reduce_dimension(cds)
pdf("clusters.pdf")
plot_cells(cds, label_groups_by_cluster=FALSE,  color_cells_by = "RNA_snn_res.0.4", cell_size=2)
dev.off()
cds <- cluster_cells(cds)
cds <- learn_graph(cds)
p <- plot_cells(cds, color_cells_by = "partition")
pdf("partition.pdf"); print(p); dev.off()
pdf("trajectory.pdf");
plot_cells(cds,
           color_cells_by = "RNA_snn_res.0.4",
           label_groups_by_cluster=FALSE,
           label_leaves=FALSE,
           label_branch_points=FALSE)
dev.off()

Clusters - PC20

Results

Conclusion: It's hard to pull them appart but I'd say that cluster 2 is an extreme (starting/ending) a branch.

Task/Figures Tracker

- Custom plots

Rscript /home/ciro/simon/scripts/tumor_tfr/SiEs09_figures.R

We can continue discussions and documenting tasks or [ready to publish] figure in our Google Drive tracker sheet

TFR project workflow

Ciro Ramirez Suastegui

Last update: 2021, January 05, at 07:45

Description

\(\checkmark\) Demultiplexing libraries

\(\checkmark\) Demultiplexing cells

\(\checkmark\) Demultiplexing donors/subjects

\(\checkmark\) Quality control

- QC thresholds

Violins

Metrics' distributions table

\(\checkmark\) Clustering analysis

- 1. Loose

- 2. Strict

\(\checkmark\) Biology/data lock

- Highlighted Plots

Clusters

Subjects

\(\checkmark\) TCR analysis

- Highlighted Plots

Clusters

Days

Days per cluster

\(\checkmark\) Gene Set Analyses

- GSEA

- Module Scoring

\(\Box\) DGE analysis

\(\Box\) Trajectory analysis

Task/Figures Tracker

- Custom plots

TFR project workflow

Ciro Ramirez Suastegui

Last update: 2021, January 05, at 07:45

Description

\(\checkmark\) Demultiplexing libraries

\(\checkmark\) Demultiplexing cells

\(\checkmark\) Demultiplexing donors/subjects

\(\checkmark\) Quality control

- QC thresholds

Violins

Metrics' distributions table

\(\checkmark\) Clustering analysis

- 1. Loose

- 2. Strict

\(\checkmark\) Biology/data lock

- Highlighted Plots

Clusters

Subjects

\(\checkmark\) TCR analysis

- Highlighted Plots

Clonotype sharing per cluster

Clusters

Days

Days per cluster

\(\checkmark\) Gene Set Analyses

- GSEA

- Module Scoring

\(\Box\) DGE analysis

\(\Box\) Trajectory analysis

Task/Figures Tracker

- Custom plots