In this part of the project, Murine data was utilised.
We're using the single-cell toolkit pipelines developed at Vijay's lab.
You can either use mkfastq or bcl2fastq. In this case we did it with bcl2fastq
.
Appropriate: e. g.: 102_B2_CD4_STIM_8D, 103_B2_CD8_STIM; the first part may not be really necessary (if you allow enough information to make the library unique). and the second name is missing a section (8D).
Consistent: e. g.: B1_CD4_STIM, B2_STIM_CD4.; the cell type and stimulation status changed in the second batch.
Results are un run NV037.
Pre-processing data with Cell Ranger. We have a wrapper for this if you don't want to prepare a script for each of your samples.
Check you're taking the right reference genome.
sh /home/ciro/scripts/cellranger_wrappeR/run.sh -h
sh /home/ciro/scripts/cellranger_wrappeR/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_cellranger_NV037.yaml -v
Some jobs will be created and you will need to wait a long time.1 Then you run the summary.
sh /home/ciro/scripts/cellranger_wrappeR/run.sh -y /home/ciro/simon/scripts/cellranger_config.yaml -s
You will make use of the ab_capture scripts.
Have the donor metadata if you want to include it in the single-cell metadata. Up yo you if you want it in the object or if you'll make use of it after clustering.
Check the hashtag structure is correct (eg. donor~hashtag_n~hashtag_id corresponds to "DONOR1-TSC5-C0305"). This is derived from the feature names (indicated in the column "names" when you run Cell Ranger).
sh /home/ciro/scripts/ab_capture/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_ab_capture.yaml -v
Rscript /home/ciro/scripts/ab_capture/summary.R -c /home/ciro/large/simon/results/ab_demux/SiEs09_100th \
--tag_str time_days~donor~hashtag_n~hashtag_id # d11-I-TSC1-C0301
Jobs will be created and you'll need to wait a few minutes.
We're using this script tailored to explore QC metrics.2
It takes about 5 minutes with 90K cells with 20gb and 1 node/1 processor.
Rscript /home/ciro/scripts/quality_control/single_cell.R -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_qc.yaml --log TRUE
QC metrics
nFeature_RNA | nCount_RNA | percent.mt | |
---|---|---|---|
qc_filters.low | 500.00 | 645.00 | 0.03 |
qc_filters.high | 4000.00 | 11000.00 | 7.50 |
Min. | 501.00 | 645.00 | 0.03 |
0.1% | 504.76 | 659.50 | 0.05 |
10% | 857.70 | 1275.20 | 0.79 |
1st Qu. | 1210.75 | 1923.00 | 1.11 |
Median | 1907.50 | 3965.00 | 1.70 |
Mean | 2017.91 | 5167.53 | 2.46 |
3rd Qu. | 2575.50 | 6671.00 | 2.98 |
90% | 3279.80 | 10046.70 | 5.01 |
99.2% | 5516.11 | 27639.74 | 11.11 |
Max. | 6376.00 | 40979.00 | 38.90 |
You need to be extra careful with this analysis because it branches very easily. The main variables you want to tweak are:
Resolution.
sh /home/ciro/scripts/clustering/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_clustering.yaml
sh /home/ciro/scripts/clustering/run.sh -y /home/ciro/simon/scripts/tumor_tfr/SiEs09_clustering_filtered.yaml
Two results are explored here.
Using loose quality thresholds and keeping the doublets. The thresholds are selected based on the global QC metrics distribution. Note that the lines in this link are corresponding to the second option (next).
These were carefully selected based on the Singlet's QC metrics. We are aiming to keep only the singlets and throw the negative ones if they're too few cells.
After we explore the biology revealed by the clustering analysis, we select a cluster resolution that gives us the most interesting prospect for a story.
Variance percentage | PC | Resolution |
---|---|---|
15 | 20 | 0.4 |
Colouring by cluster
Colouring by Mice-days
This analysis was performed using Vicente's pipeline.
This part was run in an interactive job.
mkdir --parents /home/ciro/large/simon/results/tcr/SiEs09
Rscript /home/vfajardo/scripts/TCR_data_analysis/10X_preliminary_analyses/preliminary_TCR_data_analysis.1.5.4.R \
--ReportsPath=/home/ciro/large/simon/results/tcr/SiEs09 \
--TCRContigs=/home/ciro/large/simon/raw/NV037/vdj/SiEs09_Mo_tumor_FOXP3PCD4P_TCR/outs/filtered_contig_annotations.csv \
--TCRClonotypes=/home/ciro/large/simon/raw/NV037/vdj/SiEs09_Mo_tumor_FOXP3PCD4P_TCR/outs/clonotypes.csv \
--SeuratObj=/home/ciro/large/simon/results/clustering/SiEs09_tfr_xdoublt_xclust4/.object_stem_seurat_mean0.01_pct15.rds \
--Tags "c('orig.time_days', 'orig.donor', 'orig.ht_id', 'RNA_snn_res.0.4', 'cluster_days')"
Venn diagram
Rscript /home/ciro/simon/scripts/tumor_tfr/SiEs09_signatures.R
Briefly, the score is defined for each cell by subtracting the mean expression of an aggregate of control gene lists from the mean of the signature gene list. Control gene lists were randomly selected (same size as the signature list) from bins delimited based on the level of expression of the signature list.
sh /home/ciro/simon/scripts/tumor_tfr/SiEs09_dgea.sh
This time we have 10x data, so MAST was used.
Group1 | Group2 | Column | Name | Block | Filter |
---|---|---|---|---|---|
2 | REST | RNA_snn_res.0.4 | none | global | none |
d11 | d18 | orig.time_days | none | cluster2 | none |
d11 | d18 | orig.time_days | none | cluster2 | list(c('RNA_snn_res.0.4', '2')) |
We're using Monocle 3 for this analysis. This also helps us preserve the UMAP from the Seurat analysis.
Rscript /home/ciro/simon/scripts/tumor_tfr/SiEs09_trajectory.R &> /home/ciro/simon/scripts/tumor_tfr/SiEs09_trajectory.out.txt
Clusters
library(monocle3)
npcs = 20
out_dir = paste0("/home/ciro/large/simon/results/trajectory/SiEs09_tfr_xdoublt_xclust4_", npcs, "PCs_std")
if(!dir.exists(out_dir)) dir.create(out_dir); setwd(out_dir)
mycells = readRDS("/home/ciro/large/simon/results/clustering/SiEs09_tfr_xdoublt_xclust4/.object_stem_seurat_mean0.01_pct15.rds")
gene_annotation <- mycells@assays$RNA@meta.features
gene_annotation <- cbind(gene_short_name = rownames(gene_annotation), gene_annotation)
cds <- new_cell_data_set(
expression_data = mycells@assays$RNA@counts,
cell_metadata = mycells@meta.data,
gene_metadata = gene_annotation
)
cds <- preprocess_cds(cds, num_dim = npcs)
cds <- align_cds(cds, residual_model_formula_str = "~nCount_RNA+percent.mt")
cds <- reduce_dimension(cds)
pdf("clusters.pdf")
plot_cells(cds, label_groups_by_cluster=FALSE, color_cells_by = "RNA_snn_res.0.4", cell_size=2)
dev.off()
cds <- cluster_cells(cds)
cds <- learn_graph(cds)
p <- plot_cells(cds, color_cells_by = "partition")
pdf("partition.pdf"); print(p); dev.off()
pdf("trajectory.pdf");
plot_cells(cds,
color_cells_by = "RNA_snn_res.0.4",
label_groups_by_cluster=FALSE,
label_leaves=FALSE,
label_branch_points=FALSE)
dev.off()
Clusters - PC20
Conclusion: It's hard to pull them appart but I'd say that cluster 2 is an extreme (starting/ending) a branch.
Rscript /home/ciro/simon/scripts/tumor_tfr/SiEs09_figures.R
We can continue discussions and documenting tasks or [ready to publish] figure in our Google Drive tracker sheet