In this vignette, we will find marker genes for every cluster in the PBMC3k dataset. For convenience, we use the pre-processed R object from Seurat.v3 tutorial Guide clustering (https://satijalab.org/seurat/v3.0/pbmc3k_tutorial.html) to find markers for the annotated clusters. You can download the object here. This vignette requires Signac and Seurat.v3 has been installed. If you have not installed Signac, please install the latest version at https://github.com/bioturing/signac
suppressMessages(library(Signac))
suppressMessages(library(Seurat))
suppressMessages(require(dplyr))
First let take a look at the structure of the pre-processed object
pbmc <- readRDS("~/Downloads/pbmc3k_final.rds")
pbmc
## An object of class Seurat
## 13714 features across 2638 samples within 1 assay
## Active assay: RNA (13714 features)
## 2 dimensional reductions calculated: pca, umap
table(Idents(pbmc))
##
## Naive CD4 T Memory CD4 T CD14+ Mono B CD8 T
## 697 483 480 344 271
## FCGR3A+ Mono NK DC Mk
## 162 155 32 14
Now we find marker genes for all clusters using the usual syntax in Seurat package
system.time(pbmc.markers <- HarmonyAllMarkers(pbmc, only.pos = TRUE, logfc.threshold = 0.25, verbose = F))
## user system elapsed
## 3.983 0.314 4.298
pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = Log2.fold.change)
## # A tibble: 18 x 8
## # Groups: cluster [9]
## Gene.Name Log2.fold.change Log10.adjusted.… Log10.p.value cluster
## <fct> <dbl> <dbl> <dbl> <fct>
## 1 CCR7 1.33 -26.9 -28.9 Naive …
## 2 LDLRAP1 1.10 -7.78 -9.38 Naive …
## 3 LTB 1.29 -77.4 -81.0 Memory…
## 4 AQP3 1.24 -14.8 -16.9 Memory…
## 5 S100A9 5.57 -288. -292. CD14+ …
## 6 S100A8 5.48 -281. -285. CD14+ …
## 7 CD79A 4.31 -210. -213. B
## 8 TCL1A 3.59 -97.9 -101. B
## 9 CCL5 3.06 -140. -144. CD8 T
## 10 GZMK 2.95 -58.5 -61.8 CD8 T
## 11 LST1 3.09 -94.0 -98.2 FCGR3A…
## 12 FCGR3A 3.31 -82.8 -86.2 FCGR3A…
## 13 GZMB 4.83 -103. -107. NK
## 14 GNLY 5.32 -103. -107. NK
## 15 HLA-DPB1 2.87 -11.7 -15.3 DC
## 16 FCER1A 3.87 -11.6 -15.2 DC
## 17 PPBP 8.58 -9.11 -13.2 Mk
## 18 PF4 7.24 -9.11 -13.2 Mk
## # … with 3 more variables: Dissimilarity <dbl>, Bin.count <dbl>,
## # Up.Down.score <dbl>
Compare with Seurat::FindAllMarkers
results
system.time(pbmc.markers.seurat <- FindAllMarkers(pbmc, only.pos = TRUE, logfc.threshold = 0.25, verbose = F))
## user system elapsed
## 109.916 1.383 111.310
pbmc.markers.seurat %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
## # A tibble: 18 x 7
## # Groups: cluster [9]
## p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
## <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
## 1 1.61e- 82 0.922 0.436 0.11 2.20e- 78 Naive CD4 T CCR7
## 2 7.37e- 30 0.764 0.245 0.084 1.01e- 25 Naive CD4 T LDLRAP1
## 3 7.95e- 89 0.892 0.981 0.642 1.09e- 84 Memory CD4 T LTB
## 4 1.85e- 60 0.859 0.422 0.11 2.54e- 56 Memory CD4 T AQP3
## 5 0. 3.86 0.996 0.215 0. CD14+ Mono S100A9
## 6 0. 3.80 0.975 0.121 0. CD14+ Mono S100A8
## 7 0. 2.99 0.936 0.041 0. B CD79A
## 8 9.48e-271 2.49 0.622 0.022 1.30e-266 B TCL1A
## 9 2.96e-189 2.12 0.985 0.24 4.06e-185 CD8 T CCL5
## 10 2.57e-158 2.05 0.587 0.059 3.52e-154 CD8 T GZMK
## 11 3.51e-184 2.30 0.975 0.134 4.82e-180 FCGR3A+ Mono FCGR3A
## 12 2.03e-125 2.14 1 0.315 2.78e-121 FCGR3A+ Mono LST1
## 13 7.95e-269 3.35 0.961 0.068 1.09e-264 NK GZMB
## 14 3.13e-191 3.69 0.961 0.131 4.30e-187 NK GNLY
## 15 1.48e-220 2.68 0.812 0.011 2.03e-216 DC FCER1A
## 16 1.67e- 21 1.99 1 0.513 2.28e- 17 DC HLA-DPB1
## 17 7.73e-200 5.02 1 0.01 1.06e-195 Mk PF4
## 18 3.68e-110 5.94 1 0.024 5.05e-106 Mk PPBP
Finally, we use DoHeatmap
function from Seurat package to draw two heatmaps of expression of the marker genes found by two method: Seurat default and Harmony to see the distinct expression pattern of each cell type (cluster). We only plot top 20 features (all features if less than 20).
top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = Log2.fold.change)
DoHeatmap(pbmc, features = as.character(top10$Gene.Name)) + NoLegend()
top10 <- pbmc.markers.seurat %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(pbmc, features = as.character(top10$gene)) + NoLegend()