Signac: Finding marker genes using Harmony algorithm

In this vignette, we will find marker genes for every cluster in the PBMC3k dataset. For convenience, we use the pre-processed R object from Seurat.v3 tutorial Guide clustering (https://satijalab.org/seurat/v3.0/pbmc3k_tutorial.html) to find markers for the annotated clusters. You can download the object here. This vignette requires Signac and Seurat.v3 has been installed. If you have not installed Signac, please install the latest version at https://github.com/bioturing/signac

suppressMessages(library(Signac))
suppressMessages(library(Seurat))
suppressMessages(require(dplyr))

First let take a look at the structure of the pre-processed object

pbmc <- readRDS("~/Downloads/pbmc3k_final.rds")
pbmc

## An object of class Seurat 
## 13714 features across 2638 samples within 1 assay 
## Active assay: RNA (13714 features)
##  2 dimensional reductions calculated: pca, umap

table(Idents(pbmc))

## 
##  Naive CD4 T Memory CD4 T   CD14+ Mono            B        CD8 T 
##          697          483          480          344          271 
## FCGR3A+ Mono           NK           DC           Mk 
##          162          155           32           14

Now we find marker genes for all clusters using the usual syntax in Seurat package

system.time(pbmc.markers <- HarmonyAllMarkers(pbmc, only.pos = TRUE, logfc.threshold = 0.25, verbose = F))

##    user  system elapsed 
##   3.983   0.314   4.298

pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = Log2.fold.change)

## # A tibble: 18 x 8
## # Groups:   cluster [9]
##    Gene.Name Log2.fold.change Log10.adjusted.… Log10.p.value cluster
##    <fct>                <dbl>            <dbl>         <dbl> <fct>  
##  1 CCR7                  1.33           -26.9         -28.9  Naive …
##  2 LDLRAP1               1.10            -7.78         -9.38 Naive …
##  3 LTB                   1.29           -77.4         -81.0  Memory…
##  4 AQP3                  1.24           -14.8         -16.9  Memory…
##  5 S100A9                5.57          -288.         -292.   CD14+ …
##  6 S100A8                5.48          -281.         -285.   CD14+ …
##  7 CD79A                 4.31          -210.         -213.   B      
##  8 TCL1A                 3.59           -97.9        -101.   B      
##  9 CCL5                  3.06          -140.         -144.   CD8 T  
## 10 GZMK                  2.95           -58.5         -61.8  CD8 T  
## 11 LST1                  3.09           -94.0         -98.2  FCGR3A…
## 12 FCGR3A                3.31           -82.8         -86.2  FCGR3A…
## 13 GZMB                  4.83          -103.         -107.   NK     
## 14 GNLY                  5.32          -103.         -107.   NK     
## 15 HLA-DPB1              2.87           -11.7         -15.3  DC     
## 16 FCER1A                3.87           -11.6         -15.2  DC     
## 17 PPBP                  8.58            -9.11        -13.2  Mk     
## 18 PF4                   7.24            -9.11        -13.2  Mk     
## # … with 3 more variables: Dissimilarity <dbl>, Bin.count <dbl>,
## #   Up.Down.score <dbl>

Compare with Seurat::FindAllMarkers results

system.time(pbmc.markers.seurat <- FindAllMarkers(pbmc, only.pos = TRUE, logfc.threshold = 0.25, verbose = F))

##    user  system elapsed 
## 109.916   1.383 111.310

pbmc.markers.seurat %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)

## # A tibble: 18 x 7
## # Groups:   cluster [9]
##        p_val avg_logFC pct.1 pct.2 p_val_adj cluster      gene    
##        <dbl>     <dbl> <dbl> <dbl>     <dbl> <fct>        <chr>   
##  1 1.61e- 82     0.922 0.436 0.11  2.20e- 78 Naive CD4 T  CCR7    
##  2 7.37e- 30     0.764 0.245 0.084 1.01e- 25 Naive CD4 T  LDLRAP1 
##  3 7.95e- 89     0.892 0.981 0.642 1.09e- 84 Memory CD4 T LTB     
##  4 1.85e- 60     0.859 0.422 0.11  2.54e- 56 Memory CD4 T AQP3    
##  5 0.            3.86  0.996 0.215 0.        CD14+ Mono   S100A9  
##  6 0.            3.80  0.975 0.121 0.        CD14+ Mono   S100A8  
##  7 0.            2.99  0.936 0.041 0.        B            CD79A   
##  8 9.48e-271     2.49  0.622 0.022 1.30e-266 B            TCL1A   
##  9 2.96e-189     2.12  0.985 0.24  4.06e-185 CD8 T        CCL5    
## 10 2.57e-158     2.05  0.587 0.059 3.52e-154 CD8 T        GZMK    
## 11 3.51e-184     2.30  0.975 0.134 4.82e-180 FCGR3A+ Mono FCGR3A  
## 12 2.03e-125     2.14  1     0.315 2.78e-121 FCGR3A+ Mono LST1    
## 13 7.95e-269     3.35  0.961 0.068 1.09e-264 NK           GZMB    
## 14 3.13e-191     3.69  0.961 0.131 4.30e-187 NK           GNLY    
## 15 1.48e-220     2.68  0.812 0.011 2.03e-216 DC           FCER1A  
## 16 1.67e- 21     1.99  1     0.513 2.28e- 17 DC           HLA-DPB1
## 17 7.73e-200     5.02  1     0.01  1.06e-195 Mk           PF4     
## 18 3.68e-110     5.94  1     0.024 5.05e-106 Mk           PPBP

Finally, we use DoHeatmap function from Seurat package to draw two heatmaps of expression of the marker genes found by two method: Seurat default and Harmony to see the distinct expression pattern of each cell type (cluster). We only plot top 20 features (all features if less than 20).

Heatmap of marker genes found by Harmony

top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = Log2.fold.change)
DoHeatmap(pbmc, features = as.character(top10$Gene.Name)) + NoLegend()

Heatmap of marker genes found by Seurat default method

top10 <- pbmc.markers.seurat %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(pbmc, features = as.character(top10$gene)) + NoLegend()