5 Sample Quality

5.1 Sample quality index

A mean coverage depth (MCD) and a window coefficient of variation (wCV) can be calculated from the get_MCD and get_wCV functions. To adjust the effect of low base coverage to CV, we consider wCV within a restricted range of MCD. Both functions return the same dimension of a matrix, which is the number of genes \(\times\) the number of samples.

MCD.mat = get_MCD(genelist, pileupPath, sampleInfo)
wCV.mat = get_wCV(genelist, pileupPath, sampleInfo, rnum=100, method=1, winSize=20, egPct=10)

The get_SQI function gives the AUC of the fitted lines in the regression of wCV and log transformed MCD, and normalizes it using projection depth (PD)3. Finally, we can determine the quality of the samples by defining them as Bad if PD>3. Nineteen out of 171 samples are classified as bad quality samples. The SQI plot shows the distribution of AUC and PD and the shape of the regression by sample quality.

result = get_SQI(MCD=MCD.mat, wCV=wCV.mat, rstPct=20, obsPct=50)
auc.vec <- result$auc.vec
table(auc.vec$SQI)
## 
##  Bad Good 
##   19  152
plot_SQI(SQIresult=result)

5.2 Updated gene body coverage

The gene body coverage plots in gene body coverage chapter can be updated after removing bad samples using the plot_GBCg function. The coverage patterns become much more stable, especially in the long genes.

GBCg0 = plot_GBCg(stat=2, plot=TRUE, sampleInfo, GBCresult=GBC0, auc.vec=result$auc.vec)
GBCg5 = plot_GBCg(stat=2, plot=TRUE, sampleInfo, GBCresult=GBC5, auc.vec=result$auc.vec)

pg0 <- GBCg0$plotRI +
  coord_cartesian(ylim=c(0, 0.017)) +
  ggtitle("0~5 kb")
pg5 <- GBCg5$plotRI +
  coord_cartesian(ylim=c(0, 0.017)) +
  ggtitle("5+ kb")

ggpubr::ggarrange(pg0, pg5, common.legend=TRUE, legend="bottom", nrow=1)

This function provides a continuous legend option to select such as ratio intron or PD for the line color.

pg0 <- GBCg0$plotPD +
  coord_cartesian(ylim=c(0, 0.017)) +
  ggtitle("0~5 kb")
pg5 <- GBCg5$plotPD +
  coord_cartesian(ylim=c(0, 0.017)) +
  ggtitle("5+ kb")

ggpubr::ggarrange(pg0, pg5, common.legend=TRUE, legend="bottom", nrow=1)


  1. Choi, H. (2018). Scissor for finding outliers in RNA-seq. https://doi.org/10.17615/dv2e-7a29↩︎