RNAdegrDemo
2025-02-16
1 Data Description
1.1 Inputs
We need datasets such as genelist
, coverage pileup
, and sampleInfo
to obtain the sample quality index outputs and plots.
geneInfo
is optional if you want to compare results by the properties of genes. The description of Input data and variable names is listed as follows:
genelist
: a vector of gene namespileupPath
: a vector for file paths of coverage pileupData including .RData file namesgeneInfo
: a data frame of gene information including gene ID and properties based on gencode v36gene_id
: ensembl gene IDgeneSymbol
: gene namesmerged
: gene lengthexon.wtpct_gc
: weighted percentage of GC from exon level datasubcategory
: protein coding or lncRNA
sampleInfo
: a data frame of sample information including sample ID and properties from Picard RnaSeqMetricsSampleID
: sample IDPF_BASES
: the total number of bases within the PF_READS of the SAM or BAM file to be examinedPF_ALIGNED_BASES
: the total number of aligned bases, in all mapped PF reads, that are aligned to the reference sequenceRIBOSOMAL_BASES
: number of bases in primary alignments that align to ribosomal sequenceCODING_BASES
: number of bases in primary alignments that align to a non-UTR coding base for some gene, and not ribosomal sequenceUTR_BASES
: number of bases in primary alignments that align to a UTR base for some gene, and not a coding baseINTRONIC_BASES
: number of bases in primary alignments that align to an intronic base for some gene, and not a coding or UTR baseINTERGENIC_BASES
: number of bases in primary alignments that do not align to any geneRINs
: RIN value
TPM
: a data frame for TPM normalization with protein coding and lncRNA genes
1.2 Alliance
This example consists of 1,000 selected genes among protein coding and lncRNA genes and fresh frozen and total RNA-seq (FFT) 171 samples, which can be found in data. Among the samples, 156 are tumor types and the others are normal.
RNAdegrProjR/
data/
genelist.rda
geneInfo.rda
sampleInfo.rda
dataPrep/
SCISSOR_gaf.txt
pileup/
LINC01772_pileup_part_intron.RData
...
MIR133A1HG_pileup_part_intron.RData
TPM.rda
descr(sampleInfo %>% select(-c(SampleID)),
stats = c("min", "med", "max", "n.valid"),
transpose = TRUE,
headings = FALSE)
##
## Min Median Max N.Valid
## ---------------------- --------------- ---------------- ---------------- ---------
## CODING_BASES 360227386.00 3208673516.00 7201831541.00 171.00
## INTERGENIC_BASES 1273275706.00 2567603596.00 16945505453.00 171.00
## INTRONIC_BASES 434293079.00 5593128001.00 10076495286.00 171.00
## PF_ALIGNED_BASES 5448216119.00 14566862147.00 23291961710.00 171.00
## PF_BASES 6481274100.00 16315945500.00 26148083100.00 171.00
## RIBOSOMAL_BASES 0.00 150.00 6600.00 171.00
## RINs 1.10 5.30 9.20 167.00
## UTR_BASES 381863999.00 2783953295.00 4803719959.00 171.00