Chapter 18 Generate Custom ArchRGenome for Macaque Mmul10
18.1 Description
Generate a custom genome annotation for Macaque Mmul10 version. Based on the following data:
- Genome annotation:
- A BSgenome object which contains the sequence information for a genome
- Gene annotation:
- A TxDb object (transcript database) from Bioconductor which contains information for gene/transcript coordinates
- An OrgDb object (organism database) from Bioconductor which provides a unified framework to map between gene names and various gene identifiers
18.2 Create genome annotation
<-
genomeAnnotation createGenomeAnnotation(genome = "BSgenome.Mmulatta.UCSC.rheMac10")
18.3 Create gene annotation
<- makeTxDbFromEnsembl(organism = "Macaca mulatta")
txdb seqlevels(txdb) <- paste0("chr", seqlevels(txdb))
seqlevels(txdb) <- paste0("chr", c(seq(1,20), "X", "Y"))
<- createGeneAnnotation(TxDb = txdb,
geneAnnotation OrgDb = org.Mmu.eg.db)
18.4 Filter gene without symbol
<- grep("NA", geneAnnotation$genes$symbol)
loci <- geneAnnotation$genes$gene_id[-loci]
gid <- select(txdb, keys = gid, columns="TXNAME", keytype="GENEID")
df
<- geneAnnotation$genes[-loci]
genes <- geneAnnotation$exons[-grep("NA", geneAnnotation$exons$symbol)]
exons <- geneAnnotation$TSS[which(geneAnnotation$TSS$tx_name %in% df$TXNAME)]
tss
<- createGeneAnnotation(genes = genes,
geneAnnotationSubset exons = exons,
TSS = tss)
18.5 Create ArchR genome object
save(genomeAnnotation, geneAnnotationSubset, file = "data/ArchR/Macaca_mulatta_genomeAnnotation_geneAnnotationSubset.RData")