2 Vraag 2: Lion King

2.1 A

Pas de functie van vraag1 aan zodat je in iedere ensemble dataset kan zoeken naar geassocieerde filters en attributes. De functie heeft drie argumenten: 1. ensembl dataset 2. zoekpatroon (regex) voor attributes 3. zoekpatroon (regex) voor filters

Bijvoorbeeld: functie(“dataset”, ” gene_id” , ”human”)

find2 <- function(dataset, attributes_pattern, filter_pattern){
  mart <- useEnsembl(biomart = "genes", dataset)
  
  filters <- searchFilters(mart, filter_pattern) 
  attributes <- searchAttributes(mart, attributes_pattern)
  
  head(filters) %>% print()
  head(attributes) %>% print()
}

find2(dataset = "hsapiens_gene_ensembl", "gene_id", "human")
##                            name                             description
## 149 with_illumina_humanht_12_v3 With ILLUMINA HumanHT 12 V3 probe ID(s)
## 150 with_illumina_humanht_12_v4 With ILLUMINA HumanHT 12 V4 probe ID(s)
## 151 with_illumina_humanref_8_v3 With ILLUMINA HumanRef 8 V3 probe ID(s)
## 152  with_illumina_humanwg_6_v1  With ILLUMINA HumanWG 6 V1 probe ID(s)
## 153  with_illumina_humanwg_6_v2  With ILLUMINA HumanWG 6 V2 probe ID(s)
## 154  with_illumina_humanwg_6_v3  With ILLUMINA HumanWG 6 V3 probe ID(s)
##                        name                        description         page
## 1           ensembl_gene_id                     Gene stable ID feature_page
## 2   ensembl_gene_id_version             Gene stable ID version feature_page
## 82            entrezgene_id NCBI gene (formerly Entrezgene) ID feature_page
## 106             wikigene_id                        WikiGene ID feature_page
## 204         ensembl_gene_id                     Gene stable ID    structure
## 205 ensembl_gene_id_version             Gene stable ID version    structure

2.2 B

Voor iedere dataset zoek de volgende attribute en filter zoals aangegeven in de tabel. Zoek eerst de namen op van de ensembl dataset voor de aangegeven organismen.

Dataset Attribute Filter
Leeuw protein chromosome
Baboon protein chromosome
Olifant protein chromosome

Let op: Ga niet 3 keer de functie uitvoeren met de aangegeven argumenten. Gebruik een R functie die iteraties kan uitvoeren.

martALL <- useEnsembl("genes")
searchDatasets(martALL, "(L|l)ion") 
##               dataset            description   version
## 147 pleo_gene_ensembl Lion genes (PanLeo1.0) PanLeo1.0
searchDatasets(martALL, "(B|b)aboon") 
##                  dataset                   description  version
## 139 panubis_gene_ensembl Olive baboon genes (Panu_3.0) Panu_3.0
searchDatasets(martALL, "(E|e)lephant") 
##                   dataset                                      description
## 42    cmilii_gene_ensembl Elephant shark genes (Callorhinchus_milii-6.1.3)
## 85 lafricana_gene_ensembl                       Elephant genes (Loxafr3.0)
##                      version
## 42 Callorhinchus_milii-6.1.3
## 85                 Loxafr3.0
DS <- c("pleo_gene_ensembl", "panubis_gene_ensembl", "lafricana_gene_ensembl")

for (x in DS) {
  print(x)
  find2(x, "protein", "chromosome")
}
## [1] "pleo_gene_ensembl"
##              name              description
## 1 chromosome_name Chromosome/scaffold name
##                                                 name
## 30                                   peptide_version
## 41                                        protein_id
## 120                                  peptide_version
## 160                                  peptide_version
## 173              cabingdonii_homolog_ensembl_peptide
## 177 cabingdonii_homolog_canonical_transcript_protein
##                                                        description         page
## 30                                               Version (protein) feature_page
## 41                                                INSDC protein ID feature_page
## 120                                              Version (protein)    structure
## 160                                              Version (protein)     homologs
## 173 Abingdon island giant tortoise protein or transcript stable ID     homologs
## 177                                 Query protein or transcript ID     homologs
## [1] "panubis_gene_ensembl"
##              name              description
## 1 chromosome_name Chromosome/scaffold name
##                                                 name
## 30                                   peptide_version
## 44                                        protein_id
## 167                                  peptide_version
## 207                                  peptide_version
## 220              cabingdonii_homolog_ensembl_peptide
## 224 cabingdonii_homolog_canonical_transcript_protein
##                                                        description         page
## 30                                               Version (protein) feature_page
## 44                                                INSDC protein ID feature_page
## 167                                              Version (protein)    structure
## 207                                              Version (protein)     homologs
## 220 Abingdon island giant tortoise protein or transcript stable ID     homologs
## 224                                 Query protein or transcript ID     homologs
## [1] "lafricana_gene_ensembl"
##              name              description
## 1 chromosome_name Chromosome/scaffold name
##                                                 name
## 30                                   peptide_version
## 42                                        protein_id
## 132                                  peptide_version
## 172                                  peptide_version
## 185              cabingdonii_homolog_ensembl_peptide
## 189 cabingdonii_homolog_canonical_transcript_protein
##                                                        description         page
## 30                                               Version (protein) feature_page
## 42                                                INSDC protein ID feature_page
## 132                                              Version (protein)    structure
## 172                                              Version (protein)     homologs
## 185 Abingdon island giant tortoise protein or transcript stable ID     homologs
## 189                                 Query protein or transcript ID     homologs