FAIR data in Aspergillus research

The FAIR datastation? A solution for Aspergillus research?

Sibbe Bakker

To start out with a question

Where do you look up or share (research) data?

there is a poll here

What is the ASPAR_KR project again?

A project needed to…

  • Standardising research data.

  • Improve datasharing.

Aims and scope of ASPAR_KR

Introducing an existing standard.

We don’t want to do this [1]

Easy to understand and use.

Findings so far…

What standards are there?

  • No standards are fully ready yet.

  • Standards are impossible to develop without cooperation.

  • The FAIRDS [2] platform may be useful here.

FAIRDS, what is it for?

Standardisation of omics data.

Users register their study, and make it FAIR from the start.

  • For standardised data management.

  • Automated analysis pipelines.

Made by the friendly folks at the WUR’s synthetic system biology:

Jasper Koehorst

Bart Nijsen1

Peter J Schaap

FAIRDS, how does it work?

  • Experimental design is part of the dataset.

  • Minimum information standards are used.

  • New templates can be introduced.

  • You make excel templates.
  • Fill these in with your data.
  • Upload them, FAIRDS makes RDF.

Investigation class

  • Title of the Investigation
  • Study authors.
  • Publication details.
  • Abstract.

Study class, sub question of Investigation.

  • Title of the study.
  • Aim and description.

Observation class, what a study observed.

  • Name and description
  • Observation level factors.

Sample class, what a study observed.

  • Name and description
  • Sample level factors.
  • Sample specific metadata

Assay class, what was measured.

  • Name and description
  • Assay specific metadata

A demonstration

That’s all well and good, but how does it work?

Data entry using a template.

Imagine the following situation.

  • A researcher takes air samples in Arhnem and Nijmegen.
  • He wants to know if the resistance fraction is higher in Arnhem or Nijmegen.
  • He uses Hylke et al [3] method of air sampling with the delta traps.

The delta trap method – Image by Bo Briggeman

  • Per city, 4 locations are sampled.

  • Using the two layer culture…

    • Strips are grown on Flamingo agar…
    • And Flamingo agar with ITR.
  • Lets see how to enter these things in ASPAR_KR.

Map data from the open streetmap project [4].

The data set to be FAIRified.

Our FAIRification programme.

The FAIR data, how can we use this?

Analysis of FAIR data.

  • Great! We’ve done it, we made FAIR data.
  • How do we analyse it?

We need an

To explain linked data concepts

  • RDF files are plain text of various formats.

  • The basis is the triple.

An example of an RDF statement.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<https://example.com/data#statement> rdf:type "triple" .

This…

@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
<#green-goblin>
    rel:enemyOf <#spiderman> ;
    a foaf:Person ;    # in the context of the Marvel universe
    foaf:name "Green Goblin" .
<#spiderman>
    rel:enemyOf <#green-goblin> ;
    a foaf:Person ;
    foaf:name "Spiderman", "Человек-паук"@ru .

Turns into this …

  • SPARQL is a programming language for analysis of RDF.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
PREFIX ex: <http//example.org/>
SELECT ?person ?enemy
WHERE
  {
    
    ?person a foaf:Person ;
      foaf:name ?name ;
      rel:enemyOf ?enemy .
    FILTER(STRSTARTS(?name, "Green")) .
  }

should return: “Green goblin” ex:spiderman

Analysis of RDF data.

Your Image

Using RDFlib [5, 6]:

# read in the RDF file.
rdf <- rdflib::rdf_parse("hylke_air_method_example/data.ttl",
                         format = "turtle")
rdf
Total of 425 triples, stored in hashes
-------------------------------
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenStation2> <http://schema.org/description> "A two layer culture made from the delta trap taken from the station square in Nijmegen" .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenStation2> <http://fairbydesign.nl/ontology/biosafety_level> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen> <http://schema.org/identifier> "arnhemVsNijmegen" .
<http://fairbydesign.nl/ontology/selection_medium> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenPark1> <http://fairbydesign.nl/ontology/antibiotics> "CHEMBL1835949" .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_arnhemAirSamples/sam_CultureArnhemStation2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://jermontology.org/ontology/JERMOntology#Sample> .
<http://fairbydesign.nl/ontology/biosafety_level> <http://schema.org/valueRequired> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenStation2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://jermontology.org/ontology/JERMOntology#Sample> .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_nijmegenAirSamples/sam_CultureNijmegenStation2> <http://fairbydesign.nl/ontology/medium_type> "flamingo medium" .
<http://fairbydesign.nl/ontology/inv_arnhemVsNijmegenComparison/stu_arnhemVsNijmegen/obs_arnhemAirSamples/sam_arnhem2> <http://fairbydesign.nl/ontology/packageName> "DeltaTrap" .

... with 415 more triples
sparql_query <- 
  '
  prefix ppeo:     <http://purl.org/ppeo/PPEO.owl#> 
  prefix jerm:     <http://jermontology.org/ontology/JERMOntology#> 
  prefix fair:     <http://fairbydesign.nl/ontology/> 
  prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> 
  prefix schema:   <http://schema.org/>
  SELECT 
  ?observation_label ?sample_label 
  ?total_cfu
  ?selection_cfu
  WHERE {
    # Get the samples of interest
    ?observation_unit a ppeo:observation_unit .
    ?observation_unit jerm:hasPart ?parts .
    ?parts a jerm:Sample .
    ?parts fair:packageName "DeltaTrap" .
    ?parts fair:derives ?cultures .
    ?observation_unit schema:name ?observation_label .
    ?parts schema:name ?sample_label .

    # Experimental data
    ?cultures fair:total_cfu ?total_cfu .
    ?cultures fair:selection_cfu ?selection_cfu .
  }
  '
result <- rdflib::rdf_query(rdf, sparql_query)
result
# A tibble: 8 × 4
  observation_label    sample_label             total_cfu selection_cfu
  <chr>                <chr>                        <dbl>         <dbl>
1 The city of Nijmegen Nijmegen Station plein 2        57            21
2 The city of Nijmegen Nijmegen Station plein 1        53            24
3 The city of Nijmegen Kronenburger park 2             61            26
4 The city of Nijmegen Kronenburger park 1             70            30
5 The city of Arnhem   Arnhem Centraal 2               66            28
6 The city of Arnhem   Arnhem Centraal 1               55            18
7 The city of Arnhem   Sonsbeek park 2                 51            23
8 The city of Arnhem   Sonsbeek park 1                 52            20
# Plot the result.
result |> 
  dplyr::mutate(resistance_fraction = selection_cfu / total_cfu) |> 
  ggplot2::ggplot(ggplot2::aes(x = observation_label, y = resistance_fraction)) +
  ggplot2::labs(x = "City", y = "Resistance fraction") +
  ggplot2::geom_boxplot()

How do we go forward?

  • We need feedback on the FAIRDS.
  • We need community engagement.

And…

Questions?

If you need more information?

Check out the git page.

Check out the documentation.

Thanks for

  • Martin Weichert, Bart Fraaije & Johanna Rhodes.
    Providing feedback on the prototype of ASPAR_KR.
  • Jasper Koehorst and Bart Nijsen
    Helping me contribute to FAIRDS.
  • Mariana and Anna
    Supervision during my stay at the genetics department.
  • Murambia Nyati.
    For providing comments on the presentation.

Additonal slides

Extra slides for extra questions

Examples of ASPAR_KR alternatives

alternative + -
seek4science * Supports ISA
* Sharing of templates online.
* A bit more complicated to contribute to.
* Not available locally via excel sheets.
FAIRshare * Locally available
* Limited in scope to genomics or immunology.
* Not clear how to expand it.

References

Used literature

1.
Standards. https://xkcd.com/927/. Accessed 26 Oct 2023
2.
Nijsse B, Schaap PJ, Koehorst JJ (2023) FAIR data station for lightweight metadata management and validation of omics studies. GigaScience 12:giad014. https://doi.org/10.1093/gigascience/giad014
3.
4.
OpenStreetMap contributors (2017) Planet dump retrieved from https://planet.osm.org
5.
Boettiger C, Mecum B, Krystalli A, Senderov V (2023) Rdflib: Tools to Manipulate and Query Semantic Data
6.
Jones MB, Slaughter P, Ooms J, et al (2023) Redland: RDF Library Bindings in R

Footnotes

  1. Does not want to share his likeness