FAIR data in Aspergillus research

How do we make data more useful?

Sibbe Bakker
Mariana Santos Couto Silva
Anna Fensel

Why?

Aspergillus fumigatus–an opportunist fungal pathogen.

  • A. fumigatus is a growing health concern [1].

  • A lot of research is being conducted.

    • Environmental spread.
    • Molecular biology.
  • There are some un-answered questions in this domain.

    • What mechanisms of azole resistance do environmental isolates have?
    • How does this spread in through the A fumigatus population?

A. fumigatus spreads quickly through its environment [2]

How is this research shared?

What ways of sharing are there?

Types of datasharing by [3].

How is the data actually shared?1

What are these data usefull for?

  • A single study may be understood better using this.
  • Using the data from the same study for a different question?
  • Combining specific results?
  • Finding specific results?

Can we face the open questions in the field?

What do we need to make it useful?

FAIR principles were devised by [7].

What is already available?

  • Disconnected databases.

  • Deposition of new data not easy.

  • No FAIR system.

NCBI [8]—Meta data not always standard.

Afrum database [9] upload not possible.

FunResDB [10]—Data upload not possible.

What is there to do?

What can we do about this?

  • Come together as a field.

  • Make and adapt standards as a community.

  • Be vocal about your wishes.

What is my role in this?

  • Facilitate working together.

  • Educate about standards.

  • Introduce the standards that the community wants.

Solution?

The FAIRDS project [11]

  • A standardised meta data solution.
  • Excel templates.
  • RDF output.
  • Written by members of SSB.

  • Will be maintained after the ASPAR_KR project is done.

A quick demo for making FAIR data!

Experimental design is specified in the data format.

Classes available within the ASPAR_KR database. Each class `owns’ lower level classess. For example, a sample has associated assays.

Example of data analysis

Analysis in R.
sparql_query <- 
  ' # Select the first 5 of everything...
  prefix ppeo: <http://purl.org/ppeo/PPEO.owl#> 
  prefix jm: <http://jermontology.org/ontology/JERMOntology#> 
  prefix fair: <http://fairbydesign.nl/ontology/> 
  prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  prefix schema: <http://schema.org/>
  SELECT ?observation_label ?sample_label 
    (COUNT(?sample_label) as ?n) ?total_cfu ?selection_cfu
  WHERE {
    # Get the samples of interest
    ?observation_unit a ppeo:observation_unit .
    ?observation_unit jm:hasPart ?parts .
    ?parts a jm:Sample .
    ?parts fair:packageName "TwoLayerCulture" .
    ?observation_unit schema:name ?observation_label .
    ?parts schema:name ?sample_label .
    
    # Experimental data
    ?parts fair:total_cfu ?total_cfu .
    ?parts fair:selection_cfu ?selection_cfu .
  } GROUP BY ?observation_unit
  '
result <- rdflib::rdf_query(rdf, sparql_query)
result
# A tibble: 2 × 5
  observation_label    sample_label                    n total_cfu selection_cfu
  <chr>                <chr>                       <dbl>     <dbl>         <dbl>
1 The city of Arnhem   Arnhem Centraal culture         4        66            28
2 The city of Nijmegen Nijmegen stationsplein cul…     4        57            21
result |> 
  dplyr::mutate(resistance_fraction = selection_cfu / total_cfu) |> 
  ggplot2::ggplot(ggplot2::aes(x = observation_label, y = resistance_fraction)) +
  ggplot2::geom_bar(stat="identity")

Linked data potential.

Linked data can be used to easily use information collected by others.

Lessons so far…

FAIR data is important for A. fumigatus research.

  • Without it, an overview of the field is hard to obtain.

  • Community awareness needs to increase.

  • Existing standards (FAIRDS) must be adapted.

Now lets hear what you think!

Where will you publish your data?

Questions?

If you need more information?

Check out the git page.

Thanks for

  • Jasper Koehorst and Bart Nijsen
    Developing FAIRDS.
  • Mariana and Anna
    Supervision.
  • Members of the genetics chair group
    For the good discussions.

Additonal slides

Extra slides for extra questions

How does RDF work?

  • RDF files are plain text of various formats.

  • The basis is the triple.

An example of an RDF statement.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<https://example.com/data#statement> rdf:type "triple" .

This…

@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
<#green-goblin>
    rel:enemyOf <#spiderman> ;
    a foaf:Person ;    # in the context of the Marvel universe
    foaf:name "Green Goblin" .
<#spiderman>
    rel:enemyOf <#green-goblin> ;
    a foaf:Person ;
    foaf:name "Spiderman", "Человек-паук"@ru .

Turns into this …

How does SPARQL work?

  • SPARQL is a programming language for analysis of RDF.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
PREFIX ex: <http//example.org/>
SELECT ?person ?enemy
WHERE
  {
    
    ?person a foaf:Person ;
      foaf:name ?name ;
      rel:enemyOf ?enemy .
    FILTER(STRSTARTS(?name, "Green")) .
  }

should return: “Green goblin” ex:spiderman

Examples of ASPAR_KR alternatives

alternative + -
seek4science * Supports ISA
* Sharing of templates online.
* A bit more complicated to contribute to.
* Not available locally via excel sheets.
FAIRshare * Locally available
* Limited in scope to genomics or immunology.
* Not clear how to expand it.

References

1.
Parums DV (2022) Editorial: The World Health Organization (WHO) Fungal Priority Pathogens List in Response to Emerging Fungal Pathogens During the COVID-19 Pandemic. Med Sci Monit 28:e939088-1-e939088-3. https://doi.org/10.12659/MSM.939088
2.
Keller N (2017) Heterogeneity Confounds Establishment of “a” Model Microbial Strain. mBio 8:e00135–17. https://doi.org/10.1128/mBio.00135-17
3.
4.
Gonçalves P, Melo A, Dias M, et al (2021) Azole-Resistant Aspergillus fumigatus Harboring the TR34/L98H Mutation: First Report in Portugal in Environmental Samples. Microorganisms 9(1, 1):57. https://doi.org/10.3390/microorganisms9010057
5.
Burks C, Darby A, Londoño LG, Momany M, Brewer MT (2021) Azole-resistant Aspergillus Fumigatus in the environment: Identifying key reservoirs and hotspots of antifungal resistance. PLOS Pathogens 17(7):e1009711. https://doi.org/10.1371/journal.ppat.1009711
6.
Cao D, Wang F, Yu S, et al (2021) Prevalence of Azole-Resistant Aspergillus fumigatus is Highly Associated with Azole Fungicide Residues in the Fields. Environ Sci Technol 55(5):3041–3049. https://doi.org/10.1021/acs.est.0c03958
7.
Wilkinson MD, Dumontier M, Aalbersberg IjJ, et al (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3(1, 1):160018. https://doi.org/10.1038/sdata.2016.18
8.
Barrett T, Clark K, Gevorgyan R, et al (2012) BioProject and BioSample databases at NCBI: Facilitating capture and organization of metadata. Nucleic Acids Research 40(D1):D57–D63. https://doi.org/10.1093/nar/gkr1163
9.
Sewell TR, Zhu J, Rhodes J, et al (2019) Nonrandom Distribution of Azole Resistance across the Global Population of Aspergillus fumigatus. mBio 10(3):e00392–19. https://doi.org/10.1128/mBio.00392-19
10.
Weber M, Schaer J, Walther G, et al (2018) FunResDBA web resource for genotypic susceptibility testing of Aspergillus fumigatus. Med Mycol 56(1):117–120. https://doi.org/10.1093/mmy/myx015
11.
Nijsse B, Schaap PJ, Koehorst JJ (2023) FAIR data station for lightweight metadata management and validation of omics studies. GigaScience 12:giad014. https://doi.org/10.1093/gigascience/giad014

Footnotes

  1. Statements from [4], [5], [6]