FAIR data in Aspergillus research

How do we make data more useful?

Sibbe Bakker
Mariana Santos Couto Silva
Anna Fensel


Aspergillus fumigatus–an opportunist fungal pathogen.

  • A. fumigatus is a growing health concern [1].

  • A lot of research is being conducted.

    • Environmental spread.
    • Molecular biology.
  • There are some un-answered questions in this domain.

    • What mechanisms of azole resistance do environmental isolates have?
    • How does this spread in through the A fumigatus population?

A. fumigatus spreads quickly through its environment [2]

How is this research shared?

What ways of sharing are there?

Types of datasharing by [3].

How is the data actually shared?1

What are these data usefull for?

  • A single study may be understood better using this.
  • Using the data from the same study for a different question?
  • Combining specific results?
  • Finding specific results?

Can we face the open questions in the field?

What do we need to make it useful?

FAIR principles were devised by [7].

What is already available?

  • Disconnected databases.

  • Deposition of new data not easy.

  • No FAIR system.

NCBI [8]—Meta data not always standard.

Afrum database [9] upload not possible.

FunResDB [10]—Data upload not possible.

What is there to do?

What can we do about this?

  • Come together as a field.

  • Make and adapt standards as a community.

  • Be vocal about your wishes.

What is my role in this?

  • Facilitate working together.

  • Educate about standards.

  • Introduce the standards that the community wants.


The FAIRDS project [11]

  • A standardised meta data solution.
  • Excel templates.
  • RDF output.
  • Written by members of SSB.

  • Will be maintained after the ASPAR_KR project is done.

A quick demo for making FAIR data!

Experimental design is specified in the data format.

Classes available within the ASPAR_KR database. Each class `owns’ lower level classess. For example, a sample has associated assays.

Example of data analysis

Analysis in R.
sparql_query <- 
  ' # Select the first 5 of everything...
  prefix ppeo: <http://purl.org/ppeo/PPEO.owl#> 
  prefix jm: <http://jermontology.org/ontology/JERMOntology#> 
  prefix fair: <http://fairbydesign.nl/ontology/> 
  prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
  prefix schema: <http://schema.org/>
  SELECT ?observation_label ?sample_label 
    (COUNT(?sample_label) as ?n) ?total_cfu ?selection_cfu
    # Get the samples of interest
    ?observation_unit a ppeo:observation_unit .
    ?observation_unit jm:hasPart ?parts .
    ?parts a jm:Sample .
    ?parts fair:packageName "TwoLayerCulture" .
    ?observation_unit schema:name ?observation_label .
    ?parts schema:name ?sample_label .
    # Experimental data
    ?parts fair:total_cfu ?total_cfu .
    ?parts fair:selection_cfu ?selection_cfu .
  } GROUP BY ?observation_unit
result <- rdflib::rdf_query(rdf, sparql_query)
# A tibble: 2 × 5
  observation_label    sample_label                    n total_cfu selection_cfu
  <chr>                <chr>                       <dbl>     <dbl>         <dbl>
1 The city of Arnhem   Arnhem Centraal culture         4        66            28
2 The city of Nijmegen Nijmegen stationsplein cul…     4        57            21
result |> 
  dplyr::mutate(resistance_fraction = selection_cfu / total_cfu) |> 
  ggplot2::ggplot(ggplot2::aes(x = observation_label, y = resistance_fraction)) +

Linked data potential.

Linked data can be used to easily use information collected by others.

Lessons so far…

FAIR data is important for A. fumigatus research.

  • Without it, an overview of the field is hard to obtain.

  • Community awareness needs to increase.

  • Existing standards (FAIRDS) must be adapted.

Now lets hear what you think!

Where will you publish your data?


If you need more information?

Check out the git page.

Thanks for

  • Jasper Koehorst and Bart Nijsen
    Developing FAIRDS.
  • Mariana and Anna
  • Members of the genetics chair group
    For the good discussions.

Additonal slides

Extra slides for extra questions

How does RDF work?

  • RDF files are plain text of various formats.

  • The basis is the triple.

An example of an RDF statement.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<https://example.com/data#statement> rdf:type "triple" .


@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
    rel:enemyOf <#spiderman> ;
    a foaf:Person ;    # in the context of the Marvel universe
    foaf:name "Green Goblin" .
    rel:enemyOf <#green-goblin> ;
    a foaf:Person ;
    foaf:name "Spiderman", "Человек-паук"@ru .

Turns into this …

How does SPARQL work?

  • SPARQL is a programming language for analysis of RDF.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
PREFIX ex: <http//example.org/>
SELECT ?person ?enemy
    ?person a foaf:Person ;
      foaf:name ?name ;
      rel:enemyOf ?enemy .
    FILTER(STRSTARTS(?name, "Green")) .

should return: “Green goblin” ex:spiderman

Examples of ASPAR_KR alternatives

alternative + -
seek4science * Supports ISA
* Sharing of templates online.
* A bit more complicated to contribute to.
* Not available locally via excel sheets.
FAIRshare * Locally available
* Limited in scope to genomics or immunology.
* Not clear how to expand it.


  1. Statements from [4], [5], [6]