Metadatathon and Metadata4Chem Meeting with PSDI

While the summer is at its peak, we want to look back and report on two events at the beginning of this summer – the NFDI4Chem Metadatathon in Halle and the Metadata4Chem online meeting a day after to team up with the Physical Sciences Data Infrastructure (PSDI) initiative.

NFDI4Chem Metadatathon

The NFDI4Chem Metadatathon aimed to develop and advance the harmonisation of metadata profiles, metadata schemas, and ontologies. The day began with a session led by Steffen Neumann and Oliver Koepler, who introduced overarching concepts related to schemas for datasets, projects, and samples. Key frameworks such as ISA-Tab, schema.org, DCAT/DCAT-AP, and the basis for chemistry metadata schemes – Minimum Information about Chemical Investigations (MIChI) profiles –  were discussed, with specific emphasis on their relevance to ongoing NFDI4Chem initiatives.

The afternoon discussions focused on data harmonisation efforts, particularly concerning Chemotion and other repositories like nmrXiv and databases such as MassBank EU. Oliver Koepler, Steffen Neumann, and Philip Strömert presented the need for a robust, technology-agnostic schema that could be applied across different repositories and the importance of establishing a common chemistry core (CCC) schema with additional layers that can address subdiscipline-specific chemistry metadata requirements.

A key part of the discussion revolved around the granularity of metadata, the role of provenance, and the challenges posed by different roles and responsibilities within datasets, particularly in the context of electronic lab notebooks (ELNs) like Chemotion ELN and the connected Chemotion Repository. The group intensely debated the complexities of representing chemical substances and chemical entities as one example out of CCC, recognizing the need for a flexible yet standardised approach that accommodates diverse chemical data types.

The meeting concluded with action items focusing on refining the Chemotion data model, identifying common concepts across datasets, and furthering the MIChI discussions on the sample, process descriptions, and further analytical methods. These efforts will pave the way for more integrated and interoperable metadata frameworks developed by NFDI4Chem for our community.

Metadata4Chem

The following day, NFDI4Chem and PSDI from the UK met to explore potential collaborations, particularly focusing on metadata and semantic annotation in the realm of chemistry. The meeting began with a brief introduction from all participants, followed by an insightful presentation from Steffen and Oliver, who shared an overview of NFDI4Chem’s initiatives related to metadata, also including a landscape overview of chemistry ontologies.

Aileen Day from PSDI provided an overview of PSDI’s platform, which integrates resources such as data, services, tools, and training materials. This platform employs SKOS terminologies in JSON-LD to ensure that metadata is applied consistently across all resources.

The session also included a summary by Samantha Pearman-Kanza on the ongoing work at MADICES related to metadata and semantic annotation. Key points of discussion involved the focus on JSON-LD, tools like LinkML, and ongoing work within the NFDI4Chem community and neighbouring consortia such as NFDI4Cat including the voc4cat pipeline and automated metadata extraction.

The meeting concluded with a productive discussion on future collaboration opportunities between NFDI4Chem and PSDI. Areas of interest included cross-search capabilities, harmonised indexing across repositories, and the formation of joint teams with similar domain expertise. There was a mutual interest in continuing the dialogue and exploring joint workshops, activities, and possibly co-authoring letters of support for ongoing and future projects.