ON-DEM DB

A database of DEM contact models to aid Reproducible Research and enhance data FAIRness

J.P. Morrissey

j.morrissey@ed.ac.uk

School of Engineering, University of Edinburgh, UK

Hrachya Kocharyan

American University of Armenia, Armenia

Catherine O’Sullivan

Imperial College London, UK

Friday, July 4^th, 2025

This presentation is based upon work from COST Action CA22132, supported by COST (European Cooperation in Science and Technology).

COST (European Cooperation in Science and Technology) is a funding agency for research and innovation networks. Our Actions help connect research initiatives across Europe and enable scientists to grow their ideas by sharing them with their peers, boosting their research, career and innovation.

www.cost.eu

Reminder - Research Data

Research Data Life-cycle

Managing research data is an often forgotten aspect of research.

Traditionally datasets were quite small:

Data was not widley shared beyond colleagues
Often only ~~documented~~ mentioned in publications
Typical data life-cycle on right

Ad-hoc management no longer sufficient

Poorly documented datasets are of limited use
Are they preserved beyond the current cloud storage subscription or Ph.D. Post-Doc project?

Many useful datasets have been lost because of poor research data management

Create: Collect, Generate, Process, Clean
Use: Manipulate, Analyse, Visualise
Re-Use: By colleagues or starting point for new dataset

Reproducibility & Replicability Crisis?

For the past two decades there has been significant discussion on the reproducibility crisis in science

A pattern of scientists being unable to obtain the same results as previous researchers

Distinction between Reproducibility & Replicability

Reproducibility: Can you get the same answers as me when you analyze my data?
Replicability: Can you get the same answers when you do my experiment and collect your own data?

AI will not make this better!

Dependent on the quality of the data provided to it.

Translating …

Reproducibility Example

“Here’s the raw data from my simulation. If you read the paper you will see the analysis I did and you should get the same results if you do the exact same analysis.”

Dependent on the detailed description of all of the variables in the dataset
Dependent on the detailed description of the method of analysis in the paper

Replicability Example

“I ran a DEM simulation of a shear cell. I used a linear contact model with a rolling friction model. Read the paper and you should get a similar flow function if you set-up your simulation exactly the same as me and do the exact same analysis.”

Dependent on the very careful definition of ALL parameters and inputs in the paper
Dependent on the definition of the analysis carried out
Dependent on the DEM code?

Change in Scientific Workflows

Significant changes in last decade:

The types of data generated have changed
- More numerical, less experimental

The amount of research data generated has grown
- much larger datasets (e.g. imaging, DEM, etc.) - 1TB is now more likely than 1MB!

Sharing & collaboration actively encouraged (seeking impact)
- Open science and open data are increasingly important to funders
- Citeable FAIR data
  - TUSAIL Data Collection
  - ON-DEM Data Collection

Revised Research Data Life-cycle

Data management best practices involve the entire data lifecycle from project start to end

Plan: Data Management Plan in grant submissions.
Create: Experiment, Simulation, Survey, Merge, etc…
Document: Describe the data collected in detail. Sooner rather than later!
Use: Analyse/Discover/Collaborate. Document the process.
Preserve: Store for future use. Version Control. Databases & Archives.
Share: Essential for translation of results into knowledge. Open Data? Repositories?
Re-Use: Collaborate / Derive / Develop / Teach / Policy

What is FAIR Data?

FAIR is an acronym for Findable, Accessible, Interoperable, Reusable, which are the principles which should apply to scientific data management and guardianship.

Findable: The first part of making data re-useable is to make the data findable. Detailed and accurate metadata is key
Accessible: Data could be openly available or it could require prior authentication and authorisation
Interoperable: Data needs to be able to be used in different programs or workflows
Reusable: Well defined data is essential as it makes it easier to understand and therefore use, combine and/or extend the dataset

**“FAIR Principles”**. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

FAIR Data - Machine Actionable

The FAIR principles also emphasise machine-actionability

The capacity of computational systems to find, access, interoperate, and reuse data with minimal or no human intervention
We are increasingly dependent on computers as datasets grow and we move towards automation (AI/Deep Learning/etc.)

Machine readable and digitally accessible are not the same thing!

What is Machine Actionable?

For example, traditional word processing documents and PDF files are easily read by humans but typically are difficult for machines to interpret.
A machine readable format is a file in a standard computer language (not text) that can be parsed (read) automatically by a web browser or computer system. (e.g.; xml, json).

How FAIR is your Data?

Some questions to consider about your own simulations:

Have you recorded all of the salient inputs for your simulation?
- Particle properties, material properties, interactions, boundaries, timestep, etc.

Have you used open, easily accessible file formats?
Have you documented the layout of your data (robust schema)?
Have you stored the data in a safe locations where it is easily discoverable and accessible?

Guidance on how to assess the “fairness” of your data: https://bit.ly/yourFIP

Storing Details about Contact (Interaction) Models

Concept

Any movie fans? You may be familiar with IMDB…

It tracks the “implementation” of various different films
Because not all “implementations” are made equal

Let’s consider an example

James Bond is an iconic long running series of films

Even this has issues with various implementations…
Lets’s consider Casino Royale

Casino Royale

Rating: 8.0
Year: 2006

Rating: 5.0
Year: 1967

Same original story, differently executed interpretations.

Possibly just like a DEM contact model in different DEM codes!

Breakdown of a Simulation

Simulations can be simplified into a set of components that need to be recorded

Each component may have several subcomponents such as material description and the interaction models
Each DEM code has many (different) supported interaction models
Some interaction models may exist in multiple DEM codes
- May have very different implementations or interpretations

Why do we want to track Contact Models?

There are two main reasons: Replicability and Interoperability:

Tracking what contact models are available in which codes improves both replicability and interoperability
- is the implementation of one model the same as another?
- what has changed in a new version of the same implementation? (bug-fix or new feature?)
- licensing, maintenance and development status

There are other reasons:

A list of all implemented contact models in (all) DEM codes helps users understand what features are available for each DEM code (e.g. GPU support, OS, etc.)
As a referencing tool, a permanent uuid can be created for each model that is somewhat similar to a doi.

So how do you track implementations of different models?

Defining DEM Contact Models

Currently no universally accepted Metadata Structure / Ontology for classification of DEM contact models

Need to develop

ON-DEM DB: A Way to Store Details about Contact (Interaction) Models

Proposed Database

Databases provide efficient and safe multi-user access to data

Data records can be made “immutable“
Better handling of data redundancy and duplicate avoidance
Reduced curation requirements

Web interface to Relational Database

Individual Record (page) for each contact model implementation
- Provides permanent citeable description for each contact model
Searchable
Downloadable metadata records
Accessible via REST API

Initial Prototype

Initial “alpha” version available at: on-dem-db.onrender.com

Tracking various aspects such as:

Description (code, version, etc)
Key Information of model (normal, tangential, ‘unified’, etc,)
Model Implementation Details - Reference, Repo, Licensing, GPU, OS, etc
Recommended Usage - Size range, shape, materials, etc
Detailed descriptions for models - with ability for complex formatting and equations. Can accept and render markdown.

Database Prototype Demo

Initial Testing and Feedback

Additional requirements were identified:

Image Support (Model schematics)
Need some method of saving & editing submissions
- Need to be able to track who has submitted an entry
- Entry status (draft/public)
- Revision history (who, when, description?)
- Easiest method is to provide user accounts with a login
Maintainer / Developer (status & handling multiple)
Some method of interfacing with new common file format being developed
- provide a mechanism for contact model lookup (names, type, etc)

Database Implementation

New more robust “production ready” implementation

based on Django and Postgresql
Various JavaScript libraries to add required functionality
Complete rewrite from alpha version

Advantages

Provides a nice admin backend for reviewing and editing submission if required
Simpler database interfacing via Django’s model views
Strong community and mature

Disadvantages

Extra complexity

Implemented Database Structure

Contact Model Relationships

Database Prototype Demo

Acknowledgements

Colleagues & Collaborators at:

The University of Edinburgh
TUSAIL
ON-DEM
CCC-ParaSolS

CCC-ParaSolS

CCC-ParaSolS is a two-year project — January 2025 to December 2026 — funded by the Science and Technology Facilities Council (STFC) to create a Collaborative Computational Community in particulate solids simulations.

If you are in the UK and you do DEM, then you should be part of CCC-ParaSolS!!

Thank You!

Any Questions?

Email: J.Morrissey@ed.ac.uk