ON-DEM DB

A database of DEM contact models to aid Reproducible Research and enhance data FAIRness

J.P. Morrissey

School of Engineering, University of Edinburgh, UK

Hrachya Kocharyan

American University of Armenia, Armenia

Catherine O’Sullivan

Imperial College London, UK

Friday, July 4th, 2025

This presentation is based upon work from COST Action CA22132, supported by COST (European Cooperation in Science and Technology).

COST (European Cooperation in Science and Technology) is a funding agency for research and innovation networks. Our Actions help connect research initiatives across Europe and enable scientists to grow their ideas by sharing them with their peers, boosting their research, career and innovation.

www.cost.eu

Reminder - Research Data

Research Data Life-cycle

Managing research data is an often forgotten aspect of research.

Traditionally datasets were quite small:

  • Data was not widley shared beyond colleagues
  • Often only documented mentioned in publications
  • Typical data life-cycle on right

Ad-hoc management no longer sufficient

  • Poorly documented datasets are of limited use
  • Are they preserved beyond the current cloud storage subscription or Ph.D. Post-Doc project?

Many useful datasets have been lost because of poor research data management

  • Create: Collect, Generate, Process, Clean
  • Use: Manipulate, Analyse, Visualise
  • Re-Use: By colleagues or starting point for new dataset

Reproducibility & Replicability Crisis?

For the past two decades there has been significant discussion on the reproducibility crisis in science

  • A pattern of scientists being unable to obtain the same results as previous researchers

Distinction between Reproducibility & Replicability

  • Reproducibility: Can you get the same answers as me when you analyze my data?

  • Replicability: Can you get the same answers when you do my experiment and collect your own data?

AI will not make this better!

Dependent on the quality of the data provided to it.

[Baker, 2015, Nature]

Translating …

Reproducibility Example

“Here’s the raw data from my simulation. If you read the paper you will see the analysis I did and you should get the same results if you do the exact same analysis.”

  • Dependent on the detailed description of all of the variables in the dataset
  • Dependent on the detailed description of the method of analysis in the paper

Replicability Example

“I ran a DEM simulation of a shear cell. I used a linear contact model with a rolling friction model. Read the paper and you should get a similar flow function if you set-up your simulation exactly the same as me and do the exact same analysis.”

  • Dependent on the very careful definition of ALL parameters and inputs in the paper
  • Dependent on the definition of the analysis carried out
  • Dependent on the DEM code?

Change in Scientific Workflows

Significant changes in last decade:

  • The types of data generated have changed

    • More numerical, less experimental
  • The amount of research data generated has grown
    • much larger datasets (e.g. imaging, DEM, etc.) - 1TB is now more likely than 1MB!

Revised Research Data Life-cycle

Data management best practices involve the entire data lifecycle from project start to end

  • Plan: Data Management Plan in grant submissions.

  • Create: Experiment, Simulation, Survey, Merge, etc…

  • Document: Describe the data collected in detail. Sooner rather than later!

  • Use: Analyse/Discover/Collaborate. Document the process.

  • Preserve: Store for future use. Version Control. Databases & Archives.

  • Share: Essential for translation of results into knowledge. Open Data? Repositories?

  • Re-Use: Collaborate / Derive / Develop / Teach / Policy

What is FAIR Data?

FAIR is an acronym for Findable, Accessible, Interoperable, Reusable, which are the principles which should apply to scientific data management and guardianship.

  • Findable: The first part of making data re-useable is to make the data findable. Detailed and accurate metadata is key

  • Accessible: Data could be openly available or it could require prior authentication and authorisation

  • Interoperable: Data needs to be able to be used in different programs or workflows

  • Reusable: Well defined data is essential as it makes it easier to understand and therefore use, combine and/or extend the dataset

“FAIR Principles”. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

FAIR Data - Machine Actionable

The FAIR principles also emphasise machine-actionability

  • The capacity of computational systems to find, access, interoperate, and reuse data with minimal or no human intervention
  • We are increasingly dependent on computers as datasets grow and we move towards automation (AI/Deep Learning/etc.)

Machine readable and digitally accessible are not the same thing!

What is Machine Actionable?

  • For example, traditional word processing documents and PDF files are easily read by humans but typically are difficult for machines to interpret.

  • A machine readable format is a file in a standard computer language (not text) that can be parsed (read) automatically by a web browser or computer system. (e.g.; xml, json).

How FAIR is your Data?

Some questions to consider about your own simulations:

  • Have you recorded all of the salient inputs for your simulation?
    • Particle properties, material properties, interactions, boundaries, timestep, etc.
  • Have you used open, easily accessible file formats?
  • Have you documented the layout of your data (robust schema)?
  • Have you stored the data in a safe locations where it is easily discoverable and accessible?

Guidance on how to assess the “fairness” of your data: https://bit.ly/yourFIP

Storing Details about Contact (Interaction) Models

Concept

Any movie fans? You may be familiar with IMDB…

  • It tracks the “implementation” of various different films
  • Because not all “implementations” are made equal

Let’s consider an example

James Bond is an iconic long running series of films

  • Even this has issues with various implementations…
  • Lets’s consider Casino Royale

Casino Royale

  • Rating: 8.0
  • Year: 2006

  • Rating: 5.0
  • Year: 1967

Same original story, differently executed interpretations.

  • Possibly just like a DEM contact model in different DEM codes!

Breakdown of a Simulation

Simulations can be simplified into a set of components that need to be recorded

  • Each component may have several subcomponents such as material description and the interaction models

  • Each DEM code has many (different) supported interaction models

  • Some interaction models may exist in multiple DEM codes

    • May have very different implementations or interpretations

Why do we want to track Contact Models?

There are two main reasons: Replicability and Interoperability:

  • Tracking what contact models are available in which codes improves both replicability and interoperability
    • is the implementation of one model the same as another?
    • what has changed in a new version of the same implementation? (bug-fix or new feature?)
    • licensing, maintenance and development status

There are other reasons:

  • A list of all implemented contact models in (all) DEM codes helps users understand what features are available for each DEM code (e.g. GPU support, OS, etc.)
  • As a referencing tool, a permanent uuid can be created for each model that is somewhat similar to a doi.

So how do you track implementations of different models?

Defining DEM Contact Models

Currently no universally accepted Metadata Structure / Ontology for classification of DEM contact models

  • Need to develop

ON-DEM DB: A Way to Store Details about Contact (Interaction) Models

Proposed Database

Databases provide efficient and safe multi-user access to data

  • Data records can be made “immutable“
  • Better handling of data redundancy and duplicate avoidance
  • Reduced curation requirements

Web interface to Relational Database

  • Individual Record (page) for each contact model implementation
    • Provides permanent citeable description for each contact model
  • Searchable
  • Downloadable metadata records
  • Accessible via REST API

Initial Prototype

Initial “alpha” version available at: on-dem-db.onrender.com

Tracking various aspects such as:

  • Description (code, version, etc)
  • Key Information of model (normal, tangential, ‘unified’, etc,)
  • Model Implementation Details - Reference, Repo, Licensing, GPU, OS, etc
  • Recommended Usage - Size range, shape, materials, etc
  • Detailed descriptions for models - with ability for complex formatting and equations. Can accept and render markdown.

Database Prototype Demo

Initial Testing and Feedback

Additional requirements were identified:

  • Image Support (Model schematics)

  • Need some method of saving & editing submissions

    • Need to be able to track who has submitted an entry

    • Entry status (draft/public)

    • Revision history (who, when, description?)

    • Easiest method is to provide user accounts with a login

  • Maintainer / Developer (status & handling multiple)

  • Some method of interfacing with new common file format being developed

    • provide a mechanism for contact model lookup (names, type, etc)

Database Implementation

New more robust “production ready” implementation

  • based on Django and Postgresql
  • Various JavaScript libraries to add required functionality
  • Complete rewrite from alpha version

Advantages

  • Provides a nice admin backend for reviewing and editing submission if required
  • Simpler database interfacing via Django’s model views
  • Strong community and mature

Disadvantages

  • Extra complexity

Implemented Database Structure

Contact Model Relationships

Database Prototype Demo

Acknowledgements

Colleagues & Collaborators at:

  • The University of Edinburgh
  • TUSAIL
  • ON-DEM
  • CCC-ParaSolS

CCC-ParaSolS

CCC-ParaSolS is a two-year project — January 2025 to December 2026 — funded by the Science and Technology Facilities Council (STFC) to create a Collaborative Computational Community in particulate solids simulations.

If you are in the UK and you do DEM, then you should be part of CCC-ParaSolS!!

Thank You!

Any Questions?

Email: J.Morrissey@ed.ac.uk