Interaction Models Implementation Database

ON-DEM WG2

John P. Morrissey

School of Engineering, University of Edinburgh

Thursday, 5^nd of September, 2024

Who am I…

A little about me:

Research Fellow at The University of Edinburgh
Member of the Granular & Geomechanical Processes Research Group
Focus on Granular Materials and Particulate Mechanics
- Experimental Characterisation
- DEM Simulations
- Data Analysis

Granular Mechanics & Industrial Infrastructure Research Group

Reminder - Research Data

Photo by Kelly Sikkema on Unsplash

Reproducibility & Replicability Crisis in Science

For the past two decades there has been significant discussion on the reproducibility crisis in science

A pattern of scientists being unable to obtain the same results as previous researchers

Distinction between Reproducibility & Replicability

Reproducibility: Can you get the same answers as me when you analyze my data?
Replicability: Can you get the same answers when you do my experiment and collect your own data?

AI will not make this better!

Dependent on the quality of the data provided to it.

Research Data Life-cycle

Managing research data is an often forgotten aspect of research. Traditionally datasets were quite small:

Data was not widley shared beyond colleagues
Often only ~~documented~~ mentioned in publications

Significant changes in recent decades:

The amount of research data generated has grown
- much larger datasets (e.g. imaging, DEM, etc.)
Sharing & collaboration encouraged (impact)

Ad-hoc management no longer sufficient

Poorly documented datasets are of limited use
Are they preserved beyond the current cloud storage subscription?

Create: Collect, Generate, Process, Clean
Use: Manipulate, Analyse, Visualise
Re-Use: By colleagues or starting point for new dataset

Research Data Life-cycle

Data management best practices involve the entire data lifecycle from project start to end

Plan: Data Management Plan in grant submissions.
Create: Experiment, Simulation, Survey, Merge, etc…
Document: Describe the data collected in detail. Sooner rather than later!
Use: Analyse/Discover/Collaborate. Document the process.
Preserve: Store for future use. Version Control. Databases & Archives.
Share: Essential for translation of results into knowledge. Open Data? Repositories?
Re-Use: Collaborate / Derive / Develop. Teach. Policy.

What is FAIR Data?

FAIR is an acronym for Findable, Accessible, Interoperable, Reusable, which are the principles which should apply to scientific data management and guardianship.

Findable: The first part of making data re-useable is to make the data findable. Detailed and accurate metadata is key
Accessible: Data could be openly available or it could require prior authentication and authorisation
Interoperable: Data needs to be able to be used in different programs or workflows
Reusable: Well defined data is essential as it makes it easier to understand and therefore use, combine and/or extend the dataset

**“FAIR Principles”**. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

FAIR Data - Machine Actionable

The FAIR principles also emphasise machine-actionability

The capacity of computational systems to find, access, interoperate, and reuse data with minimal or no human intervention
We are increasingly dependent on computers as datasets grow and we move towards automation (AI/Deep Learning/etc.)

Machine readable and digitally accessible are not the same thing!

For example, traditional word processing documents and PDF files are easily read by humans but typically are difficult for machines to interpret.
A machine readable format is a file in a standard computer language (not English text) that can be read automatically by a web browser or computer system. (e.g.; xml, json).

A Way to Store Details about Interaction (Contact Models)

Breakdown of a Simulation

Simulations can be simplified into a set of components that need to be recorded

Each component may have several subcomponents such as material description and the interaction models
Each DEM code has many (different) supported interaction models
Some interaction models may exist in multiple DEM codes but may have very different implementations or be slightly different

DEM Material Parameters

Physical properties

Mass, volume, shape, size distribution

Mechanical properties

Contact stiffness
Static Friction (particle-particle, particle-wall)
Rolling Friction (particle-particle, particle-wall)
Coefficient of restitution (particle-particle, particle-wall)

Fabric properties

Porosity (Solid Fraction)
Initial Stress State

Interaction Models Database

How do you track implementations of different models and why does it matter?

Concept

Any movie fans? You may be familiar with IMDB…

it tracks the “implementation” of various different films
because not all “implementations” are made equal

Let’s consider an example

James Bond is an iconic long running series of films

Even this has issues with various implementations…
Lets’s consider Casino Royale

Casino Royale

ON-DEM Interaction Models Implementation Database

Initial Prototype

Initial “alpha” version available at: on-dem-db.onrender.com

Tracking various aspects such as:

Description (code, version, etc)
Key Information of model (normal, tangential, ‘unified’, etc,)
Model Implementation Details - Reference, Repo, Licensing, GPU, OS, etc
Recommended Usage - Size range, shape, materials, etc

Database Prototype Demo

Discussion on what to include

Need to make sure we are tracking different versions of contact models in codes
- bug fixes
What do we do for tracking model parameters?
- Input parameters
- Internal variables
- Output results
Minimum requirements
- Sensible set of minimum requirements to encourage adoption
- BY removing the requirement for a source code repository we can include contact (interaction) models from any commercial software as well
Interaction with WG3 - how to link the two?

Input parameters

  {
    "$id": "equipment/interaction_model.v1-0-0.json",
  "$schema": "https://raw.githubusercontent.com/Vidminas/python-jsonschema-minmax/main/metaschema/minmax-metaschema.json",

    "type": "object",
    "properties": {
      "test_category": { "const": "Linear Shear" },
      "test_subcategory": { "const": "Jenike" },
      "class": { "label": "Test Regime", "const": "Quasi-static" },
      "rating": { "label": "Repeatability Rating", "const": 4 },
      "geometric_properties": { "$ref": "#/$defs/geometric_properties" },
      "measurement_parameters": { "$ref": "#/$defs/measurement_parameters" }
    },

    "required": ["geometric_properties", "measurement_parameters"],
  }

TO-DO

Front-End

Website contents (Someone to generate?)
Website interface (colour scheme)
Form Interface (Guided multi-step?)
Hosting (sub-domain of ondem.org)

Back-End

Implement database within GrainDB
- Add revision history functionality
Add “save progress” functionality

GrainDB

Adding the ‘F’ in FAIR

A well-executed Research Data Management Plan enables data to be Accessible, Interoperable and Reusable.

Making datasets (for specific materials or test equipment) Findable is a challenge:

Datasets may be spread across lots of different repositories which can make checking time consuming and difficult
Repository keywords are limited in number and the type of information they can store
Typically repositories do not have test or material specific metadata
- Need to browse each individual README and hope that it includes sufficient detail
How to store metadata?
What metadata to store?

What is GrainDB

We can consider a dataset as two separate components:

raw data: the actual recorded measurement/observation. (Typically time series data)
metadata: the description of all aspects of the experiment

GrainDB is an Experimental Measurements Database that collates and stores the relevant metadata for the material and test equipment that is missing from repositories, alongside a link to the full dataset which has been stored in an appropriate manner.

This becomes a searchable database of all available data, enabling the end-user to find detailed experimental datasets that match their needs
Enhancing dissemination
Provide high-quality machine-readable datasets suitable for use in AI/ML
Interface guides people on what information should be stored in accordance with best experimental practice

Real-world Challenge

(Bulk Powder Flow Characterisation Techniques, 2019, https://doi.org/10.1039/9781788016100-00064 )

Variety of test types in use to measure material properties

How to record details relating to all of these tests?
And others…

What is an Experiment?

Experiment or Measurement?

In a measurement, one performs parameter inference

Estimate quantities from observations

In an experiment, one performs hypothesis tests or model selection

Determine the best model for explaining observations

How to Record?

Databases provide efficient and safe multi-user access to data

Data records can be made “immutable“
Better handling of data redundancy and duplicate avoidance
Reduced curation requirements
Databases require a well-defined metadata structure
This provides a mechanism to make links between a material, a piece of equipment and a measurement

Accessible via a website: www.graindb.org

Metadata Structure

Need to be prescriptive

Define carefully what information is required for all aspects

Avoid confusion about units, types, dimensions, etc.
What is minimum required data and what is optional data?
Enforce minimum quality of reported data

Currently no Ontology or common descriptive language for defining granular materials and their measurement

some examples in chemistry, physics

Entity Relationships

Material Description

Equipment Description

Capturing all Equipment?

Experiment Description

Result

Prototype Database

Database

Web interface to Relational Database

Individual page (record) for each material, equipment and experiment
- Provides permanent citeable description for material/equipment/experiment
Searchable
Downloadable metadata records
REST API

Database Implementation

Database backend utilises PostgreSQL

Why?

PostgreSQL (NoSQL) databases allow advanced data types to be stored in hybrid tables
This is flexible and allows all the data for a specific category to be stored as one entry
Very good for handling complex data
Each category is defined by a schema
Provision of schemas will allow future implementations of POST API

Database Schemas

All schemas are versioned so we can:
- Update schemas over time adding new metadata
- Track what schema version of the database an entry was made or edited with
Schemas allow greater use of templating for dynamic page generation
- All additional information to create the page can be found in the schema
- Easier maintenance
- Slightly less control over aesthetics
Available at: git.ecdf.ed.ac.uk/jmorrise/tusail-experimental-database-schemas

Example Schema - Linear Shear Test

{
  "$id": "equipment/linear_shear_jenike.v1-0-0.json",
  "$schema": "https://raw.githubusercontent.com/Vidminas/python-jsonschema-minmax/main/metaschema/minmax-metaschema.json",

  "type": "object",
  "properties": {
    "test_category": { "const": "Linear Shear" },
    "test_subcategory": { "const": "Jenike" },
    "class": { "label": "Test Regime", "const": "Quasi-static" },
    "rating": { "label": "Repeatability Rating", "const": 4 },
    "geometric_properties": { "$ref": "#/$defs/geometric_properties" },
    "measurement_parameters": { "$ref": "#/$defs/measurement_parameters" }
  },
  "required": ["geometric_properties", "measurement_parameters"],

  "$defs": {
    "geometric_properties": { "label": "Geometric Properties",
      "type": "object",
      "properties": {
        "cell_diameter": { "label": "Cell Diameter", "type": "number", "minimum": 0, "units": ["mm"] },
        "base_height": { "label": "Base Height", "type": "number", "minimum": 0, "units": [ "mm" ] },
        "ring_height": { "label": "Ring Height", "type": "number", "minimum": 0, "units": [ "mm" ] },
        "wall_thickness": { "label": "Wall Thickness", "type": "number", "minimum": 0, "units": [ "mm" ] },
        "max_translation_speed": { "label": "Max. Translation Speed", "type": "number", "minimum": 0, "units": [ "mm/s" ] },
        "cell_material": { "label": "Cell Material", "type": "string" },
        "stress_application": { "label": "Vertical Stress Application Method", "enum": [ "Constant Mass", "Servo-Controlled" ] }
      },
      "additionalProperties": false
    },
    "measurement_parameters": { "label": "Measurement Properties",
      "type": "object",
      "properties": {
        "force_accuracy": { "label": "Force Accuracy", "type": "number", "minimum": 0, "units": [ "Pa" ] },
        "force_resolution": { "label": "Force Resolution", "type": "number", "minimum": 0, "units": [ "Pa" ] },
        "displacement_accuracy": { "label": "Displacement Accuracy", "type": "number", "minimum": 0, "units": [ "mm" ] },
        "displacement_resolution": { "label": "Displacement Resolution", "type": "number", "minimum": 0, "units": [ "mm" ] }
      },
      "additionalProperties": false
    }
  }
}

Database API

RESTful API implemented using Swagger and OpenAPI Specification

An interface that two computer systems use to exchange information securely over the internet
Nicely documented, interactive API documentation
www.graindb.org/api/ui

Database Prototype Demo

Concluding Remarks

We haven’t always done a very good job of preserving our datasets for future
- but we have the tools to do so
We need to be able to find all datasets easily
- A searchable database will make this a much easier task
A well defined set of metadata schemas will make recording experimental measurements for lots of different pieces of equipment more straightforward
Having detailed stored records with unique ids (urls) may make dissemination easier as well
could easily include in a publication much like a doi

Thank You!

Any Questions?

Email: J.Morrissey@ed.ac.uk