Interaction Models Implementation Database

ON-DEM WG2

John P. Morrissey

School of Engineering, University of Edinburgh

Thursday, 5nd of September, 2024

Who am I…

A little about me:

  • Research Fellow at The University of Edinburgh
  • Member of the Granular & Geomechanical Processes Research Group
  • Focus on Granular Materials and Particulate Mechanics
    • Experimental Characterisation
    • DEM Simulations
    • Data Analysis


Granular Mechanics & Industrial Infrastructure Research Group

Reminder - Research Data

Photo by Kelly Sikkema on Unsplash

Reproducibility & Replicability Crisis in Science

For the past two decades there has been significant discussion on the reproducibility crisis in science

  • A pattern of scientists being unable to obtain the same results as previous researchers

Distinction between Reproducibility & Replicability

  • Reproducibility: Can you get the same answers as me when you analyze my data?

  • Replicability: Can you get the same answers when you do my experiment and collect your own data?

AI will not make this better!

Dependent on the quality of the data provided to it.

[Baker, 2015, Nature]

Research Data Life-cycle

Managing research data is an often forgotten aspect of research. Traditionally datasets were quite small:

  • Data was not widley shared beyond colleagues
  • Often only documented mentioned in publications

Significant changes in recent decades:

  • The amount of research data generated has grown
    • much larger datasets (e.g. imaging, DEM, etc.)
  • Sharing & collaboration encouraged (impact)

Ad-hoc management no longer sufficient

  • Poorly documented datasets are of limited use
  • Are they preserved beyond the current cloud storage subscription?

  • Create: Collect, Generate, Process, Clean

  • Use: Manipulate, Analyse, Visualise

  • Re-Use: By colleagues or starting point for new dataset

Research Data Life-cycle

Data management best practices involve the entire data lifecycle from project start to end

  • Plan: Data Management Plan in grant submissions.

  • Create: Experiment, Simulation, Survey, Merge, etc…

  • Document: Describe the data collected in detail. Sooner rather than later!

  • Use: Analyse/Discover/Collaborate. Document the process.

  • Preserve: Store for future use. Version Control. Databases & Archives.

  • Share: Essential for translation of results into knowledge. Open Data? Repositories?

  • Re-Use: Collaborate / Derive / Develop. Teach. Policy.

What is FAIR Data?

FAIR is an acronym for Findable, Accessible, Interoperable, Reusable, which are the principles which should apply to scientific data management and guardianship.

  • Findable: The first part of making data re-useable is to make the data findable. Detailed and accurate metadata is key

  • Accessible: Data could be openly available or it could require prior authentication and authorisation

  • Interoperable: Data needs to be able to be used in different programs or workflows

  • Reusable: Well defined data is essential as it makes it easier to understand and therefore use, combine and/or extend the dataset

“FAIR Principles”. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

FAIR Data - Machine Actionable

The FAIR principles also emphasise machine-actionability

  • The capacity of computational systems to find, access, interoperate, and reuse data with minimal or no human intervention
  • We are increasingly dependent on computers as datasets grow and we move towards automation (AI/Deep Learning/etc.)

Machine readable and digitally accessible are not the same thing!

  • For example, traditional word processing documents and PDF files are easily read by humans but typically are difficult for machines to interpret.

  • A machine readable format is a file in a standard computer language (not English text) that can be read automatically by a web browser or computer system. (e.g.; xml, json).

A Way to Store Details about Interaction (Contact Models)

Breakdown of a Simulation

Simulations can be simplified into a set of components that need to be recorded

  • Each component may have several subcomponents such as material description and the interaction models

  • Each DEM code has many (different) supported interaction models

  • Some interaction models may exist in multiple DEM codes but may have very different implementations or be slightly different

DEM Material Parameters

Physical properties

  • Mass, volume, shape, size distribution

Mechanical properties

  • Contact stiffness
  • Static Friction (particle-particle, particle-wall)
  • Rolling Friction (particle-particle, particle-wall)
  • Coefficient of restitution (particle-particle, particle-wall)

Fabric properties

  • Porosity (Solid Fraction)
  • Initial Stress State

Interaction Models Database

How do you track implementations of different models and why does it matter?

Concept

Any movie fans? You may be familiar with IMDB…

  • it tracks the “implementation” of various different films
  • because not all “implementations” are made equal

Let’s consider an example

James Bond is an iconic long running series of films

  • Even this has issues with various implementations…
  • Lets’s consider Casino Royale

Casino Royale

ON-DEM Interaction Models Implementation Database

Initial Prototype

Initial “alpha” version available at: on-dem-db.onrender.com

Tracking various aspects such as:

  • Description (code, version, etc)
  • Key Information of model (normal, tangential, ‘unified’, etc,)
  • Model Implementation Details - Reference, Repo, Licensing, GPU, OS, etc
  • Recommended Usage - Size range, shape, materials, etc

Database Prototype Demo

Discussion on what to include

  • Need to make sure we are tracking different versions of contact models in codes

    • bug fixes
  • What do we do for tracking model parameters?

    • Input parameters
    • Internal variables
    • Output results
  • Minimum requirements

    • Sensible set of minimum requirements to encourage adoption
    • BY removing the requirement for a source code repository we can include contact (interaction) models from any commercial software as well
  • Interaction with WG3 - how to link the two?

Input parameters

  {
    "$id": "equipment/interaction_model.v1-0-0.json",
  "$schema": "https://raw.githubusercontent.com/Vidminas/python-jsonschema-minmax/main/metaschema/minmax-metaschema.json",

    "type": "object",
    "properties": {
      "test_category": { "const": "Linear Shear" },
      "test_subcategory": { "const": "Jenike" },
      "class": { "label": "Test Regime", "const": "Quasi-static" },
      "rating": { "label": "Repeatability Rating", "const": 4 },
      "geometric_properties": { "$ref": "#/$defs/geometric_properties" },
      "measurement_parameters": { "$ref": "#/$defs/measurement_parameters" }
    },

    "required": ["geometric_properties", "measurement_parameters"],
  }

TO-DO

Front-End

  • Website contents (Someone to generate?)
  • Website interface (colour scheme)
  • Form Interface (Guided multi-step?)
  • Hosting (sub-domain of ondem.org)

Back-End

  • Implement database within GrainDB
    • Add revision history functionality
  • Add “save progress” functionality

GrainDB

Adding the ‘F’ in FAIR

A well-executed Research Data Management Plan enables data to be Accessible, Interoperable and Reusable.

Making datasets (for specific materials or test equipment) Findable is a challenge:

  • Datasets may be spread across lots of different repositories which can make checking time consuming and difficult
  • Repository keywords are limited in number and the type of information they can store
  • Typically repositories do not have test or material specific metadata
    • Need to browse each individual README and hope that it includes sufficient detail
  • How to store metadata?
  • What metadata to store?

Photo by Jubbar J. on Unsplash

What is GrainDB

We can consider a dataset as two separate components:

  • raw data: the actual recorded measurement/observation. (Typically time series data)
  • metadata: the description of all aspects of the experiment

GrainDB is an Experimental Measurements Database that collates and stores the relevant metadata for the material and test equipment that is missing from repositories, alongside a link to the full dataset which has been stored in an appropriate manner.

  • This becomes a searchable database of all available data, enabling the end-user to find detailed experimental datasets that match their needs

  • Enhancing dissemination

  • Provide high-quality machine-readable datasets suitable for use in AI/ML

  • Interface guides people on what information should be stored in accordance with best experimental practice

Real-world Challenge

(Bulk Powder Flow Characterisation Techniques, 2019, https://doi.org/10.1039/9781788016100-00064 )

Variety of test types in use to measure material properties

  • How to record details relating to all of these tests?
  • And others…

What is an Experiment?

Experiment or Measurement?

In a measurement, one performs parameter inference

  • Estimate quantities from observations

In an experiment, one performs hypothesis tests or model selection

  • Determine the best model for explaining observations

How to Record?

Databases provide efficient and safe multi-user access to data

  • Data records can be made “immutable“
  • Better handling of data redundancy and duplicate avoidance
  • Reduced curation requirements
  • Databases require a well-defined metadata structure
  • This provides a mechanism to make links between a material, a piece of equipment and a measurement

Accessible via a website: www.graindb.org

Metadata Structure

Need to be prescriptive

Define carefully what information is required for all aspects

  • Avoid confusion about units, types, dimensions, etc.
  • What is minimum required data and what is optional data?
  • Enforce minimum quality of reported data

Currently no Ontology or common descriptive language for defining granular materials and their measurement

  • some examples in chemistry, physics

Entity Relationships

Material Description

Equipment Description

Capturing all Equipment?

Experiment Description

Result

Prototype Database

Database

Web interface to Relational Database

  • Individual page (record) for each material, equipment and experiment
    • Provides permanent citeable description for material/equipment/experiment
  • Searchable
  • Downloadable metadata records
  • REST API

Database Implementation

Database backend utilises PostgreSQL

Why?

  • PostgreSQL (NoSQL) databases allow advanced data types to be stored in hybrid tables
  • This is flexible and allows all the data for a specific category to be stored as one entry
  • Very good for handling complex data
  • Each category is defined by a schema
  • Provision of schemas will allow future implementations of POST API

Database Schemas

  • All schemas are versioned so we can:

    • Update schemas over time adding new metadata
    • Track what schema version of the database an entry was made or edited with
  • Schemas allow greater use of templating for dynamic page generation

    • All additional information to create the page can be found in the schema
    • Easier maintenance
    • Slightly less control over aesthetics
  • Available at: git.ecdf.ed.ac.uk/jmorrise/tusail-experimental-database-schemas

Example Schema - Linear Shear Test

{
  "$id": "equipment/linear_shear_jenike.v1-0-0.json",
  "$schema": "https://raw.githubusercontent.com/Vidminas/python-jsonschema-minmax/main/metaschema/minmax-metaschema.json",

  "type": "object",
  "properties": {
    "test_category": { "const": "Linear Shear" },
    "test_subcategory": { "const": "Jenike" },
    "class": { "label": "Test Regime", "const": "Quasi-static" },
    "rating": { "label": "Repeatability Rating", "const": 4 },
    "geometric_properties": { "$ref": "#/$defs/geometric_properties" },
    "measurement_parameters": { "$ref": "#/$defs/measurement_parameters" }
  },
  "required": ["geometric_properties", "measurement_parameters"],

  "$defs": {
    "geometric_properties": { "label": "Geometric Properties",
      "type": "object",
      "properties": {
        "cell_diameter": { "label": "Cell Diameter", "type": "number", "minimum": 0, "units": ["mm"] },
        "base_height": { "label": "Base Height", "type": "number", "minimum": 0, "units": [ "mm" ] },
        "ring_height": { "label": "Ring Height", "type": "number", "minimum": 0, "units": [ "mm" ] },
        "wall_thickness": { "label": "Wall Thickness", "type": "number", "minimum": 0, "units": [ "mm" ] },
        "max_translation_speed": { "label": "Max. Translation Speed", "type": "number", "minimum": 0, "units": [ "mm/s" ] },
        "cell_material": { "label": "Cell Material", "type": "string" },
        "stress_application": { "label": "Vertical Stress Application Method", "enum": [ "Constant Mass", "Servo-Controlled" ] }
      },
      "additionalProperties": false
    },
    "measurement_parameters": { "label": "Measurement Properties",
      "type": "object",
      "properties": {
        "force_accuracy": { "label": "Force Accuracy", "type": "number", "minimum": 0, "units": [ "Pa" ] },
        "force_resolution": { "label": "Force Resolution", "type": "number", "minimum": 0, "units": [ "Pa" ] },
        "displacement_accuracy": { "label": "Displacement Accuracy", "type": "number", "minimum": 0, "units": [ "mm" ] },
        "displacement_resolution": { "label": "Displacement Resolution", "type": "number", "minimum": 0, "units": [ "mm" ] }
      },
      "additionalProperties": false
    }
  }
}

Database API

RESTful API implemented using Swagger and OpenAPI Specification

  • An interface that two computer systems use to exchange information securely over the internet
  • Nicely documented, interactive API documentation
  • www.graindb.org/api/ui

Database Prototype Demo

Concluding Remarks

  • We haven’t always done a very good job of preserving our datasets for future

    • but we have the tools to do so
  • We need to be able to find all datasets easily

    • A searchable database will make this a much easier task
  • A well defined set of metadata schemas will make recording experimental measurements for lots of different pieces of equipment more straightforward

  • Having detailed stored records with unique ids (urls) may make dissemination easier as well

  • could easily include in a publication much like a doi

Thank You!

Any Questions?

Email: J.Morrissey@ed.ac.uk