ON-DEM WG2
School of Engineering, University of Edinburgh
Thursday, 5nd of September, 2024
A little about me:
Photo by Kelly Sikkema on Unsplash
For the past two decades there has been significant discussion on the reproducibility crisis in science
Distinction between Reproducibility & Replicability
Reproducibility: Can you get the same answers as me when you analyze my data?
Replicability: Can you get the same answers when you do my experiment and collect your own data?
AI will not make this better!
Dependent on the quality of the data provided to it.
Managing research data is an often forgotten aspect of research. Traditionally datasets were quite small:
Significant changes in recent decades:
Ad-hoc management no longer sufficient
Create: Collect, Generate, Process, Clean
Use: Manipulate, Analyse, Visualise
Re-Use: By colleagues or starting point for new dataset
Data management best practices involve the entire data lifecycle from project start to end
Plan: Data Management Plan in grant submissions.
Create: Experiment, Simulation, Survey, Merge, etc…
Document: Describe the data collected in detail. Sooner rather than later!
Use: Analyse/Discover/Collaborate. Document the process.
Preserve: Store for future use. Version Control. Databases & Archives.
Share: Essential for translation of results into knowledge. Open Data? Repositories?
Re-Use: Collaborate / Derive / Develop. Teach. Policy.
FAIR is an acronym for Findable, Accessible, Interoperable, Reusable, which are the principles which should apply to scientific data management and guardianship.
Findable: The first part of making data re-useable is to make the data findable. Detailed and accurate metadata is key
Accessible: Data could be openly available or it could require prior authentication and authorisation
Interoperable: Data needs to be able to be used in different programs or workflows
Reusable: Well defined data is essential as it makes it easier to understand and therefore use, combine and/or extend the dataset
The FAIR principles also emphasise machine-actionability
Machine readable and digitally accessible are not the same thing!
For example, traditional word processing documents and PDF files are easily read by humans but typically are difficult for machines to interpret.
A machine readable format is a file in a standard computer language (not English text) that can be read automatically by a web browser or computer system. (e.g.; xml, json).
Simulations can be simplified into a set of components that need to be recorded
Each component may have several subcomponents such as material description and the interaction models
Each DEM code has many (different) supported interaction models
Some interaction models may exist in multiple DEM codes but may have very different implementations or be slightly different
Physical properties
Mechanical properties
Fabric properties
How do you track implementations of different models and why does it matter?
Any movie fans? You may be familiar with IMDB…
James Bond is an iconic long running series of films
Initial “alpha” version available at: on-dem-db.onrender.com
Tracking various aspects such as:
Need to make sure we are tracking different versions of contact models in codes
What do we do for tracking model parameters?
Minimum requirements
Interaction with WG3 - how to link the two?
{
"$id": "equipment/interaction_model.v1-0-0.json",
"$schema": "https://raw.githubusercontent.com/Vidminas/python-jsonschema-minmax/main/metaschema/minmax-metaschema.json",
"type": "object",
"properties": {
"test_category": { "const": "Linear Shear" },
"test_subcategory": { "const": "Jenike" },
"class": { "label": "Test Regime", "const": "Quasi-static" },
"rating": { "label": "Repeatability Rating", "const": 4 },
"geometric_properties": { "$ref": "#/$defs/geometric_properties" },
"measurement_parameters": { "$ref": "#/$defs/measurement_parameters" }
},
"required": ["geometric_properties", "measurement_parameters"],
}
Front-End
Back-End
A well-executed Research Data Management Plan enables data to be Accessible, Interoperable and Reusable.
Making datasets (for specific materials or test equipment) Findable is a challenge:
We can consider a dataset as two separate components:
GrainDB is an Experimental Measurements Database that collates and stores the relevant metadata for the material and test equipment that is missing from repositories, alongside a link to the full dataset which has been stored in an appropriate manner.
This becomes a searchable database of all available data, enabling the end-user to find detailed experimental datasets that match their needs
Enhancing dissemination
Provide high-quality machine-readable datasets suitable for use in AI/ML
Interface guides people on what information should be stored in accordance with best experimental practice
Variety of test types in use to measure material properties
Experiment or Measurement?
In a measurement, one performs parameter inference
In an experiment, one performs hypothesis tests or model selection
Databases provide efficient and safe multi-user access to data
Accessible via a website: www.graindb.org
Need to be prescriptive
Define carefully what information is required for all aspects
Currently no Ontology or common descriptive language for defining granular materials and their measurement
Web interface to Relational Database
Database backend utilises PostgreSQL
Why?
All schemas are versioned so we can:
Schemas allow greater use of templating for dynamic page generation
Available at: git.ecdf.ed.ac.uk/jmorrise/tusail-experimental-database-schemas
{
"$id": "equipment/linear_shear_jenike.v1-0-0.json",
"$schema": "https://raw.githubusercontent.com/Vidminas/python-jsonschema-minmax/main/metaschema/minmax-metaschema.json",
"type": "object",
"properties": {
"test_category": { "const": "Linear Shear" },
"test_subcategory": { "const": "Jenike" },
"class": { "label": "Test Regime", "const": "Quasi-static" },
"rating": { "label": "Repeatability Rating", "const": 4 },
"geometric_properties": { "$ref": "#/$defs/geometric_properties" },
"measurement_parameters": { "$ref": "#/$defs/measurement_parameters" }
},
"required": ["geometric_properties", "measurement_parameters"],
"$defs": {
"geometric_properties": { "label": "Geometric Properties",
"type": "object",
"properties": {
"cell_diameter": { "label": "Cell Diameter", "type": "number", "minimum": 0, "units": ["mm"] },
"base_height": { "label": "Base Height", "type": "number", "minimum": 0, "units": [ "mm" ] },
"ring_height": { "label": "Ring Height", "type": "number", "minimum": 0, "units": [ "mm" ] },
"wall_thickness": { "label": "Wall Thickness", "type": "number", "minimum": 0, "units": [ "mm" ] },
"max_translation_speed": { "label": "Max. Translation Speed", "type": "number", "minimum": 0, "units": [ "mm/s" ] },
"cell_material": { "label": "Cell Material", "type": "string" },
"stress_application": { "label": "Vertical Stress Application Method", "enum": [ "Constant Mass", "Servo-Controlled" ] }
},
"additionalProperties": false
},
"measurement_parameters": { "label": "Measurement Properties",
"type": "object",
"properties": {
"force_accuracy": { "label": "Force Accuracy", "type": "number", "minimum": 0, "units": [ "Pa" ] },
"force_resolution": { "label": "Force Resolution", "type": "number", "minimum": 0, "units": [ "Pa" ] },
"displacement_accuracy": { "label": "Displacement Accuracy", "type": "number", "minimum": 0, "units": [ "mm" ] },
"displacement_resolution": { "label": "Displacement Resolution", "type": "number", "minimum": 0, "units": [ "mm" ] }
},
"additionalProperties": false
}
}
}
RESTful API implemented using Swagger and OpenAPI Specification
We haven’t always done a very good job of preserving our datasets for future
We need to be able to find all datasets easily
A well defined set of metadata schemas will make recording experimental measurements for lots of different pieces of equipment more straightforward
Having detailed stored records with unique ids (urls) may make dissemination easier as well
could easily include in a publication much like a doi