Developing a standardised file format for large scale DEM visualisation and Analytics

A Challenge in Open Data

John P. Morrissey

School of Engineering, University of Edinburgh

Thursday, 16th of May, 2024

Introduction

A little about me:

  • Research Fellow at The University of Edinburgh
  • Member of the Granular & Geomechanical Processes Research Group
  • Focus on Granular Materials and Particulate Mechanics
  • Spend my time doing Experimental Characterisation, DEM simulations & Data Analysis


Granular Mechanics & Industrial Infrastructure Research Group

Reminder - Research Data

Reproducibility & Replicability Crisis in Science

For the past two decades there has been significant discussion on the reproducibility crisis in science

  • A pattern of scientists being unable to obtain the same results as previous researchers
  • Distinction between Reproducibility & Replicability

Reproducibility: Can you get the same answers that I did when you analyze my data?

Replicability: Can you get the same answers that I did when you do my experiment and collect your own data?

[Baker, 2015, Nature]

What is FAIR Data?

FAIR is an acronym for Findable, Accessible, Interoperable, Reusable, which are the principles which should apply to scientific data management and guardianship.

  • Findable: The first part of making data re-useable is to make the data findable. Detailed and accurate metadata is key

  • Accessible: Data could be openly available or it could require prior authentication and authorisation

  • Interoperable: Data needs to be able to be used in different programs or workflows

  • Reusable: Well defined data is essential as it makes it easier to understand and therefore use, combine and/or extend the dataset

“FAIR Principles”. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

Dataset Interoperability

For visualisation purposes and sharing basic simulation data, .vtk /.vtu /.vtp files are very well supported

  • Many DEM solvers
  • Many visualisation tools like Paraview
# vtk DataFile Version 3.1
Lattice Boltzmann data
ASCII
DATASET UNSTRUCTURED_GRID
POINTS 9 INT 
0 0 0 1 0 0 2 0 0 
0 1 0 1 1 0 2 1 0 
0 2 0 1 2 0 2 2 0 

CELLS 4 20
4 0 1 3 4 
4 1 2 4 5 
4 3 4 6 7 
4 4 5 7 8 

CELL_TYPES 4
8 8 8 8 

CELL_DATA 4
SCALARS Scal_1 DOUBLE
LOOKUP_TABLE default
1 2 1 0 

SCALARS Scal_2 DOUBLE
LOOKUP_TABLE default
1 3 2 1

Dataset Interoperability

For visualisation purposes and sharing basic simulation data, .vtk /.vtu /.vtp files are very well supported

  • Many DEM solvers
  • Many visualisation tools like Paraview

For Data Analytics?

  • Initial suggestions from VELaSSCo
    • Similar to LAMMPS dump files and MercuryCG files
  • Provided minimum requirements for stress/fabric calculations
<VTKFile type=”PolyData” ...>
  <PolyData>
    <Piece NumberOfPoints=”#” NumberOfVerts=”#” NumberOfLines=”#”
    NumberOfStrips=”#” NumberOfPolys=”#”>
    <PointData>...</PointData>
    <CellData>...</CellData>
    <Points>...</Points>
    <Verts>...</Verts>
    <Lines>...</Lines>
    <Strips>...</Strips>
    <Polys>...</Polys>
    </Piece>
  </PolyData>
</VTKFile>

VTK XML Format

<?xml version="1.0"?>
<VTKFile type="PolyData" version="1.0" byte_order="LittleEndian" header_type="UInt64">
  <PolyData>
    <Piece NumberOfPoints="20" NumberOfVerts="20" NumberOfLines="0" NumberOfStrips="0" NumberOfPolys="0">
      <PointData Scalars="1_temp" Vectors="3_force">
        <DataArray Name="1_temp" NumberOfComponents="1" type="Float64" format="ascii">
          0.8096282113891782 0.3047311952960293 0.6500663143518912 0.8456637563467384 0.3906199913247039 0.0493153326814327
          0.0953282361963043 0.4089217432664466 0.9734786110287174 0.7813877948631190 0.0869568546620415 0.7417237568666443
          0.5630292772867312 0.8719294300090183 0.8044927228429963 0.3760636376786568 0.1306022635272779 0.2908653129507012
          0.3505188438702752 0.2901130193235927
        </DataArray>
        <DataArray Name="2_pressure" NumberOfComponents="1" type="Float64" format="ascii">
          0.5017704290104593 0.3785371246509415 0.0205158449338875 0.9907952905474608 0.1545654933391684 0.8782385634278128
          0.6019489853133294 0.7906981230999223 0.2521929126363432 0.9716150255608206 0.0672474944937144 0.4673686197663716
          0.0337000095394104 0.5347613958477050 0.1164155276297819 0.9550270647902264 0.5796338522273943 0.3412746271057875
          0.8523204100536866 0.1972657806074329
        </DataArray>
        <DataArray Name="3_force" NumberOfComponents="3" type="Float64" format="ascii">
          0.0309346793427824 0.4092653096970628 0.9788367317195323 0.4223113198793231 0.0900433594418957 0.1787326345033293
          0.2966289948749271 0.6954539758304935 0.2832782062952210 0.3251789785540518 0.7688481562679232 0.4766407185048503
          0.9288534597190189 0.3341192036963977 0.9436476306446026 0.8789762057451738 0.0606108299327002 0.9254951873329387
          0.7412746153715273 0.8109872417934331 0.4360667170346966 0.7233454095760591 0.6976210279118584 0.2955977110451682
          0.0994604820813603 0.1875580422474142 0.8431228106210912 0.6213582623034001 0.6158399792301240 0.9406265592401716
          0.8845641277007388 0.9350856253927768 0.2970236729050861 0.8712474915967705 0.6910999373713885 0.3187032732785566
          0.0127516549008907 0.8755324934933010 0.5639212047451645 0.1967190822555136 0.9738387864540907 0.6568564085127088
          0.0096719914074663 0.4238716923324118 0.3943384132270378 0.5395962035976353 0.3564427828885958 0.2408352500616302
          0.7594055518045913 0.7632195886304822 0.4137798466081193 0.4414728807283862 0.5239588201936661 0.5330972627254891
          0.9141526095541878 0.6308069665444429 0.5170567967508454 0.1320175579472910 0.2107269322401846 0.5006082952330471
        </DataArray>
      </PointData>
      <CellData>
      </CellData>
      <Points>
        <DataArray Name="Points" NumberOfComponents="3" type="Float64" format="ascii">
          0.4679781489940003 0.0539201795235295 0.8785521137838496 0.6662011241227854 0.6798039006229138 0.5179184112127887
          0.7253673934646818 0.5500273845649549 0.9614782388606561 0.4565839967938599 0.5347282488901165 0.3175208696688832
          0.1659279070777949 0.0767387959844464 0.7871958705242772 0.7152392271177217 0.6711429183806140 0.1804037800561749
          0.0094476341085491 0.2176115805917300 0.6161777555253786 0.6766546248504622 0.4753594409825290 0.0787254199622341
          0.2116737146235712 0.9337516804459353 0.9051117007473103 0.5781205664492428 0.1021111481025472 0.0771094947349589
          0.3606986784532169 0.2044204606938260 0.9588362141721601 0.5373476404904579 0.0037939495868201 0.5440923148342509
          0.0312391309332078 0.3297690748477242 0.8634945621439319 0.0935309527775745 0.9241324203495521 0.2405794370719853
          0.5654258596612191 0.7982712785837751 0.4059594735292160 0.4933549421178993 0.0290714379783759 0.2925235663071808
          0.4346748309322130 0.2692194427848057 0.7309894801048282 0.3220642954109557 0.9695141104130361 0.9700483721187488
          0.0062263950599849 0.3090008720062422 0.8318148406832973 0.0737151178494130 0.8033901608302160 0.4849158872100066
        </DataArray>
      </Points>
      <Verts>
        <DataArray Name="connectivity" NumberOfComponents="1" type="Int32" format="ascii">
          0 1 2 3 4 5
          6 7 8 9 10 11
          12 13 14 15 16 17
          18 19
        </DataArray>
        <DataArray Name="offsets" NumberOfComponents="1" type="Int32" format="ascii">
          1 2 3 4 5 6
          7 8 9 10 11 12
          13 14 15 16 17 18
          19 20
        </DataArray>
      </Verts>
      <Lines>
        <DataArray Name="connectivity" NumberOfComponents="1" type="Float64" format="ascii">
        </DataArray>
        <DataArray Name="offsets" NumberOfComponents="1" type="Float64" format="ascii">
        </DataArray>
      </Lines>
      <Strips>
        <DataArray Name="connectivity" NumberOfComponents="1" type="Float64" format="ascii">
        </DataArray>
        <DataArray Name="offsets" NumberOfComponents="1" type="Float64" format="ascii">
        </DataArray>
      </Strips>
      <Polys>
        <DataArray Name="connectivity" NumberOfComponents="1" type="Float64" format="ascii">
        </DataArray>
        <DataArray Name="offsets" NumberOfComponents="1" type="Float64" format="ascii">
        </DataArray>
      </Polys>
    </Piece>
  </PolyData>
</VTKFile>
<?xml version="1.0"?>
<VTKFile type="PolyData" version="1.0" byte_order="LittleEndian" header_type="UInt64">
  <PolyData>
    <Piece NumberOfPoints="20" NumberOfVerts="20" NumberOfLines="0" NumberOfStrips="0" NumberOfPolys="0">
      <PointData Scalars="1_temp" Vectors="3_force">
        <DataArray Name="1_temp" NumberOfComponents="1" type="Float64" format="binary">
          oAAAAAAAAABgvDpseejpPw6Ed0W3gNM/kuwL31fN5D8Cfx1wrQ/rP2zR+f3q/9g/EMEd+ts/qT+oNtlobme4P/r4fBrGK9o/vym3nbwm7z/pxw36IAHpPwi+7+7NQrY/XxLNdTO85z+FYpT5VQTiP4DgSYzY5us/lGHPhWe+6T8g20I4bRHYP8ANUTGTt8A/EoCqi4md0j/yfcOW5m7WP5bfiDI2kdI/
        </DataArray>
        <DataArray Name="2_pressure" NumberOfComponents="1" type="Float64" format="binary">
          oAAAAAAAAADdZNbbgA7gP5ymrMbzOdg/oNUpDBsClT9G1D1TmLTvP7ysfVXNyMM/c0WAwoca7D8N/7iEKkPjP8EgdyZmTek/zpYDvu0j0D/tv+JkeBfvPziklr0hN7E/uK9EEl7p3T/gfLGtIEGhP9WISu7DHOE/mJ/qc2jNvT9kNELrlI/uP7QT30pcjOI/1u6XiHHX1T/HmtxzNUbrP0gXKk4BQMk/
        </DataArray>
        <DataArray Name="3_force" NumberOfComponents="3" type="Float64" format="binary">
          4AEAAAAAAAAgpVswV62fPwpRIiBnMdo/QHzbaKFS7z+4L+cOJgfbP9BgBuQUDbc/aLn1Abbgxj9aHwIu+PvSP6kVQrIoQeY/8Eft6Toh0j8oII99u8/UPxBX2HJnmug/xuV6EkiB3j/joAjkKrntP57cNYM1YtU/ZyAShFwy7j/yZzS1kiDsP6BQsPhhCK8/ZGlGFaid7T+kBcyKhbjnP7c7H4Sb8+k/XHAiYITo2z+l6LpFpSXnP0hEfFXpUuY/fIhtqRLr0j+Aqcj9PXa5P5T3xuTmAcg/TyJEsNz66j/nKPa4KuLjP0GbSwv2tOM/+e+13pwZ7j/mD/ZtWU7sP6KxgLA47O0/KPhQlG8C0z9OKmRrQuHrP23jqJ19HeY/5AL3aaJl1D9AVCcmih2KPy9zRLhcBOw/SNh8e6QL4j8ormREFy7JP1axbPWvKe8/+1wXu/cE5T8AEkK26M6DP/4kEby2INs/YHcXL9c82T9Z7O9BX0ThPzq22WP1z9Y/MI1egbDTzj+H2SzfDE3oPy60mnxLbOg/zD87d1572j8uKDN4F0HcP1/XpUlFxOA/cMef/SEP4T9N0zL5vEDtPxS3bBeSL+Q/UwUHsrqL4D+00fCK8+XAP/jSLaEZ+co/ZDsEsPsE4D8=
        </DataArray>
      </PointData>
      <CellData>
      </CellData>
      <Points>
        <DataArray Name="Points" NumberOfComponents="3" type="Float64" format="binary">
          4AEAAAAAAACI/0qfWvPdP7DHS/9sm6s/1RCRUhkd7D88ThUFhVHlP/nLGxz0wOU/dffEocmS4D+SfxCuNTbnP3eMkwfTmeE/Nxz3Am7E7j980oYVrDjdP21fp2p+HOE/CJ/BDUNS1D8cSTIrID3FP9htFlsnpbM/MFbuZLUw6T+KLilgPePmPyFgrLYAeuU/EEG1l3gXxz9AlSr8R1mDPyCp7z6y2ss/RyKQabq34z+xb42ZJ6flP/RYNwFKbN4/eAYaYFkntD/gk93QHxjLP/E1QzRL4e0/7fg90Kz27D98bMGz9n/iP9BOqMn0I7o/AAUZpnK9sz/or+rorxXXPyAYqBxzKso/S+DCSMmu7j8zqM+t8zHhPwDzvCx9FG8/NvpGSTRp4T9At9qWJv2fP3AA7b/vGtU/K9MVWb+h6z9A6Vf/pPG3P4tjUid+ku0/5Dkol07Lzj8J2PH49xfiPzZPWzVwi+k/kMyTcT372T+w2G2bIJPfP4CSdS3nxJ0/8E+mw7S40j9A4s9httHbPz5RjS/kOtE/IqbYDERk5z8m9f+Ps5zUPzeBp3RCBu8/0zc54qIK7z+ASXky2YB5P1ry7JerxtM/pP0hKDqe6j/ITmJ0/t6yP+c0Vkhftek/7C04pdwI3z8=
        </DataArray>
      </Points>
      <Verts>
        <DataArray Name="connectivity" NumberOfComponents="1" type="Int32" format="binary">
          UAAAAAAAAAAAAAAAAQAAAAIAAAADAAAABAAAAAUAAAAGAAAABwAAAAgAAAAJAAAACgAAAAsAAAAMAAAADQAAAA4AAAAPAAAAEAAAABEAAAASAAAAEwAAAA==
        </DataArray>
        <DataArray Name="offsets" NumberOfComponents="1" type="Int32" format="binary">
          UAAAAAAAAAABAAAAAgAAAAMAAAAEAAAABQAAAAYAAAAHAAAACAAAAAkAAAAKAAAACwAAAAwAAAANAAAADgAAAA8AAAAQAAAAEQAAABIAAAATAAAAFAAAAA==
        </DataArray>
      </Verts>
      <Lines>
        <DataArray Name="connectivity" NumberOfComponents="1" type="Float64" format="binary">
        </DataArray>
        <DataArray Name="offsets" NumberOfComponents="1" type="Float64" format="binary">
        </DataArray>
      </Lines>
      <Strips>
        <DataArray Name="connectivity" NumberOfComponents="1" type="Float64" format="binary">
        </DataArray>
        <DataArray Name="offsets" NumberOfComponents="1" type="Float64" format="binary">
        </DataArray>
      </Strips>
      <Polys>
        <DataArray Name="connectivity" NumberOfComponents="1" type="Float64" format="binary">
        </DataArray>
        <DataArray Name="offsets" NumberOfComponents="1" type="Float64" format="binary">
        </DataArray>
      </Polys>
    </Piece>
  </PolyData>
</VTKFile>

VELaSSCo

What is VELaSSCo …

VELaSSCo was an EU funded project (2014–2016) dealing with end-user visualization of “Big Data” serving the petabyte era

  • Aimed to provide new visualization methods for large-scale simulations across disciplines (FEM, CFD, DEM, etc.)

  • Developed the VELaSSCo platform for accessing, visualizing, and querying distributed simulation information stored across multiple servers

VELaSSCo Wishlist

Data handling

  • Fast Handling of “big” data (particles & timesteps)
  • Reduce storage requirements by optimising data format
  • Standard format to share simulation data
  • Efficient communication between simulation solver and post-processing

Visualisation

  • (Near) Real time visualisation of results
  • Visualisation and tracking of complex particle shapes
  • Visualisation of large data sets over many time steps
  • Visual comparison of results from different datasets

And WHY is it still Relevant?

Solver agnostic analytics and visualisation platform

  • Requires all data to be provided in a common format

  • Sent (streamed) to visualisation client (GiD, iFX) when required

  • Adopts ISO 10303-209, Multidisciplinary analysis and design (AP209) Standards

    • ISO has standardized an object oriented schema for storing and exchange of FEM and CFD data
    • VELaSSCo extends this for DEM Data

VELaSSCo Proposed File Format

A machine-readable ascii format that stored particle and contact data in separate data files

  • providing all the necessary information required for advanced analytics such as Discrete2Continuum transformations
  • The format was developed to support analytics for both spherical and non-spherical particles

Particle Data (.p4p)

TIMESTEP PARTICLES
0.02 11
ID GROUP TYPE VOLUME MASS PX PY PZ VX VY VZ AngVel_X AngVel_Y AngVel_Z 
1 1 1 4.18879e-6 0.010472 0.015492 0.016146 0.0008229 0 0 0.19618 0 0 0 
5 2 1 4.18879e-6 0.010472 0.016643 0.019136 0.0092912 0 0 0.19618 0 0 0 
.......

Note: Angular velocity is shown as optional value here

Particle-Particle Contact Data (.p4c)

TIMESTEP CONTACTS
0.02 6
P1 P2 CX CY CZ FX FY FZ
11 1 0.004 -0.0055 0.0005 0.727312 -0.098406 2.70531
10 7 0.009 -0.0055 0.0005 -0.00396415 0.235619 0.199911
.......

Note: Total Forces

Particle-Geometry Contact Data (.p4w)

TIMESTEP CONTACTS
0.02 4
P1 WALL CX CY CZ FX FY FZ
10 1 -0.198716 -0.0265078 0.087761 -0 -0 0.00993776
11 1 -0.0178762 0.245043 3.74038 -0 -0 0.00993035
.......

Note: Total Forces

Limitations of file format

  • Duplication of data
    • Mass, Volume, Type for particles of the same type
  • Missing information
    • Shape, Orientation, etc.
    • Normal and tangential forces need to be calculated during analysis
  • Multiple files for each timestep
    • This can be an advantage at times but makes
  • Ascii format
    • Not ideal for large datasets
    • Parsing can be difficult/slow (string conversion, delimiter problems, line endings, etc)

What needs to be included?

Data & Metadata

HDF5

Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data

  • Uses a “file directory” like structure that allows you to organize data within the file
  • Different data types can be supported in single file
  • Self-describing format as meta-data can be attached to data to describe the data
    • facilitates automation without the need for separate (and additional) metadata documents
  • Supports partial reading where only required data is loaded from file
  • A high-level API with C, C++, Fortran 90, and Java interfaces

www.neonscience.org/resources/learning-hub/tutorials/about-hdf5

Example Simulation Data

Thank You!

Any Questions?

Email: J.Morrissey@ed.ac.uk

TUSAIL Zenodo Community

TUSAIL Community on Zenodo

  • Outputs from the TUSAIL project
  • Will accept deposits from the wider community who wish to share particle/granular data in a common forum