General HDF5 file reader

Fleur uses the HDF5 library for output files containing large datasets. The masci-tools library provides the reader.HDF5Reader class to extract and transform information from these files. The h5py library is used to get information from .hdf files.

Basic Usage

The specifications of what to extract and how to transform the data are given in the form of a python dictionary. Let us look at a usage example; extracting data for a bandstructure calculation from the banddos.hdf file produced by Fleur.

from masci_tools.io.parsers.hdf5 import HDF5Reader
from masci_tools.io.parsers.hdf5.recipes import FleurBands

#The HDF5Reader is used with a contextmanager to safely handle
#opening/closing the h5py.File object that is produced to extract information
with HDF5Reader('/path/to/banddos.hdf') as h5reader:
   datasets, attributes = h5reader.read(recipe=FleurBands)

The method reader.HDF5Reader.read() produces two python dictionaries. In the case of the FleurBands recipe these contain the following information.

  • datasets
    • Eigenvalues converted to eV shited to E_F=0 (if available in the banddos.hdf) and split up into spin-up/down and flattened to one dimension

    • The kpath projected to 1D and reshaped to same length as weights/eigenvalues

    • The weights (flattened) of the interstitial region, each atom, each orbital on each atom for all eigenvalues

  • attributes
    • The coordinates of the used kpoints

    • Positions, atomic symbols and indices of symmetry equivalent atoms

    • Dimensions of eigenvalues (nkpts and nbands)

    • Bravais matrix/Reciprocal cell of the system

    • Indices and labels of special k-points

    • Fermi energy

    • Number of spins in the calculation

The following pre-defined recipes are stored in masci_tools.io.parsers.hdf5.recipes:

  • Recipe for banddos.hdf for bandstructure calculations

  • Recipe for banddos.hdf for standard density of states calculations

  • Different DOS modes are also supported (jDOS, orbcomp, mcd)

If no recipe is provided to the reader.HDF5Reader, it will create the datasets and attributes as two nested dictionaries, exactly mirroring the structure of the .hdf file and converting datasets into numpy arrays.

For big datasets it might be useful to keep the dataset as a reference to the file and not load the dataset into memory. To achieve this you can pass move_to_memory=False, when initializing the reader. Notice that most of the transformations will still implicitly create numpy arrays and after the hdf file is closed the datasets will no longer be available.

Structure of recipes for the reader.HDF5Reader

The recipe for extracting bandstructure information form the banddos.hdf looks like this:

  1def bands_recipe_format(group: Literal['Local', 'jDOS', 'Orbcomp', 'MCD'], simple: bool = False) -> HDF5Recipe:
  2    """
  3    Format for bandstructure calculations retrieving weights from the given group
  4
  5    :param group: str of the group the weights should be taken from
  6    :param simple: bool, if True no additional weights are retrieved with the produced recipe
  7
  8    :returns: dict of the recipe to retrieve a bandstructure calculation
  9    """
 10
 11    if group == 'Local':
 12        atom_prefix = 'MT:'
 13        weight_atom_terminator = '[spdf]'
 14    elif group == 'jDOS':
 15        atom_prefix = 'jDOS:'
 16        weight_atom_terminator = '[spdf]'
 17    elif group == 'Orbcomp':
 18        atom_prefix = 'ORB:'
 19        weight_atom_terminator = ','
 20    elif group == 'MCD':
 21        atom_prefix = 'At'
 22        weight_atom_terminator = 'NC'
 23    else:
 24        raise ValueError(f'Unknown group: {group}')
 25
 26    recipe = HDF5Recipe({
 27        'datasets': {
 28            'eigenvalues': {
 29                'h5path':
 30                f'/{group}/BS/eigenvalues',
 31                'transforms': [
 32                    AttribTransformation(name='shift_by_attribute',
 33                                         attrib_name='fermi_energy',
 34                                         kwargs={
 35                                             'negative': True,
 36                                         }),
 37                    Transformation(name='multiply_scalar', args=(HTR_TO_EV,)),
 38                    Transformation(name='split_array', kwargs={
 39                        'suffixes': ['up', 'down'],
 40                        'name': 'eigenvalues'
 41                    }),
 42                    Transformation(name='flatten_array')
 43                ],
 44                'unpack_dict':
 45                True
 46            },
 47            'kpath': {
 48                'h5path':
 49                '/kpts/coordinates',
 50                'transforms': [
 51                    AttribTransformation(name='multiply_by_attribute',
 52                                         attrib_name='reciprocal_cell',
 53                                         kwargs={'transpose': True}),
 54                    Transformation(name='calculate_norm', kwargs={'between_neighbours': True}),
 55                    Transformation(name='cumulative_sum'),
 56                    AttribTransformation(name='repeat_array_by_attribute', attrib_name='nbands'),
 57                ]
 58            },
 59        },
 60        'attributes': {
 61            'group_name': {
 62                'h5path': f'/{group}',
 63                'transforms': [
 64                    Transformation(name='get_name'),
 65                ],
 66            },
 67            'kpoints': {
 68                'h5path': '/kpts/coordinates',
 69            },
 70            'nkpts': {
 71                'h5path': '/Local/BS/eigenvalues',
 72                'transforms': [Transformation(name='get_shape'),
 73                               Transformation(name='index_dataset', args=(1,))]
 74            },
 75            'nbands': {
 76                'h5path': '/Local/BS/eigenvalues',
 77                'transforms': [Transformation(name='get_shape'),
 78                               Transformation(name='index_dataset', args=(2,))]
 79            },
 80            'atoms_elements': {
 81                'h5path': '/atoms/atomicNumbers',
 82                'description': 'Atomic numbers',
 83                'transforms': [Transformation(name='periodic_elements')]
 84            },
 85            'n_types': {
 86                'h5path':
 87                '/atoms',
 88                'description':
 89                'Number of atom types',
 90                'transforms':
 91                [Transformation(name='get_attribute', args=('nTypes',)),
 92                 Transformation(name='get_first_element')]
 93            },
 94            'atoms_position': {
 95                'h5path': '/atoms/positions',
 96                'description': 'Atom coordinates per atom',
 97            },
 98            'atoms_groups': {
 99                'h5path': '/atoms/equivAtomsGroup'
100            },
101            'reciprocal_cell': {
102                'h5path': '/cell/reciprocalCell'
103            },
104            'bravais_matrix': {
105                'h5path': '/cell/bravaisMatrix',
106                'description': 'Coordinate transformation internal to physical for atoms',
107                'transforms': [Transformation(name='multiply_scalar', args=(BOHR_A,))]
108            },
109            'special_kpoint_indices': {
110                'h5path': '/kpts/specialPointIndices',
111                'transforms': [Transformation(name='shift_dataset', args=(-1,))]
112            },
113            'special_kpoint_labels': {
114                'h5path': '/kpts/specialPointLabels',
115                'transforms': [Transformation(name='convert_to_str')]
116            },
117            'fermi_energy': {
118                'h5path':
119                '/general',
120                'description':
121                'fermi_energy of the system',
122                'transforms': [
123                    Transformation(name='get_attribute', args=('lastFermiEnergy',)),
124                    Transformation(name='get_first_element')
125                ]
126            },
127            'spins': {
128                'h5path':
129                '/general',
130                'description':
131                'number of distinct spin directions in the system',
132                'transforms':
133                [Transformation(name='get_attribute', args=('spins',)),
134                 Transformation(name='get_first_element')]
135            }
136        }
137    })
138
139    if simple:
140        return recipe
141
142    recipe['datasets']['weights'] = {
143        'h5path':
144        f'/{group}/BS',
145        'transforms': [
146            Transformation(name='get_all_child_datasets', kwargs={'ignore': ['eigenvalues', 'kpts']}),
147            AttribTransformation(name='add_partial_sums',
148                                 attrib_name='atoms_groups',
149                                 args=(f'{atom_prefix}{{}}{weight_atom_terminator}'.format,),
150                                 kwargs={
151                                     'make_set': True,
152                                     'replace_format': f'{atom_prefix}{{}}'.format
153                                 }),
154            Transformation(name='split_array', kwargs={'suffixes': ['up', 'down']}),
155            Transformation(name='flatten_array')
156        ],
157        'unpack_dict':
158        True
159    }
160
161    return recipe

Each recipe can define the datasets and attributes entry (if one is not defined, a empty dict is returned in its place). Each entry in these sections has the same structure.

#Example entry from the FleurBands recipe

'fermi_energy': {
         'h5path':
         '/general',
         'description':
         'fermi_energy of the system',
         'transforms': [
             Transformation(name='get_attribute', args=('lastFermiEnergy',), kwargs={}),
             Transformation(name='get_first_element', args=(), kwargs={})
         ]
     }

All entries must define the key h5path. This gives the initial dataset for this key, which will be extracted from the given .hdf file. The key of the entry corresponds to the key under which the result will be saved to the output dictionary.

If the dataset should be transformed in some way after reading it, there are a number of defined transformations in masci_tools.io.parsers.hdf5.transforms. These are added to an entry by adding a list of namedtuples (reader.Transformation for general transformations; reader.AttribTransformation for attribute transformations) under the key transforms. General Transformations can be used in all entries, while transformations using an attribute value can only be used in the datasets entries. Each namedtuple takes the name of the transformation function and the positional (args), and keyword arguments (kwargs) for the transformation. Attribute transformations also take the name of the attribute, whose value should be passed to the transformation in attrib_name.

At the moment the following transformation functions are pre-defined:

General Transformations:

Transformations using an attribute:

Custom transformation functions can also be defined using the hdf5_transformation() decorator. For some transformation, e.g. get_all_child_datasets(), the result will be a subdictionary in the datasets or attributes dictionary. If this is not desired the entry can include 'unpack_dict': True. With this all keys from the resulting dict will be extracted after all transformations and put into the root dictionary.