General HDF5
file reader¶
Fleur uses the HDF5 library for output files containing large datasets.
The masci-tools library provides the reader.HDF5Reader
class to extract and transform
information from these files. The h5py
library is used to get information from .hdf
files.
Basic Usage¶
The specifications of what to extract and how to transform the data are given in the form
of a python dictionary. Let us look at a usage example; extracting data for a bandstructure
calculation from the banddos.hdf
file produced by Fleur.
from masci_tools.io.parsers.hdf5 import HDF5Reader
from masci_tools.io.parsers.hdf5.recipes import FleurBands
#The HDF5Reader is used with a contextmanager to safely handle
#opening/closing the h5py.File object that is produced to extract information
with HDF5Reader('/path/to/banddos.hdf') as h5reader:
datasets, attributes = h5reader.read(recipe=FleurBands)
The method reader.HDF5Reader.read()
produces two python dictionaries.
In the case of the FleurBands
recipe these contain the following information.
datasets
Eigenvalues converted to eV shited to
E_F=0
(if available in thebanddos.hdf
) and split up into spin-up/down and flattened to one dimensionThe kpath projected to 1D and reshaped to same length as weights/eigenvalues
The weights (flattened) of the interstitial region, each atom, each orbital on each atom for all eigenvalues
attributes
The coordinates of the used kpoints
Positions, atomic symbols and indices of symmetry equivalent atoms
Dimensions of eigenvalues (
nkpts
andnbands
)Bravais matrix/Reciprocal cell of the system
Indices and labels of special k-points
Fermi energy
Number of spins in the calculation
The following pre-defined recipes are stored in masci_tools.io.parsers.hdf5.recipes
:
Recipe for
banddos.hdf
for bandstructure calculationsRecipe for
banddos.hdf
for standard density of states calculationsDifferent DOS modes are also supported (
jDOS
,orbcomp
,mcd
)
If no recipe is provided to the reader.HDF5Reader
, it will create the datasets
and attributes
as two nested dictionaries, exactly mirroring the structure of the .hdf
file and converting datasets into numpy arrays.
For big datasets it might be useful to keep the dataset as a reference to the file and not
load the dataset into memory. To achieve this you can pass move_to_memory=False
, when
initializing the reader. Notice that most of the transformations will still implicitly
create numpy arrays and after the hdf file is closed the datasets will no longer be
available.
Structure of recipes for the reader.HDF5Reader
¶
The recipe for extracting bandstructure information form the banddos.hdf
looks like this:
1def bands_recipe_format(group: Literal['Local', 'jDOS', 'Orbcomp', 'MCD'], simple: bool = False) -> HDF5Recipe:
2 """
3 Format for bandstructure calculations retrieving weights from the given group
4
5 :param group: str of the group the weights should be taken from
6 :param simple: bool, if True no additional weights are retrieved with the produced recipe
7
8 :returns: dict of the recipe to retrieve a bandstructure calculation
9 """
10
11 if group == 'Local':
12 atom_prefix = 'MT:'
13 weight_atom_terminator = '[spdf]'
14 elif group == 'jDOS':
15 atom_prefix = 'jDOS:'
16 weight_atom_terminator = '[spdf]'
17 elif group == 'Orbcomp':
18 atom_prefix = 'ORB:'
19 weight_atom_terminator = ','
20 elif group == 'MCD':
21 atom_prefix = 'At'
22 weight_atom_terminator = 'NC'
23 else:
24 raise ValueError(f'Unknown group: {group}')
25
26 recipe = HDF5Recipe({
27 'datasets': {
28 'eigenvalues': {
29 'h5path':
30 f'/{group}/BS/eigenvalues',
31 'transforms': [
32 AttribTransformation(name='shift_by_attribute',
33 attrib_name='fermi_energy',
34 kwargs={
35 'negative': True,
36 }),
37 Transformation(name='multiply_scalar', args=(HTR_TO_EV,)),
38 Transformation(name='split_array', kwargs={
39 'suffixes': ['up', 'down'],
40 'name': 'eigenvalues'
41 }),
42 Transformation(name='flatten_array')
43 ],
44 'unpack_dict':
45 True
46 },
47 'kpath': {
48 'h5path':
49 '/kpts/coordinates',
50 'transforms': [
51 AttribTransformation(name='multiply_by_attribute',
52 attrib_name='reciprocal_cell',
53 kwargs={'transpose': True}),
54 Transformation(name='calculate_norm', kwargs={'between_neighbours': True}),
55 Transformation(name='cumulative_sum'),
56 AttribTransformation(name='repeat_array_by_attribute', attrib_name='nbands'),
57 ]
58 },
59 },
60 'attributes': {
61 'group_name': {
62 'h5path': f'/{group}',
63 'transforms': [
64 Transformation(name='get_name'),
65 ],
66 },
67 'kpoints': {
68 'h5path': '/kpts/coordinates',
69 },
70 'nkpts': {
71 'h5path': '/Local/BS/eigenvalues',
72 'transforms': [Transformation(name='get_shape'),
73 Transformation(name='index_dataset', args=(1,))]
74 },
75 'nbands': {
76 'h5path': '/Local/BS/eigenvalues',
77 'transforms': [Transformation(name='get_shape'),
78 Transformation(name='index_dataset', args=(2,))]
79 },
80 'atoms_elements': {
81 'h5path': '/atoms/atomicNumbers',
82 'description': 'Atomic numbers',
83 'transforms': [Transformation(name='periodic_elements')]
84 },
85 'n_types': {
86 'h5path':
87 '/atoms',
88 'description':
89 'Number of atom types',
90 'transforms':
91 [Transformation(name='get_attribute', args=('nTypes',)),
92 Transformation(name='get_first_element')]
93 },
94 'atoms_position': {
95 'h5path': '/atoms/positions',
96 'description': 'Atom coordinates per atom',
97 },
98 'atoms_groups': {
99 'h5path': '/atoms/equivAtomsGroup'
100 },
101 'reciprocal_cell': {
102 'h5path': '/cell/reciprocalCell'
103 },
104 'bravais_matrix': {
105 'h5path': '/cell/bravaisMatrix',
106 'description': 'Coordinate transformation internal to physical for atoms',
107 'transforms': [Transformation(name='multiply_scalar', args=(BOHR_A,))]
108 },
109 'special_kpoint_indices': {
110 'h5path': '/kpts/specialPointIndices',
111 'transforms': [Transformation(name='shift_dataset', args=(-1,))]
112 },
113 'special_kpoint_labels': {
114 'h5path': '/kpts/specialPointLabels',
115 'transforms': [Transformation(name='convert_to_str')]
116 },
117 'fermi_energy': {
118 'h5path':
119 '/general',
120 'description':
121 'fermi_energy of the system',
122 'transforms': [
123 Transformation(name='get_attribute', args=('lastFermiEnergy',)),
124 Transformation(name='get_first_element')
125 ]
126 },
127 'spins': {
128 'h5path':
129 '/general',
130 'description':
131 'number of distinct spin directions in the system',
132 'transforms':
133 [Transformation(name='get_attribute', args=('spins',)),
134 Transformation(name='get_first_element')]
135 }
136 }
137 })
138
139 if simple:
140 return recipe
141
142 recipe['datasets']['weights'] = {
143 'h5path':
144 f'/{group}/BS',
145 'transforms': [
146 Transformation(name='get_all_child_datasets', kwargs={'ignore': ['eigenvalues', 'kpts']}),
147 AttribTransformation(name='add_partial_sums',
148 attrib_name='atoms_groups',
149 args=(f'{atom_prefix}{{}}{weight_atom_terminator}'.format,),
150 kwargs={
151 'make_set': True,
152 'replace_format': f'{atom_prefix}{{}}'.format
153 }),
154 Transformation(name='split_array', kwargs={'suffixes': ['up', 'down']}),
155 Transformation(name='flatten_array')
156 ],
157 'unpack_dict':
158 True
159 }
160
161 return recipe
Each recipe can define the datasets
and attributes
entry (if one is not defined,
a empty dict is returned in its place). Each entry in these sections has the same structure.
#Example entry from the FleurBands recipe
'fermi_energy': {
'h5path':
'/general',
'description':
'fermi_energy of the system',
'transforms': [
Transformation(name='get_attribute', args=('lastFermiEnergy',), kwargs={}),
Transformation(name='get_first_element', args=(), kwargs={})
]
}
All entries must define the key h5path
. This gives the initial dataset for this key,
which will be extracted from the given .hdf
file. The key of the entry corresponds to
the key under which the result will be saved to the output dictionary.
If the dataset should be transformed in some way after reading it, there are a number
of defined transformations in masci_tools.io.parsers.hdf5.transforms
.
These are added to an entry by adding a list of namedtuples
(reader.Transformation
for general transformations;
reader.AttribTransformation
for attribute transformations) under the key
transforms
. General Transformations can be used in all entries, while transformations
using an attribute value can only be used in the datasets
entries. Each namedtuple takes
the name
of the transformation function and the positional (args
),
and keyword arguments (kwargs
) for the transformation. Attribute transformations
also take the name of the attribute, whose value should be passed to the transformation
in attrib_name
.
At the moment the following transformation functions are pre-defined:
General Transformations:
get_first_element()
: Get the index0
of the datasetindex_dataset()
: Get the indexindex
of the datasetslice_dataset()
: Slice the given dataset with the given argumentget_shape()
: Get the shape of the datasettile_array()
: Use np.tile to repeat dataset a given amount of timesrepeat_array()
: Use np.repeat to repeat each element in the dataset a given amount of timesget_all_child_datasets()
: extract all datasets contained in the current hdf group and enter them into a dictmerge_subgroup_datasets()
: extract all datasets contained in the subgroups of the current hdf group and enter them into a dict in a list (or one numpy array)stack_datasets()
: Stack the given datasets in the dictionary along a given axisshift_dataset()
: Shift the given dataset with a scalar valuemultiply_scalar()
: Multiply the given dataset with a scalar valuemultiply_array()
: Multiply the given dataset with a given arrayconvert_to_complex_array()
: Convert real dataset to complex arraycalculate_norm()
: Calculate norm of list of vectors (either absolute or difference between subsequent entries)cumulative_sum()
: Calculative cumulative sum of datasetget_attribute()
: Get the value of one given attribute on the datasetattributes()
: Get all defined attributes on the dataset as a dictmove_to_memory()
: Convert dataset to numpy array (if not already done implicitly)flatten_array()
: Create copy of dataset flattened into one dimensionsplit_array()
: Split the given dataset along its first index and store result in a dictionary with keys with suffixesconvert_to_str()
: Convert datatype of dataset to stringperiodic_elements()
: Convert atomic numbers to their atomic symbols
Transformations using an attribute:
multiply_by_attribute()
: Multiply dataset by value of attribute (both scalar and matrix)shift_by_attribute()
: Shift the given dataset with the value of an attributerepeat_array_by_attribute()
: Callrepeat_array()
with the value of an attribute as argumenttile_array_by_attribute()
: Calltile_array()
with the value of an attribute as argumentadd_partial_sums()
: Sum over entries in dictionary datasets with given patterns in the key (Pattern is formatted with given attribute value)
Custom transformation functions can also be defined using the hdf5_transformation()
decorator. For some transformation, e.g. get_all_child_datasets()
, the result
will be a subdictionary in the datasets
or attributes
dictionary. If this is not desired
the entry can include 'unpack_dict': True
. With this all keys from the resulting dict
will be extracted after all transformations and put into the root dictionary.