Python module#
Overview#
Convenience functions#
|
Dump (nested) dictionary to file. |
|
Copy groups/datasets from one HDF5-archive |
|
Copy a dataset from one file to another. |
|
Compare two files. Return dictionary with differences::. |
|
Compare two files. Return three dictionaries with differences::. |
Manipulate path#
|
Return absolute path. |
|
Join path components. |
Iterators#
|
Get paths to all dataset and groups that contain attributes. |
|
Iterator to transverse all datasets in a HDF5-archive. |
|
Paths of all groups in a HDF5-archive. |
|
From a list of paths, filter those paths that do not point to datasets. |
Verify#
|
Try reading each datasets. |
|
Check if a path exists in the HDF5-archive. |
|
Check if any of the input paths exists in the HDF5-archive. |
|
Check if all of the input paths exists in the HDF5-archive. |
|
Check that a dataset is equal in both files. |
|
Check that all listed datasets are equal in both files. |
Documentation#
- class GooseHDF5.ExtendableList(file: File, key: str, dtype=None, chunk: int = 1000, **kwargs)#
Write extendable list to HDF5 file.
For example:
data = np.random.random([100]) with h5py.File("foo.h5", "w") as file: with g5.ExtendableList(file, "foo", np.float64) as dset: for d in data: dset.append(d)
- Parameters
file – Opened HDF5 file (in write mode).
key – Path to the dataset.
dtype – Data-type to use (needed for new datasets).
chunk – Chunk size: flush after this many entries.
kwargs – An optional dictionary with attributes.
- flush()#
Flush the buffer.
- class GooseHDF5.ExtendableSlice(file: File, name: str, shape: Optional[tuple[int, ...]] = None, dtype=None, chunk: int = 1, maxshape: Optional[tuple[int, ...]] = None, **kwargs)#
Write slices of an extendable dataset to HDF5 file.
For example:
dataset = np.random.random([100, 10, 10]) with h5py.File("foo.h5", "w") as file: with g5.ExtendableSlice(file, "foo", (10, 10), np.float64) as dset: for i in range(dataset.shape[0]): dset += dataset[i, ...]
- Parameters
file – Opened HDF5 file (in write mode).
name – Path to the dataset.
shape – Shape of all dimensions >= 1. The shape of dimension 0 is dynamic.
dtype – Data-type to use (needed for new datasets).
chunk – Chunk size: flush after this many slices.
maxshape – Maximum shape of all dimensions >= 1. Default: same as
shape
.kwargs – An optional dictionary with attributes.
- flush()#
Flush the buffer.
- GooseHDF5.G5compare(args: list[str])#
Command-line tool to print datasets from a file, see
--help
. :param args: Command-line arguments (should be all strings).
- GooseHDF5.G5list(args: list[str])#
Command-line tool to print datasets from a file, see
--help
. :param args: Command-line arguments (should be all strings).
- GooseHDF5.G5modify(args: list[str])#
Command-line tool to print datasets from a file, see
--help
. :param args: Command-line arguments (should be all strings).
- GooseHDF5.G5print(args: list[str])#
Command-line tool to print datasets from a file, see
--help
. :param args: Command-line arguments (should be all strings).
- GooseHDF5.abspath(path: str) str #
Return absolute path.
- Parameters
path (str) – A HDF5-path.
- Returns
The absolute path.
- GooseHDF5.allequal(source: File, dest: File, source_datasets: list[str], dest_datasets: Optional[list[str]] = None, root: Optional[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False)#
Check that all listed datasets are equal in both files.
- Parameters
source (h5py.File) – The source HDF5-archive.
dest (h5py.File) – The destination HDF5-archive.
source_datasets (list) – List of dataset-paths in
source
.dest_datasets (list) – List of dataset-paths in
dest
, defaults tosource_datasets
.root – Path prefix for all
dest_datasets
.attrs – Compare attributes (the same way at datasets).
matching_dtype – Check that not only the data but also the type matches.
shallow – Check only the presence of the dataset, not its value.
- GooseHDF5.compare(a: str | h5py.File, b: str | h5py.File, paths_a: list[str] = None, paths_b: list[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, only_datasets: bool = False, fold: str | list[str] = None, max_depth: int = None, close: bool = False) dict[list] #
- GooseHDF5.compare(a: h5py.File, b: h5py.File, paths_a: list[str] = None, paths_b: list[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, only_datasets: bool = False, max_depth: int = None, fold: str | list[str] = None, list_folded: bool = False, close: bool = False) dict[list]
- GooseHDF5.compare(a: str, b: str, paths_a: list[str] = None, paths_b: list[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, only_datasets: bool = False, max_depth: int = None, fold: str | list[str] = None, list_folded: bool = False, close: bool = False) dict[list]
Compare two files. Return dictionary with differences:
{ "->" : ["/path/in/b/but/not/in/a", ...], "<-" : ["/path/in/a/but/not/in/b", ...], "!=" : ["/path/in/both/but/different/data", ...], "==" : ["/data/matching", ...] }
Warning
Folded groups are not compared in any way! Use list_folded to include this in the output.
- Parameters
a – HDF5-archive (as opened
h5py.File
or with thefilepath
).b – HDF5-archive (as opened
h5py.File
or with thefilepath
).paths_a – Paths from
a
to consider. Default: read fromgetdatapaths()
.paths_b – Paths from
b
to consider. Default: read fromgetdatapaths()
.attrs – Compare attributes (the same way at datasets).
matching_dtype – Check that not only the data but also the type matches.
shallow – Check only the presence of datasets, not their values, size, or attributes.
only_datasets – Compare datasets only (not groups, regardless if they have attributes).
max_depth – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
list_folded – Return folded groups under “??”
close – Use
np.isclose
also onfloat
-int
matches.
- Returns
Dictionary with difference.
- GooseHDF5.compare_allow(comparison: dict[list], paths: list[str], keys: list[str] = ['->', '<-', '!='], root: Optional[str] = None) dict[list] #
Modify the output of
compare()
to allow specific differences. In practice this removes certain fields from the lists under specific keys in the dictionary.- Parameters
comparison – The output of
compare()
.paths – List of paths to allow.
keys – List of comparison keys (
"->"
,"<-"
,"!="
).root – Path prefix for
paths
.
- Returns
The modified comparison dictionary.
- GooseHDF5.compare_rename(a: h5py.File, b: h5py.File, rename: list[str] = None, paths_a: list[str] = None, paths_b: list[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, regex: bool = False, only_datasets: bool = True, max_depth: int = None, fold: str | list[str] = None, list_folded: bool = False, close: bool = False) dict[list] #
Compare two files. Return three dictionaries with differences:
# plain comparison between a and b { "->" : ["/path/in/b/but/not/in/a", ...], "<-" : ["/path/in/a/but/not/in/b", ...], "!=" : ["/path/in/both/but/different/data", ...], "==" : ["/data/matching", ...] } # comparison of renamed paths: list of paths in a { "!=" : ["/path/in/a/with/rename/path/not_equal", ...], "==" : ["/path/in/a/with/rename/path/matching", ...] } # comparison of renamed paths: list of paths in b { "!=" : ["/path/in/b/with/rename/path/not_equal", ...], "==" : ["/path/in/b/with/rename/path/matching", ...] }
Warning
Folded groups are not compared in any way! Use list_folded to include this in the output.
- Parameters
a – HDF5-archive (as opened
h5py.File
or with thefilepath
).b – HDF5-archive (as opened
h5py.File
or with thefilepath
).rename – List with with renamed pairs:
[["/a/0", "/b/1"], ...]
.paths_a – Paths from
a
to consider. Default: read fromgetdatapaths()
.paths_b – Paths from
b
to consider. Default: read fromgetdatapaths()
.attrs – Compare attributes (the same way at datasets).
matching_dtype – Check that not only the data but also the type matches.
shallow – Check only the presence of datasets, not their values, size, or attributes.
regex – Use regular expressions to match
rename
.only_datasets – Compare datasets only (not groups, regardless if they have attributes).
max_depth – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
list_folded – Return folded groups under “??”
close – Use
np.isclose
also onfloat
-int
matches.
- Returns
Dictionary with difference.
- GooseHDF5.copy(source: File, dest: File, source_paths: list[str], dest_paths: Optional[list[str]] = None, root: Optional[str] = None, source_root: Optional[str] = None, skip: bool = False, preserve_soft: bool = False, shallow: bool = False, expand_soft: bool = False, expand_external: bool = False, expand_refs: bool = False, without_attrs: bool = False)#
Copy groups/datasets from one HDF5-archive
source
to another HDF5-archivedest
. The datasets can be renamed by specifying a list ofdest_paths
(whose entries should correspond to thesource_paths
). In addition, aroot
path prefix can be specified for the destination datasets. Likewise, asource_root
path prefix can be specified for the source datasets.For the options
shallow
,expand_soft
,expand_external
,expand_refs
,without_attrs
see: h5py.Group.copy.- Parameters
source – The source HDF5-archive.
dest – The destination HDF5-archive.
source_paths – List of dataset-paths in
source
.dest_paths – List of dataset-paths in
dest
, defaults tosource_paths
.root – Path prefix for all
dest_paths
.source_root – Path prefix for all
source_paths
.skip – Skip datasets that are not present in source.
preserve_soft – Preserve soft links.
shallow – Only copy immediate members of a group.
expand_soft – Expand soft links into new objects.
expand_external – Expand external links into new objects.
expand_refs – Copy objects which are pointed to by references.
without_attrs – Copy object(s) without copying HDF5 attributes.
- GooseHDF5.copy_dataset(source, dest, paths, compress=False, double_to_float=False)#
Copy a dataset from one file to another. This function also copies possible attributes.
- Parameters
source (h5py.File) – The source HDF5-archive.
dest (h5py.File) – The destination HDF5-archive.
paths (str, list) – (List of) HDF5-path(s) to copy.
compress (bool) – Compress the destination dataset(s).
double_to_float (bool) – Convert doubles to floats before copying.
- GooseHDF5.create_extendible(file: File, key: str, dtype, ndim: int = 1, **kwargs) Dataset #
Create extendible dataset.
- Parameters
file – Opened HDF5 file.
key – Path to the dataset.
dtype – Data-type to use.
ndim – Number of dimensions.
kwargs – An optional dictionary with attributes.
- GooseHDF5.dump(file: File, data: dict, root: str = '/')#
Dump (nested) dictionary to file.
- GooseHDF5.equal(source: File, dest: File, source_dataset: str, dest_dataset: Optional[str] = None, root: Optional[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, close: bool = False)#
Check that a dataset is equal in both files.
- Parameters
source (h5py.File) – The source HDF5-archive.
dest (h5py.File) – The destination HDF5-archive.
source_datasets (list) – List of dataset-paths in
source
.dest_datasets (list) – List of dataset-paths in
dest
, defaults tosource_datasets
.root – Path prefix for
dest_dataset
.attrs – Compare attributes (the same way at datasets).
matching_dtype – Check that not only the data but also the type matches.
shallow – Check only the presence of the dataset, not its value.
close – Use
np.isclose
also onfloat
-int
matches.
- GooseHDF5.exists(file, path)#
Check if a path exists in the HDF5-archive.
- Parameters
file (h5py.File) – A HDF5-archive.
path (str) – HDF5-path.
- GooseHDF5.exists_all(file, paths)#
Check if all of the input paths exists in the HDF5-archive.
- Arguments
- Parameters
file (h5py.File) – A HDF5-archive.
path (list) – List of HDF5-paths.
- GooseHDF5.exists_any(file, paths)#
Check if any of the input paths exists in the HDF5-archive.
- Parameters
file (h5py.File) – A HDF5-archive.
path (list) – List of HDF5-paths.
- GooseHDF5.filter_datasets(file, paths)#
From a list of paths, filter those paths that do not point to datasets.
- Parameters
file (h5py.File) – A HDF5-archive.
paths (list) – List of HDF5-paths.
- Returns
Filtered
paths
.
- GooseHDF5.getdatapaths(file: h5py.File, root: str = '/', max_depth: int = None, fold: str | list[str] = None, fold_symbol: str = '/...') list[str] #
Get paths to all dataset and groups that contain attributes.
Warning
getgroups()
visits all groups in the file, regardless if they are folded (byfold
ormax_depth
). Depending on the file, this can be quite costly. If runtime is an issue consider searching for datasets only usinggetdatasets()
if your use-case allows it.- Parameters
file – A HDF5-archive.
root – Start a certain point along the path-tree.
max_depth – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
fold_symbol – Use symbol to indicate that a group is folded.
- Returns
List of paths (always absolute, so includes the
root
if used).
- GooseHDF5.getdatasets(file: h5py.File, root: str = '/', max_depth: int = None, fold: str | list[str] = None, fold_symbol: str = '/...') Iterator #
Iterator to transverse all datasets in a HDF5-archive. One can choose to fold (not transverse deeper than):
Groups deeper than a certain
max_depth
.A (list of) specific group(s).
- Parameters
file – A HDF5-archive.
root – Start a certain point along the path-tree.
max_depth – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
fold_symbol – Use symbol to indicate that a group is folded.
- Returns
Iterator to paths (always absolute, so includes the
root
if used).- Example
Consider this file:
/path/to/first/a /path/to/first/b /data/c /data/d /e
Calling:
with h5py.File("...", "r") as file: for path in GooseHDF5.getpaths(file, max_depth=2, fold="/data"): print(path)
Will print:
/path/to/... /data/... /e
The
...
indicates that it concerns a folded group, not a dataset. Here, the first group was folded because of the maximum depth, the second because it was specifically requested to be folded.
- GooseHDF5.getgroups(file: h5py.File, root: str = '/', has_attrs: bool = False, max_depth: int = None, fold: str | list[str] = None, fold_symbol: str = '/...') list[str] #
Paths of all groups in a HDF5-archive.
Warning
The function visits all groups in the file, regardless if they are folded (by
fold
ormax_depth
). Depending on the file, this can be quite costly.- Parameters
file – A HDF5-archive.
root – Start at a certain point along the path-tree.
has_attrs – Return only groups that have attributes.
max_depth (int) – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
fold_symbol – Use symbol to indicate that a group is folded.
- Returns
List of paths (always absolute, so includes the
root
if used).
- GooseHDF5.info_table(source, paths: list[str], link_type: bool = False) PrettyTable #
Get a table with basic information per path:
path
size
shape
dtype
attrs: Number of attributes
link: Link type
- Parameters
paths – List of paths.
link_type – Include the link-type in the output.
- GooseHDF5.isnumeric(a)#
Returns
True
is an array contains numeric values.- Parameters
a (array) – An array.
- Returns
bool
- GooseHDF5.join(*args, root: bool = False) str #
Join path components.
- Parameters
args (list) – Piece of a path.
root – Prepend the output with the root
"/"
.
- Returns
The concatenated path.
- GooseHDF5.print_attribute(source, paths: list[str])#
Print paths to dataset and to all underlying attributes. :param paths: List of paths.
- GooseHDF5.print_plain(source, paths: list[str], show_links: bool = False)#
Print the paths to all datasets (one per line). :param paths: List of paths. :param show_links: Show the path the link points to.
- GooseHDF5.verify(file, datasets, error=False)#
Try reading each datasets.
- Parameters
file (h5py.File) – A HDF5-archive.
datasets (list) – List of HDF5-paths tp datasets.
error (bool) –
If
True
, the function raises an error if reading failed.If
False
, the function just continues.
- Returns
List with only those datasets that can be successfully opened.