Python module#

Overview#

Convenience functions#

`GooseHDF5.dump`(file, data[, root])	Dump (nested) dictionary to file.
`GooseHDF5.copy`(source, dest, source_paths[, ...])	Copy groups/datasets from one HDF5-archive `source` to another HDF5-archive `dest`.
`GooseHDF5.copy_dataset`(source, dest, paths)	Copy a dataset from one file to another.
`GooseHDF5.compare`(-> dict[list])	Compare two files. Return dictionary with differences::.
`GooseHDF5.compare_rename`(a, b[, rename, ...])	Compare two files. Return three dictionaries with differences::.

Manipulate path#

`GooseHDF5.abspath`(path)	Return absolute path.
`GooseHDF5.join`(*args[, root])	Join path components.

Iterators#

`GooseHDF5.getdatapaths`(file[, root, ...])	Get paths to all dataset and groups that contain attributes.
`GooseHDF5.getdatasets`(file[, root, ...])	Iterator to transverse all datasets in a HDF5-archive.
`GooseHDF5.getgroups`(file[, root, has_attrs, ...])	Paths of all groups in a HDF5-archive.
`GooseHDF5.filter_datasets`(file, paths)	From a list of paths, filter those paths that do not point to datasets.

Verify#

`GooseHDF5.verify`(file, datasets[, error])	Try reading each datasets.
`GooseHDF5.exists`(file, path)	Check if a path exists in the HDF5-archive.
`GooseHDF5.exists_any`(file, paths)	Check if any of the input paths exists in the HDF5-archive.
`GooseHDF5.exists_all`(file, paths)	Check if all of the input paths exists in the HDF5-archive.
`GooseHDF5.equal`(source, dest, source_dataset)	Check that a dataset is equal in both files.
`GooseHDF5.allequal`(source, dest, source_datasets)	Check that all listed datasets are equal in both files.

Documentation#

class GooseHDF5.ExtendableList(file: File, key: str, dtype=None, chunk: int = 1000, **kwargs)#

Write extendable list to HDF5 file.

For example:

data = np.random.random([100])

with h5py.File("foo.h5", "w") as file:
    with g5.ExtendableList(file, "foo", np.float64) as dset:
        for d in data:
            dset.append(d)

Parameters

file – Opened HDF5 file (in write mode).
key – Path to the dataset.
dtype – Data-type to use (needed for new datasets).
chunk – Chunk size: flush after this many entries.
kwargs – An optional dictionary with attributes.

flush()#: Flush the buffer.

class GooseHDF5.ExtendableSlice(file: File, name: str, shape: Optional[tuple[int, ...]] = None, dtype=None, chunk: int = 1, maxshape: Optional[tuple[int, ...]] = None, **kwargs)#

Write slices of an extendable dataset to HDF5 file.

For example:

dataset = np.random.random([100, 10, 10])

with h5py.File("foo.h5", "w") as file:
    with g5.ExtendableSlice(file, "foo", (10, 10), np.float64) as dset:
        for i in range(dataset.shape[0]):
            dset += dataset[i, ...]

Parameters

file – Opened HDF5 file (in write mode).
name – Path to the dataset.
shape – Shape of all dimensions >= 1. The shape of dimension 0 is dynamic.
dtype – Data-type to use (needed for new datasets).
chunk – Chunk size: flush after this many slices.
maxshape – Maximum shape of all dimensions >= 1. Default: same as shape.
kwargs – An optional dictionary with attributes.

flush()#: Flush the buffer.

GooseHDF5.G5compare(args: list[str])#: Command-line tool to print datasets from a file, see --help. :param args: Command-line arguments (should be all strings).

GooseHDF5.G5list(args: list[str])#: Command-line tool to print datasets from a file, see --help. :param args: Command-line arguments (should be all strings).

GooseHDF5.G5modify(args: list[str])#: Command-line tool to print datasets from a file, see --help. :param args: Command-line arguments (should be all strings).

GooseHDF5.G5print(args: list[str])#: Command-line tool to print datasets from a file, see --help. :param args: Command-line arguments (should be all strings).

GooseHDF5.abspath(path: str) → str#

Return absolute path.

Parameters: path (str) – A HDF5-path.
Returns: The absolute path.

GooseHDF5.allequal(source: File, dest: File, source_datasets: list[str], dest_datasets: Optional[list[str]] = None, root: Optional[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False)#

Check that all listed datasets are equal in both files.

Parameters

source (h5py.File) – The source HDF5-archive.
dest (h5py.File) – The destination HDF5-archive.
source_datasets (list) – List of dataset-paths in source.
dest_datasets (list) – List of dataset-paths in dest, defaults to source_datasets.
root – Path prefix for all dest_datasets.
attrs – Compare attributes (the same way at datasets).
matching_dtype – Check that not only the data but also the type matches.
shallow – Check only the presence of the dataset, not its value.

GooseHDF5.compare(a: str | h5py.File, b: str | h5py.File, paths_a: list[str] = None, paths_b: list[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, only_datasets: bool = False, fold: str | list[str] = None, max_depth: int = None, close: bool = False) → dict[list]#

GooseHDF5.compare(a: h5py.File, b: h5py.File, paths_a: list[str] = None, paths_b: list[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, only_datasets: bool = False, max_depth: int = None, fold: str | list[str] = None, list_folded: bool = False, close: bool = False) → dict[list]

GooseHDF5.compare(a: str, b: str, paths_a: list[str] = None, paths_b: list[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, only_datasets: bool = False, max_depth: int = None, fold: str | list[str] = None, list_folded: bool = False, close: bool = False) → dict[list]

Compare two files. Return dictionary with differences:

{
    "->" : ["/path/in/b/but/not/in/a", ...],
    "<-" : ["/path/in/a/but/not/in/b", ...],
    "!=" : ["/path/in/both/but/different/data", ...],
    "==" : ["/data/matching", ...]
}

Warning

Folded groups are not compared in any way! Use list_folded to include this in the output.

Parameters

a – HDF5-archive (as opened h5py.File or with the filepath).
b – HDF5-archive (as opened h5py.File or with the filepath).
paths_a – Paths from a to consider. Default: read from getdatapaths().
paths_b – Paths from b to consider. Default: read from getdatapaths().
attrs – Compare attributes (the same way at datasets).
matching_dtype – Check that not only the data but also the type matches.
shallow – Check only the presence of datasets, not their values, size, or attributes.
only_datasets – Compare datasets only (not groups, regardless if they have attributes).
max_depth – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
list_folded – Return folded groups under “??”
close – Use np.isclose also on float-int matches.

Returns

Dictionary with difference.

GooseHDF5.compare_allow(comparison: dict[list], paths: list[str], keys: list[str] = ['->', '<-', '!='], root: Optional[str] = None) → dict[list]#

Modify the output of compare() to allow specific differences. In practice this removes certain fields from the lists under specific keys in the dictionary.

Parameters

comparison – The output of compare().
paths – List of paths to allow.
keys – List of comparison keys ("->", "<-", "!=").
root – Path prefix for paths.

Returns

The modified comparison dictionary.

GooseHDF5.compare_rename(a: h5py.File, b: h5py.File, rename: list[str] = None, paths_a: list[str] = None, paths_b: list[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, regex: bool = False, only_datasets: bool = True, max_depth: int = None, fold: str | list[str] = None, list_folded: bool = False, close: bool = False) → dict[list]#

Compare two files. Return three dictionaries with differences:

# plain comparison between a and b

{
    "->" : ["/path/in/b/but/not/in/a", ...],
    "<-" : ["/path/in/a/but/not/in/b", ...],
    "!=" : ["/path/in/both/but/different/data", ...],
    "==" : ["/data/matching", ...]
}

# comparison of renamed paths: list of paths in a

{
    "!=" : ["/path/in/a/with/rename/path/not_equal", ...],
    "==" : ["/path/in/a/with/rename/path/matching", ...]
}

# comparison of renamed paths: list of paths in b

{
    "!=" : ["/path/in/b/with/rename/path/not_equal", ...],
    "==" : ["/path/in/b/with/rename/path/matching", ...]
}

Warning

Folded groups are not compared in any way! Use list_folded to include this in the output.

Parameters

a – HDF5-archive (as opened h5py.File or with the filepath).
b – HDF5-archive (as opened h5py.File or with the filepath).
rename – List with with renamed pairs: [["/a/0", "/b/1"], ...].
paths_a – Paths from a to consider. Default: read from getdatapaths().
paths_b – Paths from b to consider. Default: read from getdatapaths().
attrs – Compare attributes (the same way at datasets).
matching_dtype – Check that not only the data but also the type matches.
shallow – Check only the presence of datasets, not their values, size, or attributes.
regex – Use regular expressions to match rename.
only_datasets – Compare datasets only (not groups, regardless if they have attributes).
max_depth – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
list_folded – Return folded groups under “??”
close – Use np.isclose also on float-int matches.

Returns

Dictionary with difference.

GooseHDF5.copy(source: File, dest: File, source_paths: list[str], dest_paths: Optional[list[str]] = None, root: Optional[str] = None, source_root: Optional[str] = None, skip: bool = False, preserve_soft: bool = False, shallow: bool = False, expand_soft: bool = False, expand_external: bool = False, expand_refs: bool = False, without_attrs: bool = False)#

Copy groups/datasets from one HDF5-archive source to another HDF5-archive dest. The datasets can be renamed by specifying a list of dest_paths (whose entries should correspond to the source_paths). In addition, a root path prefix can be specified for the destination datasets. Likewise, a source_root path prefix can be specified for the source datasets.

For the options shallow, expand_soft, expand_external, expand_refs, without_attrs see: h5py.Group.copy.

Parameters

source – The source HDF5-archive.
dest – The destination HDF5-archive.
source_paths – List of dataset-paths in source.
dest_paths – List of dataset-paths in dest, defaults to source_paths.
root – Path prefix for all dest_paths.
source_root – Path prefix for all source_paths.
skip – Skip datasets that are not present in source.
preserve_soft – Preserve soft links.
shallow – Only copy immediate members of a group.
expand_soft – Expand soft links into new objects.
expand_external – Expand external links into new objects.
expand_refs – Copy objects which are pointed to by references.
without_attrs – Copy object(s) without copying HDF5 attributes.

GooseHDF5.copy_dataset(source, dest, paths, compress=False, double_to_float=False)#

Copy a dataset from one file to another. This function also copies possible attributes.

Parameters

source (h5py.File) – The source HDF5-archive.
dest (h5py.File) – The destination HDF5-archive.
paths (str, list) – (List of) HDF5-path(s) to copy.
compress (bool) – Compress the destination dataset(s).
double_to_float (bool) – Convert doubles to floats before copying.

GooseHDF5.create_extendible(file: File, key: str, dtype, ndim: int = 1, **kwargs) → Dataset#

Create extendible dataset.

Parameters

file – Opened HDF5 file.
key – Path to the dataset.
dtype – Data-type to use.
ndim – Number of dimensions.
kwargs – An optional dictionary with attributes.

GooseHDF5.dump(file: File, data: dict, root: str = '/')#: Dump (nested) dictionary to file.

GooseHDF5.equal(source: File, dest: File, source_dataset: str, dest_dataset: Optional[str] = None, root: Optional[str] = None, attrs: bool = True, matching_dtype: bool = False, shallow: bool = False, close: bool = False)#

Check that a dataset is equal in both files.

Parameters

source (h5py.File) – The source HDF5-archive.
dest (h5py.File) – The destination HDF5-archive.
source_datasets (list) – List of dataset-paths in source.
dest_datasets (list) – List of dataset-paths in dest, defaults to source_datasets.
root – Path prefix for dest_dataset.
attrs – Compare attributes (the same way at datasets).
matching_dtype – Check that not only the data but also the type matches.
shallow – Check only the presence of the dataset, not its value.
close – Use np.isclose also on float-int matches.

GooseHDF5.exists(file, path)#

Check if a path exists in the HDF5-archive.

Parameters

file (h5py.File) – A HDF5-archive.
path (str) – HDF5-path.

GooseHDF5.exists_all(file, paths)#

Check if all of the input paths exists in the HDF5-archive.

Arguments

Parameters

file (h5py.File) – A HDF5-archive.
path (list) – List of HDF5-paths.

GooseHDF5.exists_any(file, paths)#

Check if any of the input paths exists in the HDF5-archive.

Parameters

file (h5py.File) – A HDF5-archive.
path (list) – List of HDF5-paths.

GooseHDF5.filter_datasets(file, paths)#

From a list of paths, filter those paths that do not point to datasets.

Parameters

file (h5py.File) – A HDF5-archive.
paths (list) – List of HDF5-paths.

Returns

Filtered paths.

GooseHDF5.getdatapaths(file: h5py.File, root: str = '/', max_depth: int = None, fold: str | list[str] = None, fold_symbol: str = '/...') → list[str]#

Get paths to all dataset and groups that contain attributes.

Warning

getgroups() visits all groups in the file, regardless if they are folded (by fold or max_depth). Depending on the file, this can be quite costly. If runtime is an issue consider searching for datasets only using getdatasets() if your use-case allows it.

Parameters

file – A HDF5-archive.
root – Start a certain point along the path-tree.
max_depth – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
fold_symbol – Use symbol to indicate that a group is folded.

Returns

List of paths (always absolute, so includes the root if used).

GooseHDF5.getdatasets(file: h5py.File, root: str = '/', max_depth: int = None, fold: str | list[str] = None, fold_symbol: str = '/...') → Iterator#

Iterator to transverse all datasets in a HDF5-archive. One can choose to fold (not transverse deeper than):

Groups deeper than a certain max_depth.
A (list of) specific group(s).

Parameters

file – A HDF5-archive.
root – Start a certain point along the path-tree.
max_depth – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
fold_symbol – Use symbol to indicate that a group is folded.

Returns

Iterator to paths (always absolute, so includes the root if used).

Example

Consider this file:

/path/to/first/a
/path/to/first/b
/data/c
/data/d
/e

Calling:

with h5py.File("...", "r") as file:

    for path in GooseHDF5.getpaths(file, max_depth=2, fold="/data"):
        print(path)

Will print:

/path/to/...
/data/...
/e

The ... indicates that it concerns a folded group, not a dataset. Here, the first group was folded because of the maximum depth, the second because it was specifically requested to be folded.

GooseHDF5.getgroups(file: h5py.File, root: str = '/', has_attrs: bool = False, max_depth: int = None, fold: str | list[str] = None, fold_symbol: str = '/...') → list[str]#

Paths of all groups in a HDF5-archive.

Warning

The function visits all groups in the file, regardless if they are folded (by fold or max_depth). Depending on the file, this can be quite costly.

Parameters

file – A HDF5-archive.
root – Start at a certain point along the path-tree.
has_attrs – Return only groups that have attributes.
max_depth (int) – Set a maximum depth beyond which groups are folded.
fold – Specify groups that are folded.
fold_symbol – Use symbol to indicate that a group is folded.

Returns

List of paths (always absolute, so includes the root if used).

GooseHDF5.info_table(source, paths: list[str], link_type: bool = False) → PrettyTable#

Get a table with basic information per path:

path
size
shape
dtype
attrs: Number of attributes
link: Link type

Parameters

paths – List of paths.
link_type – Include the link-type in the output.

GooseHDF5.isnumeric(a)#

Returns True is an array contains numeric values.

Parameters: a (array) – An array.
Returns: bool

GooseHDF5.join(*args, root: bool = False) → str#

Join path components.

Parameters

args (list) – Piece of a path.
root – Prepend the output with the root "/".

Returns

The concatenated path.

GooseHDF5.print_attribute(source, paths: list[str])#: Print paths to dataset and to all underlying attributes. :param paths: List of paths.

GooseHDF5.print_plain(source, paths: list[str], show_links: bool = False)#: Print the paths to all datasets (one per line). :param paths: List of paths. :param show_links: Show the path the link points to.

GooseHDF5.verify(file, datasets, error=False)#

Try reading each datasets.

Parameters

file (h5py.File) – A HDF5-archive.
datasets (list) – List of HDF5-paths tp datasets.
error (bool) –
- If True, the function raises an error if reading failed.
- If False, the function just continues.

Returns

List with only those datasets that can be successfully opened.