Python module

Overview

Convenience functions

GooseHDF5.dump(file, data[, root])

Dump (nested) dictionary to file.

GooseHDF5.copy(source, dest, source_datasets)

Copy groups/datasets from one HDF5-archive source to another HDF5-archive dest.

GooseHDF5.copy_dataset(source, dest, paths)

Copy a dataset from one file to another.

GooseHDF5.copydatasets(source, dest, ...[, ...])

Copy datasets from one HDF5-archive source to another HDF5-archive dest.

GooseHDF5.compare(a, b[, paths_a, paths_b, ...])

Compare two files. Return dictionary with differences::.

GooseHDF5.compare_rename(a, b[, rename, ...])

Compare two files. Return three dictionaries with differences::.

Manipulate path

GooseHDF5.abspath(path)

Return absolute path.

GooseHDF5.join(*args[, root])

Join path components.

Iterators

GooseHDF5.getdatapaths(file[, root])

Get paths to all datasets and groups that contain attributes.

GooseHDF5.getdatasets(file[, root, ...])

Iterator to transverse all datasets in a HDF5-archive.

GooseHDF5.getgroups(file[, root, has_attrs, ...])

Paths of all groups in a HDF5-archive.

GooseHDF5.filter_datasets(file, paths)

From a list of paths, filter those paths that do not point to datasets.

GooseHDF5.copydatasets(source, dest, ...[, ...])

Copy datasets from one HDF5-archive source to another HDF5-archive dest.

Verify

GooseHDF5.verify(file, datasets[, error])

Try reading each datasets.

GooseHDF5.exists(file, path)

Check if a path exists in the HDF5-archive.

GooseHDF5.exists_any(file, paths)

Check if any of the input paths exists in the HDF5-archive.

GooseHDF5.exists_all(file, paths)

Check if all of the input paths exists in the HDF5-archive.

GooseHDF5.equal(source, dest, source_dataset)

Check that a dataset is equal in both files.

GooseHDF5.allequal(source, dest, source_datasets)

Check that all listed datasets are equal in both files.

Documentation

GooseHDF5.G5list(args: list[str])

Command-line tool to print datasets from a file, see --help. :param args: Command-line arguments (should be all strings).

GooseHDF5.G5print(args: list[str])

Command-line tool to print datasets from a file, see --help. :param args: Command-line arguments (should be all strings).

GooseHDF5.abspath(path)

Return absolute path.

Parameters

path (str) – A HDF5-path.

Returns

The absolute path.

GooseHDF5.allequal(source: h5py._hl.files.File, dest: h5py._hl.files.File, source_datasets: list[str], dest_datasets: Optional[list[str]] = None, root: Optional[str] = None, attrs: bool = True, matching_dtype: bool = False)

Check that all listed datasets are equal in both files.

Parameters
  • source (h5py.File) – The source HDF5-archive.

  • dest (h5py.File) – The destination HDF5-archive.

  • source_datasets (list) – List of dataset-paths in source.

  • dest_datasets (list) – List of dataset-paths in dest, defaults to source_datasets.

  • root – Path prefix for all dest_datasets.

  • attrs – Compare attributes (the same way at datasets).

  • matching_dtype – Check that not only the data but also the type matches.

GooseHDF5.compare(a: Union[str, h5py._hl.files.File], b: Union[str, h5py._hl.files.File], paths_a: list[str] = None, paths_b: list[str] = None, attrs: bool = True, matching_dtype: bool = False)

Compare two files. Return dictionary with differences:

{
    "->" : ["/path/in/b/but/not/in/a", ...],
    "<-" : ["/path/in/a/but/not/in/b", ...],
    "!=" : ["/path/in/both/but/different/data", ...],
    "==" : ["/data/matching", ...]
}
Parameters
  • a – HDF5-archive (as opened h5py.File or with the filepath).

  • b – HDF5-archive (as opened h5py.File or with the filepath).

  • paths_a – Paths from a to consider. Default: read from getdatapaths().

  • paths_b – Paths from b to consider. Default: read from getdatapaths().

  • attrs – Compare attributes (the same way at datasets).

  • matching_dtype – Check that not only the data but also the type matches.

Returns

Dictionary with difference.

GooseHDF5.compare_rename(a: h5py._hl.files.File, b: h5py._hl.files.File, rename: Optional[list[str]] = None, paths_a: Optional[list[str]] = None, paths_b: Optional[list[str]] = None, attrs: bool = True, matching_dtype: bool = False)

Compare two files. Return three dictionaries with differences:

# plain comparison between a and b

{
    "->" : ["/path/in/b/but/not/in/a", ...],
    "<-" : ["/path/in/a/but/not/in/b", ...],
    "!=" : ["/path/in/both/but/different/data", ...],
    "==" : ["/data/matching", ...]
}

# comparison of renamed paths: list of paths in a

{
    "!=" : ["/path/in/a/with/rename/path/not_equal", ...],
    "==" : ["/path/in/a/with/rename/path/matching", ...]
}

# comparison of renamed paths: list of paths in b

{
    "!=" : ["/path/in/b/with/rename/path/not_equal", ...],
    "==" : ["/path/in/b/with/rename/path/matching", ...]
}
Parameters
  • a – HDF5-archive (as opened h5py.File or with the filepath).

  • b – HDF5-archive (as opened h5py.File or with the filepath).

  • rename – List with with renamed pairs: [["/a/0", "/b/1"], ...].

  • paths_a – Paths from a to consider. Default: read from getdatapaths().

  • paths_b – Paths from b to consider. Default: read from getdatapaths().

  • attrs – Compare attributes (the same way at datasets).

  • matching_dtype – Check that not only the data but also the type matches.

GooseHDF5.copy(source: h5py._hl.files.File, dest: h5py._hl.files.File, source_datasets: list[str], dest_datasets: Optional[list[str]] = None, root: Optional[str] = None, recursive: bool = True, skip: bool = False, expand_soft: bool = True)

Copy groups/datasets from one HDF5-archive source to another HDF5-archive dest. The datasets can be renamed by specifying a list of dest_datasets (whose entries should correspond to the source_datasets). In addition, a root (path prefix) for the destination datasets name can be specified.

Parameters
  • source – The source HDF5-archive.

  • dest – The destination HDF5-archive.

  • source_datasets – List of dataset-paths in source.

  • dest_datasets – List of dataset-paths in dest, defaults to source_datasets.

  • root – Path prefix for all dest_datasets.

  • recursive – If the source is a group, copy all objects within that group recursively.

  • skip – Skip datasets that are not present in source.

  • expand_soft – Copy the underlying data of a link, or copy as link with the same path.

GooseHDF5.copy_dataset(source, dest, paths, compress=False, double_to_float=False)

Copy a dataset from one file to another. This function also copies possible attributes.

Parameters
  • source (h5py.File) – The source HDF5-archive.

  • dest (h5py.File) – The destination HDF5-archive.

  • paths (str, list) – (List of) HDF5-path(s) to copy.

  • compress (bool) – Compress the destination dataset(s).

  • double_to_float (bool) – Convert doubles to floats before copying.

GooseHDF5.copydatasets(source: h5py._hl.files.File, dest: h5py._hl.files.File, source_datasets: list[str], dest_datasets: Optional[list[str]] = None, root: Optional[str] = None)

Copy datasets from one HDF5-archive source to another HDF5-archive dest. The datasets can be renamed by specifying a list of dest_datasets (whose entries should correspond to the source_datasets). If the source is a Group object, by default all objects within that group will be copied recursively.

In addition, a root (path prefix) for the destination datasets name can be specified.

Parameters
  • source – The source HDF5-archive.

  • dest – The destination HDF5-archive.

  • source_datasets – List of dataset-paths in source.

  • dest_datasets – List of dataset-paths in dest, defaults to source_datasets.

  • root – Path prefix for all dest_datasets.

GooseHDF5.dump(file: h5py._hl.files.File, data: dict, root: str = '/')

Dump (nested) dictionary to file.

GooseHDF5.equal(source: h5py._hl.files.File, dest: h5py._hl.files.File, source_dataset: str, dest_dataset: Optional[str] = None, root: Optional[str] = None, attrs: bool = True, matching_dtype: bool = False)

Check that a dataset is equal in both files.

Parameters
  • source (h5py.File) – The source HDF5-archive.

  • dest (h5py.File) – The destination HDF5-archive.

  • source_datasets (list) – List of dataset-paths in source.

  • dest_datasets (list) – List of dataset-paths in dest, defaults to source_datasets.

  • root – Path prefix for dest_dataset.

  • attrs – Compare attributes (the same way at datasets).

  • matching_dtype – Check that not only the data but also the type matches.

GooseHDF5.exists(file, path)

Check if a path exists in the HDF5-archive.

Parameters
  • file (h5py.File) – A HDF5-archive.

  • path (str) – HDF5-path.

GooseHDF5.exists_all(file, paths)

Check if all of the input paths exists in the HDF5-archive.

Arguments

Parameters
  • file (h5py.File) – A HDF5-archive.

  • path (list) – List of HDF5-paths.

GooseHDF5.exists_any(file, paths)

Check if any of the input paths exists in the HDF5-archive.

Parameters
  • file (h5py.File) – A HDF5-archive.

  • path (list) – List of HDF5-paths.

GooseHDF5.filter_datasets(file, paths)

From a list of paths, filter those paths that do not point to datasets.

Parameters
  • file (h5py.File) – A HDF5-archive.

  • paths (list) – List of HDF5-paths.

Returns

Filtered paths.

GooseHDF5.getdatapaths(file, root: str = '/')

Get paths to all datasets and groups that contain attributes.

Parameters
  • file – A HDF5-archive.

  • root – Start at a certain point along the path-tree.

Returns

list[str].

GooseHDF5.getdatasets(file, root='/', max_depth=None, fold=None)

Iterator to transverse all datasets in a HDF5-archive. One can choose to fold (not transverse deeper than):

  • Groups deeper than a certain max_depth.

  • A (list of) specific group(s).

Parameters
  • file (h5py.File) – A HDF5-archive.

  • root (str) – Start a certain point along the path-tree.

  • max_depth (int) – Set a maximum depth beyond which groups are folded.

  • fold (list) – Specify groups that are folded.

Returns

Iterator.

Example

Consider this file:

/path/to/first/a
/path/to/first/b
/data/c
/data/d
/e

Calling:

with h5py.File("...", "r") as file:

    for path in GooseHDF5.getpaths(file, max_depth=2, fold="/data"):
        print(path)

Will print:

/path/to/...
/data/...
/e

The ... indicates that it concerns a folded group, not a dataset. Here, the first group was folded because of the maximum depth, the second because it was specifically requested to be folded.

GooseHDF5.getgroups(file: h5py._hl.files.File, root: str = '/', has_attrs: bool = False, max_depth: Optional[int] = None) list[str]

Paths of all groups in a HDF5-archive.

Parameters
  • file – A HDF5-archive.

  • root – Start at a certain point along the path-tree.

  • has_attrs – Return only groups that have attributes.

  • max_depth (int) – Set a maximum depth beyond which groups are folded.

Returns

list[str].

GooseHDF5.getpaths(data, root='/', max_depth=None, fold=None)

Iterator to transverse all datasets in HDF5-archive. One can choose to fold (not transverse deeper than):

  • Groups deeper than a certain max_depth.

  • A (list of) specific group(s).

Parameters
  • data (h5py.File) – A HDF5-archive.

  • root (str) – Start a certain point along the path-tree.

  • max_depth (int) – Set a maximum depth beyond which groups are folded.

  • fold (list) – Specify groups that are folded.

Returns

Iterator.

Example

Consider this file:

/path/to/first/a
/path/to/first/b
/data/c
/data/d
/e

Calling:

with h5py.File('...', 'r') as data:

    for path in GooseHDF5.getpaths(data, max_depth=2, fold='/data'):
        print(path)

Will print:

/path/to/...
/data/...
/e

The ... indicate that it concerns a folded group, not a dataset. Here, the first group was folded because of the maximum depth, and the second because it was specifically requested to be folded.

GooseHDF5.isnumeric(a)

Returns True is an array contains numeric values.

Parameters

a (array) – An array.

Returns

bool

GooseHDF5.join(*args, root=False)

Join path components.

Parameters

args (list) – Piece of a path.

Returns

The concatenated path.

GooseHDF5.print_attribute(source, paths: list[str])

Print paths to dataset and to all underlying attributes. :param paths: List of paths.

GooseHDF5.print_info(source, paths: list[str])

Print the paths to all datasets (one per line), including type information. :param paths: List of paths.

GooseHDF5.print_plain(source, paths: list[str], show_links: bool = False)

Print the paths to all datasets (one per line). :param paths: List of paths. :param show_links: Show the path the link points to.

GooseHDF5.verify(file, datasets, error=False)

Try reading each datasets.

Parameters
  • file (h5py.File) – A HDF5-archive.

  • datasets (list) – List of HDF5-paths tp datasets.

  • error (bool) –

    • If True, the function raises an error if reading failed.

    • If False, the function just continues.

Returns

List with only those datasets that can be successfully opened.