hdf5storage¶
Module to read and write python types to/from HDF5.
This is the hdf5storage package, a Python package to read and write python data types to HDF5 (Heirarchal Data Format) files beyond just Numpy types.
Version 0.2
|
Writes one piece of data into an HDF5 file. |
|
Writes data into an HDF5 file. |
|
Reads one piece of data from an HDF5 file. |
|
Reads pieces of data from an HDF5 file. |
|
Save a dictionary of python objects to a MATLAB MAT file. |
|
Loads data to a MATLAB MAT file. |
Gets the default MarshallerCollection. |
|
|
Makes a new default MarshallerCollection. |
|
Wrapper that allows writing and reading data from an HDF5 file. |
|
Set of options governing how data is read/written to/from disk. |
|
Represents, maintains, and retreives a set of marshallers. |
write¶
-
hdf5storage.
write
(data, path='/', **keywords)[source]¶ Writes one piece of data into an HDF5 file.
Wrapper around
File
andFile.write
. Specifically, this function is>>> with File(writable=True, **keywords) as f: >>> f.write(data, path)
- Parameters
data (any) – The python object to write.
path (str or bytes or pathlib.PurePath or Iterable, optional) – The path to write the data to.
str
andbytes
paths must be POSIX style. The directory name is the Group to put it in and the basename is the Dataset name to write it to. The default is'/'
.**keywords – Extra keyword arguments to pass to
File
.
- Returns
data – The data that is read.
- Return type
any
- Raises
TypeError – If an argument has an invalid type.
ValueError – If an argument has an invalid value.
IOError – If the file cannot be opened or some other file operation failed.
NotImplementedError – If writing data is not supported.
exceptions.TypeNotMatlabCompatibleError – If writing a type not compatible with MATLAB and the
action_for_matlab_incompatible
option is set to'error'
.
See also
writes¶
-
hdf5storage.
writes
(mdict, **keywords)[source]¶ Writes data into an HDF5 file.
Wrapper around
File
andFile.writes
. Specifically, this function is>>> with File(writable=True, **keywords) as f: >>> f.writes(mdict)
- Parameters
mdict (Mapping) – The
dict
or other dictionary type object of paths and data to write to the file. The paths are the keys (str
andbytes
paths must be POSIX style) where the directory name is the Group to put it in and the basename is the name to write it to. The values are the data to write.**keywords – Extra keyword arguments to pass to
File
.
- Raises
TypeError – If an argument has an invalid type.
ValueError – If an argument has an invalid value.
IOError – If the file cannot be opened or some other file operation failed.
NotImplementedError – If writing anything in mdict is not supported.
exceptions.TypeNotMatlabCompatibleError – If writing a type not compatible with MATLAB and the
action_for_matlab_incompatible
option is set to'error'
.
See also
read¶
-
hdf5storage.
read
(path='/', **keywords)[source]¶ Reads one piece of data from an HDF5 file.
Wrapper around
File
andFile.reads
with the exception that thematlab_compatible
option is set toFalse
if it isn’t given explicitly. Specifically, this function does>>> if 'matlab_compatible' in keywords or ( ... 'options' in keywords ... and keywords['options'] is not None): >>> extra_kws = dict() >>> else: >>> extra_kws = {'matlab_compatible': False} >>> with File(writable=False, **extra_kws, **keywords) as f: >>> f.read(path)
- Parameters
path (str or bytes or pathlib.PurePath or Iterable, optional) – The path to read from.
str
andbytes
paths must be POSIX style. The default is'/'
.**keywords – Extra keyword arguments to pass to
File
.
- Raises
TypeError – If an argument has an invalid type.
ValueError – If an argument has an invalid value.
KeyError – If the path cannot be found.
IOError – If the file cannot be opened or some other file operation failed.
IOError – If the file is closed.
exceptions.CantReadError – If reading the data can’t be done.
See also
reads¶
-
hdf5storage.
reads
(paths, **keywords)[source]¶ Reads pieces of data from an HDF5 file.
Wrapper around
File
andFile.reads
with the exception that thematlab_compatible
option is set toFalse
if it isn’t given explicitly. Specifically, this function does>>> if 'matlab_compatible' in keywords or ( ... 'options' in keywords ... and keywords['options'] is not None): >>> extra_kws = dict() >>> else: >>> extra_kws = {'matlab_compatible': False} >>> with File(writable=False, **extra_kws, **keywords) as f: >>> f.reads(paths)
- Parameters
paths (Iterable) – An iterable of paths to read data from.
str
andbytes
paths must be POSIX style.**keywords – Extra keyword arguments to pass to
File
.
- Returns
datas – An iterable holding the piece of data for each path in paths in the same order.
- Return type
iterable
- Raises
TypeError – If an argument has an invalid type.
ValueError – If an argument has an invalid value.
KeyError – If a path cannot be found.
IOError – If the file cannot be opened or some other file operation failed.
IOError – If the file is closed.
exceptions.CantReadError – If reading the data can’t be done.
See also
savemat¶
-
hdf5storage.
savemat
(file_name, mdict, appendmat=True, format='7.3', oned_as='row', store_python_metadata=True, action_for_matlab_incompatible='error', marshaller_collection=None, truncate_existing=False, truncate_invalid_matlab=False, **keywords)[source]¶ Save a dictionary of python objects to a MATLAB MAT file.
Saves the data provided in the dictionary mdict to a MATLAB MAT file. format determines which kind/vesion of file to use. The ‘7.3’ version, which is HDF5 based, is handled by this package and all types that this package can write are supported. Versions 4 and 5 are not HDF5 based, so everything is dispatched to the SciPy package’s
scipy.io.savemat
function, which this function is modelled after (arguments not specific to this package have the same names, etc.).- Parameters
file_name (str or file-like object) – Name of the MAT file to store in. The ‘.mat’ extension is added on automatically if not present if appendmat is set to
True
. An open file-like object can be passed if the writing is being dispatched to SciPy (format < 7.3).mdict (dict) – The dictionary of variables and their contents to store in the file.
appendmat (bool, optional) – Whether to append the ‘.mat’ extension to file_name if it doesn’t already end in it or not.
format ({'4', '5', '7.3'}, optional) – The MATLAB mat file format to use. The ‘7.3’ format is handled by this package while the ‘4’ and ‘5’ formats are dispatched to SciPy.
oned_as ({'row', 'column'}, optional) – Whether 1D arrays should be turned into row or column vectors.
store_python_metadata (bool, optional) – Whether or not to store Python type information. Doing so allows most types to be read back perfectly. Only applicable if not dispatching to SciPy (format >= 7.3).
action_for_matlab_incompatible (str, optional) – The action to perform writing data that is not MATLAB compatible. The actions are to write the data anyways (‘ignore’), don’t write the incompatible data (‘discard’), or throw a
TypeNotMatlabCompatibleError
exception.marshaller_collection (MarshallerCollection, optional) – Collection of marshallers to disk to use. Only applicable if not dispatching to SciPy (format >= 7.3).
truncate_existing (bool, optional) – Whether to truncate the file if it already exists before writing to it.
truncate_invalid_matlab (bool, optional) – Whether to truncate a file if the file doesn’t have the proper header (userblock in HDF5 terms) setup for MATLAB metadata to be placed.
**keywords – Additional keywords arguments to be passed onto
scipy.io.savemat
if dispatching to SciPy (format < 7.3).
- Raises
ImportError – If format < 7.3 and the
scipy
module can’t be found.NotImplementedError – If writing a variable in mdict is not supported.
exceptions.TypeNotMatlabCompatibleError – If writing a type not compatible with MATLAB and action_for_matlab_incompatible is set to
'error'
.
Notes
Writing the same data and then reading it back from disk using the HDF5 based version 7.3 format (the functions in this package) or the older format (SciPy functions) can lead to very different results. Each package supports a different set of data types and converts them to and from the same MATLAB types differently.
See also
loadmat()
Equivelent function to do reading.
scipy.io.savemat()
SciPy function this one models after and dispatches to.
writes()
Function used to do the actual writing.
loadmat¶
-
hdf5storage.
loadmat
(file_name, mdict=None, appendmat=True, variable_names=None, marshaller_collection=None, **keywords)[source]¶ Loads data to a MATLAB MAT file.
Reads data from the specified variables (or all) in a MATLAB MAT file. There are many different formats of MAT files. This package can only handle the HDF5 based ones (the version 7.3 and later). As SciPy’s
scipy.io.loadmat
function can handle the earlier formats, if this function cannot read the file, it will dispatch it onto the scipy function with all the calling arguments it uses passed on. This function is modelled after the SciPy one (arguments not specific to this package have the same names, etc.).Warning
Variables in variable_names that are missing from the file do not cause an exception and will just be missing from the output.
- Parameters
file_name (str) – Name of the MAT file to read from. The ‘.mat’ extension is added on automatically if not present if appendmat is set to
True
.mdict (dict, optional) – The dictionary to insert read variables into
appendmat (bool, optional) – Whether to append the ‘.mat’ extension to file_name if it doesn’t already end in it or not.
variable_names (None or sequence, optional) – The variable names to read from the file.
None
selects all.marshaller_collection (MarshallerCollection, optional) – Collection of marshallers from disk to use. Only applicable if not dispatching to SciPy (version 7.3 and newer files).
**keywords – Additional keywords arguments to be passed onto
scipy.io.loadmat
if dispatching to SciPy if the file is not a version 7.3 or later format.
- Returns
mdict – Dictionary of all the variables read from the MAT file (name as the key, and content as the value). If a variable was missing from the file, it will not be present here.
- Return type
- Raises
ImportError – If it is not a version 7.3 .mat file and the
scipy
module can’t be found when dispatching to SciPy.KeyError – If a variable cannot be found.
exceptions.CantReadError – If reading the data can’t be done.
Notes
Writing the same data and then reading it back from disk using the HDF5 based version 7.3 format (the functions in this package) or the older format (SciPy functions) can lead to very different results. Each package supports a different set of data types and converts them to and from the same MATLAB types differently.
See also
savemat()
Equivalent function to do writing.
scipy.io.loadmat()
SciPy function this one models after and dispatches to.
reads()
Function used to do the actual reading.
get_default_MarshallerCollection¶
-
hdf5storage.
get_default_MarshallerCollection
()[source]¶ Gets the default MarshallerCollection.
The initial default only includes the builtin marshallers in the
Marshallers
submodule.- Returns
mc – The default MarshallerCollection.
- Return type
Warning
Any changes made to mc after getting it will be persistent to future calls of this function till
make_new_default_MarshallerCollection
is called.
make_new_default_MarshallerCollection¶
-
hdf5storage.
make_new_default_MarshallerCollection
(*args, **keywords)[source]¶ Makes a new default MarshallerCollection.
Replaces the current default
MarshallerCollection
with a new one.- Parameters
*args (positional arguments) – Positional arguments to use in creating the
MarshallerCollection
.**keywords (keywords arguments) – Keyword arguments to use in creating the
MarshallerCollection
.
File¶
-
class
hdf5storage.
File
(filename='data.h5', writable=False, truncate_existing=False, truncate_invalid_matlab=False, options=None, **keywords)[source]¶ Bases:
collections.abc.MutableMapping
Wrapper that allows writing and reading data from an HDF5 file.
Opens an HDF5 file for reading (and optionally writing) Python objects from/to. The
close
method must be called to close the file. This class supports context handling with thewith
statement.Python objects are read and written from/to paths. Paths can be given directly as POSIX style
str
orbytes
, aspathlib.PurePath
, or the separated path can be given as an iterable ofstr
,bytes
, andpathlib.PurePath
. Each part of a separated path is escaped usingpathesc.escape_path
. Otherwise, the path is assumed to be already escaped. Escaping is done so that targets with a part that starts with one or more periods, contain slashes, and/or contain nulls can be used without causing the wrong Group to be looked in or the wrong target to be looked at. It essentially allows one to make a Dataset named'..'
or'a/a'
instead of moving around in the Dataset hierarchy.There are various options that can be used to influence how the data is read and written. They can be passed as an already constructed
Options
into options or as additional keywords that will be used to make one byoptions = Options(**keywords)
.Two very important options are
store_python_metadata
andmatlab_compatible
, which arebool
. The first makes it so that enough metadata (HDF5 Attributes) are written that data can be read back accurately without it (or its contents if it is a container type) ending up different types, transposed in the case of numpy arrays, etc. The latter makes it so that the appropriate metadata is written, string and bool and complex types are converted properly, and numpy arrays are transposed; which is needed to make sure that MATLAB can import data correctly (the HDF5 header is also set so MATLAB will recognize it).This class is a MutableMapping, meaning that it supports many of the operations allowed on
dict
, including operations that modify it.Example
>>> import hdf5storage >>> with hdf5storage.File('data.h5', writable=True) as f: >>> f.write(4, '/a') >>> a = f.read('/a') >>> a 4
Note
This class is threadsafe to the
threading
module, but not themultiprocessing
module.Warning
The passed
Options
object is shallow copied, meaning that changes to the original will not affect an instance of this class with the exception of changes within themarshaller_collection
option which is not deep copied. Themarshallers_collection
option should not bechanged while a method of an instance of this class is running.- Parameters
filename (str, optional) – The path to the HDF5 file to open. The default is
'data.h5'
.writable (bool, optional) – Whether the writing should be allowed or not. The default is
False
(readonly).truncate_existing (bool, optional) – If writable is
True
, whether to truncate the file if it already exists before writing to it.truncate_invalid_matlab (bool, optional) – If writable is
True
, whether to truncate a file if matlab_compatibility is being done and the file doesn’t have the proper header (userblock in HDF5 terms) setup for MATLAB metadata to be placed.options (Options or None, optional) – The options to use when reading and/or writing. Is mutually exclusive with any additional keyword arguments given (set to
None
or don’t provide the argument at all to use them).**keywords – If options was not provided or was
None
, these are used as arguments to make aOptions
.
- Raises
TypeError – If an argument has an invalid type.
ValueError – If an argument has an invalid value.
IOError – If the file cannot be opened or some other file operation failed.
- Variables
closed (bool) – Whether the file is closed or not.
-
__delitem__
(path)[source]¶ Deletes one path from the file.
Deletes one location from the file specified by path.
- Parameters
path (str or bytes or pathlib.PurePath or Iterable) – The path to write the data to.
str
andbytes
paths must be POSIX style. The directory name is the Group to put it in and the basename is the Dataset/Group name to write it to.- Raises
-
__eq__
(other)¶ Return self==value.
-
__getitem__
(path)[source]¶ Reads the object at the specified path from the file.
A wrapper around the
reads
method to read a single piece of data at the single location path.- Parameters
path (str or bytes or pathlib.PurePath or Iterable, optional) – The path to read from.
str
andbytes
paths must be POSIX style.- Returns
data – The data that is read.
- Return type
any
- Raises
IOError – If the file is closed.
KeyError – If the path cannot be found.
exceptions.CantReadError – If reading the data can’t be done.
See also
-
__iter__
()[source]¶ Get an Iterator over the names in the file root.
Warning
The names are returned as is, rather than unescaped. Use
pathesc.unescape_path
to unescape them.- Returns
it – Iterator over the names of the objects in the file root.
- Return type
Iterator
- Raises
IOError – If the file is not open.
See also
-
__ne__
()¶ Return self!=value.
-
__setitem__
(path, data)[source]¶ Writes one piece of data into the file.
A wrapper around the
writes
method to write a single piece of data, data, to a single location, path.- Parameters
path (str or bytes or pathlib.PurePath or Iterable) – The path to write the data to.
str
andbytes
paths must be POSIX style. The directory name is the Group to put it in and the basename is the Dataset/Group name to write it to.data (any) – The python object to write.
- Raises
IOError – If the file is closed or it isn’t writable.
TypeError – If path is an invalid type.
NotImplementedError – If writing data is not supported.
exceptions.TypeNotMatlabCompatibleError – If writing a type not compatible with MATLAB and the
action_for_matlab_incompatible
option is set to'error'
.
See also
-
clear
() → None. Remove all items from D.¶
-
get
(k[, d]) → D[k] if k in D, else d. d defaults to None.¶
-
items
() → a set-like object providing a view on D's items¶
-
keys
() → a set-like object providing a view on D's keys¶
-
pop
(k[, d]) → v, remove specified key and return the corresponding value.¶ If key is not found, d is returned if given, otherwise KeyError is raised.
-
popitem
() → (k, v), remove and return some (key, value) pair¶ as a 2-tuple; but raise KeyError if D is empty.
-
read
(path='/')[source]¶ Reads one piece of data from the file.
A wrapper around the
reads
method to read a single piece of data at the single location path.- Parameters
path (str or bytes or pathlib.PurePath or Iterable, optional) – The path to read from.
str
andbytes
paths must be POSIX style. The default is'/'
.- Returns
data – The data that is read.
- Return type
any
- Raises
IOError – If the file is closed.
KeyError – If the path cannot be found.
exceptions.CantReadError – If reading the data can’t be done.
See also
-
reads
(paths)[source]¶ Read pieces of data from the file.
- Parameters
paths (Iterable) – An iterable of paths to read data from.
str
andbytes
paths must be POSIX style.- Returns
datas – An Iterable holding the piece of data for each path in paths in the same order.
- Return type
Iterable
- Raises
IOError – If the file is closed.
KeyError – If a path cannot be found.
exceptions.CantReadError – If reading the data can’t be done.
-
setdefault
(k[, d]) → D.get(k,d), also set D[k]=d if k not in D¶
-
update
([E, ]**F) → None. Update D from mapping/iterable E and F.¶ If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
-
values
() → an object providing a view on D's values¶
-
write
(data, path='/')[source]¶ Writes one piece of data into the file.
A wrapper around the
writes
method to write a single piece of data, data, to a single location, path.- Parameters
data (any) – The python object to write.
path (str or bytes pathlib.PurePath or Iterable, optional) – The path to write the data to.
str
andbytes
paths must be POSIX style. The directory name is the Group to put it in and the basename is the Dataset/Group name to write it to. The default is'/'
.
- Raises
IOError – If the file is closed or it isn’t writable.
TypeError – If path is an invalid type.
NotImplementedError – If writing data is not supported.
exceptions.TypeNotMatlabCompatibleError – If writing a type not compatible with MATLAB and the
action_for_matlab_incompatible
option is set to'error'
.
See also
-
writes
(mdict)[source]¶ Write one or more pieces of data to the file.
Stores one or more python objects in mdict to the specified locations in the HDF5. The paths are specified as POSIX style paths where the directory name is the Group to put it in and the basename is the Dataset/Group name to write to.
- Parameters
mdict (Mapping) – A
dict
or similar Mapping type of paths and the data to write to the file. The paths are the keys (str
andbytes
paths must be POSIX style) where the directory name is the Group to put it in and the basename is the name to write it to. The values are the data to write.- Raises
IOError – If the file is closed or it isn’t writable.
TypeError – If a path in mdict is an invalid type.
NotImplementedError – If writing an object in mdict is not supported.
exceptions.TypeNotMatlabCompatibleError – If writing a type not compatible with MATLAB and the
action_for_matlab_incompatible
option is set to'error'
.
Options¶
-
class
hdf5storage.
Options
(store_python_metadata=True, matlab_compatible=True, action_for_matlab_incompatible='error', delete_unused_variables=False, structured_numpy_ndarray_as_struct=False, make_atleast_2d=False, convert_numpy_bytes_to_utf16=False, convert_numpy_str_to_utf16=False, convert_bools_to_uint8=False, reverse_dimension_order=False, structs_as_dicts=False, store_shape_for_empty=False, complex_names=('r', 'i'), group_for_references='/#refs#', oned_as='row', dict_like_keys_name='keys', dict_like_values_name='values', compress=True, compress_size_threshold=16384, compression_algorithm='gzip', gzip_compression_level=7, shuffle_filter=True, compressed_fletcher32_filter=True, uncompressed_fletcher32_filter=False, marshaller_collection=None, **keywords)[source]¶ Bases:
object
Set of options governing how data is read/written to/from disk.
There are many ways that data can be transformed as it is read or written from a file, and many attributes can be used to describe the data depending on its format. The option with the most effect is the matlab_compatible option. It makes sure that the file is compatible with MATLAB’s HDF5 based version 7.3 mat file format. It overrides several options to the values in the following table.
attribute
value
delete_unused_variables
True
structured_numpy_ndarray_as_struct
True
make_atleast_2d
True
convert_numpy_bytes_to_utf16
True
convert_numpy_str_to_utf16
True
convert_bools_to_uint8
True
reverse_dimension_order
True
store_shape_for_empty
True
complex_names
('real', 'imag')
group_for_references
'/#refs#'
compression_algorithm
'gzip'
In addition to setting these options, a specially formatted block of bytes is put at the front of the file so that MATLAB can recognize its format.
- Parameters
store_python_metadata (bool, optional) – See Attributes.
matlab_compatible (bool, optional) – See Attributes.
action_for_matlab_incompatible (str, optional) – See Attributes. Only valid values are ‘ignore’, ‘discard’, and ‘error’.
delete_unused_variables (bool, optional) – See Attributes.
structured_numpy_ndarray_as_struct (bool, optional) – See Attributes.
make_atleast_2d (bool, optional) – See Attributes.
convert_numpy_bytes_to_utf16 (bool, optional) – See Attributes.
convert_numpy_str_to_utf16 (bool, optional) – See Attributes.
convert_bools_to_uint8 (bool, optional) – See Attributes.
reverse_dimension_order (bool, optional) – See Attributes.
store_shape_for_empty (bool, optional) – See Attributes.
complex_names (tuple of two str, optional) – See Attributes.
group_for_references (str, optional) – See Attributes.
oned_as (str, optional) – See Attributes.
dict_like_keys_name (str, optional) – See Attributes.
dict_like_values_name (str, optional) – See Attributes.
compress (bool, optional) – See Attributes.
compress_size_threshold (int, optional) – See Attributes.
compression_algorithm (str, optional) – See Attributes.
gzip_compression_level (int, optional) – See Attributes.
shuffle_filter (bool, optional) – See Attributes.
compressed_fletcher32_filter (bool, optional) – See Attributes.
uncompressed_fletcher32_filter (bool, optional) – See Attributes.
marshaller_collection (MarshallerCollection, optional) – See Attributes.
**keywords – Additional keyword arguments. They are ignored. They are allowed to be given to be more compatible with future versions of this package where more options will be added.
- Variables
make_atleast_2d (bool) –
complex_names (tuple of two str) –
oned_as ({'row', 'column'}) –
compression_algorithm ({'gzip', 'lzf', 'szip'}) –
shuffle_filter (bool) –
marshaller_collection (MarshallerCollection) – Collection of marshallers to disk.
-
property
action_for_matlab_incompatible
¶ The action to do when writing non-MATLAB compatible data.
{‘ignore’, ‘discard’, ‘error’}
The action to perform when doing MATLAB compatibility but a type being written is not MATLAB compatible. The actions are to write the data anyways (‘ignore’), don’t write the incompatible data (‘discard’), or throw a
TypeNotMatlabCompatibleError
exception. The default is ‘error’.
-
property
complex_names
¶ Names to use for the real and imaginary fields.
tuple of two str
(r, i)
where r and i are twostr
. When reading and writing complex numbers, the real part gets the name in r and the imaginary part gets the name in i.h5py
uses('r', 'i')
by default, unless MATLAB compatibility is being done in which case its default is('real', 'imag')
.Must be
('real', 'imag')
if doing MATLAB compatibility.
-
property
compress
¶ Whether to compress large python objects (datasets).
bool
If
True
, python objects (datasets) larger thancompress_size_threshold
will be compressed.
-
property
compress_size_threshold
¶ Minimum size of a python object before it is compressed.
int
Minimum size in bytes a python object must be for it to be compressed if
compress
is set. Must be non-negative.See also
-
property
compressed_fletcher32_filter
¶ Whether to use the fletcher32 filter on compressed python objects.
bool
If
True
, python objects (datasets) that are compressed are run through the fletcher32 filter, which stores a checksum with each chunk so that data corruption can be more easily detected.See also
compress
,shuffle_filter
,uncompressed_flether32_filter
,h5py.Group.create_dataset
-
property
compression_algorithm
¶ Algorithm to use for compression.
{‘gzip’, ‘lzf’, ‘szip’}
Compression algorithm to use When the
compress
option is set and a python object is larger thancompress_size_threshold
.'gzip'
is the only MATLAB compatible option.'gzip'
is also known as the Deflate algorithm, which is the default compression algorithm of ZIP files and is a common compression algorithm used on tarballs. It is the most compatible option. It has good compression and is reasonably fast. Its compression level is set with thegzip_compression_level
option, which is an integer between 0 and 9 inclusive.'lzf'
is a very fast but low to moderate compression algorithm. It is less commonly used than gzip/Deflate, but doesn’t have any patent or license issues.'szip'
is a compression algorithm that has some patents and license restrictions. It is not always available.See also
compress
,compress_size_threshold
,h5py.Group.create_dataset
http
//www.hdfgroup.org/doc_resource/SZIP/Commercial_szip.html
-
property
convert_bools_to_uint8
¶ Whether or not to convert bools to
numpy.uint8
.bool
If
True
(defaults toFalse
unless MATLAB compatibility is being done), bool types are converted tonumpy.uint8
before being written to file.Must be
True
if doing MATLAB compatibility. MATLAB doesn’t use the enums thath5py
wants to use by default and also uses uint8 intead of int8.
-
property
convert_numpy_bytes_to_utf16
¶ Whether or not to convert
numpy.bytes_
to UTF-16.bool
If
True
(defaults toFalse
unless MATLAB compatibility is being done),numpy.bytes_
and anything that is converted to them (bytes
, andbytearray
) are converted to UTF-16 before being written to file asnumpy.uint16
.Must be
True
if doing MATLAB compatibility. MATLAB uses UTF-16 for its strings.See also
numpy.bytes_
,convert_numpy_str_to_utf16
-
property
convert_numpy_str_to_utf16
¶ Whether or not to convert
numpy.unicode_
to UTF-16.bool
If
True
(defaults toFalse
unless MATLAB compatibility is being done),numpy.unicode_
and anything that is converted to them (str
) will be converted to UTF-16 if possible before being written to file asnumpy.uint16
. If doing so would lead to a loss of data (character can’t be translated to UTF-16) or would change the shape of an array ofnumpy.unicode_
due to a character being converted into a pair of 2-bytes, the conversion will not be made and the string will be stored in UTF-32 form as anumpy.uint32
.Must be
True
if doing MATLAB compatibility. MATLAB uses UTF-16 for its strings.See also
numpy.bytes_
,convert_numpy_str_to_utf16
-
property
delete_unused_variables
¶ Whether or not to delete file variables not written to.
bool
If
True
(defaults toFalse
unless MATLAB compatibility is being done), variables in the file below where writing starts that are not written to are deleted.Must be
True
if doing MATLAB compatibility.
-
property
dict_like_keys_name
¶ The Dataset name for the keys of dict like objects.
str
When a
dict
like object has at least one key that isn’t anstr
or is anstr
with invalid characters, the objects are stored as an array of keys and an array of values. This option sets the name of the Dataset for the keys.New in version 0.2.
See also
-
property
dict_like_values_name
¶ The Dataset name for the values of dict like objects.
str
When a
dict
like object has at least one key that isn’t anstr
or is anstr
with invalid characters, the objects are stored as an array of keys and an array of values. This option sets the name of the Dataset for the values.New in version 0.2.
See also
-
property
group_for_references
¶ Path for where to put objects pointed at by references.
str
The absolute POSIX path for the Group to place all data that is pointed to by another piece of data (needed for
numpy.object_
and similar types). This path is automatically excluded from its parent group when reading back adict
.Must be
'/#refs#
if doing MATLAB compatibility.Must already be escaped.
See also
-
property
gzip_compression_level
¶ The compression level to use when doing the gzip algorithm.
int
Compression level to use when data is being compressed with the
'gzip'
algorithm. Must be an integer between 0 and 9 inclusive. Lower values are faster while higher values give better compression.See also
-
property
make_atleast_2d
¶ Whether or not to convert scalar types to 2D arrays.
bool
If
True
(defaults toFalse
unless MATLAB compatibility is being done), all scalar types are converted to 2D arrays when written to file.oned_as
determines whether 1D arrays are turned into row or column vectors.Must be
True
if doing MATLAB compatibility. MATLAB can only import 2D and higher dimensional arrays.See also
-
property
marshaller_collection
¶ The MarshallerCollection to use.
MarshallerCollection
The
MarshallerCollection
(collection of marshallers to disk) to use. The default is to use the default one fromget_default_MarshallerCollection
.Warning
This property does NOT return a copy.
-
property
matlab_compatible
¶ Whether or not to make the file compatible with MATLAB.
bool
If
True
(default), data is written to file in such a way that it compatible with MATLAB’s version 7.3 mat file format which is HDF5 based. Setting it toTrue
forces other options to hold the specific values in the table below.attribute
value
delete_unused_variables
True
structured_numpy_ndarray_as_struct
True
make_atleast_2d
True
convert_numpy_bytes_to_utf16
True
convert_numpy_str_to_utf16
True
convert_bools_to_uint8
True
reverse_dimension_order
True
store_shape_for_empty
True
complex_names
('real', 'imag')
group_for_references
'/#refs#'
compression_algorithm
'gzip'
In addition to setting these options, a specially formatted block of bytes is put at the front of the file so that MATLAB can recognize its format.
-
property
oned_as
¶ Vector that 1D arrays become when making everything >= 2D.
{‘row’, ‘column’}
When the
make_atleast_2d
option is set (set implicitly by doing MATLAB compatibility), this option controls whether 1D arrays become row vectors or column vectors.See also
-
property
reverse_dimension_order
¶ Whether or not to reverse the order of array dimensions.
bool
If
True
(defaults toFalse
unless MATLAB compatibility is being done), the dimension order ofnumpy.ndarray
andnumpy.matrix
are reversed. This switches them from C ordering to Fortran ordering. The switch of ordering is essentially a transpose.Must be
True
if doing MATLAB compatibility. MATLAB uses Fortran ordering.
-
property
shuffle_filter
¶ Whether to use the shuffle filter on compressed python objects.
bool
If
True
, python objects (datasets) that are compressed are run through the shuffle filter, which reversibly rearranges the data to improve compression.See also
compress
,h5py.Group.create_dataset
-
property
store_python_metadata
¶ Whether or not to store Python metadata.
bool
If
True
(default), information on the Python type for each object written to disk is put in its attributes so that it can be read back into Python as the same type.
-
property
store_shape_for_empty
¶ Whether to write the shape if an object has no elements.
bool
If
True
(defaults toFalse
unless MATLAB compatibility is being done), objects that have no elements (e.g. a 0x0x2 array) will have their shape (an array of the number of elements along each axis) written to disk in place of nothing, which would otherwise be written.Must be
True
if doing MATLAB compatibility. For empty arrays, MATLAB requires that the shape array be written in its place along with the attribute ‘MATLAB_empty’ set to 1 to flag it.
-
property
structs_as_dicts
¶ Whether Matlab structs should be read as dicts.
bool
Setting this to
True
can be helpful if your structures contain very large arrays such that their dtypes, if converted tonp.ndarray
, would exceed the 2GB maximum allowed by NumPy.
-
property
structured_numpy_ndarray_as_struct
¶ Whether or not to convert structured ndarrays to structs.
bool
If
True
(defaults toFalse
unless MATLAB compatibility is being done), allnumpy.ndarray
with fields (compound dtypes) are written as HDF5 Groups with the fields as Datasets (correspond to struct arrays in MATLAB).Must be
True
if doing MATLAB compatibility. MATLAB cannot handle the compound types made by writing these types.
-
property
uncompressed_fletcher32_filter
¶ Whether to use the fletcher32 filter on uncompressed non-scalar python objects.
bool
If
True
, python objects (datasets) that are NOT compressed and are not scalars (when converted to a Numpy type, their shape is not an emptytuple
) are run through the fletcher32 filter, which stores a checksum with each chunk so that data corruption can be more easily detected. This forces all uncompressed data to be chuncked regardless of how small and can increase file sizes.See also
compress
,shuffle_filter
,compressed_flether32_filter
,h5py.Group.create_dataset
MarshallerCollection¶
-
class
hdf5storage.
MarshallerCollection
(load_plugins=False, lazy_loading=True, priority=('builtin', 'plugin', 'user'), marshallers=[])[source]¶ Bases:
object
Represents, maintains, and retreives a set of marshallers.
Maintains a list of marshallers used to marshal data types to and from HDF5 files. It includes the builtin marshallers from the
hdf5storage.Marshallers
module, optionally any marshallers from installed third party plugins, as well as any user supplied or added marshallers. While the builtin list cannot be changed; user ones can be added or removed. Also has functions to get the marshaller appropriate for thetype
or type_string for a python data type.User marshallers must inherit from
hdf5storage.Marshallers.TypeMarshaller
and provide its interface.The priority with which marshallers are chosen (builtin, plugin, or user) can be set using the priority option. Within marshallers from third party plugins, those supporting the higher Marshaller API versions take priority over those supporting lower versions.
Changed in version 0.2: All marshallers must now inherit from
hdf5storage.Marshallers.TypeMarshaller
.New in version 0.2: Marshallers can be loaded from third party plugins that declare the
'hdf5storage.marshallers.plugins'
entry point.Changed in version 0.2: The order of marshaller priority (builtin, plugin, or user) can be changed. The default is now builtin, plugin, user whereas previously the default was user, builtin.
- Parameters
load_plugins (bool, optional) – Whether to load marshallers from the third party plugins or not. Default is
False
.lazy_loading (bool, optional) – Whether to attempt to load the required modules for each marshaller right away when added/given or to only do so when required (when marshaller is needed). Default is
True
.priority (Sequence, optional) – 3-element Sequence specifying the priority ordering (first has highest priority). The three elements must be
'builtin'
for the builtin marshallers included in this package,'plugin'
for marshallers provided by other python packages via plugin, and'user'
for marshallers provided to this class explicityly during creation. The default priority order is builtin, plugin, user.marshallers (marshaller or Iterable of marshallers, optional) – The user marshaller/s to add to the collection. Must inherit from
hdf5storage.Marshallers.TypeMarshaller
.
- Variables
priority (tuple of str) –
- Raises
TypeError – If one of the arguments is the wrong type.
ValueError – If one of the arguments has an invalid value.
-
add_marshaller
(marshallers)[source]¶ Add a marshaller/s to the user provided list.
Adds a marshaller or an Iterable of them to the user provided set of marshallers.
Changed in version 0.2: All marshallers must now inherit from
hdf5storage.Marshallers.TypeMarshaller
.- Parameters
marshallers (marshaller or Iterable) – The user marshaller/s to add to the user provided collection. Must inherit from
hdf5storage.Marshallers.TypeMarshaller
.- Raises
TypeError – If one of marshallers is the wrong type.
-
clear_marshallers
()[source]¶ Clears the list of user provided marshallers.
Removes all user provided marshallers, but not the builtin ones from the
hdf5storage.Marshallers
module or those from plugins, from the list of marshallers used.
-
get_marshaller_for_matlab_class
(matlab_class)[source]¶ Gets the appropriate marshaller for a MATLAB class string.
Retrieves the marshaller, if any, that can be used to read/write a Python object associated with the given MATLAB class string. The modules it requires, if available, will be loaded.
- Parameters
matlab_class (str) – MATLAB class string for a Python object.
- Returns
marshaller (marshaller or None) – The marshaller that can read/write the type to file.
None
if no appropriate marshaller is found.has_required_modules (bool) – Whether the required modules for reading the type are present or not.
-
get_marshaller_for_type
(tp)[source]¶ Gets the appropriate marshaller for a type.
Retrieves the marshaller, if any, that can be used to read/write a Python object with type ‘tp’. The modules it requires, if available, will be loaded.
- Parameters
tp (type or str) – Python object
type
(which would be the class reference) or its string representation like'collections.deque'
.- Returns
marshaller (marshaller or None) – The marshaller that can read/write the type to file.
None
if no appropriate marshaller is found.has_required_modules (bool) – Whether the required modules for reading the type are present or not.
-
get_marshaller_for_type_string
(type_string)[source]¶ Gets the appropriate marshaller for a type string.
Retrieves the marshaller, if any, that can be used to read/write a Python object with the given type string. The modules it requires, if available, will be loaded.
- Parameters
type_string (str) – Type string for a Python object.
- Returns
marshaller (marshaller or None) – The marshaller that can read/write the type to file.
None
if no appropriate marshaller is found.has_required_modules (bool) – Whether the required modules for reading the type are present or not.
-
property
priority
¶ The priority order when choosing the marshaller to use.
tuple of str
3-element
tuple
specifying the priority ordering (first has highest priority). The three elements are'builtin'
for the builtin marshallers included in this package,'plugin'
for marshallers provided by other python packages via plugin, and'user'
for marshallers provided to this class explicityly during creation.