[hdf-forum] HDF5 Deleting Datasets AND Recovering Space
Quincey Koziol
koziol at hdfgroup.org
Tue Jun 2 14:54:59 EDT 2009
Hi Tom,
On Jun 2, 2009, at 1:35 PM, Tom wrote:
> I understand section 5.2 of the user guide (below) says that when one
> deletes groups/datasets using H5G.unlink, the space on disk is NOT
> recovered. This has come to be an issue in my application. We are
> working with very large datasets that after some time no longer need
> to be stored. However, with this limitation, it seems that my HDF5
> files are essentially equivalent to a very large CD-R from a storage
> perspective. I understand from the guide as well that I can recover
> space by copying the data over to a new file, but when my file size is
> several gigabytes, this can be a slow process.
>
> Has this not become an issue for other users and applications? I
> understand HDF5 is commonly used for oceanography and satellite
> imagery. Wouldn't these application require intermittent deleting of
> data, especially in situations where the file size is on the order of
> terabytes?
You could use the 'h5repack' utility on your file, which might be an
OK solution for you. Also, the latest 1.8.x release (1.8.3 currently)
is much more efficient about recovering space in the file, until the
file is closed. The next major version of HDF5 (1.10.x) should have a
mechanism for persistent free space tracking, which will take even
more pressure off the problem. However, it's still possible that even
with persistent free space tracking the file will need to be repacked
if the internal fragmentation gets to be too large.
Quincey
> ---------------------------------------------------------------------------------------------------------------
> From the User Guide:
>
>
>
> The size of the dataset cannot be reduced after it is created. The
> dataset can be expanded by extending one or more dimensions, with
> H5Dextend. It is not possible to contract a dataspace, or to reclaim
> allocated space.
>
> HDF5 does not at this time provide a mechanism to remove a dataset
> from a file, or to reclaim the storage from deleted objects. Through
> the H5Gunlink function one can remove links to a dataset from the file
> structure. Once all links to a dataset have been removed, that dataset
> becomes inaccessible to any application and is effectively removed
> from the file. But this does not recover the space the dataset
> occupies.
>
> The only way to recover the space is to write all the objects of
> the file into a new file. Any unlinked object is inaccessible to the
> application and will not be included in the new file.
>
> ---------------------------------------------------------------------------------------------------------------
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org
> .
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2502 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20090602/b1bd2541/attachment.bin>
More information about the Hdf-forum
mailing list