[hdf-forum] Re: HDF5 Deleting Datasets AND Recovering Space
Quincey Koziol
koziol at hdfgroup.org
Tue Jun 2 15:20:45 EDT 2009
Hi Tom,
On Jun 2, 2009, at 2:14 PM, Tom wrote:
> Thanks for the responses. And great ideas!
>
> For our application, we are using MATLAB, and MATLAB's HDF5 library.
> Unfortunately, the latest version of MATLAB only supports version
> HDF5-1.6.5 of the library. So there is no built-in function 'h5repack'
> or capability for external links. I like the idea of keeping several
> smaller h5 files rather than one large. Less chance of corruption,
> etc. But because our processing steps are interlinked at various
> levels, we would need that external linking capability to process
> along several files 'simutaneously'.
>
> Any ideas on how to access h5repack or the new external linking
> capability in MATLAB that uses an older HDF5 library? MATLAB offers a
> C-conversion capability called 'MEX'. So perhaps I can take some of
> those functions from the C library, convert them manually to C-code,
> and access via MATLAB?
Hmm, I know that the "Spring '09" MATLAB release included a
HDF5-1.8.x library. Which version are you using? Even if it's an
older version, you should be able to run h5repack on your files (its a
command-line utility) without affecting them being read by older
versions of the library.
Quincey
> Thanks again,
>
> Tom
>
> On Jun 2, 3:06 pm, Daniel Kahn <daniel_k... at ssaihq.com> wrote:
>> Tom wrote:I understand section 5.2 of the user guide (below) says
>> that when one deletes groups/datasets using H5G.unlink, the space
>> on disk is NOT recovered. This has come to be an issue in my
>> application. We are working with very large datasets that after
>> some time no longer need to be stored. However, with this
>> limitation, it seems that my HDF5 files are essentially equivalent
>> to a very large CD-R from a storage perspective. I understand from
>> the guide as well that I can recover space by copying the data over
>> to a new file, but when my file size is several gigabytes, this can
>> be a slow process. Has this not become an issue for other users and
>> applications? I understand HDF5 is commonly used for oceanography
>> and satellite imagery. Wouldn't these application require
>> intermittent deleting of data, especially in situations where the
>> file size is on the order of terabytes?Tom,
>> I have not encountered this need in area of satellite remote
>> sensing in which I have some experience. In fact, the one
>> application I know that onlyaddeddata was problematic, not because
>> HDF5 but because of our storage methodology. The processes of
>> reducing data is sub-divided into a sequence of "atomic" steps and
>> the data are stored in HDF5 files with unique names at each step.
>> The number of minutes of data stored in an HDF5 is chosen to keep
>> the file sizes and processing times reasonable. If an improvement
>> to a step in the sequence is developed and all the data are
>> reprocessed; the old files, unnecessary files are just deleted. In
>> our business, once a file is created and its unique name assigned
>> it is considered bad form to modify it.
>> Versions later than 1.8.0 of HDF5 allow you to link from an HDF5
>> file to objects in another HDF5 file using anexternal link. I
>> think applicationsreadingthe data can follow the path from a
>> "master" file into external ones transparently, i.e. without
>> knowing if the link is external or not, which allows a developer to
>> introduce external links with a minimum of code changes. If you
>> remove the link and delete the external file your disk space is
>> recovered. The key here is to design your HDF5 hierarchy and
>> external links to match the pattern of usage you expect, in
>> particular the pattern of how datasets and groups are deleted when
>> they are no longer needed.
>> (Note I have not used this feature myself, seehttp://www.docstoc.com/docs/5688152/External-Links-in-HDF5for
>> a description of abilities and limitations of this technique.)
>> --dan-- Daniel Kahn Science Systems and Applications Inc.
>> 301
>> -867
>> -2162
>> ---------------------------------------------------------------------- This
>> mailing list is for HDF software users discussion. To subscribe to
>> this list, send a message to hdf-forum-subscribe at hdfgroup.org. To
>> unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org
> .
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2502 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20090602/613127f7/attachment.bin>
More information about the Hdf-forum
mailing list