[hdf-forum] Re: HDF5 Deleting Datasets AND Recovering Space

Quincey Koziol koziol at hdfgroup.org
Tue Jun 2 15:20:45 EDT 2009


Hi Tom,

On Jun 2, 2009, at 2:14 PM, Tom wrote:

> Thanks for the responses. And great ideas!
>
> For our application, we are using MATLAB, and MATLAB's HDF5 library.
> Unfortunately, the latest version of MATLAB only supports version
> HDF5-1.6.5 of the library. So there is no built-in function 'h5repack'
> or capability for external links. I like the idea of keeping several
> smaller h5 files rather than one large. Less chance of corruption,
> etc. But because our processing steps are interlinked at various
> levels, we would need that external linking capability to process
> along several files 'simutaneously'.
>
> Any ideas on how to access h5repack or the new external linking
> capability in MATLAB that uses an older HDF5 library? MATLAB offers a
> C-conversion capability called 'MEX'. So perhaps I can take some of
> those functions from the C library, convert them manually to C-code,
> and access via MATLAB?

	Hmm, I know that the "Spring '09" MATLAB release included a  
HDF5-1.8.x library.  Which version are you using?  Even if it's an  
older version, you should be able to run h5repack on your files (its a  
command-line utility) without affecting them being read by older  
versions of the library.

	Quincey


> Thanks again,
>
> Tom
>
> On Jun 2, 3:06 pm, Daniel Kahn <daniel_k... at ssaihq.com> wrote:
>> Tom wrote:I understand section 5.2 of the user guide (below) says  
>> that when one deletes groups/datasets using H5G.unlink, the space  
>> on disk is NOT recovered. This has come to be an issue in my  
>> application. We are working with very large datasets that after  
>> some time no longer need to be stored. However, with this  
>> limitation, it seems that my HDF5 files are essentially equivalent  
>> to a very large CD-R from a storage perspective. I understand from  
>> the guide as well that I can recover space by copying the data over  
>> to a new file, but when my file size is several gigabytes, this can  
>> be a slow process. Has this not become an issue for other users and  
>> applications? I understand HDF5 is commonly used for oceanography  
>> and satellite imagery. Wouldn't these application require  
>> intermittent deleting of data, especially in situations where the  
>> file size is on the order of terabytes?Tom,
>> I have not encountered this need in area of satellite remote  
>> sensing in which I have some experience.  In fact, the one  
>> application I know that onlyaddeddata was problematic, not because  
>> HDF5 but because of our storage methodology.  The processes of  
>> reducing data is sub-divided into a sequence of "atomic" steps and  
>> the data are stored in HDF5 files with unique names at each step.   
>> The number of minutes of data stored in an HDF5 is chosen to keep  
>> the file sizes and processing times reasonable.  If an improvement  
>> to a step in the sequence is developed and all the data are  
>> reprocessed; the old files, unnecessary files are just deleted.  In  
>> our business, once a file is created and its unique name assigned  
>> it is considered bad form to modify it.
>> Versions later than 1.8.0 of HDF5 allow you to link from an HDF5  
>> file to objects in another HDF5 file using anexternal link.  I  
>> think applicationsreadingthe data can follow the path from a  
>> "master" file into external ones transparently, i.e. without  
>> knowing if the link is external or not, which allows a developer to  
>> introduce external links with a minimum of code changes.  If you  
>> remove the link and delete the external file your disk space is  
>> recovered.  The key here is to design your HDF5 hierarchy and  
>> external links to match the pattern of usage you expect, in  
>> particular the pattern of how datasets and groups are deleted when  
>> they are no longer needed.
>> (Note I have not used this feature myself, seehttp://www.docstoc.com/docs/5688152/External-Links-in-HDF5for 
>>  a description of abilities and limitations of this technique.)
>> --dan-- Daniel Kahn Science Systems and Applications Inc.  
>> 301 
>> -867 
>> -2162 
>> ---------------------------------------------------------------------- This 
>>  mailing list is for HDF software users discussion. To subscribe to  
>> this list, send a message to hdf-forum-subscribe at hdfgroup.org. To  
>> unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org 
> .
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2502 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20090602/613127f7/attachment.bin>


More information about the Hdf-forum mailing list