[hdf-forum] speeding up h5repack

Elena Pourmal epourmal at hdfgroup.org
Wed Oct 29 13:50:33 EDT 2008


Brock,

For this particular dataset try to specify chunk size with the  -l  
CHUNK=64x16x16x16 flag.

Explanation:

When user doesn't specify a chunk size, h5repack uses dimensions of  
the dataset (9970x16x16x16 in this case) to set up chunking  
parameters. Current implementation sets chunk dimensions to the  
dataset dimensions. Therefore, one gets a pretty big chunk that  
doesn't fit into chunk cache (1MB default; tuning is not available for  
h5repack at this point).

h5repack writes a dataset by hyperslabs. Since chunk doesn't fit into  
chunk cache, HDF5 library writes part of the chunk, evicts from chunk  
cache, compresses it and writes to the file. When next hyperslab needs  
to be written, HDF5 reads the chunk, uncompresses it, writes new data,  
compresses it, writes to file, and so on.

This behavior will be avoided if hyperslab  corresponds to a chunk or  
to several chunks that fit into chunk cache.

We are aware of the problem and are working on improving HDF5 tools  
performance including better default strategy for choosing chunking  
parameters and hyperslabs.

Elena

On Oct 28, 2008, at 8:00 PM, Elena Pourmal wrote:

> Brock,
>
> It is hard to say for sure why performance is bad.
>
> Do you know if original dataset was chunked?
>
> Try
>
> h5dump -p -H
>
> command on your file  and check for CHUNKED_LAYOUT keyword in the  
> output.
>
> Elena
>
> On Oct 28, 2008, at 1:20 PM, Brock Palen wrote:
>
>> Is there any tweaks that can be done to speed up compressing  
>> already created hdf5 files?
>>
>> For example
>>
>> h5repack -v -i rt_3d_71nm_5micron_hdf5_plt_cnt_0010 -o  
>> lt_cnt_0010_zipped -f GZIP=1
>>
>> Takes 129 Minutes
>>
>> While:
>> gzip rt_3d_71nm_5micron_hdf5_plt_cnt_0010
>>
>> Takes 1.5 Minutes
>>
>> hdf5-1.6.7
>>
>> We don't have szip enabled, but would be interested in trying  
>> (academic work so licensing should not be a problem).
>>
>> Just seemed strange that it took so long,  the uncompressed hdf5  
>> file is from FLASH2.5.
>>
>> Any insight would be nice.
>>
>>
>>
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> brockp at umich.edu
>> (734)936-1985
>>
>>
>>
>>
>> ----------------------------------------------------------------------
>> This mailing list is for HDF software users discussion.
>> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org 
>> .
>> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>>
>>
>
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org 
> .
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20081029/86b3a3dd/attachment.html>


More information about the Hdf-forum mailing list