[hdf-forum] speeding up h5repack
Elena Pourmal
epourmal at hdfgroup.org
Wed Oct 29 13:50:33 EDT 2008
Brock,
For this particular dataset try to specify chunk size with the -l
CHUNK=64x16x16x16 flag.
Explanation:
When user doesn't specify a chunk size, h5repack uses dimensions of
the dataset (9970x16x16x16 in this case) to set up chunking
parameters. Current implementation sets chunk dimensions to the
dataset dimensions. Therefore, one gets a pretty big chunk that
doesn't fit into chunk cache (1MB default; tuning is not available for
h5repack at this point).
h5repack writes a dataset by hyperslabs. Since chunk doesn't fit into
chunk cache, HDF5 library writes part of the chunk, evicts from chunk
cache, compresses it and writes to the file. When next hyperslab needs
to be written, HDF5 reads the chunk, uncompresses it, writes new data,
compresses it, writes to file, and so on.
This behavior will be avoided if hyperslab corresponds to a chunk or
to several chunks that fit into chunk cache.
We are aware of the problem and are working on improving HDF5 tools
performance including better default strategy for choosing chunking
parameters and hyperslabs.
Elena
On Oct 28, 2008, at 8:00 PM, Elena Pourmal wrote:
> Brock,
>
> It is hard to say for sure why performance is bad.
>
> Do you know if original dataset was chunked?
>
> Try
>
> h5dump -p -H
>
> command on your file and check for CHUNKED_LAYOUT keyword in the
> output.
>
> Elena
>
> On Oct 28, 2008, at 1:20 PM, Brock Palen wrote:
>
>> Is there any tweaks that can be done to speed up compressing
>> already created hdf5 files?
>>
>> For example
>>
>> h5repack -v -i rt_3d_71nm_5micron_hdf5_plt_cnt_0010 -o
>> lt_cnt_0010_zipped -f GZIP=1
>>
>> Takes 129 Minutes
>>
>> While:
>> gzip rt_3d_71nm_5micron_hdf5_plt_cnt_0010
>>
>> Takes 1.5 Minutes
>>
>> hdf5-1.6.7
>>
>> We don't have szip enabled, but would be interested in trying
>> (academic work so licensing should not be a problem).
>>
>> Just seemed strange that it took so long, the uncompressed hdf5
>> file is from FLASH2.5.
>>
>> Any insight would be nice.
>>
>>
>>
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> brockp at umich.edu
>> (734)936-1985
>>
>>
>>
>>
>> ----------------------------------------------------------------------
>> This mailing list is for HDF software users discussion.
>> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org
>> .
>> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>>
>>
>
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org
> .
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20081029/86b3a3dd/attachment.html>
More information about the Hdf-forum
mailing list