[hdf-forum] speeding up h5repack

George N. White III gnwiii at gmail.com
Tue Oct 28 19:21:27 EDT 2008


On Tue, Oct 28, 2008 at 3:20 PM, Brock Palen <brockp at umich.edu> wrote:

> Is there any tweaks that can be done to speed up compressing already created
> hdf5 files?
>
> For example
>
> h5repack -v -i rt_3d_71nm_5micron_hdf5_plt_cnt_0010 -o lt_cnt_0010_zipped -f
> GZIP=1
>
> Takes 129 Minutes
>
> While:
> gzip rt_3d_71nm_5micron_hdf5_plt_cnt_0010
>
> Takes 1.5 Minutes
>
> hdf5-1.6.7
>
> We don't have szip enabled, but would be interested in trying (academic work
> so licensing should not be a problem).
>
> Just seemed strange that it took so long,  the uncompressed hdf5 file is
> from FLASH2.5.
>
> Any insight would be nice.

There is overhead processing the structural information in hdf5 files,
and there is startup overhead for the compression library (setting up
the structures) for each
chunk to be compressed.   It would be interesting to see the time for some
trivial h5repack operation (-f NONE, scaling?).   In principle, h5repack should
be able to take advantage of parallel processing, so if you could get 1000
processors going you mght beat gzip by a large factor.

What were the file sizes?

The gzip program supports levels 1--9 (fast, less compressed to slow,
more compressed), with default 6, so your gzip run should have been
doing more
compression work than h5repack.  The question is how much of the overhead
is dealing with the hdf5 structure and how much from the compression library
startup.   Function call profiles would give you the number of calls to deflate
and deflateInit for the two runs.

-- 
George N. White III <aa056 at chebucto.ns.ca>
Head of St. Margarets Bay, Nova Scotia

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.





More information about the Hdf-forum mailing list