[Hdf-forum] Slow conversion to binary using h5dump
Mark Howison
mark.howison at gmail.com
Fri Jan 15 11:26:25 EST 2010
Hi Peter and Elena,
I agree that the problem is most likely the chunking. I tried
repacking the dataset, but this was as slow as running h5dump on the
chunked dataset (probably for the same reason: non-contiguous disk
access on Lustre). So I regenerated my dataset without chunking, and I
was able to see about 10x better throughput with h5dump to binary (the
bottleneck at this point is probably in Lustre and not in h5dump). I
also tried increasing the buffer size to 4MB but found the effect was
negligible.
Thanks for your help,
Mark
On Wed, Jan 13, 2010 at 2:56 PM, Peter Cao <xcao at hdfgroup.org> wrote:
> Mark,
>
> the default buffer size used in h5dump is 1MB. Setting it to 4MB will
> improve
> the performance but it may be still slow because the buffer size is much
> less
> than the chunk size.
>
> If you use h5repack to change the chunk size to (1x64x3072, a little less
> than 1MB)
> and try h5dump again, you will see the difference.
>
> Thanks
> --pc
>
>
> Mark Howison wrote:
>>
>> I tried using BE and LE and both are equally slow. Here is the header
>> info. Also, I should note that the dataset is roughly 108GB, but it
>> does fit into local memory (196GB is available). Also, it seems to
>> continuously write at 4MB/s, instead of sitting and processing for a
>> while and then bursting at 100MB/s or something. It is also chunked.
>> Maybe this is causing problems, because h5dump has to jump around to
>> non-contiguous offsets to contiguously assemble the binary output?
>>
>> Thanks,
>> Mark
>>
>> mhowison at davinci:/project/projectdirs/vacet/mark> h5dump -p -H -d
>> /Step#0/Block/Analyze7.5/0 combustion.h5part
>> HDF5 "combustion.h5part" {
>> DATASET "/Step#0/Block/Analyze7.5/0" {
>> DATATYPE H5T_IEEE_F32LE
>> DATASPACE SIMPLE { ( 3072, 3072, 3072 ) / ( 3072, 3072, 3072 ) }
>> STORAGE_LAYOUT {
>> CHUNKED ( 1024, 768, 768 )
>> SIZE 115964116992
>> }
>> FILTERS {
>> NONE
>> }
>> FILLVALUE {
>> FILL_TIME H5D_FILL_TIME_IFSET
>> VALUE 0
>> }
>> ALLOCATION_TIME {
>> H5D_ALLOC_TIME_EARLY
>> }
>> }
>> }
>>
>>
>> On Tue, Jan 12, 2010 at 3:40 PM, Elena Pourmal <epourmal at hdfgroup.org>
>> wrote:
>>
>>>
>>> Mark,
>>>
>>> h5dump performance may be affected by many factors (size of the h5dump
>>> default read buffer, chunking sizes of the dataset, compression, etc.)
>>> Would it be possible for you to do h5dump -p -H -d .... to print the
>>> header information for the dataset you are trying to export? We may have a
>>> better idea what may go wrong.
>>>
>>> Thank you!
>>>
>>> Elena
>>> On Jan 12, 2010, at 12:15 PM, Jonathan Kim wrote:
>>>
>>>
>>>>
>>>> Hi,
>>>>
>>>> It's different from h5copy.
>>>>
>>>> What the original data format? BE or LE?
>>>> If it's BE, could you try -b BE and see if any performance difference?
>>>>
>>>> And could you try with smaller size of HDF5 file? (under 10GB)
>>>>
>>>> Also could you try on other filesystems? (non-parallel as well)
>>>>
>>>> Since it's the performance issue not a specific bug, more testing
>>>> results
>>>> would be helpful.
>>>>
>>>> Thanks.
>>>>
>>>> - Jonathan
>>>>
>>>> -----Original Message-----
>>>> From: hdf-forum-bounces at hdfgroup.org
>>>> [mailto:hdf-forum-bounces at hdfgroup.org]
>>>> On Behalf Of Mark Howison
>>>> Sent: Tuesday, January 12, 2010 10:26 AM
>>>> To: HDF forum
>>>> Subject: [Hdf-forum] Slow conversion to binary using h5dump
>>>>
>>>> Hi, I tried converting a 108GB HDF5 file to binary using the "-b LE"
>>>> flag in h5dump, but it ran at a crawling pace, only about 4MB/s. This
>>>> is in comparison to an h5copy I did on the same machine (our SGI
>>>> Altix) that ran at 600MB/s. The filesystem is GPFS. Any ideas why
>>>> h5dump is having so much trouble? Is there a conversion phase (to LE)
>>>> that is bogging things down? Thanks, Mark
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum at hdfgroup.org
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum at hdfgroup.org
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum at hdfgroup.org
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum at hdfgroup.org
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum at hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
More information about the Hdf-forum
mailing list