[Hdf-forum] Slow conversion to binary using h5dump

Mark Howison mark.howison at gmail.com
Fri Jan 15 11:26:25 EST 2010


Hi Peter and Elena,

I agree that the problem is most likely the chunking. I tried
repacking the dataset, but this was as slow as running h5dump on the
chunked dataset (probably for the same reason: non-contiguous disk
access on Lustre). So I regenerated my dataset without chunking, and I
was able to see about 10x better throughput with h5dump to binary (the
bottleneck at this point is probably in Lustre and not in h5dump). I
also tried increasing the buffer size to 4MB but found the effect was
negligible.

Thanks for your help,
Mark

On Wed, Jan 13, 2010 at 2:56 PM, Peter Cao <xcao at hdfgroup.org> wrote:
> Mark,
>
> the default buffer size used in h5dump is 1MB. Setting it to 4MB will
> improve
> the performance but it may be still slow because the buffer size is much
> less
> than the chunk size.
>
> If you use h5repack to change the chunk size to (1x64x3072, a little less
> than 1MB)
> and try h5dump again, you will see the difference.
>
> Thanks
> --pc
>
>
> Mark Howison wrote:
>>
>> I tried using BE and LE and both are equally slow. Here is the header
>> info. Also, I should note that the dataset is roughly 108GB, but it
>> does fit into local memory (196GB is available). Also, it seems to
>> continuously write at 4MB/s, instead of sitting and processing for a
>> while and then bursting at 100MB/s or something. It is also chunked.
>> Maybe this is causing problems, because h5dump has to jump around to
>> non-contiguous offsets to contiguously assemble the binary output?
>>
>> Thanks,
>> Mark
>>
>> mhowison at davinci:/project/projectdirs/vacet/mark> h5dump -p -H -d
>> /Step#0/Block/Analyze7.5/0 combustion.h5part
>> HDF5 "combustion.h5part" {
>> DATASET "/Step#0/Block/Analyze7.5/0" {
>>   DATATYPE  H5T_IEEE_F32LE
>>   DATASPACE  SIMPLE { ( 3072, 3072, 3072 ) / ( 3072, 3072, 3072 ) }
>>   STORAGE_LAYOUT {
>>      CHUNKED ( 1024, 768, 768 )
>>      SIZE 115964116992
>>    }
>>   FILTERS {
>>      NONE
>>   }
>>   FILLVALUE {
>>      FILL_TIME H5D_FILL_TIME_IFSET
>>      VALUE  0
>>   }
>>   ALLOCATION_TIME {
>>      H5D_ALLOC_TIME_EARLY
>>   }
>> }
>> }
>>
>>
>> On Tue, Jan 12, 2010 at 3:40 PM, Elena Pourmal <epourmal at hdfgroup.org>
>> wrote:
>>
>>>
>>> Mark,
>>>
>>> h5dump performance may be affected by many factors (size of the h5dump
>>> default read buffer, chunking sizes of the dataset, compression, etc.)
>>> Would it be possible for you to do h5dump -p -H -d .... to print the
>>> header information for the dataset you are trying to export? We may have a
>>> better idea what may go wrong.
>>>
>>> Thank you!
>>>
>>> Elena
>>> On Jan 12, 2010, at 12:15 PM, Jonathan Kim wrote:
>>>
>>>
>>>>
>>>> Hi,
>>>>
>>>> It's different from h5copy.
>>>>
>>>> What the original data format? BE or LE?
>>>> If it's BE, could you try -b BE and see if any performance difference?
>>>>
>>>> And could you try with smaller size of HDF5 file?  (under 10GB)
>>>>
>>>> Also could you try on other filesystems?  (non-parallel as well)
>>>>
>>>> Since it's the performance issue not a specific bug, more testing
>>>> results
>>>> would be helpful.
>>>>
>>>> Thanks.
>>>>
>>>> - Jonathan
>>>>
>>>> -----Original Message-----
>>>> From: hdf-forum-bounces at hdfgroup.org
>>>> [mailto:hdf-forum-bounces at hdfgroup.org]
>>>> On Behalf Of Mark Howison
>>>> Sent: Tuesday, January 12, 2010 10:26 AM
>>>> To: HDF forum
>>>> Subject: [Hdf-forum] Slow conversion to binary using h5dump
>>>>
>>>> Hi, I tried converting a 108GB HDF5 file to binary using the "-b LE"
>>>> flag in h5dump, but it ran at a crawling pace, only about 4MB/s. This
>>>> is in comparison to an h5copy I did on the same machine (our SGI
>>>> Altix) that ran at 600MB/s. The filesystem is GPFS. Any ideas why
>>>> h5dump is having so much trouble? Is there a conversion phase (to LE)
>>>> that is bogging things down? Thanks, Mark
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum at hdfgroup.org
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum at hdfgroup.org
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum at hdfgroup.org
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum at hdfgroup.org
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum at hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>



More information about the Hdf-forum mailing list