[Hdf-forum] Slow conversion to binary using h5dump
Elena Pourmal
epourmal at hdfgroup.org
Fri Jan 15 11:34:45 EST 2010
Mark,
On Jan 15, 2010, at 10:26 AM, Mark Howison wrote:
> Hi Peter and Elena,
>
> I agree that the problem is most likely the chunking. I tried
> repacking the dataset, but this was as slow as running h5dump on the
> chunked dataset (probably for the same reason: non-contiguous disk
> access on Lustre). So I regenerated my dataset without chunking, and I
> was able to see about 10x better throughput with h5dump to binary (the
> bottleneck at this point is probably in Lustre and not in h5dump). I
> also tried increasing the buffer size to 4MB but found the effect was
> negligible.
>
Thank you for trying a bigger buffer. It is good to know the result.
Elena
> Thanks for your help,
> Mark
>
> On Wed, Jan 13, 2010 at 2:56 PM, Peter Cao <xcao at hdfgroup.org> wrote:
>> Mark,
>>
>> the default buffer size used in h5dump is 1MB. Setting it to 4MB will
>> improve
>> the performance but it may be still slow because the buffer size is much
>> less
>> than the chunk size.
>>
>> If you use h5repack to change the chunk size to (1x64x3072, a little less
>> than 1MB)
>> and try h5dump again, you will see the difference.
>>
>> Thanks
>> --pc
>>
>>
>> Mark Howison wrote:
>>>
>>> I tried using BE and LE and both are equally slow. Here is the header
>>> info. Also, I should note that the dataset is roughly 108GB, but it
>>> does fit into local memory (196GB is available). Also, it seems to
>>> continuously write at 4MB/s, instead of sitting and processing for a
>>> while and then bursting at 100MB/s or something. It is also chunked.
>>> Maybe this is causing problems, because h5dump has to jump around to
>>> non-contiguous offsets to contiguously assemble the binary output?
>>>
>>> Thanks,
>>> Mark
>>>
>>> mhowison at davinci:/project/projectdirs/vacet/mark> h5dump -p -H -d
>>> /Step#0/Block/Analyze7.5/0 combustion.h5part
>>> HDF5 "combustion.h5part" {
>>> DATASET "/Step#0/Block/Analyze7.5/0" {
>>> DATATYPE H5T_IEEE_F32LE
>>> DATASPACE SIMPLE { ( 3072, 3072, 3072 ) / ( 3072, 3072, 3072 ) }
>>> STORAGE_LAYOUT {
>>> CHUNKED ( 1024, 768, 768 )
>>> SIZE 115964116992
>>> }
>>> FILTERS {
>>> NONE
>>> }
>>> FILLVALUE {
>>> FILL_TIME H5D_FILL_TIME_IFSET
>>> VALUE 0
>>> }
>>> ALLOCATION_TIME {
>>> H5D_ALLOC_TIME_EARLY
>>> }
>>> }
>>> }
>>>
>>>
>>> On Tue, Jan 12, 2010 at 3:40 PM, Elena Pourmal <epourmal at hdfgroup.org>
>>> wrote:
>>>
>>>>
>>>> Mark,
>>>>
>>>> h5dump performance may be affected by many factors (size of the h5dump
>>>> default read buffer, chunking sizes of the dataset, compression, etc.)
>>>> Would it be possible for you to do h5dump -p -H -d .... to print the
>>>> header information for the dataset you are trying to export? We may have a
>>>> better idea what may go wrong.
>>>>
>>>> Thank you!
>>>>
>>>> Elena
>>>> On Jan 12, 2010, at 12:15 PM, Jonathan Kim wrote:
>>>>
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> It's different from h5copy.
>>>>>
>>>>> What the original data format? BE or LE?
>>>>> If it's BE, could you try -b BE and see if any performance difference?
>>>>>
>>>>> And could you try with smaller size of HDF5 file? (under 10GB)
>>>>>
>>>>> Also could you try on other filesystems? (non-parallel as well)
>>>>>
>>>>> Since it's the performance issue not a specific bug, more testing
>>>>> results
>>>>> would be helpful.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> - Jonathan
>>>>>
>>>>> -----Original Message-----
>>>>> From: hdf-forum-bounces at hdfgroup.org
>>>>> [mailto:hdf-forum-bounces at hdfgroup.org]
>>>>> On Behalf Of Mark Howison
>>>>> Sent: Tuesday, January 12, 2010 10:26 AM
>>>>> To: HDF forum
>>>>> Subject: [Hdf-forum] Slow conversion to binary using h5dump
>>>>>
>>>>> Hi, I tried converting a 108GB HDF5 file to binary using the "-b LE"
>>>>> flag in h5dump, but it ran at a crawling pace, only about 4MB/s. This
>>>>> is in comparison to an h5copy I did on the same machine (our SGI
>>>>> Altix) that ran at 600MB/s. The filesystem is GPFS. Any ideas why
>>>>> h5dump is having so much trouble? Is there a conversion phase (to LE)
>>>>> that is bogging things down? Thanks, Mark
>>>>>
>>>>> _______________________________________________
>>>>> Hdf-forum is for HDF software users discussion.
>>>>> Hdf-forum at hdfgroup.org
>>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Hdf-forum is for HDF software users discussion.
>>>>> Hdf-forum at hdfgroup.org
>>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>>
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum at hdfgroup.org
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum at hdfgroup.org
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum at hdfgroup.org
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum at hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
More information about the Hdf-forum
mailing list