[Hdf-forum] Slow conversion to binary using h5dump

Elena Pourmal epourmal at hdfgroup.org
Fri Jan 15 11:34:45 EST 2010


Mark,

On Jan 15, 2010, at 10:26 AM, Mark Howison wrote:

> Hi Peter and Elena,
> 
> I agree that the problem is most likely the chunking. I tried
> repacking the dataset, but this was as slow as running h5dump on the
> chunked dataset (probably for the same reason: non-contiguous disk
> access on Lustre). So I regenerated my dataset without chunking, and I
> was able to see about 10x better throughput with h5dump to binary (the
> bottleneck at this point is probably in Lustre and not in h5dump). I
> also tried increasing the buffer size to 4MB but found the effect was
> negligible.
> 
Thank you for trying a bigger buffer. It is good to know the result. 

Elena
> Thanks for your help,
> Mark
> 
> On Wed, Jan 13, 2010 at 2:56 PM, Peter Cao <xcao at hdfgroup.org> wrote:
>> Mark,
>> 
>> the default buffer size used in h5dump is 1MB. Setting it to 4MB will
>> improve
>> the performance but it may be still slow because the buffer size is much
>> less
>> than the chunk size.
>> 
>> If you use h5repack to change the chunk size to (1x64x3072, a little less
>> than 1MB)
>> and try h5dump again, you will see the difference.
>> 
>> Thanks
>> --pc
>> 
>> 
>> Mark Howison wrote:
>>> 
>>> I tried using BE and LE and both are equally slow. Here is the header
>>> info. Also, I should note that the dataset is roughly 108GB, but it
>>> does fit into local memory (196GB is available). Also, it seems to
>>> continuously write at 4MB/s, instead of sitting and processing for a
>>> while and then bursting at 100MB/s or something. It is also chunked.
>>> Maybe this is causing problems, because h5dump has to jump around to
>>> non-contiguous offsets to contiguously assemble the binary output?
>>> 
>>> Thanks,
>>> Mark
>>> 
>>> mhowison at davinci:/project/projectdirs/vacet/mark> h5dump -p -H -d
>>> /Step#0/Block/Analyze7.5/0 combustion.h5part
>>> HDF5 "combustion.h5part" {
>>> DATASET "/Step#0/Block/Analyze7.5/0" {
>>>   DATATYPE  H5T_IEEE_F32LE
>>>   DATASPACE  SIMPLE { ( 3072, 3072, 3072 ) / ( 3072, 3072, 3072 ) }
>>>   STORAGE_LAYOUT {
>>>      CHUNKED ( 1024, 768, 768 )
>>>      SIZE 115964116992
>>>    }
>>>   FILTERS {
>>>      NONE
>>>   }
>>>   FILLVALUE {
>>>      FILL_TIME H5D_FILL_TIME_IFSET
>>>      VALUE  0
>>>   }
>>>   ALLOCATION_TIME {
>>>      H5D_ALLOC_TIME_EARLY
>>>   }
>>> }
>>> }
>>> 
>>> 
>>> On Tue, Jan 12, 2010 at 3:40 PM, Elena Pourmal <epourmal at hdfgroup.org>
>>> wrote:
>>> 
>>>> 
>>>> Mark,
>>>> 
>>>> h5dump performance may be affected by many factors (size of the h5dump
>>>> default read buffer, chunking sizes of the dataset, compression, etc.)
>>>> Would it be possible for you to do h5dump -p -H -d .... to print the
>>>> header information for the dataset you are trying to export? We may have a
>>>> better idea what may go wrong.
>>>> 
>>>> Thank you!
>>>> 
>>>> Elena
>>>> On Jan 12, 2010, at 12:15 PM, Jonathan Kim wrote:
>>>> 
>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> It's different from h5copy.
>>>>> 
>>>>> What the original data format? BE or LE?
>>>>> If it's BE, could you try -b BE and see if any performance difference?
>>>>> 
>>>>> And could you try with smaller size of HDF5 file?  (under 10GB)
>>>>> 
>>>>> Also could you try on other filesystems?  (non-parallel as well)
>>>>> 
>>>>> Since it's the performance issue not a specific bug, more testing
>>>>> results
>>>>> would be helpful.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> - Jonathan
>>>>> 
>>>>> -----Original Message-----
>>>>> From: hdf-forum-bounces at hdfgroup.org
>>>>> [mailto:hdf-forum-bounces at hdfgroup.org]
>>>>> On Behalf Of Mark Howison
>>>>> Sent: Tuesday, January 12, 2010 10:26 AM
>>>>> To: HDF forum
>>>>> Subject: [Hdf-forum] Slow conversion to binary using h5dump
>>>>> 
>>>>> Hi, I tried converting a 108GB HDF5 file to binary using the "-b LE"
>>>>> flag in h5dump, but it ran at a crawling pace, only about 4MB/s. This
>>>>> is in comparison to an h5copy I did on the same machine (our SGI
>>>>> Altix) that ran at 600MB/s. The filesystem is GPFS. Any ideas why
>>>>> h5dump is having so much trouble? Is there a conversion phase (to LE)
>>>>> that is bogging things down? Thanks, Mark
>>>>> 
>>>>> _______________________________________________
>>>>> Hdf-forum is for HDF software users discussion.
>>>>> Hdf-forum at hdfgroup.org
>>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Hdf-forum is for HDF software users discussion.
>>>>> Hdf-forum at hdfgroup.org
>>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> Hdf-forum at hdfgroup.org
>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum at hdfgroup.org
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>> 
>>> 
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum at hdfgroup.org
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum at hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org




More information about the Hdf-forum mailing list