[hdf-forum] Reading across multiple chunks is very slow

Ger van Diepen diepen at astron.nl
Fri Mar 20 04:22:15 EDT 2009


Hi Fransesc,

The only thing I can think of is that reading [1,1978,1556,2] requires much more data shuffling. Effectively the 2 tiles have to be interleaved. In the other case the 2 tiles just have to be concatenated. But I doubt if that costs so much more time.
Do you have the amount of user and system time it took?

Cheers,
Ger
 
>>> Francesc Alted <faltet at pytables.org> 03/19/09 7:35 PM >>> 
Hi,

A PyTables' user has reported a performance problem when reading a 
dataset in some cases.  I've tracked down the problem to the HDF5 
library as the output of the attached script reveals:

Time for creating dataset with dims {3, 1978, 1556, 288} --> 0.000000
Time for writing hyperslice {2, 1978, 1556, 2} --> 12.010000
Time for reading hyperslice {2, 1978, 1556, 1} --> 0.020000
Time for reading hyperslice {1, 1978, 1556, 2} --> 2.490000

[This dataset has a chunksize of: {1, 1978, 1556, 1}]

The problem is: why it took 100x times more to read a hyperslice with a 
count of {1, 1978, 1556, 2} than other with count {2, 1978, 1556, 1}?

I was trying to figure out what's happening, but as I can't realize a 
clear explanation, I think that perhaps this is a bug in HDF5.  I've 
tried with HDF5 1.6.5, 1.8.2 and 1.8.2-post8, all with similar results.

Thanks,

-- 
Francesc Alted



----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.





More information about the Hdf-forum mailing list