[hdf-forum] Reading across multiple chunks is very slow
Ger van Diepen
diepen at astron.nl
Fri Mar 20 04:22:15 EDT 2009
Hi Fransesc,
The only thing I can think of is that reading [1,1978,1556,2] requires much more data shuffling. Effectively the 2 tiles have to be interleaved. In the other case the 2 tiles just have to be concatenated. But I doubt if that costs so much more time.
Do you have the amount of user and system time it took?
Cheers,
Ger
>>> Francesc Alted <faltet at pytables.org> 03/19/09 7:35 PM >>>
Hi,
A PyTables' user has reported a performance problem when reading a
dataset in some cases. I've tracked down the problem to the HDF5
library as the output of the attached script reveals:
Time for creating dataset with dims {3, 1978, 1556, 288} --> 0.000000
Time for writing hyperslice {2, 1978, 1556, 2} --> 12.010000
Time for reading hyperslice {2, 1978, 1556, 1} --> 0.020000
Time for reading hyperslice {1, 1978, 1556, 2} --> 2.490000
[This dataset has a chunksize of: {1, 1978, 1556, 1}]
The problem is: why it took 100x times more to read a hyperslice with a
count of {1, 1978, 1556, 2} than other with count {2, 1978, 1556, 1}?
I was trying to figure out what's happening, but as I can't realize a
clear explanation, I think that perhaps this is a bug in HDF5. I've
tried with HDF5 1.6.5, 1.8.2 and 1.8.2-post8, all with similar results.
Thanks,
--
Francesc Alted
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
More information about the Hdf-forum
mailing list