[Hdf-forum] Chunk cache size and performance
Ger van Diepen
diepen at astron.nl
Fri Jan 8 02:25:09 EST 2010
Hi Francesc,
This might be related to a problem I reported last June.
I did tests using a 3-dim array with various chunk shapes and access
patterns. It got very slow when iterating through the data by vector in
the Z-direction. I believe it was filed as a bug by the HDF5 group. I
sent a test program to Quincey that shows the behaviour. I'll forward
that mail and the test program to you, so you can try it out yourself if
you like to.
I suspect the cache lookup algorithm to be the culprit. The larger the
cache and the more often it has to look up, the slower things get. BTW,
Did you adapt the cache's hash size to the number of slots in the cache?
In your tests you only mention the chunk size, but not the chunk shape.
Isn't that important? It gives me the impression that in your tests the
data are stored and accessed fully sequentially which makes the cache
useless.
Cheers,
Ger
>>> Francesc Alted 01/07/10 8:32 PM >>>
Hi,
I'm doing some small benchmarks to present on a forthcoming workshop,
and I'd
be very grateful if someone can explain shed some light on the
performance
figures that I'm getting.
What I want to stress during the workshop is the dependency of I/O
throughput
on the chunksize for a certain dataset. For making the plots that I've
got
(attached), I have chosen a dataset of 2 GB (2-dim, shape is (512,
65536) and
datatype is double precision) so that it can easily fit into my OS cache
memory (my machine has 8 GB) and make the effects clearer. In the X
axis, I
represent the chunksize for every dataset (from 1 KB up to 8 MB). In
the Y
axis there is the performance for reading the dataset sequentially.
Now, for for a chunk cache size of 1 MB (figure 'sequential-1MB.pdf'),
it can
be seen that HDF5 can read at up to 1.6 GB/s (which is pretty good :-).
However, if I raise the chunk cache size to 8 MB, the peak performance
falls
down to a mere 1.0 GB/s, that is, almost a 40% less. The Blosc
compressor
performance is also very affected by this (slower compressors like LZO
or Zlib
does not notice this effect very much because they are the obvious
bottleneck).
I've tried with other cache sizes, and the smaller, the better (reaching
a
performance of almost 1.9 GB/s on my machine for a cache size of 128
KB).
Varying the number of slots in cache does not seem to affect performance
too
much here.
My guess is that the guilty of this significant performance penalty is
the
chunk cache subsystem of HDF5. Could anyone confirm this? In case this
is
true, do you think that some optimization in that regard could be
carried out
in the future?
Thanks,
--
Francesc Alted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100108/f458d864/attachment.html>
More information about the Hdf-forum
mailing list