[Hdf-forum] Chunk cache size and performance

Ger van Diepen diepen at astron.nl
Mon Jan 11 02:11:19 EST 2010


Hi Fransesc,

Why does HDF5 have to reserve space for each IO operation? Space is only
needed for the cache and the buffer supplied by the caller. The cache
space can be allocated only once because all chunks have the same size.
I would expect the IO and cache to work like:

The cache contains uncompressed chunks (otherwise the system's cache can
be used as well).
Reading (or writing) is done like:
- Iterate over the data to be read and determine which chunks to read
- Per chunk check if in the cache. If so, copy the required data part to
the supplied buffer.
- If not in cache, read that chunk into a (pre-allocated) buffer,
uncompress to a free slot in the cache, and copy to supplied buffer. If
no free slot, it has to remove a chunk from the cache using e.g. a
least-recently-used algorithm.
- If the data are not compressed (as in my case), it can read directly
into the cache slot.

If it is not working that way, I probably have a very incorrect view of
the purpose of the HDF5 cache. Quincey may be able to shed some light.

I noticed in your pictures that apart from HF5L_reg_malloc and _free,
also a lot of calls to H5D-btree-cmp3 are done. I assume these can be
expensive calls. Such calls are expected and therefore I was wondering
if they are the culprit. However, I don't understand why it is using a
btree comparison, because the documentation says that the chunk cache
uses a hash algorithm, not a btree to find chunks in the cache.

Note that casacore (that we are using as well), also supports chunking
and caching. It is much faster for the smaller IO operations, so it is
an implementation issue.

Cheers,
Ger

>>> Francesc Alted  01/11/10 7:07 AM >>>
Hi Ger,

A Friday 08 January 2010 08:25:09 escriguéreu:
> Hi Francesc,
> 
> This might be related to a problem I reported last June.
> I did tests using a 3-dim array with various chunk shapes and access
> patterns. It got very slow when iterating through the data by vector
in
> the Z-direction. I believe it was filed as a bug by the HDF5 group. I
> sent a test program to Quincey that shows the behaviour. I'll forward
> that mail and the test program to you, so you can try it out yourself
if
> you like to.
> 
> I suspect the cache lookup algorithm to be the culprit. The larger the
> cache and the more often it has to look up, the slower things get.
BTW,
> Did you adapt the cache's hash size to the number of slots in the
cache?

Thanks for your suggestion.  I've been looking at your problem, and my 
profiles seem to say that it is not a cache issue.

Have a look at the attached screenshots showing profiles for your test
bed 
reading in the x axis with a cache size of 4 KB (the HDF5 cache
subsystem does 
not enters in action at all) and 256 KB (your size).  I've also added a 
profile for the tiled case for comparison purposes.  For all the
profiles 
(except tiled), the bottleneck is clearly in the `H5FL_reg_free` and 
`H5FL_reg_malloc` calls, no matter how large the cache size is (even if
it 
does not enters in action).

I think this is expected, because HDF5 has to reserve space for each I/O

operation.  When you walk the dataset following directions x or y, you
have to 
do (32*2)x more I/O operations than for the tiled case, and HDF5 needs
to book 
(and free again!) (32*2)x more memory areas.  Also, when you read
through the 
z axis, the additional times to book/release memory is (32*32)x.  All of
this 
is consistent with both profiles and running the benchmark manually:

faltet at antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 t
real    0m0.057s
user    0m0.048s
sys     0m0.004s

faltet at antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 x
setting cache to 32 chunks (4096 bytes) with 3203 slots  // forcing no
cache
real    0m1.055s
user    0m0.860s
sys     0m0.168s

faltet at antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 y
setting cache to 32 chunks (262144 bytes) with 3203 slots
real    0m1.211s
user    0m1.176s
sys     0m0.028s

faltet at antec:/tmp> 
time ./tHDF5 1024 1024 10 32 32 2sys     0m0.024s

So, in my opinion, there is little that HDF5 can do here.  You should
better 
adapt the chunk shape to your most used case (if you have just one, but
I know 
that this is not typically the case).

> In your tests you only mention the chunk size, but not the chunk
shape.
> Isn't that important? It gives me the impression that in your tests
the
> data are stored and accessed fully sequentially which makes the cache
> useless.

Yes, chunk shape is important, sorry, I forgot this important detail. 
As I 
mentioned in a previous message to Rob Latham, I want to optimize 'semi-
random' access mode in a certain row of the dataset, so I normally
choose the 
chunk shape as (1, X), where X is the needed value for obtaining a
chunksize 
between 1 KB and 8 MB --if X is larger than the maximum number of
columns, I 
expand the number of rows in the chunk shape accordingly.

Thanks,

-- 
Francesc Alted

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100111/0e9752f0/attachment.html>


More information about the Hdf-forum mailing list