[Hdf-forum] Chunk cache size and performance

Francesc Alted faltet at pytables.org
Fri Jan 8 12:45:08 EST 2010


Hi Ger,

A Friday 08 January 2010 08:25:09 escriguéreu:
> Hi Francesc,
> 
> This might be related to a problem I reported last June.
> I did tests using a 3-dim array with various chunk shapes and access
> patterns. It got very slow when iterating through the data by vector in
> the Z-direction. I believe it was filed as a bug by the HDF5 group. I
> sent a test program to Quincey that shows the behaviour. I'll forward
> that mail and the test program to you, so you can try it out yourself if
> you like to.
> 
> I suspect the cache lookup algorithm to be the culprit. The larger the
> cache and the more often it has to look up, the slower things get. BTW,
> Did you adapt the cache's hash size to the number of slots in the cache?

Thanks for your suggestion.  I've been looking at your problem, and my 
profiles seem to say that it is not a cache issue.

Have a look at the attached screenshots showing profiles for your test bed 
reading in the x axis with a cache size of 4 KB (the HDF5 cache subsystem does 
not enters in action at all) and 256 KB (your size).  I've also added a 
profile for the tiled case for comparison purposes.  For all the profiles 
(except tiled), the bottleneck is clearly in the `H5FL_reg_free` and 
`H5FL_reg_malloc` calls, no matter how large the cache size is (even if it 
does not enters in action).

I think this is expected, because HDF5 has to reserve space for each I/O 
operation.  When you walk the dataset following directions x or y, you have to 
do (32*2)x more I/O operations than for the tiled case, and HDF5 needs to book 
(and free again!) (32*2)x more memory areas.  Also, when you read through the 
z axis, the additional times to book/release memory is (32*32)x.  All of this 
is consistent with both profiles and running the benchmark manually:

faltet at antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 t
real    0m0.057s
user    0m0.048s
sys     0m0.004s

faltet at antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 x
setting cache to 32 chunks (4096 bytes) with 3203 slots  // forcing no cache
real    0m1.055s
user    0m0.860s
sys     0m0.168s

faltet at antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 y
setting cache to 32 chunks (262144 bytes) with 3203 slots
real    0m1.211s
user    0m1.176s
sys     0m0.028s

faltet at antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 z
setting cache to 5 chunks (40960 bytes) with 503 slots
real    0m14.813s
user    0m14.777s
sys     0m0.024s

So, in my opinion, there is little that HDF5 can do here.  You should better 
adapt the chunk shape to your most used case (if you have just one, but I know 
that this is not typically the case).

> In your tests you only mention the chunk size, but not the chunk shape.
> Isn't that important? It gives me the impression that in your tests the
> data are stored and accessed fully sequentially which makes the cache
> useless.

Yes, chunk shape is important, sorry, I forgot this important detail.  As I 
mentioned in a previous message to Rob Latham, I want to optimize 'semi-
random' access mode in a certain row of the dataset, so I normally choose the 
chunk shape as (1, X), where X is the needed value for obtaining a chunksize 
between 1 KB and 8 MB --if X is larger than the maximum number of columns, I 
expand the number of rows in the chunk shape accordingly.

Thanks,

-- 
Francesc Alted
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tHDF5-tile.png
Type: image/png
Size: 44166 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100108/5a65fadb/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tHDF5-x-4KB.png
Type: image/png
Size: 46587 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100108/5a65fadb/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tHDF5-x-256KB.png
Type: image/png
Size: 46509 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100108/5a65fadb/attachment-0005.png>


More information about the Hdf-forum mailing list