[Hdf-forum] Chunk cache size and performance

Francesc Alted faltet at pytables.org
Thu Jan 7 14:30:57 EST 2010


Hi,

I'm doing some small benchmarks to present on a forthcoming workshop, and I'd 
be very grateful if someone can explain shed some light on the performance 
figures that I'm getting.

What I want to stress during the workshop is the dependency of I/O throughput 
on the chunksize for a certain dataset.  For making the plots that I've got 
(attached), I have chosen a dataset of 2 GB (2-dim, shape is (512, 65536) and 
datatype is double precision) so that it can easily fit into my OS cache 
memory (my machine has 8 GB) and make the effects clearer.  In the X axis, I 
represent the chunksize for every dataset (from 1 KB up to 8 MB).  In the Y 
axis there is the performance for reading the dataset sequentially.

Now, for for a chunk cache size of 1 MB (figure 'sequential-1MB.pdf'), it can 
be seen that HDF5 can read at up to 1.6 GB/s (which is pretty good :-).  
However, if I raise the chunk cache size to 8 MB, the peak performance falls 
down to a mere 1.0 GB/s, that is, almost a 40% less.  The Blosc compressor 
performance is also very affected by this (slower compressors like LZO or Zlib 
does not notice this effect very much because they are the obvious 
bottleneck).

I've tried with other cache sizes, and the smaller, the better (reaching a 
performance of almost 1.9 GB/s on my machine for a cache size of 128 KB).  
Varying the number of slots in cache does not seem to affect performance too 
much here.

My guess is that the guilty of this significant performance penalty is the 
chunk cache subsystem of HDF5.  Could anyone confirm this?  In case this is 
true, do you think that some optimization in that regard could be carried out 
in the future?

Thanks,

-- 
Francesc Alted
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sequential-8MB.pdf
Type: application/pdf
Size: 18698 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100107/7c29e565/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sequential-1MB.pdf
Type: application/pdf
Size: 18704 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100107/7c29e565/attachment-0003.pdf>


More information about the Hdf-forum mailing list