[Hdf-forum] Chunk cache size and performance
Francesc Alted
faltet at pytables.org
Fri Jan 8 12:27:48 EST 2010
[Trying again to post a stripped down version of a message that was previously
refused by the list as being too large]
Just for archival purposes,
I've been profiling the case below, and I have come with more clues. I've got
a profile for different HDF5 cache sizes, namely, 256 KB, 1 MB and 8
MB. The dataset size is 2 GB and the chunksize for this tests was 512 KB.
When the cache size is 256 KB, the cache cannot keep anything (so it does not
enter in action), the most consuming function is Python code, and maximum
throughput is achieved (~1.8 GB/s). When the cache size is 1 MB, HDF5 cache
enters in action, lots of memcpy calls happen, and throughput is a bit slower
(~1.7 GB/s). Finally, for 8 MB, calls to memcpy grows considerably (2x than 1
MB case), making the throughput fall down to 1.0 GB/s.
Now, I don't completely understand what makes the 8 MB scenario to call 2x
more memcpy than for the 1 MB case. Of course, this additional 2x should be
responsible for the evident degradation in performance.
Ideas?
A Thursday 07 January 2010 20:30:57 Francesc Alted escrigué:
> Hi,
>
> I'm doing some small benchmarks to present on a forthcoming workshop, and
> I'd be very grateful if someone can explain shed some light on the
> performance figures that I'm getting.
>
> What I want to stress during the workshop is the dependency of I/O
> throughput on the chunksize for a certain dataset. For making the plots
> that I've got (attached), I have chosen a dataset of 2 GB (2-dim, shape is
> (512, 65536) and datatype is double precision) so that it can easily fit
> into my OS cache memory (my machine has 8 GB) and make the effects
> clearer. In the X axis, I represent the chunksize for every dataset (from
> 1 KB up to 8 MB). In the Y axis there is the performance for reading the
> dataset sequentially.
>
> Now, for for a chunk cache size of 1 MB (figure 'sequential-1MB.pdf'), it
> can be seen that HDF5 can read at up to 1.6 GB/s (which is pretty good
> :-). However, if I raise the chunk cache size to 8 MB, the peak
> performance falls down to a mere 1.0 GB/s, that is, almost a 40% less.
> The Blosc compressor performance is also very affected by this (slower
> compressors like LZO or Zlib does not notice this effect very much because
> they are the obvious bottleneck).
>
> I've tried with other cache sizes, and the smaller, the better (reaching a
> performance of almost 1.9 GB/s on my machine for a cache size of 128 KB).
> Varying the number of slots in cache does not seem to affect performance
> too much here.
>
> My guess is that the guilty of this significant performance penalty is the
> chunk cache subsystem of HDF5. Could anyone confirm this? In case this is
> true, do you think that some optimization in that regard could be carried
> out in the future?
>
> Thanks,
>
--
Francesc Alted
-------------- next part --------------
A non-text attachment was scrubbed...
Name: callgrind-1MB.png
Type: image/png
Size: 109948 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100108/c0524f72/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: callgrind-8MB.png
Type: image/png
Size: 111824 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100108/c0524f72/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: callgrind-256KB.png
Type: image/png
Size: 110853 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100108/c0524f72/attachment-0005.png>
More information about the Hdf-forum
mailing list