[hdf-forum] Re: I/O optimization when writing many datasets

Neil Fortner nfortne2 at hdfgroup.org
Wed Jun 10 15:03:02 EDT 2009


Patrick,

Patrick wrote:
> Hello all,
>
> In the image processing problem I am currently working on, I am 
> obliged to write large hdf5-Files (100s of GB) containing several 
> million smallish (~KB-MB) extendible datasets. Although I/O 
> performance has been encouraging so far ( especially when compared to 
> writing individual binary files...), as far as I can tell there are 
> three main paramaters open to tweaking which could further increase 
> write performance, i.e. chunk size, meta data cache and buffer size.
>
> Since I am working on high performance servers with at least 128GB of 
> RAM,  write performance is paramount and I could easily cope with a 
> reasonable increase in memory usage and final file size. Being 
> relatively new to hdf5, I am unsure about how to best set the 
> cache/buffer sizes as well as the chunk size, or whether the default 
> settings are already adequate. I would be very grateful for any 
> suggestions!

If you are only writing to very small datasets, then the default chunk 
cache size (1 MB) is most likely large enough, since this limit is 
applied to each dataset individually.  However, if you are regularly 
rewriting/reading the same portions of the dataset, and it can grow 
beyond 1 MB then you may see a benefit  from increasing the cache size.  
Depending on your chunk size, you may also want to increase the number 
of elements in the chunk cache from the default 521 (make sure it stays 
a prime number).  Be careful about having too many datasets open at once 
though, as the limit is 1 MB for each dataset.  So if you have several 
million datasets open  you potentially have several million megabytes of 
cache.

The chunk size should align as closely as possible to your typcial 
selection for writing (or reading).  This minimizes the amount of costly 
scattering as well as wasted space in the cache.  However you should not 
set it too small, in order to avoid excessive overhead.

Thanks,
-Neil Fortner

> Thanks,
>
> Patrick
>
>
>
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to 
> hdf-forum-subscribe at hdfgroup.org.
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.





More information about the Hdf-forum mailing list