[hdf-forum] Large number of incrementally growing extendible datasets
Björn Andres
bjoern.andres at iwr.uni-heidelberg.de
Thu Jun 18 08:40:33 EDT 2009
Hello!
In a computational geometry application, I am dealing with sets of
points in 3D space. Each point set can be represented as a matrix (e.g.
a 2D dataset in HDF5) having as many rows as there are points, and three
columns for the three coordinates of each point.
The exact number of sets (about 10^6) is known at the initialization of
an algorithm while the number of points in each set is unknown.
Incrementally, new points are computed and have to be appended to the
datasets that have already been constructed.
I have written code (C++, HDF5 1.8.3) which creates 10^7 datasets in one
HDF5 file. These datasets are made extendible in the first dimension
such that new rows of coordinates can be appended.
There are on average 10 appends per dataset and each append consists on
average of 250 points (3 kBytes). After having written about 7 GB to the
hard drive, the performance goes down to almost zero. Note that at most
one dataset is open at any time.
I am now wondering whether the introduction of 10^6 extendible datasets
is a bad idea overall.
- Does HDF5 in its internal organization move around data such that
extendible datasets (after appending new data) are contiguous?
- Does HDF5 require that the file is contiguous on the hard drive? Can
the file system cause the problem?
- Can caching be a problem? Note that at most one dataset is open at any
time. I close it right after having appended data.
I appreciate valuable hints!
Kind regards,
Bjoern
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
More information about the Hdf-forum
mailing list