[hdf-forum] How best to model objects that have lists of sub objects.
Richard Corden
richard.corden at gmail.com
Thu Jun 18 16:36:24 EDT 2009
Hi,
I am unable to find a satisfactory way to model object hierarchies,
specifically something as follows:
class B { ... };
class A {
std::vector <B> m_b;
};
Here are some of my requirements:
* The number of B's is relatively small, probably lower than 10
* There's no upper bound
* It needs to be possible to append data later
* The order the items are written needs to be preserved
My initial thoughts were to have a new dataset for each "list" of sub
objects and then reference it from the owning object. In this model,
each 'A' object has it's own list of indexes to B objects:
[A DATASET]
data1, "DATASET REF A_B_1"
data2, "DATASET REF A_B_2"
data3, "DATASET REF A_B_3"
[B DATASET]
B1
B2
B3
B4
B5
[A_B_1 DATASET]
Index to B1
Index to B3
Index to B4
[A_B_2 DATASET]
Index to B2
Index to B5
[A_B_3 DATASET]
Index to B3
Unfortunately, at least the way I implemented this, the approach was too
slow. It also seemed to be the case that as lots and lots of datasets
were added the performance would degrade significantly.
My current approach is to use region references. In order to preserve
the write order, I write the written index to a LIST dataset and once
all B's have been written I create region references to the list indexes.
The above therefore looks like:
[A's DATASET]
data1, "DATASET REGION REF 0,2"
data2, "DATASET REGION REF 3,4"
data3, "DATASET REGION REF 5"
[B DATASET]
B1
B2
B3
B4
B5
[A_B_LIST]
Index to B1
Index to B3
Index to B4
Index to B2
Index to B5
Index to B3
Time wise, this performed significantly better than the previous model
(and meets my requirements), however, it seems that the size of the hdf
file is now extremely large, mostly due to the region references.
I have a test case with about 10k records and each record has 4 region
references. If I create and write out all the data with the correct
region references, then the size of the resulting file is about 2.5Mb.
If I write the data with "empty" region references then size of the file
is about 250k.
Is this expected? Is there a way I can optimise this?
Finally, is there an standard approach for modelling this kind of data
HDF that I should be using?
Many thanks for your time,
Richard
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
More information about the Hdf-forum
mailing list