[hdf-forum] How best to model objects that have lists of sub objects.

Richard Corden richard.corden at gmail.com
Thu Jun 18 16:36:24 EDT 2009


Hi,

I am unable to find a satisfactory way to model object hierarchies, 
specifically something as follows:

class B { ... };

class A {
  std::vector <B> m_b;
};


Here are some of my requirements:
   * The number of B's is relatively small, probably lower than 10
   * There's no upper bound
   * It needs to be possible to append data later
   * The order the items are written needs to be preserved

My initial thoughts were to have a new dataset for each "list" of sub 
objects and then reference it from the owning object.  In this model, 
each 'A' object has it's own list of indexes to B objects:

[A DATASET]
data1, "DATASET REF A_B_1"
data2, "DATASET REF A_B_2"
data3, "DATASET REF A_B_3"

[B DATASET]
B1
B2
B3
B4
B5

[A_B_1 DATASET]
Index to B1
Index to B3
Index to B4

[A_B_2 DATASET]
Index to B2
Index to B5

[A_B_3 DATASET]
Index to B3


Unfortunately, at least the way I implemented this, the approach was too 
slow.  It also seemed to be the case that as lots and lots of datasets 
were added the performance would degrade significantly.


My current approach is to use region references.  In order to preserve 
the write order, I write the written index to a LIST dataset and once 
all B's have been written I create region references to the list indexes.

The above therefore looks like:

[A's DATASET]
data1, "DATASET REGION REF 0,2"
data2, "DATASET REGION REF 3,4"
data3, "DATASET REGION REF 5"

[B DATASET]
B1
B2
B3
B4
B5

[A_B_LIST]
Index to B1
Index to B3
Index to B4
Index to B2
Index to B5
Index to B3


Time wise, this performed significantly better than the previous model 
(and meets my requirements), however, it seems that the size of the hdf 
file is now extremely large, mostly due to the region references.

I have a test case with about 10k records and each record has 4 region 
references.  If I create and write out all the data with the correct 
region references, then the size of the resulting file is about 2.5Mb.  
If I write the data with "empty" region references then size of the file 
is about 250k.

Is this expected?  Is there a way I can optimise this? 


Finally, is there an standard approach for modelling this kind of data 
HDF that I should be using?


Many thanks for your time,

Richard





----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.





More information about the Hdf-forum mailing list