[hdf-forum] Memory Leak while reading large number of arrays

Francesc Alted faltet at pytables.org
Fri Jun 26 11:02:39 EDT 2009


Hi Bruno,

A Thursday 25 June 2009 23:29:38 Bruno Oliveira escrigué:
> Hi all,
> This is my first post in this list. I work at a company specialized in
> engineering applications, and we have been using HDF for several years now
> and so far we have been really happy with it.
>
> Lately thought we have been trying to track down a memory leak while
> reading some large datasets. First we found the memory leak problem while
> trying to execute some simulations in our application. Tracking down the
> memory usage, we narrowed it down to our in-house routines that read/write
> HDF files. To verify if it was a problem with our code (more likely) or a
> problem in the HDF library (highly unlikely), we created some sample code
> that only uses the HDF library routines, and that reads a file similar to
> the one where we originally found the problem. Unfortunately, the memory
> leak still occurs. I'm writing here because we are out of ideas on how to
> try to figure this problem out, so perhaps you guys can shed some light in
> the matter and point us in the right direction.
>
> The file layout we use in this case is (roughly) as follows:
[clip]
> Above, everything is a group, except for the member "values", which is a
> 50,000 x 1 dataset of doubles.
>
> As you can see, we have 501 "Timestep" root groups, each containing 17
> datasets. We try to read this as follows:

Hmm, that accounts for 17*501 = 8517 groups.  In my experience, you must be 
ready to see hundreds of MB consumed by the HDF5 when you walk over tens of 
thousands of groups/datasets.

> 1. Pre-allocate 17 buffers, each one being able to accommodate an entire
> dataset;
> 2. Go over each time-step, and read the 17 buffers.
>
> Measuring the memory during each Timestep read (i.e., the reading of the 17
> datasets inside that Timestep), the memory keeps accumulating, until by the
> end of the read of the last Timestep it is over 100Mb. Since we use always
> the same buffers, we have no idea of what the problem is. The sample
> routine we are using for reading is as follows:

As I see it, 100 MB can be expected for 8500 groups, so it may well be HDF5's 
'fault'.  Perhaps there is a way to instruct HDF5 to not consume so much 
memory for these scenarios, but in general, I recommend not to put too many 
groups on a single file.

At any rate, it always helps if you can submit a sample of the code 
reproducing this behaviour.

-- 
Francesc Alted

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.





More information about the Hdf-forum mailing list