[hdf-forum] h5py -- most efficient way to load a hdf5
Mag Gam
magawake at gmail.com
Tue Jun 23 07:32:20 EDT 2009
Thanks for the response
On Tue, Jun 23, 2009 at 2:46 AM, Andrew
Collette<andrew.collette at gmail.com> wrote:
> Hi,
>
>> I have a very large csv file 14G and I am planning to move all of my
>> data to hdf5. I am using h5py to load the data. The biggest problem I
>> am having is, I am putting the entire file into memory and then
>> creating a dataset from it. This is very inefficient and it takes over
>> 4 hours to create the hdf5 file.
>>
>> The csv file has various types:
>> int4, int4, str, str, str, str, str
>
> Since you're using Python you should investigate the functions
> numpy.fromfile and numpy.loadtxt. The biggest thing you should worry
> about is finding a way to iterate over rows in the input file. You
> can create an HDF5 dataset with the proper size and dtype, and then
> fill it in row by row as you read records in from the csv file. That
> way you avoid having to load the entire file into memory.
Correct, this is the way I am trying to do it but do I have to worry
about resize? Because each file has different number of rows.
The numpy.from file and loadtxt actually load everything into memory.
Thats the way I am doing it now, and its very inefficient.
Do you have any sample code for line by line read and then push it to
hdf5 file?
>
> As far as the datatypes, if all the rows of your CSV have the same
> fields, the dtype for the HDF5 file should be something like:
>
> vl_str = h5py.new_vlen(str)
> mydtype = numpy.dtype([('Field1', 'i4'), ('Field2', 'i4'), ('Field3',
> vl_str), ('Field4', vl_str). ... <the rest> ...])
>
> This strategy will create an HDF5 dataset whose elements are a
> compound type with two integers and four variable-length strings.
>
> HTH,
> Andrew
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
>
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
More information about the Hdf-forum
mailing list