[Hdf-forum] writing data as fast as possible
Dimitris Servis
servisster at gmail.com
Fri Jan 15 06:47:48 EST 2010
Hi Ger
I am not sure if you can compose a multi file afterwards. The final result
is what you want: multiple raw data files and multiple metadata files
physically separated. Maybe with a little tweak of the VFL driver (not a
very difficult task) and some prerequisites (raw data file size and type)
you can easily force HDF5 to use the address spaces of the separate files in
datasets. After all, 1GB of doubles on disk is little different that 1GB of
doubles in memory... Maybe this can give you a hint?
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetExternal
HTH
--ds
2010/1/15 Ger van Diepen <diepen at astron.nl>
>
> Hi Dimitris,
>
> Thanks for your suggestion.
> I'll look into the VFL possibility. A disadvantage I see is that you
> have to make it known, so common tools won't recognize it. I guess it is
> not possible to load it dynamically. In principle it should be possible
> by storing the driver name and use it as the name of the shared library.
>
>
> Does using a specific VFL mean that all data are written through that
> VFL and that it is not possible to store the meta data in the 'normal'
> way and only store the bulk data through that VFL?
> Preferably I would like to bypass HDF5 entirely when writing (apart
> from telling the VFL) and use that VFL when reading. In that way I can
> write as I like and still use tools like h5view to look at the data
> (provided the VFL can be loaded dynamically).
>
> As I said writing happens in parallel, so each disk has to part of the
> job. But we would like to keep the amount of resources needed to a
> minimum, for which optimal write performance is needed. Reading is less
> of an issue.
>
> Cheers,
> Ger
>
> >>> Dimitris Servis <servisster at gmail.com> 1/15/2010 11:47 AM >>>
> Ger
>
> couldn't you use the multi VFL driver? In any case, the nGB/sec rate
> will not be sustainable by any hard drive today in the market right?
> Even if you buffer in SSDs still the disk write speed is your bottleneck
> rather than hdf5...
>
> HTH
>
> -- dimitris
>
>
> 2010/1/15 Ger van Diepen <diepen at astron.nl>
>
>
>
>
> We have an instrument generating several GBytes/second.
> We would like to dump these data as fast as possible to disk (of
> course
> in a parallel way), preferably with O_DIRECT to bypass the OS file
> cache
> and to avoid seeks to maintain meta data.
> Yet, it would be nice if these data can be seen as an HDF5 dataset.
>
> My question is if it possible to do something like this. Either that
> HDF5 has the option to write data in this way or through some
> mechanism
> that HDF5 can treat such an external file as HDF5 data, possibly by
> dynamically loading a shared library that knows how to interpret the
> data.
>
> I'm asking this because we have the impression that something like an
> extendible dataset does not give us the performance/robustness we
> need.
> In case of a crash we would like to lose as few data as possible,
> which
> means that a regular flush has to be done which might kill
> performance.
> I guess that without a flush the indices are only written at the end,
> so
> the data cannot be found in case of a crash.
>
> Cheers,
> Ger van Diepen
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum at hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum at hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20100115/6182f480/attachment-0001.html>
More information about the Hdf-forum
mailing list