[hdf-forum] data layout for parallel read-only access
Mark Howison
mark.howison at gmail.com
Tue Jun 23 01:34:10 EDT 2009
On Fri, Jun 19, 2009 at 9:47 AM, Mark Moll<mmoll at cs.rice.edu> wrote:
> Hi Mark,
>
> Just a few clarifications:
> - The reading of data sets is mostly asynchronous; each node reach its "own"
> data sets.
Hmm, so each dataset belongs to only one node? It sounds like you
might not want to go the parallel HDF5 route. If the access is
asynchronous then you don't want to be using synchronized collective
calls.
> - File-per-node is indeed a problem with varying concurrency. That's why I
> was thinking of a data-set-per-file organization. Each file needs to be
> accessed by only one node. Each node would have to read many files.
The disadvantage to dataset-per-file is that it could lead to lots of
files. We have had problems in lustre with using basic filesystem
commands (ls, mv, cp, etc.) on large collections of small files (in a
recent example, 400K files totaling 230TB). Those commands will fail
with an error like "argument list too long..." Although, this may be a
limitation of those commands that isn't just specific to lustre.
> So the question is whether there is a performance difference between
> parallel asynchronous reading of many files vs. one large file. I'm
> guessing HDF5 does some preemptive fetching, which would work in favor of
> one large file.
I don't think that HDF5 will do any speculative read-aheads within a
file, if that is what you mean by preemptive fetching. It will only
read the regions you specify with a hyperslab selection.
Collective/parallel access only makes sense if you have many nodes
selecting different hyperslabs within the same dataset. If each node
is loading a different dataset, I think that collective access will
lead to unnecessary MPI communication, which will only become worse as
you scale up.
Mark
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
More information about the Hdf-forum
mailing list