[hdf-forum] Re: Question about meta data for chunked datasets
Rob Latham
robl at mcs.anl.gov
Mon Mar 23 15:46:54 EDT 2009
On Mon, Mar 23, 2009 at 12:05:20PM -0700, Mark Howison wrote:
> However, as you can see in the attached plot, the truncate (purple) at
> the end is still taking up a substantial amount of the total IO time
> for this test application. For now, I will probably disable the
> truncate directly in the MPI-POSIX VFD code, like Noel Keen has done,
> but in the long term we should figure out why it is there and when it
> is necessary. Hopefully, the lustre/HDF5 funding will come through
> soon!
Maybe HDF5 needs to truncate, maybe it doesn't. But if I'm reading
your plot right, only one process is calling truncate. Sounds to me
like you've found a Lustre issue.
What does lustre do if you run a standalone program that calls
ftruncate to create a 2GB file? To create a 2250776576 byte file? If
you do a few writes before calling ftruncate?
==rob
> Thanks
> Mark
>
> ifi=5 -1 open64("../output/prs.h5part",2,-1) 3.47368e+01 2.34790e-02
> ifi=5 41 open64("../output/prs.h5part",578,-1) 3.47642e+01 1.86651e-02
> ifi=5 0 lseek64(41,0,0) 3.47932e+01 2.14577e-06
> ifi=5 96 write(41,0x7fffffffb6e0,96) 3.47932e+01 1.29604e-03
> ifi=5 1048576 lseek64(41,1048576,0) 3.48958e+01 3.09944e-06
> ifi=5 1757600 write(41,0x37137d40,1757600) 3.48958e+01 1.58372e+00
> ifi=5 1757600 write(41,0x372e4ee0,1757600) 3.64796e+01 4.77600e-01
> ifi=5 1757600 write(41,0x37492080,1757600) 3.69572e+01 1.23870e-02
> ifi=5 1757600 write(41,0x3763f220,1757600) 3.69696e+01 3.26340e-02
> ifi=5 96 lseek64(41,96,0) 4.13958e+01 1.90735e-06
> ifi=5 40 write(41,0x5d01aaa8,40) 4.13959e+01 4.13990e-03
> ifi=5 544 write(41,0x5d01a548,544) 4.14000e+01 1.71661e-05
> ifi=5 120 write(41,0x5d01b118,120) 4.14000e+01 1.50204e-05
> ifi=5 40 write(41,0x5d01bc78,40) 4.14001e+01 1.50204e-05
> ifi=5 544 write(41,0x5d01a548,544) 4.14001e+01 1.47820e-05
> ifi=5 120 write(41,0x5d01c1c8,120) 4.14001e+01 1.38283e-05
> ifi=5 328 write(41,0x7fffffffb630,328) 4.14001e+01 1.50204e-05
> ifi=5 40 write(41,0x5d024248,40) 4.14001e+01 1.50204e-05
> ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
> ifi=5 120 write(41,0x5d024798,120) 4.14002e+01 1.50204e-05
> ifi=5 328 write(41,0x7fffffffb630,328) 4.14002e+01 1.50204e-05
> ifi=5 40 write(41,0x5d026ae8,40) 4.14002e+01 1.40667e-05
> ifi=5 544 write(41,0x5d01a548,544) 4.14002e+01 1.50204e-05
> ifi=5 120 write(41,0x5d0270d8,120) 4.14003e+01 1.50204e-05
> ifi=5 328 write(41,0x7fffffffb630,328) 4.14003e+01 1.40667e-05
> ifi=5 272 write(41,0x5d02b0d8,272) 4.14003e+01 1.54972e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.62125e-05
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14005e+01 1.54018e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14007e+01 1.49965e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14009e+01 1.52111e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14010e+01 1.69277e-05
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14011e+01 1.52111e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14012e+01 1.53065e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14014e+01 1.53065e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.59740e-05
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14016e+01 1.51157e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14018e+01 1.52111e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14020e+01 1.52111e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14021e+01 1.59740e-05
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14022e+01 1.49965e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14023e+01 1.50919e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14025e+01 1.53065e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14027e+01 1.50919e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14028e+01 1.59740e-05
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14029e+01 1.49965e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14030e+01 1.48058e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14032e+01 1.49965e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.69277e-05
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14034e+01 1.51873e-04
> ifi=5 3136 write(41,0x5d02a118,3136) 4.14036e+01 1.52826e-04
> ifi=5 328 write(41,0x7fffffffb630,328) 4.14037e+01 1.59740e-05
> ifi=5 0 lseek64(41,0,0) 4.14039e+01 9.53674e-07
> ifi=5 96 write(41,0x7fffffffb4e0,96) 4.14039e+01 1.69277e-05
> ifi=5 0 ftruncate64(41,2,250,776,576) 4.14039e+01 1.06910e+00
> ifi=5 0 lseek64(41,0,0) 4.24736e+01 3.09944e-06
> ifi=5 96 write(41,0x7fffffffb4a0,96) 4.24737e+01 4.69685e-05
> ifi=5 0 close(41) 4.24739e+01 3.16906e-03
>
>
> On Tue, Feb 17, 2009 at 9:19 AM, Quincey Koziol <koziol at hdfgroup.org> wrote:
> > Hi Mark,
> >
> > On Feb 13, 2009, at 3:41 PM, Mark Howison wrote:
> >
> >> Also, here is a graph showing that same activity on node 0 (the first
> >> row of pixels). The color key is:
> >>
> >> blue = write
> >> dark purple = truncate
> >> purple = fsync
> >> teal = fflush
> >>
> >> Mark
> >>
> >>
> >> On Fri, Feb 13, 2009 at 12:03 PM, Mark Howison <MHowison at lbl.gov> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I have a parallel HDF5 application that is writing out chunked data to
> >>> a 3D dataset and is exhibiting a large number of small writes upon
> >>> closing the file. Below I've attached a trace of POSIX calls on node 0
> >>> showing the file open, then 4 chunks of size 1757600 bytes being
> >>> written, then a series of 40 - 3136 byte writes (mostly 3136), and
> >>> then a truncate call before the file is closed. The small writes are
> >>> not ideal because this is a lustre file system on a Cray XT at NERSC.
> >>> Together, those small writes and truncate take about 30% of the time
> >>> from file open to close.
> >>>
> >>> My hypothesis is that the small writes represent meta data related to
> >>> the chunk indexing. Does that sound right?
> >
> > Yes, that's probably correct.
> >
> >>> What is the best way for me to consolidate these small writes into one
> >>> large write? Should I use
> >>> H5Pset_meta_block_size() to set the block size to the lustre stripe
> >>> width of 1MB?
> >
> > Yes, that would probably help.
> >
> >>> I'm a little concerned by the fact that the 3136 byte
> >>> writes are not to contiguous offsets, and perhaps cannot be
> >>> consolidated into a single write.
> >>>
> >>> What is the purpose of the truncate? Can it be removed?
> >
> > I think with some analysis we could eliminate the truncate in
> > some/all cases, but we'll need to finish getting funding in place to work on
> > these issues with Lustre.
> >
> > Quincey
> >
> >>> Thanks,
> >>>
> >>> Mark Howison
> >>> mhowison at lbl.gov
> >>> Student Research Assistant
> >>> Visualization Group, Lawrence Berkeley National Labs
> >>>
> >>>
> >>> ifi=5 41 open64("../output/prs.h5part",66,-1) 3.26833e+01 6.02412e-03
> >>> ifi=5 0 close(41) 3.26894e+01 1.08004e-04
> >>> ifi=5 41 open64("../output/prs.h5part",2,-1) 3.26895e+01 3.85680e-02
> >>> ifi=5 0 lseek64(41,0,2) 3.27325e+01 1.83105e-03
> >>> ifi=5 0 lseek64(41,0,0) 3.27344e+01 9.53674e-07
> >>> ifi=5 96 write(41,0x7fffffffb740,96) 3.27358e+01 3.38793e-04
> >>> ifi=5 7304 lseek64(41,7304,0) 3.27391e+01 3.09944e-06
> >>> ifi=5 1757600 write(41,0x371cdc80,1757600) 3.27391e+01 9.63148e-01
> >>> ifi=5 1757600 write(41,0x3737ae20,1757600) 3.37023e+01 2.74949e-02
> >>> ifi=5 1757600 write(41,0x37527fc0,1757600) 3.37299e+01 1.32360e-02
> >>> ifi=5 1757600 write(41,0x376d5160,1757600) 3.37432e+01 1.96590e-02
> >>> ifi=5 96 lseek64(41,96,0) 3.45493e+01 1.90735e-06
> >>> ifi=5 40 write(41,0x5d0b8188,40) 3.45493e+01 1.57619e-03
> >>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45509e+01 1.69277e-05
> >>> ifi=5 120 write(41,0x5d0b87f8,120) 3.45510e+01 1.50204e-05
> >>> ifi=5 40 write(41,0x5d0b9308,40) 3.45510e+01 1.38283e-05
> >>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45510e+01 1.50204e-05
> >>> ifi=5 120 write(41,0x5d0b9858,120) 3.45510e+01 1.38283e-05
> >>> ifi=5 328 write(41,0x7fffffffb660,328) 3.45510e+01 1.59740e-05
> >>> ifi=5 40 write(41,0x5d0c18d8,40) 3.45511e+01 1.40667e-05
> >>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45511e+01 1.50204e-05
> >>> ifi=5 120 write(41,0x5d0c1ed8,120) 3.45511e+01 1.40667e-05
> >>> ifi=5 328 write(41,0x7fffffffb660,328) 3.45511e+01 1.50204e-05
> >>> ifi=5 40 write(41,0x5d0c4288,40) 3.45511e+01 1.40667e-05
> >>> ifi=5 544 write(41,0x5d0b7ca8,544) 3.45512e+01 1.50204e-05
> >>> ifi=5 120 write(41,0x5d0c4918,120) 3.45512e+01 1.40667e-05
> >>> ifi=5 328 write(41,0x7fffffffb660,328) 3.45512e+01 1.40667e-05
> >>> ifi=5 272 write(41,0x5d0c8948,272) 3.45512e+01 3.58105e-03
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45548e+01 1.69277e-05
> >>> ifi=5 114251304 lseek64(41,114251304,0) 3.45549e+01 9.53674e-07
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45549e+01 1.46720e-02
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.78814e-05
> >>> ifi=5 214440776 lseek64(41,214440776,0) 3.45696e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45696e+01 1.86720e-02
> >>> ifi=5 314627112 lseek64(41,314627112,0) 3.45883e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.45883e+01 1.42689e-02
> >>> ifi=5 414813448 lseek64(41,414813448,0) 3.46026e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46026e+01 1.24190e-02
> >>> ifi=5 514999784 lseek64(41,514999784,0) 3.46150e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46150e+01 1.48160e-02
> >>> ifi=5 615186120 lseek64(41,615186120,0) 3.46299e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46299e+01 3.93460e-02
> >>> ifi=5 715372456 lseek64(41,715372456,0) 3.46693e+01 9.53674e-07
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46693e+01 1.76220e-02
> >>> ifi=5 815558792 lseek64(41,815558792,0) 3.46869e+01 9.53674e-07
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46869e+01 1.06070e-02
> >>> ifi=5 915745128 lseek64(41,915745128,0) 3.46975e+01 1.19209e-06
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.46975e+01 1.74150e-02
> >>> ifi=5 1015931464 lseek64(41,1015931464,0) 3.47150e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47150e+01 1.11501e-02
> >>> ifi=5 1116117800 lseek64(41,1116117800,0) 3.47262e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47262e+01 1.67122e-02
> >>> ifi=5 1216304136 lseek64(41,1216304136,0) 3.47429e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47429e+01 5.77402e-03
> >>> ifi=5 1316490472 lseek64(41,1316490472,0) 3.47487e+01 9.53674e-07
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47487e+01 1.83940e-02
> >>> ifi=5 1416676808 lseek64(41,1416676808,0) 3.47671e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47671e+01 1.35159e-02
> >>> ifi=5 1516863144 lseek64(41,1516863144,0) 3.47806e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47806e+01 1.70491e-02
> >>> ifi=5 1617049480 lseek64(41,1617049480,0) 3.47977e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.47977e+01 9.32908e-03
> >>> ifi=5 1717235816 lseek64(41,1717235816,0) 3.48071e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48071e+01 1.15631e-02
> >>> ifi=5 1817422152 lseek64(41,1817422152,0) 3.48187e+01 9.53674e-07
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48187e+01 8.60000e-03
> >>> ifi=5 1917608488 lseek64(41,1917608488,0) 3.48273e+01 9.53674e-07
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48273e+01 6.62398e-03
> >>> ifi=5 2017794824 lseek64(41,2017794824,0) 3.48339e+01 1.19209e-06
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48339e+01 7.51495e-03
> >>> ifi=5 2117981160 lseek64(41,2117981160,0) 3.48415e+01 0.00000e+00
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48415e+01 1.77360e-02
> >>> ifi=5 2218167496 lseek64(41,2218167496,0) 3.48592e+01 9.53674e-07
> >>> ifi=5 3136 write(41,0x5d0c7948,3136) 3.48592e+01 1.63181e-02
> >>> ifi=5 2249807432 lseek64(41,2249807432,0) 3.48756e+01 0.00000e+00
> >>> ifi=5 328 write(41,0x7fffffffb660,328) 3.48756e+01 7.10177e-03
> >>> ifi=5 0 lseek64(41,0,0) 3.48828e+01 0.00000e+00
> >>> ifi=5 96 write(41,0x7fffffffb510,96) 3.48828e+01 2.69413e-05
> >>> ifi=5 0 ftruncate64(41,2249809480) 3.48829e+01 7.08644e-01
> >>> ifi=5 0 fsync(41) 3.55917e+01 3.28633e-01
> >>> ifi=5 0 lseek64(41,0,0) 3.59472e+01 1.90735e-06
> >>> ifi=5 96 write(41,0x7fffffffb4d0,96) 3.59473e+01 5.88894e-05
> >>> ifi=5 0 close(41) 3.59477e+01 9.05991e-06
> >>>
> >>
> >> <node0-meta-data.png>----------------------------------------------------------------------
> >> This mailing list is for HDF software users discussion.
> >> To subscribe to this list, send a message to
> >> hdf-forum-subscribe at hdfgroup.org.
> >> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
> >
> >
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the Hdf-forum
mailing list