[hdf-forum] setting chunk dimensions
Ruth Aydt
aydt at hdfgroup.org
Tue Oct 21 22:19:45 EDT 2008
Hi Natalie,
You can think of the hyperslabs as the way you logically access (write
or read) subsets of a complete dataset from your application's
perspective. By specifying different hyperslabs you can access
different subsets of the dataset. You can also access the entire
dataset -- it just depends on what you specify in the write or read.
Chunked storage defines how the dataset is physically written to /
read from disk. The chunk size is set when the dataset is created
and remains constant. Typically you want to chose a chunk layout that
will perform well for the most frequent logical access pattern -- or
for the access pattern that you want the best performance with.
So hyberslabs are about logical access and chunks are about physical
storage organization on disk. Both hyperslabs and chunks will have
the same number of dimensions as the dataset. But, the dimension
*sizes* for both hyberslabs and chunks may be (and usually are)
different than your dataset's dimension sizes.
The interaction of chunk sizes, hyperslab selections, and various
other factors can dramatically impact performance.
You may be interested in sections 4.1 and 5 of the NetCDF-4
Performance Report found at www.hdfgroup.org/pubs/papers. They give
some explanation about hyperslabs and chunked storage, and how
performance may vary, as well as how chunked storage may impact
filesize.
-Ruth
On Oct 21, 2008, at 3:10 AM, Natalie Happenhofer wrote:
> Hi!
> I´m trying to write my data via hyperslabs, ad there is also a nice
> example how to do it on the HDF5.org webpage. I just don´t
> understand how to set the chunk_dims, or, more precisely, what do
> this chunking dimensions do?
> Here is the part of the example code using the chunk-dims:
>
> nt
> main (void)
> {
> hid_t file; /* handles */
> hid_t dataspace, dataset;
> hid_t filespace;
> hid_t cparms;
> hsize_t dims[2] = { 3, 3}; /*
> * dataset dimensions
> * at the creation time
> */
> hsize_t dims1[2] = { 3, 3}; /* data1 dimensions */
> hsize_t dims2[2] = { 7, 1}; /* data2 dimensions */
>
>
> hsize_t dims3[2] = { 2, 2}; /* data3 dimensions */
>
> hsize_t maxdims[2] = {H5S_UNLIMITED, H5S_UNLIMITED};
> hsize_t chunk_dims[2] ={2, 5};
> hsize_t size[2];
> hsize_t offset[2];
>
> herr_t status;
>
> int data1[3][3] = { {1, 1, 1}, /* data to write */
> {1, 1, 1},
> {1, 1, 1} };
>
> int data2[7] = { 2, 2, 2, 2, 2, 2, 2};
>
> int data3[2][2] = { {3, 3},
> {3, 3} };
> int fillvalue = 0;
>
> /*
> * Create the data space with unlimited dimensions.
> */
> dataspace = H5Screate_simple(RANK, dims, maxdims);
>
> /*
> * Create a new file. If file exists its contents will be
> overwritten.
> */
> file = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT,
> H5P_DEFAULT);
>
> /*
> * Modify dataset creation properties, i.e. enable chunking.
> */
> cparms = H5Pcreate(H5P_DATASET_CREATE);
> status = H5Pset_chunk( cparms, RANK, chunk_dims);
> status = H5Pset_fill_value (cparms, H5T_NATIVE_INT, &fillvalue );
>
>
>
> chunk_dims is set to {2,5}, which I don´t understand, because the
> initial dataset is 3x3 and is then extended to 10x3 - why the {2,5}?
>
> thx,
> NH
>
>
> * Create a new dataset within the file using cparms
> * creation properties.
> */
> dataset = H5Dcreate2(file, DATASETNAME, H5T_NATIVE_INT,
> dataspace, H5P_DEFAULT,
> cparms, H5P_DEFAULT);
>
> Express yourself instantly with MSN Messenger! MSN Messenger
------------------------------------------------------------
Ruth Aydt
The HDF Group
1901 South First Street, Suite C-2
Champaign, IL 61820
aydt at hdfgroup.org (217)265-7837
------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20081021/b50842be/attachment.html>
More information about the Hdf-forum
mailing list