[hdf-forum] provenance
Matthew Dougherty
matthewd at bcm.edu
Tue Mar 24 19:30:57 EDT 2009
On Mar 24, 2009, at 5:02 PM, Quincey Koziol wrote:
>> 1) should be allowable to have more than one UUID.
>> They may be independent of each other, and added at different times.
>
> Hmm, what do you mean here? Below you were asking for a single
> unique ID for each HDF5 file...
1) other scientific groups may have their own UID schemes (eg, LSID-
life science IDs, DOI)
2) definitely need an HDF created UID that is not easy to change.
3) to track the provenance of an HDF file might be accomplished by
logging a UID (eg time) when an HDF file is opened.
then the HDF file has collection of open times, which are unique to
that file.
If any write activity occurs after opening, then the open UID is
flagged as such.
when a file is copied, then the files diverge and are identified by
different open UIDs.
such a provenance scheme should be automatic and optional.
some instances you don't want the overhead, such as you might be doing
a million opens.
In such a case you get one UID when the HDF was created.
> We could allow an application to choose which type of UUID to
> store. I've filed a bug for adding a UUID to a file and will amend
> it to suggest giving the application the choice of which version of
> the UUID to store.
sounds good, would like to have one to choose from that is not opaque,
should include time, computer, username.
audit trails are key to provenance.
>
>
>> 4) have a non changeable flag set in the HDF creation that would
>> override calls to H5Pset_obj_track_times ignoring 'track_times'
>> parameter set to FALSE.
>> set it at creation and modifications & changes are always noted.
>
> Hmm, I don't think that's very helpful, really. We don't have any
> other "override" properties like this...
main concern is the audit trail gets turned off.
More information about the Hdf-forum
mailing list