[hdf-forum] provenance
Quincey Koziol
koziol at hdfgroup.org
Wed Mar 25 11:34:22 EDT 2009
Hi Matthew,
On Mar 24, 2009, at 6:30 PM, Matthew Dougherty wrote:
>
> On Mar 24, 2009, at 5:02 PM, Quincey Koziol wrote:
>>> 1) should be allowable to have more than one UUID.
>>> They may be independent of each other, and added at different times.
>>
>> Hmm, what do you mean here? Below you were asking for a single
>> unique ID for each HDF5 file...
>
>
>
> 1) other scientific groups may have their own UID schemes (eg, LSID-
> life science IDs, DOI)
Yes, they could add those IDs to objects as metadata. Perhaps we
could come up with a suggested standard, but I don't think we could
easily work with every group's scheme in particular.
> 2) definitely need an HDF created UID that is not easy to change.
Yes.
> 3) to track the provenance of an HDF file might be accomplished by
> logging a UID (eg time) when an HDF file is opened.
> then the HDF file has collection of open times, which are unique to
> that file.
> If any write activity occurs after opening, then the open UID is
> flagged as such.
> when a file is copied, then the files diverge and are identified by
> different open UIDs.
>
>
> such a provenance scheme should be automatic and optional.
> some instances you don't want the overhead, such as you might be
> doing a million opens.
> In such a case you get one UID when the HDF was created.
Good ideas toward provenance features, yes.
>> We could allow an application to choose which type of UUID to
>> store. I've filed a bug for adding a UUID to a file and will amend
>> it to suggest giving the application the choice of which version of
>> the UUID to store.
>
> sounds good, would like to have one to choose from that is not
> opaque, should include time, computer, username.
> audit trails are key to provenance.
I think we are working to different purposes here. I'm just trying
to get a unique ID into the file (and perhaps for each object) and
don't want to tie it into any provenance effort. I also want to
pursue the provenance idea, but it should be a separate, probably
higher-level, project (which might use the UUID for some purpose).
>>> 4) have a non changeable flag set in the HDF creation that would
>>> override calls to H5Pset_obj_track_times ignoring 'track_times'
>>> parameter set to FALSE.
>>> set it at creation and modifications & changes are always noted.
>>
>> Hmm, I don't think that's very helpful, really. We don't have any
>> other "override" properties like this...
>
>
> main concern is the audit trail gets turned off.
Sure, I understand.
Quincey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2502 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20090325/4bba7914/attachment.bin>
More information about the Hdf-forum
mailing list