[hdf-forum] provenance

Quincey Koziol koziol at hdfgroup.org
Wed Mar 25 11:34:22 EDT 2009


Hi Matthew,

On Mar 24, 2009, at 6:30 PM, Matthew Dougherty wrote:

>
> On Mar 24, 2009, at 5:02 PM, Quincey Koziol wrote:
>>> 1) should be allowable to have more than one UUID.
>>> They may be independent of each other, and added at different times.
>>
>> 	Hmm, what do you mean here?  Below you were asking for a single  
>> unique ID for each HDF5 file...
>
>
>
> 1) other scientific groups may have their own UID schemes (eg, LSID- 
> life science IDs, DOI)

	Yes, they could add those IDs to objects as metadata.  Perhaps we  
could come up with a suggested standard, but I don't think we could  
easily work with every group's scheme in particular.

> 2) definitely need an HDF created UID that is not easy to change.

	Yes.

> 3) to track the provenance of an HDF file might be accomplished by  
> logging a UID (eg time)  when an HDF file is opened.
> then the HDF file has collection of open times, which are unique to  
> that file.
> If any write activity occurs after opening, then the open UID is  
> flagged as such.
> when a file is copied, then the files diverge and are identified by  
> different open UIDs.
>
>
> such a provenance scheme should be automatic and optional.
> some instances you don't want the overhead, such as you might be  
> doing a million opens.
> In such a case you get one UID when the HDF was created.

	Good ideas toward provenance features, yes.

>> 	We could allow an application to choose which type of UUID to  
>> store.  I've filed a bug for adding a UUID to a file and will amend  
>> it to suggest giving the application the choice of which version of  
>> the UUID to store.
>
> sounds good, would like to have one to choose from that is not  
> opaque, should include time, computer, username.
> audit trails are key to provenance.

	I think we are working to different purposes here.  I'm just trying  
to get a unique ID into the file (and perhaps for each object) and  
don't want to tie it into any provenance effort.  I also want to  
pursue the provenance idea, but it should be a separate, probably  
higher-level, project (which might use the UUID for some purpose).

>>> 4) have a non changeable flag set in the HDF creation that would  
>>> override calls to  H5Pset_obj_track_times ignoring 'track_times'  
>>> parameter set to FALSE.
>>> set it at creation and modifications & changes are always noted.
>>
>> 	Hmm, I don't think that's very helpful, really.  We don't have any  
>> other "override" properties like this...
>
>
> main concern is the audit trail gets turned off.

	Sure, I understand.

		Quincey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2502 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20090325/4bba7914/attachment.bin>


More information about the Hdf-forum mailing list