[hdf-forum] provenance

Matthew Dougherty matthewd at bcm.edu
Tue Mar 24 19:30:57 EDT 2009


On Mar 24, 2009, at 5:02 PM, Quincey Koziol wrote:
>> 1) should be allowable to have more than one UUID.
>> They may be independent of each other, and added at different times.
>
> 	Hmm, what do you mean here?  Below you were asking for a single  
> unique ID for each HDF5 file...



1) other scientific groups may have their own UID schemes (eg, LSID- 
life science IDs, DOI)


2) definitely need an HDF created UID that is not easy to change.


3) to track the provenance of an HDF file might be accomplished by  
logging a UID (eg time)  when an HDF file is opened.
then the HDF file has collection of open times, which are unique to  
that file.
If any write activity occurs after opening, then the open UID is  
flagged as such.
when a file is copied, then the files diverge and are identified by  
different open UIDs.


such a provenance scheme should be automatic and optional.
some instances you don't want the overhead, such as you might be doing  
a million opens.
In such a case you get one UID when the HDF was created.



> 	We could allow an application to choose which type of UUID to  
> store.  I've filed a bug for adding a UUID to a file and will amend  
> it to suggest giving the application the choice of which version of  
> the UUID to store.

sounds good, would like to have one to choose from that is not opaque,  
should include time, computer, username.
audit trails are key to provenance.



>
>
>> 4) have a non changeable flag set in the HDF creation that would  
>> override calls to  H5Pset_obj_track_times ignoring 'track_times'  
>> parameter set to FALSE.
>> set it at creation and modifications & changes are always noted.
>
> 	Hmm, I don't think that's very helpful, really.  We don't have any  
> other "override" properties like this...


main concern is the audit trail gets turned off.




More information about the Hdf-forum mailing list