Intranet Tools

nb. next round of REF2013 will NOT be using data from eprints.ecs, but the central university REF interface.

RSS 1.0 Feed
RSS 2.0 Feed
Atom Feed
 

Recording and Using Provenance in a Protein Compressibility Experiment

Groth, P., Miles, S., Fang, W., Wong, S. C., Zauner, K. P. and Moreau, L. (2005) Recording and Using Provenance in a Protein Compressibility Experiment. In: The 14th IEEE International Symposium on High Performance Distributed Computing (HPDC-14), 24-27 July, 2005, Research Triangle Park, North Carolina.

Download

[img]
Preview
PDF
162Kb

Abstract

Very large scale computations are now becoming routinely
used as a methodology to undertake scientific research.
In this context, ‘provenance systems’ are regarded
as the equivalent of the scientist’s logbook for in silico experimentation:
provenance captures the documentation of
the process that led to some result. Using a protein compressibility
analysis application, we derive a set of generic
use cases for a provenance system. In order to support
these, we address the following fundamental questions:
what is provenance? how to record it? what is the performance
impact for grid execution? what is the performance
of reasoning? In doing so, we define a technologyindependent
notion of provenance that captures interactions
between components, internal component information and
grouping of interactions, so as to allow us to analyse and
reason about the execution of scientific processes. In order
to support persistent provenance in heterogeneous applications,
we introduce a separate provenance store, in
which provenance documentation can be stored, archived
and queried independently of the technology used to run the
application. Through a series of practical tests, we evaluate
the performance impact of such a provenance system. In
summary, we demonstrate that provenance recording overhead
of our prototype system remains under 10% of execution
time, and we show that the recorded information successfully
supports our use cases in a performant manner.

Item Type:Conference or Workshop Item
Creator/Authors:
Paul Groth
Simon Miles
Weijan Fang
Sylvia C. Wong
Klaus-Peter Zauner
Luc Moreau
Keywords:Provenance, Grid, protein compressibility
Research Group:Old ECS Groups > Science and Engineering of Natural Systems
Old ECS Groups > BIO@ECS Research Group
Current ECS Groups > Web and Internet Science
Old ECS Groups > Intelligence, Agents, Multimedia
Current ECS Groups > Agents, Interaction and Complexity
Date:2005
Information about this record:
Performance Indicator:EZ~06~06~04
Citations:ISI: 1, Google Scholar: 56
Downloads (2010):18
ID Code:10910
Last Modified:23 Sep 2011 10:32
Deposited On:24 May 2005 by Groth, Paul

Tools & Metadata

Download Statistics

Last month

Last year

Members of ECS may view the download statistics dashboard for this record.

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in this archive you will be forwarded to the paracite service. Poorly formated references will probably not work.

P. Buneman, S. Khanna, K.Tajima, and W. Tan. Archiving

scientific data. In Proc. of the 2002 ACM SIGMOD International

Conference on Management of Data, pages 1–12.

ACM Press, 2002.

[2] Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage

of view data in a warehousing environment. ACM Trans.

Database Syst., 25(2):179–227, 2000.

[3] R. Figueiredo, P. Dinda, and J. Fortes. A case for grid

computing on virtual machines. In Proceedings of the 23rd

Internatinal Conference on Distributed Computing Systems

(ICDCS 2003), 2003.

[4] I. Foster and C. Kesselman, editors. The Grid: Blueprint for a

New Computing Infrastructure. Morgan Kaufman Publishers,

1998.

[5] I. Foster, J. Voeckler, M. Wilde, and Y. Zhao. Chimera: A

virtual data system for representing, querying and automating

data derivation. In Proceedings of the 14th Conference on

Scientific and Statistical Database Management, Edinburgh,

Scotland, July 2002.

[6] M. Greenwood, C. Goble, R. Stevens, J. Zhao, M. Addis,

D. Marvin, L. Moreau, and T. Oinn. Provenance of e-science

experiments - experience from bioinformatics. In Proceedings

of the UK OST e-Science second All Hands Meeting

2003 (AHM’03), pages 223–226, Nottingham, UK, Sept.

2003.

[7] P. Groth, M. Luck, and L. Moreau. A protocol for recording

provenance in service-oriented grids. In Proceedings of

the 8th International Conference on Principles of Distributed

Systems (OPODIS’04), Grenoble, France, Dec. 2004.

[8] K. Keahey, K. Doering, and I. Foster. From sandbox to playground:

Dynamic virtual environments in the grid. In Proceedings

of the 5th International Workshop in Grid Computing

(Grid 2004),, Pittsburgh, PA, Nov. 2004.

[9] K. Lanctot, M. Li, and E. h. Yang. Estimating dna sequence

entropy. In Proceedings of the Eleventh Annual ACMSIAM

Symposium on Discrete Algorithms, pages 409–418,

San Francisco, California, Jan. 9–11, 2000. ACM.

[10] D. Lanter. Design of a lineage-based meta-data base for

gis. Cartography and Geographic Information Systems,

18(4):255–261, 1991.

[11] S. Miles, P. Groth, M. Branco, and L. Moreau. The requirements

of recording and using provenance in e-science experiments.

Technical report, University of Southampton, 2005.

[12] S. Miles, J. Papay, T. Payne, M. Luck, and L. Moreau. Towards

a protocol for the attachment of metadata to service

descriptions and its use in semantic discovery. Scientific Programming,

pages 201–211, 2005.

[13] C. Nevill-Manning and I. Witten. Protein is incompressible.

In J. Storer and M. Cohn, editors, Proc. Data Compression

Conference, pages 257–266, Los Alamitos, CA, 1999. IEEE

Press.

[14] G. Sampath. A block coding method that leads to significantly

lower entropy values for the proteins and coding

sections of haemophilus influenzae. In Proceedings of

the Computational Systems Bioinformatics (CSB’03). IEEE

Computer Society, 2003.

Corrections

ECS staff and postgraduates may modify this record

  Welcome from Deputy Head of School (Research) Research Prospectus Industrial Partnerships New Research Students Notes for Guidance New Research Students Notes for Guidance
The ECS EPrints Repository supports OAI 2.0 with a base URL of http://eprints.ecs.soton.ac.uk/cgi/oai2

EPrints is free software developed by the University of Southampton to facilitate Open Access to research.
EPrints