Intranet Tools

nb. next round of REF2013 will NOT be using data from eprints.ecs, but the central university REF interface.

RSS 1.0 Feed
RSS 2.0 Feed
Atom Feed
 

Automatic Extraction of Knowledge from Web Documents

Alani, H., Kim, S., Millard, D. E., Weal, M. J., Lewis, P. H., Hall, W. and Shadbolt, N. R. (2003) Automatic Extraction of Knowledge from Web Documents. In: 2nd International Semantic Web Conference - Workshop on Human Language Technology for the Semantic Web abd Web Services, October 20-23, Sanibel Island, Florida, USA.

Download

[img]
Preview
PDF
276Kb

Abstract

A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper.

Item Type:Conference or Workshop Item
Creator/Authors:
Harith Alani
Sanghee Kim
David E. Millard
Mark J. Weal
Paul H. Lewis
Wendy Hall
Nigel R. Shadbolt
Research Group:Old ECS Groups > Science and Engineering of Natural Systems
Current ECS Groups > Web and Internet Science
Old ECS Groups > Intelligence, Agents, Multimedia
Date:2003
Information about this record:
Performance Indicator:EZ~07~07~04
Citations:ISI: 98, Google Scholar: 29
Downloads (2010):154
ID Code:8194
Last Modified:23 Sep 2011 10:29
Deposited On:18 Oct 2003 by Alani, Harith

Tools & Metadata

Download Statistics

Last month

Last year

Members of ECS may view the download statistics dashboard for this record.

References in Article

Select the SEEK icon to attempt to find the referenced article. If it does not appear to be in this archive you will be forwarded to the paracite service. Poorly formated references will probably not work.

1. Alani, H., Kim, S., Millard, D., Weal, M., Hall, W., Lewis, P., and Shadbolt, N. “Automatic Ontology-based Knowledge Extraction from Web Documents”. IEEE Intelligent Systems, 18(1), pages 14-21, 2003.

2. Ciravegna, F. “Adaptive Information Extraction from Text by Rule Induction and Generalisation”. Proc.17th Int. Joint Con. on AI (IJCAI), pp 1251--1256, Seattle, USA, 2001.

3. Cunningham, H., Maynard, D., Bontcheva, K., and Tablan, V. “GATE: a framework and graphical development environment for robust NLP tools and applications”. Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, Phil.,USA, 2002.

4. Handschuh, S., Staab, S., and Ciravegna, F. “S-CREAM – Semi Automatic Creation of Metadata”. Semantic Authoring, Annotation and Markup Workshop, 15th European Conf. on Artificial Intelligence, pages 27--33, Lyon, France, 2002.

5. Kim, S., Alani, H., Hall, W., Lewis, P.H., Millard, D.E., Shadbolt, N., and Weal, M.J. “Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web”. Workshop on Semantic Authoring, Annotation & Knowledge Markup, 15th European Conf. on Artificial Intelligence (ECAI), pages 1--6, Lyon, France, July 2002.

6. Lee, K., D. Luparello, and J. Roudaire, “Automatic Construction of Personalised TV News Programs,” Proc. 7th ACM Conf. on Multimedia, Orlando, Florida, 1999, pp. 323-332.

7. Maedche, A., G. Neumann and S. Staab. Bootstrapping an Ontology-based Information Extraction System. Intelligent Exploration of the Web. Springer 2002.

8. Marsh, E. & D. Perzanowski (NRL), MUC-7 Evaluation of IE Technology: Overview of Results, available at http://www.itl.nist.gov/iaui/894.02/related_projects/muc/index.html

9. McKeown, K. R., R. Barzilay, D. Evans, V. Hatzivassiloglou, J. L. Klavans, A. Nenkova, C. Sable, B. Schiffman and S. Sigelman. “Tracking and Summarizing News on a Daily Basis with Columbia's Newsblaster”. Proc. Human Language Technology Conf., CA, USA. 2002.

10. Michaelides, D.T., Millard, D.E., Weal, M.J., and DeRoure, D. “Auld Leaky: A Contextual Open Hypermedia Link Server”. Proc. of the 7th Hypermedia: Openness, Structural Awareness, and Adaptivity, pages 59--70, Springer Verlag, Heidelberg, 2001, LNCS.

11. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. “Introduction to wordnet: An on-line lexical database”. Int. Journal of Lexicography, 3(4):235--312, 1993.

12. Radev, D. R. and K. R. McKeown. "Generating natural language summaries from multiple on-line sources." Compuutational Linguistics 24(3): 469—500, 1998.

13. Reidsma, D., J. Kuper, T. Declerck, H. Saggion and H. Cunningham. Cross document annotation for multimedia retrieval. EACL Workshop on Language Technology and the Semantic Web (NLPXML), Budapest, Hungary, 2003.

14. Rutledge, L., B. Bailey, J.V. Ossenbruggen, L. Hardman, and J. Geurts, “Generating Presentation Constraints from Rhetorical Structure,” Proc. 11th ACM Conf. on Hypertext and Hypermedia, San Antonio, Texas, USA, 2000, pp. 19-28.

15. Sekine, S. and Grishman R., “A corpus-based probabilistic grammar with only two non-terminals”, Proc. of the 1st Int. Workshop on Multimedia annotation, Japan, 2001.

16. Staab, S., Maedche, A., and Handschuh, S. “An Annotation Framework for the Semantic Web”. Proc. 1st Int. Workshop on MultiMedia Annotation, Tokoyo, Japan, January 2001.

17. Vargas-Vera, M., E. Motta, J. Domingue, M. Lanzoni, A. Stutt and F. Ciravegna. “MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup”. 13th Int. Conf on Knowledge Engineering and Management (EKAW 02), Spain, 2002.

18. White, M., T. Korelsky, C. Cardie, V. Ng, D. Pierce and K. Wagstaff. Multidocument Summarization via Information Extraction. Proc. of Human Language Technology Conf. (HLT 2001), San Diego, CA, 2001.

Corrections

ECS staff and postgraduates may modify this record

  Welcome from Deputy Head of School (Research) Research Prospectus Industrial Partnerships New Research Students Notes for Guidance New Research Students Notes for Guidance
The ECS EPrints Repository supports OAI 2.0 with a base URL of http://eprints.ecs.soton.ac.uk/cgi/oai2

EPrints is free software developed by the University of Southampton to facilitate Open Access to research.
EPrints