sensagent's content

  • definitions
  • synonyms
  • antonyms
  • encyclopedia

Dictionary and translator for handheld

⇨ New : sensagent is now available on your handheld

   Advertising ▼

sensagent's office

Shortkey or widget. Free.

Windows Shortkey: sensagent. Free.

Vista Widget : sensagent. Free.

Webmaster Solution

Alexandria

A windows (pop-into) of information (full-content of Sensagent) triggered by double-clicking any word on your webpage. Give contextual explanation and translation from your sites !

Try here  or   get the code

SensagentBox

With a SensagentBox, visitors to your site can access reliable information on over 5 million pages provided by Sensagent.com. Choose the design that fits your site.

Business solution

Improve your site content

Add new content to your site from Sensagent by XML.

Crawl products or adds

Get XML access to reach the best products.

Index images and define metadata

Get XML access to fix the meaning of your metadata.


Please, email us to describe your idea.

WordGame

The English word games are:
○   Anagrams
○   Wildcard, crossword
○   Lettris
○   Boggle.

Lettris

Lettris is a curious tetris-clone game where all the bricks have the same square shape but different content. Each square carries a letter. To make squares disappear and save space for other squares you have to assemble English words (left, right, up, down) from the falling squares.

boggle

Boggle gives you 3 minutes to find as many words (3 letters or more) as you can in a grid of 16 letters. You can also try the grid of 16 letters. Letters must be adjacent and longer words score better. See if you can get into the grid Hall of Fame !

English dictionary
Main references

Most English definitions are provided by WordNet .
English thesaurus is mainly derived from The Integral Dictionary (TID).
English Encyclopedia is licensed by Wikipedia (GNU).

Copyrights

The wordgames anagrams, crossword, Lettris and Boggle are provided by Memodata.
The web service Alexandria is granted from Memodata for the Ebay search.
The SensagentBox are offered by sensAgent.

Translation

Change the target language to find translations.
Tips: browse the semantic fields (see From ideas to words) in two languages to learn more.

last searches on the dictionary :

2946 online visitors

computed in 0.047s

   Advertising ▼


 » 

Wikipedia

General Architecture for Text Engineering

From Wikipedia

Jump to: navigation, search
GATE

GATE Developer v5 main window
Developer(s)GATE research team, Dept. Computer Science, University of Sheffield
Initial release1996
Stable release5.1 (2009-12-09) [+/−]
Preview release5.2 (nightly builds released everyday) [+/−]
Written inJava
Operating systemCross-platform
Available inEnglish
TypeText mining Information Extraction
LicenseLGPL
Websitehttp://gate.ac.uk/

General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including information extraction in many languages.

GATE includes[1]:

  • an IDE, GATE Developer: an integrated development environment for natural language processing components bundled with a very widely used information extraction system and a comprehensive set of other plugins
  • a web app, GATE Teamware: a collaborative annotation environment for factory-style semantic annotation projects built around a workflow engine and a heavily-optimised backend service infrastructure
  • a framework, GATE Embedded: an object library optimised for inclusion in diverse applications giving access to all the services used by GATE Developer and more
  • an architecture: a high-level organisational picture of language processing software composition
  • a process for the creation of robust and maintainable services.

Under development:

  • a wiki/CMS[2]
  • a cloud computing solution for hosted large-scale text processing, GATE Cloud

GATE aims to remove the necessity for solving common engineering problems before doing useful research, or re-engineering before deploying research results into applications. Core functions of GATE take care of the lion’s share of the engineering:

  • modelling and persistence of specialised data structures
  • measurement, evaluation, benchmarking
  • visualisation and editing of annotations, ontologies, parse trees, etc.
  • a finite state transduction language for rapid prototyping and efficient implementation of shallow analysis methods (JAPE, see below)
  • extraction of training instances for machine learning
  • pluggable machine learning implementations (Weka, SVM Light, an in-house uneven margins SVM implementation[3] and more.)

On top of the core functions, GATE includes components for diverse natural language processing tasks, e.g. parsers, morphology, tagging, information retrieval tools, information extraction components for various languages, and many others. It has been widely applied in fields such as bioinformatics[4] and others. GATE Developer and Embedded are supplied with an information extraction system (ANNIE) which has been adapted and evaluated very widely (numerous industrial systems, research systems evaluated in MUC, TREC, ACE, DUC, Pascal, NTCIR, etc.). ANNIE is often used to create RDF or OWL (metadata) for unstructured content (semantic annotation). GATE has been compared to NLTK, R and RapidMiner[5]. As well as being widely used in its own right, it forms the basis of the KIM semantic platform[6].

GATE community and research has been involved in several European research projects including TAO, SEKT, NeOn, Media-Campaign, Musing, Service-Finder, LIRICS and KnowledgeWeb, as well as many other projects.

As of December 4, 2009, 691 people are on the gate-users mailing list at SourceForge.net, and 98,858 downloads from SourceForge are recorded since the project moved to SourceForge in 2005[7]. The paper "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications"[8] has received over 800 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide[9], include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady[10], and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock[11].

Contents

Features

GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger. ANNIE can be used as-is to provide basic information extraction functionality, or provide a starting point for more specific tasks.

Languages currently handled in GATE include English, Spanish, Chinese, Arabic, French, German, Hindi, Italian, Cebuano, Romanian, Russian.

Plugins are included for machine learning with Weka, RASP, MAXENT, SVM Light, as well as a fast LibSVM integration and an in-house perceptron implementation, for managing Ontologies like WordNet, for querying search engines like Google or Yahoo, for part of speech tagging with Brill or TreeTagger, and many more.

GATE can handle input in various formats, such as TXT, HTML, XML, Doc, PDF documents, and Java Serial, PostgreSQL, Lucene, Oracle Databases with help of RDBMS storage over JDBC.

It also uses the JAPE (Java Annotation Patterns Engine) language for building rules in order to annotate documents with tags. JAPE stands for "Java Annotation Patterns Engine". JAPE provides finite state transduction over annotations based on regular expressions. JAPE is a version of CPSL – Common Pattern Specification Language. JAPE transducers are used within GATE to manipulate annotations on text. Documentation is provided in the GATE User Guide[12]. A tutorial has also been written by Press Association Images[13].

GATE Developer

GATE Developer is the GATE graphical user interface. It is analogous to systems like Mathematica for mathematicians, or Eclipse for Java programmers[14], providing a convenient graphical environment for research and development of language processing software. As well as being a powerful research tool in its own right, it is also useful in conjunction with GATE Embedded (the GATE API by which GATE functionality can be included in applications); for example, GATE Developer can be used to create applications that can then be embedded via the API.

GATE 5 main window.

The GATE Developer GUI consists of a top menu and row of icons, a left vertical resources tree, a central-right tabbed pane of the resource viewers and a message field at the bottom.

The resources tree and the menu are use to load, save and run resources. The resources tree display the loaded resources and allows to show a resource in a resource viewer by double-clicking on it or pressing Enter key.

Each loaded resource can be displayed in a specific resource viewer that take most of the space in the GUI.

Here you can see the document viewer use to display a document and its annotations. In pink are <A> hyperlink annotations from an [[[Hypertext Markup Language|HTML]] file. The right list is the annotation sets list and the bottom table is the annotation list. In the center is the annotation editor window.

GATE Teamware

Teamware is a web-based management platform for collaborative annotation & curation. GATE Teamware delivers a multi-function user interface over the Internet for viewing, adding and editing text annotations. The web-based management interface allows for project set-up, tracking, and management:

  • Loading document collections (a "corpus” or “corpora”)
  • Creating re-usable project templates
  • Initiating projects based on templates
  • Assigning project roles to specific users
  • Monitoring progress and various project statistics in real time
  • Reporting of project status, annotator activity and statistics
  • Applying GATE-based processing routines (automatic annotations or post-annotation processing)

See also

Free software portal

References

  1. GATE Family page on the GATE website
  2. GATE Wiki
  3. Adapting SVM for Data Sparseness and Imbalance: A Case Study on Information Extraction. Journal Of Natural Language Engineering 2009 (Y. Li, K. Bontcheva and H. Cunningham)
  4. "Combining Biological Databases and Text Mining to Support New Bioinformatics Applications", by René Witte and Christopher J.O. Baker (in "Lecture Notes in Computer Science, Springer Berlin, Volume 3513, 2005)
  5. "Open Source Text Analytics" web article by Seth Grimes
  6. "KIM – a semantic platform for information extraction and retrieval", by Popov et al (Natural Language Engineering (2004), 10:375-392)
  7. GATE project page on SourceForge
  8. "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications", by Cunningham H., Maynard D., Bontcheva K. and Tablan V. (In proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002)
  9. GATE User Guide
  10. "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady
  11. "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock
  12. JAPE chapter in the GATE User Guide
  13. A JAPE tutorial from Press Association Images, UK
  14. GATE Developer chapter in the GATE User Guide

General Architecture for Text Engineering

From Wikipedia, the free encyclopedia

Jump to: navigation, search
GATE

GATE Developer v5 main window
Developer(s)GATE research team, Dept. Computer Science, University of Sheffield
Initial release1996
Stable release5.1 (2009-12-09) [+/−]
Preview release5.2 (nightly builds released everyday) [+/−]
Written inJava
Operating systemCross-platform
Available inEnglish
TypeText mining Information Extraction
LicenseLGPL
Websitehttp://gate.ac.uk/

General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including information extraction in many languages.

GATE includes[1]:

  • an IDE, GATE Developer: an integrated development environment for natural language processing components bundled with a very widely used information extraction system and a comprehensive set of other plugins
  • a web app, GATE Teamware: a collaborative annotation environment for factory-style semantic annotation projects built around a workflow engine and a heavily-optimised backend service infrastructure
  • a framework, GATE Embedded: an object library optimised for inclusion in diverse applications giving access to all the services used by GATE Developer and more
  • an architecture: a high-level organisational picture of language processing software composition
  • a process for the creation of robust and maintainable services.

Under development:

  • a wiki/CMS[2]
  • a cloud computing solution for hosted large-scale text processing, GATE Cloud

GATE aims to remove the necessity for solving common engineering problems before doing useful research, or re-engineering before deploying research results into applications. Core functions of GATE take care of the lion’s share of the engineering:

  • modelling and persistence of specialised data structures
  • measurement, evaluation, benchmarking
  • visualisation and editing of annotations, ontologies, parse trees, etc.
  • a finite state transduction language for rapid prototyping and efficient implementation of shallow analysis methods (JAPE, see below)
  • extraction of training instances for machine learning
  • pluggable machine learning implementations (Weka, SVM Light, an in-house uneven margins SVM implementation[3] and more.)

On top of the core functions, GATE includes components for diverse natural language processing tasks, e.g. parsers, morphology, tagging, information retrieval tools, information extraction components for various languages, and many others. It has been widely applied in fields such as bioinformatics[4] and others. GATE Developer and Embedded are supplied with an information extraction system (ANNIE) which has been adapted and evaluated very widely (numerous industrial systems, research systems evaluated in MUC, TREC, ACE, DUC, Pascal, NTCIR, etc.). ANNIE is often used to create RDF or OWL (metadata) for unstructured content (semantic annotation). GATE has been compared to NLTK, R and RapidMiner[5]. As well as being widely used in its own right, it forms the basis of the KIM semantic platform[6].

GATE community and research has been involved in several European research projects including TAO, SEKT, NeOn, Media-Campaign, Musing, Service-Finder, LIRICS and KnowledgeWeb, as well as many other projects.

As of December 4, 2009, 691 people are on the gate-users mailing list at SourceForge.net, and 98,858 downloads from SourceForge are recorded since the project moved to SourceForge in 2005[7]. The paper "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications"[8] has received over 800 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide[9], include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady[10], and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock[11].

Contents

Features

GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger. ANNIE can be used as-is to provide basic information extraction functionality, or provide a starting point for more specific tasks.

Languages currently handled in GATE include English, Spanish, Chinese, Arabic, French, German, Hindi, Italian, Cebuano, Romanian, Russian.

Plugins are included for machine learning with Weka, RASP, MAXENT, SVM Light, as well as a fast LibSVM integration and an in-house perceptron implementation, for managing Ontologies like WordNet, for querying search engines like Google or Yahoo, for part of speech tagging with Brill or TreeTagger, and many more.

GATE can handle input in various formats, such as TXT, HTML, XML, Doc, PDF documents, and Java Serial, PostgreSQL, Lucene, Oracle Databases with help of RDBMS storage over JDBC.

It also uses the JAPE (Java Annotation Patterns Engine) language for building rules in order to annotate documents with tags. JAPE stands for "Java Annotation Patterns Engine". JAPE provides finite state transduction over annotations based on regular expressions. JAPE is a version of CPSL – Common Pattern Specification Language. JAPE transducers are used within GATE to manipulate annotations on text. Documentation is provided in the GATE User Guide[12]. A tutorial has also been written by Press Association Images[13].

GATE Developer

GATE Developer is the GATE graphical user interface. It is analogous to systems like Mathematica for mathematicians, or Eclipse for Java programmers[14], providing a convenient graphical environment for research and development of language processing software. As well as being a powerful research tool in its own right, it is also useful in conjunction with GATE Embedded (the GATE API by which GATE functionality can be included in applications); for example, GATE Developer can be used to create applications that can then be embedded via the API.

GATE 5 main window.

The GATE Developer GUI consists of a top menu and row of icons, a left vertical resources tree, a central-right tabbed pane of the resource viewers and a message field at the bottom.

The resources tree and the menu are use to load, save and run resources. The resources tree display the loaded resources and allows to show a resource in a resource viewer by double-clicking on it or pressing Enter key.

Each loaded resource can be displayed in a specific resource viewer that take most of the space in the GUI.

Here you can see the document viewer use to display a document and its annotations. In pink are <A> hyperlink annotations from an [[[Hypertext Markup Language|HTML]] file. The right list is the annotation sets list and the bottom table is the annotation list. In the center is the annotation editor window.

GATE Teamware

Teamware is a web-based management platform for collaborative annotation & curation. GATE Teamware delivers a multi-function user interface over the Internet for viewing, adding and editing text annotations. The web-based management interface allows for project set-up, tracking, and management:

  • Loading document collections (a "corpus” or “corpora”)
  • Creating re-usable project templates
  • Initiating projects based on templates
  • Assigning project roles to specific users
  • Monitoring progress and various project statistics in real time
  • Reporting of project status, annotator activity and statistics
  • Applying GATE-based processing routines (automatic annotations or post-annotation processing)

See also

Free software portal

References

  1. ^ GATE Family page on the GATE website
  2. ^ GATE Wiki
  3. ^ Adapting SVM for Data Sparseness and Imbalance: A Case Study on Information Extraction. Journal Of Natural Language Engineering 2009 (Y. Li, K. Bontcheva and H. Cunningham)
  4. ^ "Combining Biological Databases and Text Mining to Support New Bioinformatics Applications", by René Witte and Christopher J.O. Baker (in "Lecture Notes in Computer Science, Springer Berlin, Volume 3513, 2005)
  5. ^ "Open Source Text Analytics" web article by Seth Grimes
  6. ^ "KIM – a semantic platform for information extraction and retrieval", by Popov et al (Natural Language Engineering (2004), 10:375-392)
  7. ^ GATE project page on SourceForge
  8. ^ "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications", by Cunningham H., Maynard D., Bontcheva K. and Tablan V. (In proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002)
  9. ^ GATE User Guide
  10. ^ "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady
  11. ^ "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock
  12. ^ JAPE chapter in the GATE User Guide
  13. ^ A JAPE tutorial from Press Association Images, UK
  14. ^ GATE Developer chapter in the GATE User Guide

 

All translations of General_Architecture_for_Text_Engineering


   Advertising ▼