YAKE

Problem Addressed

Keywords are useful for various tasks, such as information retrieval, document summarization, text categorization, and sentiment analysis. However, keyword extraction is a challenging task, especially for single documents, because it requires to capture the specificity and importance of terms within a document without relying on external resources or prior knowledge.

Most of the existing methods for keyword extraction are either supervised, which require large amounts of labelled data that are not always available or suitable for different domains and languages, or unsupervised, which often rely on global features, such as document frequency or inverse document frequency, that are not effective for single documents.

Technology

Our solution is YAKE, an unsupervised algorithm for keyword extraction from single documents based on multiple local features, such as term frequency, position, and relatedness. This software also uses a language-independent scoring function that assigns a relevance score to each candidate keyword.

Results of several experiments on different datasets and languages, showed that YAKE! outperforms state-of-the-art methods in terms of precision, recall, and F1-score, extracts keywords in different types of documents, such as news articles, scientific papers, and political party programmes.

Advantages

Multilingual - No use of any language-specific tools (statistical features);

Adaptable - Unsupervised, not requiring labelled data, and multiple local features (terms importance);

Fast and simple - Effective scoring function.

Possible Applications

- Search engine optimization by easing the identification of relevant keywords;

- Data annotation and summarization by quickly extracting keywords from a single document;

- Automatic indexation in libraries, archives, and museums;

- Knowledge extraction and enrich graphical representation.

Commercial Rights
INESC TEC has exclusive rights
Development Stage
Mature Technology (TRL 7-9)
Further Information

Intellectual Property Status
Full copyrights

Opportunity
- Licensing (AGPLv3 or commercial license)
- Contract Research

Publications
YAKE! Keyword extraction from single documents using multiple local features
A Text Feature Based Automatic Keyword Extraction Method for Single Documents
YAKE! Collection-independent Automatic Keyword Extractor

Demo/Video
YAKE Demo

Git/Repository
https://github.com/LIAAD/yake

Awards & News
ECIR'18 Best Short Paper

Industrial Categories
Digital
Tags
Natural Language Processing (NLP), Keyword extraction, Language-Independent, Unsupervised Method

Commercial Rights
INESC TEC has exclusive rights
Development Stage
Mature Technology (TRL 7-9)
Industrial Categories
Digital
Further Information

Intellectual Property Status
Full copyrights

Opportunity
- Licensing (AGPLv3 or commercial license)
- Contract Research

Publications
YAKE! Keyword extraction from single documents using multiple local features
A Text Feature Based Automatic Keyword Extraction Method for Single Documents
YAKE! Collection-independent Automatic Keyword Extractor

Demo/Video
YAKE Demo

Git/Repository
https://github.com/LIAAD/yake

Awards & News
ECIR'18 Best Short Paper
Tags
Natural Language Processing (NLP), Keyword extraction, Language-Independent, Unsupervised Method

Problem Addressed

Technology

Advantages

Possible Applications

Commercial Rights

Development Stage

Further Information

Industrial Categories

Tags

Commercial Rights

Development Stage

Industrial Categories

Further Information

Tags

Daniel Marques Vasconcelos