Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
YAKE

Problem Addressed

Keywords are useful for various tasks, such as information retrieval, document summarization, text categorization, and sentiment analysis. However, keyword extraction is a challenging task, especially for single documents, because it requires to capture the specificity and importance of terms within a document without relying on external resources or prior knowledge. 

Most of the existing methods for keyword extraction are either supervised, which require large amounts of labelled data that are not always available or suitable for different domains and languages, or unsupervised, which often rely on global features, such as document frequency or inverse document frequency, that are not effective for single documents. 

 

Technology

Our solution is YAKE, an unsupervised algorithm for keyword extraction from single documents based on multiple local features, such as term frequency, position, and relatedness. This software also uses a language-independent scoring function that assigns a relevance score to each candidate keyword. 

Results of several experiments on different datasets and languages, showed that YAKE! outperforms state-of-the-art methods in terms of precision, recall, and F1-score, extracts keywords in different types of documents, such as news articles, scientific papers, and political party programmes.

Advantages

Multilingual - No use of any language-specific tools (statistical features);

Adaptable - Unsupervised, not requiring labelled data, and multiple local features (terms importance);

Fast and simple - Effective scoring function.

Possible Applications 

- Search engine optimization by easing the identification of relevant keywords; 

- Data annotation and summarization by quickly extracting keywords from a single document; 

- Automatic indexation in libraries, archives, and museums;

- Knowledge extraction and enrich graphical representation. 

  • Industrial Categories

    Digital
  • Tags

    Natural Language Processing (NLP), Keyword extraction, Language-Independent, Unsupervised Method
Contacts