Problem Addressed
Keywords are useful for various tasks, such as information retrieval, document summarization, text categorization, and sentiment analysis. However, keyword extraction is a challenging task, especially for single documents, because it requires to capture the specificity and importance of terms within a document without relying on external resources or prior knowledge.
Most of the existing methods for keyword extraction are either supervised, which require large amounts of labelled data that are not always available or suitable for different domains and languages, or unsupervised, which often rely on global features, such as document frequency or inverse document frequency, that are not effective for single documents.
Technology
Our solution is YAKE, an unsupervised algorithm for keyword extraction from single documents based on multiple local features, such as term frequency, position, and relatedness. This software also uses a language-independent scoring function that assigns a relevance score to each candidate keyword.
Results of several experiments on different datasets and languages, showed that YAKE! outperforms state-of-the-art methods in terms of precision, recall, and F1-score, extracts keywords in different types of documents, such as news articles, scientific papers, and political party programmes.
Advantages
Multilingual - No use of any language-specific tools (statistical features);
Adaptable - Unsupervised, not requiring labelled data, and multiple local features (terms importance);
Fast and simple - Effective scoring function.
Possible Applications
- Search engine optimization by easing the identification of relevant keywords;
- Data annotation and summarization by quickly extracting keywords from a single document;
- Automatic indexation in libraries, archives, and museums;
- Knowledge extraction and enrich graphical representation.
-
Commercial Rights
INESC TEC has exclusive rights -
Development Stage
Mature Technology (TRL 7-9) -
Further Information
Intellectual Property Status
Full copyrights
Opportunity
- Licensing (AGPLv3 or commercial license)
- Contract Research
Demo/Video
Scientific Publications
Information Sciences, 509 (2020), 257-289
Git/Repository
Awards & News
-
Industrial Categories
Digital -
Tags
Natural Language Processing (NLP), Keyword extraction, Language-Independent, Unsupervised Method