Publications

Publications by Alípio Jorge

2018

A Text Feature Based Automatic Keyword Extraction Method for Single Documents

Authors
Campos, R; Mangaravite, V; Pasquali, A; Jorge, AM; Nunes, C; Jatowt, A;

Publication
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018)

Abstract
In this work, we propose a lightweight approach for keyword extraction and ranking based on an unsupervised methodology to select the most important keywords of a single document. To understand the merits of our proposal, we compare it against RAKE, TextRank and SingleRank methods (three well-known unsupervised approaches) and the baseline TF. IDF, over four different collections to illustrate the generality of our approach. The experimental results suggest that extracting keywords from documents using our method results in a superior effectiveness when compared to similar approaches.

CloseRead Abstract

2018

YAKE! Collection-Independent Automatic Keyword Extractor

Authors
Campos, R; Mangaravite, V; Pasquali, A; Jorge, AM; Nunes, C; Jatowt, A;

Publication
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018)

Abstract
In this paper, we present YAKE!, a novel feature-based system for multi-lingual keyword extraction from single documents, which supports texts of different sizes, domains or languages. Unlike most systems, YAKE! does not rely on dictionaries or thesauri, neither it is trained against any corpora. Instead, we follow an unsupervised approach which builds upon features extracted from the text, making it thus applicable to documents written in many different languages without the need for external knowledge. This can be beneficial for a large number of tasks and a plethora of situations where the access to training corpora is either limited or restricted. In this demo, we offer an easy to use, interactive session, where users from both academia and industry can try our system, either by using a sample document or by introducing their own text. As an add-on, we compare our extracted keywords against the output produced by the IBM Natural Language Understanding (IBM NLU) and Rake system. YAKE! demo is available at http://bit.ly/YakeDemoECIR2018. A python implementation of YAKE! is also available at PyPi repository (https://pypi.python.org/pypi/yake/).

CloseRead Abstract

2013

Binary recommender systems: Introduction, an application and outlook

Authors
Jorge, AM;

Publication
ACM International Conference Proceeding Series

Abstract
Recommender Systems are a hot application area these days, made popular by well known web sites. The problem of predicting user preferences is very demanding from the data mining algorithm design point of view, but it also poses challenges to evaluation and monitoring. Moreover, there is a lot of information that can be exploited, from clickstreams and background information to musical content and social interaction. As data grows and recommendation requests must be answered in a split second, online and agile solutions must be implemented. In this talk we will give a brief introduction to binary recommender systems, describe a particular hybrid application to music recommendation - from algorithm to online evaluation, and refer to context aware and online recommender algorithms. © 2013 ACM.

CloseRead Abstract

2018

Forgetting techniques for stream-based matrix factorization in recommender systems

Authors
Matuszyk, P; Vinagre, J; Spiliopoulou, M; Jorge, AM; Gama, J;

Publication
KNOWLEDGE AND INFORMATION SYSTEMS

Abstract
Forgetting is often considered a malfunction of intelligent agents; however, in a changing world forgetting has an essential advantage. It provides means of adaptation to changes by removing effects of obsolete (not necessarily old) information from models. This also applies to intelligent systems, such as recommender systems, which learn users' preferences and predict future items of interest. In this work, we present unsupervised forgetting techniques that make recommender systems adapt to changes of users' preferences over time. We propose eleven techniques that select obsolete information and three algorithms that enforce the forgetting in different ways. In our evaluation on real-world datasets, we show that forgetting obsolete information significantly improves predictive power of recommender systems.

CloseRead Abstract

2014

A data warehouse to support web site automation

Authors
Domingues, MA; Soares, C; Jorge, AM; Rezende, SO;

Publication
Journal of the Brazilian Computer Society

Abstract
Background: Due to the constant demand for new information and timely updates of services and content in order to satisfy the user’s needs, web site automation has emerged as a solution to automate several personalization and management activities of a web site. One goal of automation is the reduction of the editor’s effort and consequently of the costs for the owner. The other goal is that the site can more timely adapt to the behavior of the user, improving the browsing experience and helping the user in achieving his/her own goals. Methods: A database to store rich web data is an essential component for web site automation. In this paper, we propose a data warehouse that is developed to be a repository of information to support different web site automation and monitoring activities. We implemented our data warehouse and used it as a repository of information in three different case studies related to the areas of e-commerce, e-learning, and e-news. Result: The case studies showed that our data warehouse is appropriate for web site automation in different contexts. Conclusion: In all cases, the use of the data warehouse was quite simple and with a good response time, mainly because of the simplicity of its structure. © 2014, Domingues et al.; licensee Springer.

CloseRead Abstract

2013

Comparing relational and non-relational algorithms for clustering propositional data

Authors
Motta, R; Nogueira, BM; Jorge, AM; De Andrade Lopes, A; Rezende, SO; De Oliveira, MCF;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
Cluster detection methods are widely studied in Propositional Data Mining. In this context, data is individually represented as a feature vector. This data has a natural nonrelational structure, but can be represented in a relational form through similarity-based network models. In these models, examples are represented by vertices and an edge connects two examples with high similarity. This relational representation allows employing network-based algorithms in Relational Data Mining. Specifically in clustering tasks, these models allow to use community detection algorithms in networks in order to detect data clusters. In this work, we compared traditional non-relational data-based clustering algorithms with clustering detection algorithms based on relational data using measures for community detection in networks. We carried out an exploratory analysis over 23 numerical datasets and 10 textual datasets. Results show that network models can efficiently represent the data topology, allowing their application in cluster detection with higher precision when compared to non-relational methods. Copyright 2013 ACM.

CloseRead Abstract