2016
Autores
Pasquali, A; Canavarro, M; Campos, R; Jorge, AM;
Publicação
Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering, C3S2E '16, Porto, Portugal, July 20-22, 2016
Abstract
Automatic topic detection in document collections is an important tool for various tasks. In particular, it is valuable for studying and understanding socio-political phenomena. A currently relevant example is the automatic analysis of streams of posts issued by different activist groups in the current Brazilian turmoil, through the analysis of the generated streams of texts published on the web. It is useful to determine the relative importance of the different topics identified. We can find in the literature proposals for measuring topic relevance. In this paper, we adopt two of such measures and apply them to data sets extracted from Facebook pages related to Brazilian political activism. On top of the analysis, we then carry an experimental evaluation of the human interpretability for these two measures by comparing their outcomes with the opinion of three Brazilian professionals from the field of Communication Science and media-activists. Copyright 2016 ACM.
2017
Autores
Pereira, J; Pasquali, A; Saleiro, P; Rossetti, R;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)
Abstract
In the last years researchers in the field of intelligent transportation systems have made several efforts to extract valuable information from social media streams. However, collecting domain-specific data from any social media is a challenging task demanding appropriate and robust classification methods. In this work we focus on exploring geolocated tweets in order to create a travel-related tweet classifier using a combination of bag-of-words and word embeddings. The resulting classification makes possible the identification of interesting spatio-temporal relations in Sao Paulo and Rio de Janeiro.
2018
Autores
Campos, R; Mangaravite, V; Pasquali, A; Jorge, AM; Nunes, C; Jatowt, A;
Publicação
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018)
Abstract
In this work, we propose a lightweight approach for keyword extraction and ranking based on an unsupervised methodology to select the most important keywords of a single document. To understand the merits of our proposal, we compare it against RAKE, TextRank and SingleRank methods (three well-known unsupervised approaches) and the baseline TF. IDF, over four different collections to illustrate the generality of our approach. The experimental results suggest that extracting keywords from documents using our method results in a superior effectiveness when compared to similar approaches.
2018
Autores
Campos, R; Mangaravite, V; Pasquali, A; Jorge, AM; Nunes, C; Jatowt, A;
Publicação
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018)
Abstract
In this paper, we present YAKE!, a novel feature-based system for multi-lingual keyword extraction from single documents, which supports texts of different sizes, domains or languages. Unlike most systems, YAKE! does not rely on dictionaries or thesauri, neither it is trained against any corpora. Instead, we follow an unsupervised approach which builds upon features extracted from the text, making it thus applicable to documents written in many different languages without the need for external knowledge. This can be beneficial for a large number of tasks and a plethora of situations where the access to training corpora is either limited or restricted. In this demo, we offer an easy to use, interactive session, where users from both academia and industry can try our system, either by using a sample document or by introducing their own text. As an add-on, we compare our extracted keywords against the output produced by the IBM Natural Language Understanding (IBM NLU) and Rake system. YAKE! demo is available at http://bit.ly/YakeDemoECIR2018. A python implementation of YAKE! is also available at PyPi repository (https://pypi.python.org/pypi/yake/).
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.