Publicacoes - INESC TEC

Publicações

Publicações por Pavel Brazdil

2019

Simplifying the Algorithm Selection Using Reduction of Rankings of Classification Algorithms

Autores
Abdulrahman, SM; Brazdil, P; Zainon, WMNW; Adamu, A;

Publicação
2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND COMPUTER APPLICATIONS (ICSCA 2019)

Abstract
The average ranking method (AR) is one of the simplest and effective algorithms selection methods. This method uses metadata in the form of test results of a given set of algorithms on a given set of datasets and calculates an average rank for each algorithm. The ranks are used to construct the average ranking. In this paper we investigate the problem of how the rankings can be reduced by removing non-competitive and redundant algorithms, thereby reducing the number of tests a user needs to conduct on a new dataset to identify the most suitable algorithm. The method proposed involves two phases. In the first one, the aim is to identify the most competitive algorithms for each dataset used in the past. This is done with the recourse to a statistical test. The second phase involves a covering method whose aim is to reduce the algorithms by eliminating redundant variants. The proposed method differs from one earlier proposal in various aspects. One important one is that it takes both accuracy and time into consideration. The proposed method was compared to the baseline strategy which consists of executing all algorithms from the ranking. It is shown that the proposed method leads to much better performance than the baseline.

FecharLer Abstract

2021

Towards a Human-AI Hybrid Framework for Inter-Researcher Similarity Detection

Autores
Guimaraes, D; Paulino, D; Correia, A; Trigo, L; Brazdil, P; Paredes, H;

Publicação
PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON HUMAN-MACHINE SYSTEMS (ICHMS)

Abstract
Understanding the intellectual landscape of scientific communities and their collaborations has become an indispensable part of research per se. In this regard, measuring similarities among scientific documents can help researchers to identify groups with similar interests as a basis for strengthening collaboration and university-industry linkages. To this end, we intend to evaluate the performance of hybrid crowd-computing methods in measuring the similarity between document pairs by comparing the results achieved by crowds and artificial intelligence (AI) algorithms. That said, in this paper we designed two types of experiments to illustrate some issues in calculating how similar an automatic solution is to a given ground truth. In the first type of experiments, we created a crowdsourcing campaign consisting of four human intelligence tasks (HITs) in which the participants had to indicate whether or not a set of papers belonged to the same author. The second type involves a set of natural language processing (NLP) processes in which we used the TF-IDF measure and the Bidirectional Encoder Representation from Transformers (BERT) model. The results of the two types of experiments carried out in this study provide preliminary insight into detecting major contributions from human-AI cooperation at similarity calculation in order to achieve better decision support. We believe that in this case decision makers can be better informed about potential collaborators based on content-based insights enhanced by hybrid human-AI mechanisms.

FecharLer Abstract

2021

Text documents streams with improved incremental similarity

Autores
Sarmento, RP; Cardoso, DO; Dearo, K; Brazdil, P; Gama, J;

Publicação
SOCIAL NETWORK ANALYSIS AND MINING

Abstract
There has been a significant effort by the research community to address the problem of providing methods to organize documentation, with the help of Information Retrieval methods. In this paper, we present several experiments with stream analysis methods to explore streams of text documents. This paper also presents possible architectures of the Text Document Stream Organization, with the use of incremental algorithms like Incremental Sparse TF-IDF and Incremental Similarity. Our results show that with this architecture, significant improvements are achieved, regarding efficiency in grouping of similar documents. These improvements are important since it is of general knowledge that great amounts of text analysis are a high dimensional and complex subject of study, in the data analysis area.

FecharLer Abstract

2022

Detection of Loanwords in Angolan Portuguese: A Text Mining Approach

Autores
Muhongo, TS; Brazdil, PB; Silva, F;

Publicação
INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE

Abstract
Angola is characterized by many different languages and social, cultural and political realities, which had a marked effect on Angolan Portuguese (AP). Consequently, AP is characterized by diatopic variation. One of the marked effects is the loanwords imported from other Angolan languages. Our objective is to analyze different Angolan texts, analyze the lexical forms used and conduct a comparative study with European Portuguese, aiming at identifying the possible loanwords in Angolan Portuguese. This process was automated, as well as the identification of all loanwords' cotexts. In addition, we determine the lexical class of each loanword and the Angolan language of its origin. Most lexical loanwords come from the Kimbundu, although AP includes loanwords from some other Angolan languages too. Our study serves as a basis for preparing an Angolan regionalism dictionary. We noticed that more than 700 identified loanwords do not figure in the existing dictionaries.

FecharLer Abstract

2021

Exploiting Performance-based Similarity between Datasets in Metalearning

Autores
Leite, R; Brazdil, P;

Publicação
AAAI Workshop on Meta-Learning and MetaDL Challenge, MetaDL@AAAI 2021, virtual, February 9, 2021.

Abstract

2022

Semi-Automatic Approaches for Exploiting Shifter Patterns in Domain-Specific Sentiment Analysis

Autores
Brazdil, P; Muhammad, SH; Oliveira, F; Cordeiro, J; Silva, F; Silvano, P; Leal, A;

Publicação
MATHEMATICS

Abstract
This paper describes two different approaches to sentiment analysis. The first is a form of symbolic approach that exploits a sentiment lexicon together with a set of shifter patterns and rules. The sentiment lexicon includes single words (unigrams) and is developed automatically by exploiting labeled examples. The shifter patterns include intensification, attenuation/downtoning and inversion/reversal and are developed manually. The second approach exploits a deep neural network, which uses a pre-trained language model. Both approaches were applied to texts on economics and finance domains from newspapers in European Portuguese. We show that the symbolic approach achieves virtually the same performance as the deep neural network. In addition, the symbolic approach provides understandable explanations, and the acquired knowledge can be communicated to others. We release the shifter patterns to motivate future research in this direction.

FecharLer Abstract