Publicacoes - INESC TEC

Publicações

Publicações por Pavel Brazdil

2022

Metalearning

Autores
Brazdil, P; van Rijn, JN; Soares, C; Vanschoren, J;

Publicação
Cognitive Technologies

Abstract

2022

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

Autores
Muhammad, SH; Adelani, DI; Ruder, S; Ahmad, IS; Abdulmumin, I; Bello, BS; Choudhury, M; Emezue, CC; Abdullahi, SS; Aremu, A; Jorge, A; Brazdil, P;

Publicação
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

Abstract
Sentiment analysis is one of the most widely studied applications in NLP, but most work focuses on languages with large amounts of data. We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria-Hausa, Igbo, Nigerian-Pidgin, and Yoruba-consisting of around 30,000 annotated tweets per language, including a significant fraction of code-mixed tweets. We propose text collection, filtering, processing, and labeling methods that enable us to create datasets for these low-resource languages. We evaluate a range of pre-trained models and transfer strategies on the dataset. We find that language-specific models and language-adaptive fine-tuning generally perform best. We release the datasets, trained models, sentiment lexicons, and code to incentivize research on sentiment analysis in under-represented languages.

FecharLer Abstract

2022

On Usefulness of Outlier Elimination in Classification Tasks

Autores
Hetlerovic, D; Popelinsky, L; Brazdil, P; Soares, C; Freitas, F;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XX, IDA 2022

Abstract
Although outlier detection/elimination has been studied before, few comprehensive studies exist on when exactly this technique would be useful as preprocessing in classification tasks. The objective of our study is to fill in this gap. We have performed experiments with 12 various outlier elimination methods and 10 classification algorithms on 50 different datasets. The results were then processed by the proposed reduction method, whose aim is identify the most useful workflows for a given set of tasks (datasets). The reduction method has identified that just three OEMs that are generally useful for the given set of tasks. We have shown that the inclusion of these OEMs is indeed useful, as it leads to lower loss in accuracy and the difference is quite significant (0.5%) on average.

FecharLer Abstract

1991

Learning in multi-agent environments

Autores
Brazdil, P;

Publicação
Algorithmic Learning Theory, 2nd International Workshop, ALT '91, Tokyo, Japan, October 23-25, 1991, Proceedings

Abstract

2001

Improving the Robustness and Encoding Complexity of Behavioural Clones

Autores
Camacho, R; Brazdil, P;

Publicação
Machine Learning: EMCL 2001, 12th European Conference on Machine Learning, Freiburg, Germany, September 5-7, 2001, Proceedings

Abstract
The aim of behavioural cloning is to synthesize artificial controllers that are robust and comprehensible to human understanding. To attain the two objectives we propose the use of the Incremental Correction model that is based on a closed-loop control strategy to model the reactive aspects of human control skills. We have investigated the use of three different representations to encode the artificial controllers: univariate decision trees as induced by C4.5; multivariate decision and regression trees as induced by cart and; clausal theories induced by an Inductive Logic Programming (ILP) system. We obtained an increase in robustness and a lower complexity of the controllers when compared with results using other models. The controllers synthesized by cart revealed to be the most robust. The ILP system produced the simpler encodings. © Springer-Verlag Berlin Heidelberg 2001.

FecharLer Abstract

1991

Panel: Learning in Distributed Systems and Multi-Agent Environments

Autores
Brazdil, P; Gams, M; Sian, SS; Torgo, L; de Velde, WV;

Publicação
Machine Learning - EWSL-91, European Working Session on Learning, Porto, Portugal, March 6-8, 1991, Proceedings

Abstract
The paper begins with the discussion on why we should be concerned with machine learning in the context of distributed AI. The rest of the paper is dedicated to various problems of multi-agent learning. First, a common framework for comparing different existing systems is presented. It is pointed out that it is useful to distinguish when the individual agents communicate. Some systems communicate during the learning phase, others during the problem solving phase, for example. It is also important to consider how, that is in what language, the communication is established. The paper analyses several systems in this framework. Particular attention is paid to previous work done by the authors in this area. The paper covers use of redundant knowledge, knowledge integration, evaluation of hypothesis by a community of agents and resolution of language differences between agents. © Springer-Verlag Berlin Heidelberg 1991.

FecharLer Abstract