Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I am Associate Professor at the School of Economics of the University of Porto, where  I teach Statistics and Multivariate Data Analysis, at undergraduate and post-graduate (Master, PhD) levels, and member of the Artificial Intelligence and Decision Support Lab (LIAAD) of INESC-TEC. I hold a doctorate degree in Applied Mathematics from the University of Paris Dauphine (1991).

My current research focuses on the analysis of multidimensional complex data, known as symbolic data - data representing inherent variability, in the form of intervals or distributions - for which I develop statistical approaches and multivariate analysis methodologies.  I am generally interested in multivariate data analysis, with particular incidence in clustering methods.

Interest
Topics
Details

Details

  • Name

    Paula Brito
  • Role

    Research Coordinator
  • Since

    01st January 2008
001
Publications

2024

Community detection in interval-weighted networks

Authors
Alves, H; Brito, P; Campos, P;

Publication
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
In this paper we introduce and develop the concept of interval-weighted networks (IWN), a novel approach in Social Network Analysis, where the edge weights are represented by closed intervals composed with precise information, comprehending intrinsic variability. We extend IWN for both Newman's modularity and modularity gain and the Louvain algorithm, considering a tabular representation of networks by contingency tables. We apply our methodology to two real-world IWN. The first is a commuter network in mainland Portugal, between the twenty three NUTS 3 Regions (IWCN). The second focuses on annual merchandise trade between 28 European countries, from 2003 to 2015 (IWTN). The optimal partition of geographic locations (regions or countries) is developed and compared using two new different approaches, designated as Classic Louvain and Hybrid Louvain , which allow taking into account the variability observed in the original network, thereby minimizing the loss of information present in the raw data. Our findings suggest the division of the twenty three Portuguese regions in three main communities for the IWCN and between two to three country communities for the IWTN. However, we find different geographical partitions according to the community detection methodology used. This analysis can be useful in many real-world applications, since it takes into account that the weights may vary within the ranges, rather than being constant.

2024

Anomaly detection-based undersampling for imbalanced classification problems

Authors
Park, YJ; Brito, P; Ma, YC;

Publication
ENGINEERING OPTIMIZATION

Abstract
In various machine learning applications, classification plays an important role in categorizing and predicting data. To improve the classification performance, it is crucial to identify and remove the anomalies. Also, class imbalance in many machine learning applications is a very common problem since most classifiers tend to be biased toward the majority class by ignoring the minority class instances. Thus, in this research, we propose a new under-sampling technique based on anomaly detection and removal to enhance the performance of imbalanced classification problems. To demonstrate the effectiveness of the proposed method, comprehensive experiments are conducted on forty imbalanced data sets and two non-parametric hypothesis tests are employed to show the statistical difference in classification performances between the proposed method and other traditional resampling methods. From the experiment, it is shown that the proposed method improves the classification performance by effectively detecting and eliminating the anomalies among true-majority or pseudo-majority class instances.

2024

Immigrant groups in the Luxembourgish labour market: A Symbolic Data Analysis approach

Authors
Silva, CC; Brito, P; Campos, P;

Publication
Statistical Journal of the IAOS

Abstract
Luxembourg, known for its immigration history, attracts immigrants to work. This study analyses different immigrant groups in the labour market from 2014 to 2022 by using Labor Force Survey (LFS) data, Symbolic Data Analysis (SDA), and the Monitoring the Evolution of Clusters (MEC) framework. Based on the birthplace and length of residence in Luxembourg, in each year, microdata were aggregated into 21 symbolic objects. They were primarily described by 16 modal variables which are multi-valued variables with a frequency attached to each category. Moreover, clustering using complete linkage and the Chernoff’s distance was applied. The Heuristic Identification of Noisy Variables (HINoV) suggested that with just six variables, objects may be grouped homogeneously. The MEC framework traced temporal relations and transitions between the clusters, revealing some movements across the different years. Results indicate that people from the European Union (EU) and Neighbouring countries have similar profiles while the Portuguese have opposite characteristics. The Luxembourgers are somewhere in between. Profiling people from non-EU countries was challenging. The data and methodology used make it easy to replicate the work in other nations, enabling comparison of results and monitoring to continue in the future.

2024

New skills in symbolic data analysis for official statistics

Authors
Verde R.; Batagelj V.; Brito P.; Silva A.P.D.; Korenjak-Cerne S.; Dobša J.; Diday E.;

Publication
Statistical Journal of the IAOS

Abstract
The paper draws attention to the use of Symbolic Data Analysis (SDA) in the field of Official Statistics. It is composed of three sections presenting three pilot techniques in the field of SDA. The three contributions range from a technique based on the notion of exactly unified summaries for the creation of symbolic objects, a model-based approach for interval data as an innovative parametric strategy in this context, and measures of similarity defined between a class and a collection of classes based on the frequency of the categories which characterize them. The paper shows the effectiveness of the proposed approaches as prototypes of numerous techniques developed within the SDA framework and opens to possible further developments.

2024

Special issue on New methodologies in clustering and classification for complex and/or big data

Authors
Brito, P; Cerioli, A; Garcia-Escudero, LA; Saporta, G;

Publication
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Abstract
[No abstract available]

Supervised
thesis

2024

The evolution of immigrant groups in Luxembourg - What are the different pathways in the labour market?

Author
Catarina Campos de Melo Sousa Silva

Institution
UP-FEP

2024

Anomaly Detection Methods for Complex Data: Applications to Internet Traffic and Financial Markets

Author
Catarina Padrela Loureiro

Institution
UP-FEP

2023

The evolution of immigrant groups in Luxembourg - What are the different pathways in the labour market?

Author
Catarina Campos de Melo Sousa Silva

Institution
UP-FEP

2023

Multi-class Classification of Distributional Data

Author
Ana Carolina Silva Rodrigues dos Santos

Institution
UP-FEP

2023

Searching for Symbolic Patterns in Attributed Networks

Author
Maria Hermínia Esteves de Carvalho

Institution
UP-FEP