Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Paula Brito

2015

Clustering of symbolic data

Autores
Brito, P;

Publicação
Handbook of Cluster Analysis

Abstract
In this chapter, we present clustering methods for symbolic data. We start by recalling that symbolic data is data presenting inherent variability, and the motivations for the introduction of this new paradigm.We then proceed by defining the different types of variables that allow for the representation of symbolic data, and recall some distance measures appropriate for the new data types. Then we present clustering methods for different types of symbolic data, both hierarchical and nonhierarchical. An application illustrates two well-known methods for clustering symbolic data. © 2016 by Taylor & Francis Group, LLC.

2024

Anomaly detection-based undersampling for imbalanced classification problems

Autores
Park, YJ; Brito, P; Ma, YC;

Publicação
ENGINEERING OPTIMIZATION

Abstract
In various machine learning applications, classification plays an important role in categorizing and predicting data. To improve the classification performance, it is crucial to identify and remove the anomalies. Also, class imbalance in many machine learning applications is a very common problem since most classifiers tend to be biased toward the majority class by ignoring the minority class instances. Thus, in this research, we propose a new under-sampling technique based on anomaly detection and removal to enhance the performance of imbalanced classification problems. To demonstrate the effectiveness of the proposed method, comprehensive experiments are conducted on forty imbalanced data sets and two non-parametric hypothesis tests are employed to show the statistical difference in classification performances between the proposed method and other traditional resampling methods. From the experiment, it is shown that the proposed method improves the classification performance by effectively detecting and eliminating the anomalies among true-majority or pseudo-majority class instances.

2023

Community detection in interval-weighted networks

Autores
Alves, H; Brito, P; Campos, P;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
In this paper we introduce and develop the concept of interval-weighted networks (IWN), a novel approach in Social Network Analysis, where the edge weights are represented by closed intervals composed with precise information, comprehending intrinsic variability. We extend IWN for both Newman's modularity and modularity gain and the Louvain algorithm, considering a tabular representation of networks by contingency tables. We apply our methodology to two real-world IWN. The first is a commuter network in mainland Portugal, between the twenty three NUTS 3 Regions (IWCN). The second focuses on annual merchandise trade between 28 European countries, from 2003 to 2015 (IWTN). The optimal partition of geographic locations (regions or countries) is developed and compared using two new different approaches, designated as Classic Louvain and Hybrid Louvain , which allow taking into account the variability observed in the original network, thereby minimizing the loss of information present in the raw data. Our findings suggest the division of the twenty three Portuguese regions in three main communities for the IWCN and between two to three country communities for the IWTN. However, we find different geographical partitions according to the community detection methodology used. This analysis can be useful in many real-world applications, since it takes into account that the weights may vary within the ranges, rather than being constant.

  • 11
  • 11