Publicacoes - INESC TEC

Publicações

Publicações por Pedro Manuel Ribeiro

2023

The GANfather: Controllable generation of malicious activity to improve defence systems

Autores
Pereira, RR; Bono, J; Ascensao, JT; Aparício, D; Ribeiro, P; Bizarro, P;

Publicação
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023

Abstract
Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7-4 trillion euros are laundered annually and go undetected. We propose The GANfather, a method to generate samples with properties of malicious activity, without label requirements. We propose to reward the generation of malicious samples by introducing an extra objective to the typical Generative Adversarial Networks (GANs) loss. Ultimately, our goal is to enhance the detection of illicit activity using the discriminator network as a novel and robust defence system. Optionally, we may encourage the generator to bypass pre-existing detection systems. This setup then reveals defensive weaknesses for the discriminator to correct. We evaluate our method in two real-world use cases, money laundering and recommendation systems. In the former, our method moves cumulative amounts close to 350 thousand dollars through a network of accounts without being detected by an existing system. In the latter, we recommend the target item to a broad user base with as few as 30 synthetic attackers. In both cases, we train a new defence system to capture the synthetic attacks.

FecharLer Abstract

2023

From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs

Autores
Eddin, AN; Bono, J; Aparício, D; Ferreira, H; Ascensao, J; Ribeiro, P; Bizarro, P;

Publicação
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023

Abstract
Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling khop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graphsprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.

FecharLer Abstract

2024

Multilayer quantile graph for multivariate time series analysis and dimensionality reduction

Autores
Silva, VF; Silva, ME; Ribeiro, P; Silva, F;

Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
In recent years, there has been a surge in the prevalence of high- and multidimensional temporal data across various scientific disciplines. These datasets are characterized by their vast size and challenging potential for analysis. Such data typically exhibit serial and cross-dependency and possess high dimensionality, thereby introducing additional complexities to conventional time series analysis methods. To address these challenges, a recent and complementary approach has emerged, known as network-based analysis methods for multivariate time series. In univariate settings, quantile graphs have been employed to capture temporal transition properties and reduce data dimensionality by mapping observations to a smaller set of sample quantiles. To confront the increasingly prominent issue of high dimensionality, we propose an extension of quantile graphs into a multivariate variant, which we term Multilayer Quantile Graphs. In this innovative mapping, each time series is transformed into a quantile graph, and inter-layer connections are established to link contemporaneous quantiles of pairwise series. This enables the analysis of dynamic transitions across multiple dimensions. In this study, we demonstrate the effectiveness of this new mapping using synthetic and benchmark multivariate time series datasets. We delve into the resulting network's topological structures, extract network features, and employ these features for original dataset analysis. Furthermore, we compare our results with a recent method from the literature. The resulting multilayer network offers a significant reduction in the dimensionality of the original data while capturing serial and cross-dimensional transitions. This approach facilitates the characterization and analysis of large multivariate time series datasets through network analysis techniques.

FecharLer Abstract

2024

Computing Motifs in Hypergraphs

Autores
Nóbrega, D; Ribeiro, P;

Publicação
COMPLEX NETWORKS XV, COMPLENET 2024

Abstract
Motifs are overrepresented and statistically significant sub-patterns in a network, whose identification is relevant to uncover its underlying functional units. Recently, its extraction has been performed on higher-order networks, but due to the complexity arising from polyadic interactions, and the similarity with known computationally hard problems, its practical application is limited. Our main contribution is a novel approach for hyper-subgraph census and higher-order motif discovery, allowing for motifs with sizes 3 or 4 to be found efficiently, in real-world scenarios. It is consistently an order of magnitude faster than a baseline state-of-art method, while using less memory and supporting a wider range of base algorithms.

FecharLer Abstract

2024

Deep-Graph-Sprints: Accelerated Representation Learning in Continuous-Time Dynamic Graphs

Autores
Eddin, AN; Bono, J; Aparício, DO; Ferreira, H; Pinto Ribeiro, PM; Bizarro, P;

Publicação
Trans. Mach. Learn. Res.

Abstract
Continuous-time dynamic graphs (CTDGs) are essential for modeling interconnected, evolving systems. Traditional methods for extracting knowledge from these graphs often depend on feature engineering or deep learning. Feature engineering is limited by the manual and time-intensive nature of crafting features, while deep learning approaches suffer from high inference latency, making them impractical for real-time applications. This paper introduces Deep-Graph-Sprints (DGS), a novel deep learning architecture designed for efficient representation learning on CTDGs with low-latency inference requirements. We benchmark DGS against state-of-the-art (SOTA) feature engineering and graph neural network methods using five diverse datasets. The results indicate that DGS achieves competitive performance while inference speed improves between 4x and 12x compared to other deep learning approaches on our benchmark datasets. Our method effectively bridges the gap between deep representation learning and low-latency application requirements for CTDGs.

FecharLer Abstract

2025

Next Higher Point: Two Novel Approaches for Computing Natural Visibility Graphs

Autores
Daniel, P; Silva, VF; Ribeiro, P;

Publicação
COMPLEX NETWORKS & THEIR APPLICATIONS XIII, COMPLEX NETWORKS 2024, VOL 1

Abstract
With the huge amount of data that has been collected over time, many methods are being developed to allow better understanding and forecasting in several domains. Time series analysis is a powerful tool to achieve this goal. Despite being a well-established area, there are some gaps, and new methods are emerging to overcome these limitations, such as visibility graphs. Visibility graphs allow the analyses of times series as complex networks and make possible the use of more advanced techniques from another well-established area, network science. In this paper, we present two new efficient approaches for computing natural visibility graphs from times series, one for online scenarios in.O(n log n) and the other for offline scenarios in.O(nm), the latter taking advantage of the number of different values in the time series (m).

FecharLer Abstract