2023
Autores
Pereira, RR; Bono, J; Ascensao, JT; Aparício, D; Ribeiro, P; Bizarro, P;
Publicação
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023
Abstract
Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7-4 trillion euros are laundered annually and go undetected. We propose The GANfather, a method to generate samples with properties of malicious activity, without label requirements. We propose to reward the generation of malicious samples by introducing an extra objective to the typical Generative Adversarial Networks (GANs) loss. Ultimately, our goal is to enhance the detection of illicit activity using the discriminator network as a novel and robust defence system. Optionally, we may encourage the generator to bypass pre-existing detection systems. This setup then reveals defensive weaknesses for the discriminator to correct. We evaluate our method in two real-world use cases, money laundering and recommendation systems. In the former, our method moves cumulative amounts close to 350 thousand dollars through a network of accounts without being detected by an existing system. In the latter, we recommend the target item to a broad user base with as few as 30 synthetic attackers. In both cases, we train a new defence system to capture the synthetic attacks.
2023
Autores
Eddin, AN; Bono, J; Aparício, D; Ferreira, H; Ascensao, J; Ribeiro, P; Bizarro, P;
Publicação
PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023
Abstract
Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling khop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graphsprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.
2024
Autores
Silva, VF; Silva, ME; Ribeiro, P; Silva, F;
Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS
Abstract
In recent years, there has been a surge in the prevalence of high- and multidimensional temporal data across various scientific disciplines. These datasets are characterized by their vast size and challenging potential for analysis. Such data typically exhibit serial and cross-dependency and possess high dimensionality, thereby introducing additional complexities to conventional time series analysis methods. To address these challenges, a recent and complementary approach has emerged, known as network-based analysis methods for multivariate time series. In univariate settings, quantile graphs have been employed to capture temporal transition properties and reduce data dimensionality by mapping observations to a smaller set of sample quantiles. To confront the increasingly prominent issue of high dimensionality, we propose an extension of quantile graphs into a multivariate variant, which we term Multilayer Quantile Graphs. In this innovative mapping, each time series is transformed into a quantile graph, and inter-layer connections are established to link contemporaneous quantiles of pairwise series. This enables the analysis of dynamic transitions across multiple dimensions. In this study, we demonstrate the effectiveness of this new mapping using synthetic and benchmark multivariate time series datasets. We delve into the resulting network's topological structures, extract network features, and employ these features for original dataset analysis. Furthermore, we compare our results with a recent method from the literature. The resulting multilayer network offers a significant reduction in the dimensionality of the original data while capturing serial and cross-dimensional transitions. This approach facilitates the characterization and analysis of large multivariate time series datasets through network analysis techniques.
2023
Autores
Pereira, RR; Bono, J; Ascensão, JT; Aparício, D; Ribeiro, P; Bizarro, P;
Publicação
CoRR
Abstract
2024
Autores
Nóbrega, D; Ribeiro, P;
Publicação
COMPLEX NETWORKS XV, COMPLENET 2024
Abstract
Motifs are overrepresented and statistically significant sub-patterns in a network, whose identification is relevant to uncover its underlying functional units. Recently, its extraction has been performed on higher-order networks, but due to the complexity arising from polyadic interactions, and the similarity with known computationally hard problems, its practical application is limited. Our main contribution is a novel approach for hyper-subgraph census and higher-order motif discovery, allowing for motifs with sizes 3 or 4 to be found efficiently, in real-world scenarios. It is consistently an order of magnitude faster than a baseline state-of-art method, while using less memory and supporting a wider range of base algorithms.
2024
Autores
Eddin, AN; Bono, J; Aparício, D; Ferreira, H; Ribeiro, P; Bizarro, P;
Publicação
CoRR
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.