Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Pedro Manuel Ribeiro

2020

PseudoChecker: an integrated online platform for gene inactivation inference

Authors
Alves, LQ; Ruivo, R; Fonseca, MM; Lopes Marques, M; Ribeiro, P; Castro, LFC;

Publication
NUCLEIC ACIDS RESEARCH

Abstract
The rapid expansion of high-quality genome assemblies, exemplified by ongoing initiatives such as the Genome-10K and i5k, demands novel automated methods to approach comparative genomics. Of these, the study of inactivating mutations in the coding region of genes, or pseudogenization, as a source of evolutionary novelty is mostly overlooked. Thus, to address such evolutionary/genomic events, a systematic, accurate and computationally automated approach is required. Here, we present PseudoChecker, the first integrated online platform for gene inactivation inference. Unlike the few existing methods, our comparative genomics-based approach displays full automation, a built-in graphical user interface and a novel index, PseudoIndex, for an empirical evaluation of the gene coding status. As a multi-platform online service, PseudoChecker simplifies access and usability, allowing a fast identification of disruptive mutations. An analysis of 30 genes previously reported to be eroded in mammals, and 30 viable genes from the same lineages, demonstrated that PseudoChecker was able to correctly infer 97% of loss events and 95% of functional genes, confirming its reliability. PseudoChecker is freely available, without login required, at http://pseudochecker.ciimar.up.pt.

2020

Condensed Graphs: A Generic Framework for Accelerating Subgraph Census Computation

Authors
Martins, M; Ribeiro, P;

Publication
COMPLEX NETWORKS XI

Abstract
Determining subgraph frequencies is at the core of several graph mining methodologies such as discovering network motifs or computing graphlet degree distributions. Current state-of-the-art algorithms for this task either take advantage of common patterns emerging on the networks or target a set of specific subgraphs for which analytical calculations are feasible. Here, we propose a novel network generic framework revolving around a new data-structure, a Condensed Graph, that combines both the aforementioned approaches, but generalized to support any subgraph topology and size. Furthermore, our methodology can use as a baseline any enumeration based census algorithm, speeding up its computation. We target simple topologies that allow us to skip several redundant and heavy computational steps using combinatorics. We were are able to achieve substantial improvements, with evidence of exponential speedup for our best cases, where these patterns represent up to 97% of the network, from a broad set of real and synthetic networks.

2021

Time series analysis via network science: Concepts and algorithms

Authors
Silva, VF; Silva, ME; Ribeiro, P; Silva, F;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
There is nowadays a constant flux of data being generated and collected in all types of real world systems. These data sets are often indexed by time, space, or both requiring appropriate approaches to analyze the data. In univariate settings, time series analysis is a mature field. However, in multivariate contexts, time series analysis still presents many limitations. In order to address these issues, the last decade has brought approaches based on network science. These methods involve transforming an initial time series data set into one or more networks, which can be analyzed in depth to provide insight into the original time series. This review provides a comprehensive overview of existing mapping methods for transforming time series into networks for a wide audience of researchers and practitioners in machine learning, data mining, and time series. Our main contribution is a structured review of existing methodologies, identifying their main characteristics, and their differences. We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified way and language. We first describe the case of univariate time series, which can be mapped to single layer networks, and we divide the current mappings based on the underlying concept: visibility, transition, and proximity. We then proceed with multivariate time series discussing both single layer and multiple layer approaches. Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic. This article is categorized under: Fundamental Concepts of Data and Knowledge > Data Concepts Fundamental Concepts of Data and Knowledge > Knowledge Representation

2020

StreamFaSE: An Online Algorithm for Subgraph Counting in Dynamic Networks

Authors
Branquinho, H; Grácio, L; Ribeiro, P;

Publication
Complex Networks & Their Applications IX - Volume 2, Proceedings of the Ninth International Conference on Complex Networks and Their Applications, COMPLEX NETWORKS 2020, 1-3 December 2020, Madrid, Spain.

Abstract
Counting subgraph occurrences in complex networks is an important analytical task with applicability in a multitude of domains such as sociology, biology and medicine. This task is a fundamental primitive for concepts such as motifs and graphlet degree distributions. However, there is a lack of online algorithms for computing and updating subgraph counts in dynamic networks. Some of these networks exist as a streaming of edge additions and deletions that are registered as they occur in the real world. In this paper we introduce StreamFaSE, an efficient online algorithm for keeping track of exact subgraph counts in dynamic networks, and we explain in detail our approach, showcasing its general applicability in different network scenarios. We tested our method on a set of diverse real directed and undirected network streams, showing that we are always faster than the current existing methods for this task, achieving several orders of magnitude speedup when compared to a state-of-art baseline. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

2021

A Survey on Subgraph Counting: Concepts, Algorithms, and Applications to Network Motifs and Graphlets

Authors
Ribeiro, P; Paredes, P; Silva, MEP; Aparicio, D; Silva, F;

Publication
ACM COMPUTING SURVEYS

Abstract
Computing subgraph frequencies is a fundamental task that lies at the core of several network analysis methodologies, such as network motifs and graphlet-based metrics, which have been widely used to categorize and compare networks from multiple domains. Counting subgraphs is, however, computationally very expensive, and there has been a large body of work on efficient algorithms and strategies to make subgraph counting feasible for larger subgraphs and networks. This survey aims precisely to provide a comprehensive overview of the existing methods for subgraph counting. Our main contribution is a general and structured review of existing algorithms, classifying them on a set of key characteristics, highlighting their main similarities and differences. We identify and describe the main conceptual approaches, giving insight on their advantages and limitations, and we provide pointers to existing implementations. We initially focus on exact sequential algorithms, but we also do a thorough survey on approximate methodologies (with a trade-off between accuracy and execution time) and parallel strategies (that need to deal with an unbalanced search space).

2022

Network Science - 7th International Winter Conference, NetSci-X 2022, Porto, Portugal, February 8-11, 2022, Proceedings

Authors
Ribeiro, P; Silva, F; Ferreira Mendes, JF; Laureano, RD;

Publication
NetSci-X

Abstract

  • 6
  • 12