Publications

Publications by Pedro Manuel Ribeiro

2012

Comparison of co-authorship networks across scientific fields using motifs

Authors
Choobdar, S; Ribeiro, P; Bugla, S; Silva, F;

Publication
2012 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM)

Abstract
Comparing scientific production across different fields of knowledge is commonly controversial and subject to disagreement. Such comparisons are often based on quantitative indicators, such as papers per researcher, and data normalization is very difficult to accomplish. Different approaches can provide new insight and in this paper we focus on the comparison of different scientific fields based on their research collaboration networks. We use co-authorship networks where nodes are researchers and the edges show the existing co-authorship relations between them. Our comparison methodology is based on network motifs, which are over represented patterns, or subgraphs. We derive motif fingerprints for 22 scientific fields based on 29 different small motifs found in the corresponding co-authorship networks. These fingerprints provide a metric for assessing similarity among scientific fields, and our analysis shows that the discrimination power of the 29 motif types is not identical. We use a co-authorship dataset built from over 15,361 publications inducing a co-authorship network with over 32,842 researchers. Our results also show that we can group different fields according to their fingerprints, supporting the notion that some fields present higher similarity and can be more easily compared.

CloseRead Abstract

2010

PARALLEL CALCULATION OF SUBGRAPH CENSUS IN BIOLOGICAL NETWORKS

Authors
Ribeiro, P; Silva, F; Lopes, L;

Publication
BIONFORMATICS 2010: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON BIOINFORMATICS

Abstract
Mining meaningful data from complex biological networks is a critical task in many areas of research. One important example is calculating the frequency of all subgraphs of a certain size, also known as the sub graph census problem. This can provide a very comprehensive structural characterization of a network and is also used as an intermediate step in the computation of network motifs, an important basic building block of networks, that try to bridge the gap between structure and function. The subgraph census problem is com-putationally hard and here we present several parallel strategies to solve this problem. Our initial strategies were refined towards achieving an efficient and scalable adaptive parallel algorithm. This algorithm achieves almost linear speedups up to 128 cores when applied to a representative set of biological networks from different domains and makes the calculation of census for larger subgraph sizes feasible.

CloseRead Abstract

2010

g-tries: an efficient data structure for discovering network motifs

Authors
Pinto Ribeiro, PM; Silva, FMA;

Publication
Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, March 22-26, 2010

Abstract
In this paper we propose a novel specialized data structure that we call g-trie, designed to deal with collections of subgraphs. The main conceptual idea is akin to a prefix tree in the sense that we take advantage of common topology by constructing a multiway tree where the descendants of a node share a common substructure. We give algorithms to construct a g-trie, to list all stored subgraphs, and to find occurrences on another graph of the subgraphs stored in the g-trie. We evaluate the implementation of this structure and its associated algorithms on a set of representative benchmark biological networks in order to find network motifs. To assess the efficiency of our algorithms we compare their performance with other known network motif algorithms also implemented in the same common platform. Our results show that indeed, g-tries are a feasible, adequate and very efficient data structure for network motifs discovery, clearly outperforming previous algorithms and data structures. © 2010 ACM.

CloseRead Abstract

2012

Querying subgraph sets with g-tries

Authors
Pinto Ribeiro, PM; Silva, FMA;

Publication
Proceedings of the 2nd ACM SIGMOD Workshop on Databases and Social Networks, DBSocial 2012, Scottsdale, AZ, USA, May 20, 2012

Abstract
In this paper we present an universal methodology for finding all the occurrences of a given set of subgraphs in one single larger graph. Past approaches would either enumerate all possible subgraphs of a certain size or query a single subgraph. We use g-tries, a data structure specialized in dealing with subgraph sets. G-Tries store the topological information on a tree that exposes common substructure. Using a specialized canonical form and symmetry breaking conditions, a single non-redundant search of the entire set of subgraphs is possible. We give results of applying g-tries querying to different social networks, showing that we can efficiently find the occurrences of a set containing subgraphs of multiple sizes, outperforming previous methods. Copyright 2012 ACM.

CloseRead Abstract

2010

Efficient Parallel Subgraph Counting Using G-Tries

Authors
Pinto Ribeiro, PM; Silva, FMA; Lopes, LMB;

Publication
Proceedings of the 2010 IEEE International Conference on Cluster Computing, Heraklion, Crete, Greece, 20-24 September, 2010

Abstract
Finding and counting the occurrences of a collection of subgraphs within another larger network is a computationally hard problem, closely related to graph isomorphism. The subgraph count is by itself a very powerful characterization of a network and it is crucial for other important network measurements. G-tries are a specialized data-structure designed to store and search for subgraphs. By taking advantage of subgraph common substructure, g-tries can provide considerable speedups over previously used methods. In this paper we present a parallel algorithm based precisely on gtries that is able to efficiently find and count subgraphs. The algorithm relies on randomized receiver-initiated dynamic load balancing and is able to stop its computation at any given time, efficiently store its search position, divide what is left to compute in two halfs, and resume from where it left. We apply our algorithm to several representative real complex networks from various domains and examine its scalability. We obtain an almost linear speedup up to 128 processors, thus allowing us to reach previously unfeasible limits. We showcase the multidisciplinary potential of the algorithm by also applying it to network motif discovery. © 2010 IEEE.

CloseRead Abstract

2009

Strategies for Network Motifs Discovery

Authors
Pinto Ribeiro, PM; Silva, FMA; Kaiser, M;

Publication
Fifth International Conference on e-Science, e-Science 2009, 9-11 December 2009, Oxford, UK

Abstract
Complex networks from domains like Biology or Sociology are present in many e-Science data sets. Dealing with networks can often form a workflow bottleneck as several related algorithms are computationally hard. One example is detecting characteristic patterns or "network motifs" - a problem involving subgraph mining and graph isomorphism. This paper provides a review and runtime comparison of current motif detection algorithms in the field. We present the strategies and the corresponding algorithms in pseudo-code yielding a framework for comparison. We categorize the algorithms outlining the main differences and advantages of each strategy. We finally implement all strategies in a common platform to allow a fair and objective efficiency comparison using a set of benchmark networks.We hope to inform the choice of strategy and critically discuss future improvements in motif detection. © 2009 IEEE.

CloseRead Abstract