Publications

Publications by Fernando Silva

2009

Strategies for Network Motifs Discovery

Authors
Pinto Ribeiro, PM; Silva, FMA; Kaiser, M;

Publication
Fifth International Conference on e-Science, e-Science 2009, 9-11 December 2009, Oxford, UK

Abstract
Complex networks from domains like Biology or Sociology are present in many e-Science data sets. Dealing with networks can often form a workflow bottleneck as several related algorithms are computationally hard. One example is detecting characteristic patterns or "network motifs" - a problem involving subgraph mining and graph isomorphism. This paper provides a review and runtime comparison of current motif detection algorithms in the field. We present the strategies and the corresponding algorithms in pseudo-code yielding a framework for comparison. We categorize the algorithms outlining the main differences and advantages of each strategy. We finally implement all strategies in a common platform to allow a fair and objective efficiency comparison using a set of benchmark networks.We hope to inform the choice of strategy and critically discuss future improvements in motif detection. © 2009 IEEE.

CloseRead Abstract

1998

Distribution and Mobility with Lexical Scoping in Process Calculi

Authors
Vasconcelos, VT; Lopes, LMB; Silva, FMA;

Publication
Electr. Notes Theor. Comput. Sci.

Abstract
We propose a simple model of distribution for mobile processes, independent of the underlying calculus. Conventional processes compute within sites; inter-site computation is achieved by message sending and object migration, both obeying a lexical scope. We focus on the semantics of networks, on programming practice, and on physical realization with current technology. ©1998 Published by Elsevier Science B.V.

CloseRead Abstract

2012

Motif Mining in Weighted Networks

Authors
Choobdar, S; Ribeiro, P; Silva, F;

Publication
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012)

Abstract
Unexpectedly frequent subgraphs, known as motifs, can help in characterizing the structure of complex networks. Most of the existing methods for finding motifs are designed for unweighted networks, where only the existence of connection between nodes is considered, and not their strength or capacity. However, in many real world networks, edges contain more information than just simple node connectivity. In this paper, we propose a new method to incorporate edge weight information in motif mining. We think of a motif as a subgraph that contains unexpected information, and we define a new significance measurement to assess this subgraph exceptionality. The proposed metric embeds the weight distribution in subgraphs and it is based on weight entropy. We use the g-trie data structure to find instances of k-sized subgraphs and to calculate its significance score. Following a statistical approach, the random entropy of subgraphs is then calculated, avoiding the time consuming step of random network generation. The discrimination power of the derived motif profile by the proposed method is assessed against the results of the traditional unweighted motifs through a graph classification problem. We use a set of labeled ego networks of co-authorship in the biology and mathematics fields. The new proposed method is shown to be feasible, achieving even slightly better accuracy. Since it does not require the generation of random networks, it is also computationally faster, and because we are able to use the weight information in computing the motif importance, we can avoid converting weighted networks into unweighted ones.

CloseRead Abstract

2006

April - An inductive logic programming system

Authors
Fonseca, NA; Silva, F; Camacho, R;

Publication
LOGICS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
Inductive Logic Programming (ILP) is a Machine Learning research field that has been quite successful in knowledge discovery in relational domains. ILP systems use a set of pre-classified examples (positive and negative) and prior knowledge to learn a theory in which positive examples succeed and the negative examples fail. In this paper we present a novel ILP system called April, capable of exploring several parallel strategies in distributed and shared memory machines.

CloseRead Abstract

2009

BIORED - A Genetic Algorithm for Pattern Detection in Biosequences

Authors
Pereira, P; Silva, F; Fonseca, NA;

Publication
2ND INTERNATIONAL WORKSHOP ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (IWPACBB 2008)

Abstract
We present a new, efficient and scalable tool, named BIORED, for pattern discovery in proteomic and genomic sequences. It uses a genetic algorithm to find interesting patterns in the form of regular expressions, and a new efficient pattern matching procedure to count pattern occurrences. We studied the performance, scalability and usefulness of BIORED using several databases of biosequences. The results show that BIORED was successful in finding previously known patterns, thus an excellent indicator for its potential. BIORED is available for download under the GNU Public License at http://www.dcc.fc.up.pt/bi-ored/. An online demo is available at the same address.

CloseRead Abstract

2009

Improving the efficiency of inductive logic programming systems

Authors
Fonseca, NA; Costa, VS; Rocha, R; Camacho, R; Silva, F;

Publication
SOFTWARE-PRACTICE & EXPERIENCE

Abstract
Inductive logic programming (ILP) is a sub-field of machine learning that provides an excellent framework for multi-relational data mining applications. The advantages of ILP have been successfully demonstrated in complex and relevant industrial and scientific problems. However, to produce valuable models, ILP systems often require long running times and large amounts of memory. In this paper we address fundamental issues that have direct impact on the efficiency of ILP systems. Namely, we discuss how improvements in the indexing mechanisms of an underlying logic programming system benefit ILP performance. Furthermore, we propose novel data structures to reduce memory requirements and we suggest a new lazy evaluation technique to search the hypothesis space more efficiently. These proposals have been implemented in the April ILP system and evaluated using several well-known data sets. The results observed show significant improvements in running time without compromising the accuracy of the models generated. Indeed, the combined techniques achieve several order of magnitudes speedup in some data sets. Moreover, memory requirements are reduced in nearly half of the data sets. Copyright (C) 2008 John Wiley & Sons, Ltd.

CloseRead Abstract