2009
Autores
Fonseca, NA; Srinivasan, A; Silva, F; Camacho, R;
Publicação
MACHINE LEARNING
Abstract
The growth of machine-generated relational databases, both in the sciences and in industry, is rapidly outpacing our ability to extract useful information from them by manual means. This has brought into focus machine learning techniques like Inductive Logic Programming (ILP) that are able to extract human-comprehensible models for complex relational data. The price to pay is that ILP techniques are not efficient: they can be seen as performing a form of discrete optimisation, which is known to be computationally hard; and the complexity is usually some super-linear function of the number of examples. While little can be done to alter the theoretical bounds on the worst-case complexity of ILP systems, some practical gains may follow from the use of multiple processors. In this paper we survey the state-of-the-art on parallel ILP. We implement several parallel algorithms and study their performance using some standard benchmarks. The principal findings of interest are these: (1) of the techniques investigated, one that simply constructs models in parallel on each processor using a subset of data and then combines the models into a single one, yields the best results; and (2) sequential (approximate) ILP algorithms based on randomized searches have lower execution times than (exact) parallel algorithms, without sacrificing the quality of the solutions found.
2009
Autores
Ribeiro, P; Simonotto, J; Kaiser, M; Silva, F;
Publicação
JOURNAL OF NEUROSCIENCE METHODS
Abstract
When calculating correlation networks from multi-electrode array (MEA) data, one works with extensive computations. Unfortunately, as the MEAs grow bigger, the time needed for the computation grows even more: calculating pair-wise correlations for current 60 channel systems can take hours on normal commodity computers whereas for future 1000 channel systems it would take almost 280 times as long, given that the number of pairs increases with the square of the number of channels. Even taking into account the increase of speed in processors, soon it can be unfeasible to compute correlations in a single computer. Parallel computing is a way to sustain reasonable calculation times in the future. We provide a general tool for rapid computation of correlation networks which was tested for: (a) a single computer cluster with 16 cores, (b) the Newcastle Condor System utilizing idle processors of university computers and (c) the inter-cluster, with 192 cores. Our reusable tool provides a simple interface for neuroscientists, automating data partition and job submission, and also allowing coding in any programming language. It is also sufficiently flexible to be used in other high-performance computing environments.
2010
Autores
Martins, R; Narasimhan, P; Lopes, L; Silva, F;
Publicação
2010 29TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS SRDS 2010
Abstract
We address the problem of providing transparent, lightweight, fault-tolerance mechanisms for generic peer-to-peer middleware systems. The main idea is to use the peer-to-peer overlay to provide for fault-tolerance rather than support it higher up in the middleware architecture, e.g. in the form of services. To evaluate our approach we have implemented a fault-tolerant middleware prototype that uses a hierarchical peer-to-peer overlay in which the leaf peers connect to sensors that provide data streams. Clients connect to the root of the overlay and request streams that are routed upwards through intermediate peers in the overlay up to the client. We report encouraging preliminary results for latency, jitter and resource consumption for both the non-faulty and faulty cases.
2010
Autores
Ribeiro, P; Silva, F;
Publicação
ALGORITHMS IN BIOINFORMATICS
Abstract
Many biological networks contain recurring overrepresented elements, called network motifs. Finding these substructures is a computationally hard task related to graph isomorphism. G-Tries are an efficient data structure, based on multiway trees, capable of efficiently identifying common substructures in a set of subgraphs. They are highly successful in constraining the search space when finding the occurrences of those subgraphs in a larger original graph. This leads to speedups up to 100 times faster than previous methods that aim for exact and complete results. In this paper we present a new efficient sampling algorithm for subgraph frequency estimation based on g-tries. It is able to uniformly traverse a fraction of the search space, providing an accurate unbiased estimation of subgraph frequencies. Our results show that in the same amount of time our algorithm achieves better precision than previous methods, as it is able to sustain higher sampling speeds.
2011
Autores
Choobdar, S; Silva, F; Ribeiro, P;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE
Abstract
Complex networks are ubiquitous in real-world and represent a multitude of natural and artificial systems. Some of these networks are inherently dynamic and their structure changes over time, but only recently has the research community been trying to better characterize them. In this paper we propose a novel general methodology to characterize time evolving networks, analyzing the dynamics of their structure by labeling the nodes and tracking how these labels evolve. Node labeling is formulated as a clustering task that assigns a classification to each node according to its local properties. Association rule mining is then applied to sequences of nodes' labels to extract useful rules that best describe changes in the network. We evaluate our method using two different networks, a real-world network of the world annual trades and a synthetic scale-free network, in order to uncover evolution patterns. The results show that our approach is valid and gives insights into the dynamics of the network. As an example, the derived rules for the scale-free network capture the properties of preferential node attachment.
2012
Autores
Dutra, I; Rocha, R; Costa, VS; Silva, F; Santos, J;
Publicação
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW)
Abstract
In this work we perform a detailed study of different or-scheduling strategies varying several parameters in two or-parallel systems, YapOr and ThOr, running on multi-core machines. Our results show that some kinds of applications are sensitive to the choice of scheduling strategy adopted. In particular, the choice of scheduling parameters mostly affect applications that have short execution times, which, despite having speedups, have their performance significantly affected. Our results also show that topmost dispatching can be more advantageous than bottommost dispatching, a finding that contradicts previous works in this area. One last finding is that YapOr and ThOr are affected differently by changes in scheduling with ThOr performing significantly better than YapOr in several applications.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.