Publications

Publications by CRACS

2012

Finding language-independent contextual supernodes on coreference networks

Authors
Devezas, J; Figueira, A;

Publication
IAENG International Journal of Computer Science

Abstract
We propose a method for creating news context by taking advantage of a folksonomy of web clipping based on online news. We experiment with an ontology-based named entity recognition process, describing two alternate implementation approaches, and we study two different ways of modeling the relationships induced by the coreference of named entities on news clips. We try to establish a context by identifying the community structure for a clip-centric network and for an entity-centric network, based on a small test set from the Breadcrumbs system. Finally, we compare both models, based on the detected news communities, and show the advantages of each network specilication.

CloseRead Abstract

2012

Comparison of co-authorship networks across scientific fields using motifs

Authors
Choobdar, S; Ribeiro, P; Bugla, S; Silva, F;

Publication
2012 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM)

Abstract
Comparing scientific production across different fields of knowledge is commonly controversial and subject to disagreement. Such comparisons are often based on quantitative indicators, such as papers per researcher, and data normalization is very difficult to accomplish. Different approaches can provide new insight and in this paper we focus on the comparison of different scientific fields based on their research collaboration networks. We use co-authorship networks where nodes are researchers and the edges show the existing co-authorship relations between them. Our comparison methodology is based on network motifs, which are over represented patterns, or subgraphs. We derive motif fingerprints for 22 scientific fields based on 29 different small motifs found in the corresponding co-authorship networks. These fingerprints provide a metric for assessing similarity among scientific fields, and our analysis shows that the discrimination power of the 29 motif types is not identical. We use a co-authorship dataset built from over 15,361 publications inducing a co-authorship network with over 32,842 researchers. Our results also show that we can group different fields according to their fingerprints, supporting the notion that some fields present higher similarity and can be more easily compared.

CloseRead Abstract

2012

A design and implementation of the Extended Andorra Model

Authors
Lopes, R; Costa, VS; Silva, F;

Publication
THEORY AND PRACTICE OF LOGIC PROGRAMMING

Abstract
Logic programming provides a high-level view of programming, giving implementers a vast latitude into what techniques to explore to achieve the best performance for logic programs. Towards obtaining maximum performance, one of the holy grails of logic programming has been to design computational models that could be executed efficiently and that would allow both for a reduction of the search space and for exploiting all the available parallelism in the application. These goals have motivated the design of the Extended Andorra Model (EAM), a model where goals that do not constrain nondeterministic goals can execute first. In this work, we present and evaluate the Basic design for EAM, a system that builds upon David H. D. Warren's original EAM with Implicit Control. We provide a complete description and implementation of the Basic design for EAM System as a set of rewrite and control rules. We present the major data structures and execution algorithms that are required for efficient execution, and evaluate system performance. A detailed performance study of our system is included. Our results show that the system achieves acceptable base performance and that a number of applications benefit from the advanced search inherent to the EAM.

CloseRead Abstract

2012

Querying subgraph sets with g-tries

Authors
Pinto Ribeiro, PM; Silva, FMA;

Publication
Proceedings of the 2nd ACM SIGMOD Workshop on Databases and Social Networks, DBSocial 2012, Scottsdale, AZ, USA, May 20, 2012

Abstract
In this paper we present an universal methodology for finding all the occurrences of a given set of subgraphs in one single larger graph. Past approaches would either enumerate all possible subgraphs of a certain size or query a single subgraph. We use g-tries, a data structure specialized in dealing with subgraph sets. G-Tries store the topological information on a tree that exposes common substructure. Using a specialized canonical form and symmetry breaking conditions, a single non-redundant search of the entire set of subgraphs is possible. We give results of applying g-tries querying to different social networks, showing that we can efficiently find the occurrences of a set containing subgraphs of multiple sizes, outperforming previous methods. Copyright 2012 ACM.

CloseRead Abstract

2012

Or-parallel prolog execution on multicores based on stack splitting

Authors
Vieira, R; Rocha, R; Silva, FMA;

Publication
Proceedings of the POPL 2012 Workshop on Declarative Aspects of Multicore Programming, DAMP 2012, Philadelphia, PA, USA, Saturday, January 28, 2012

Abstract
Many or-parallel Prolog computational models exploiting implicit parallelism have been proposed in the past. The Muse and YapOr systems are arguably two of the most efficient systems exploiting or-parallelism on shared memory architectures, both based on the environment copying model. Stack splitting emerged as an alternative model specially geared to distributed memory architectures as it basically splits the computation in such a way that no further, or just minimal, synchronization is required. With the new multicore architectures, it just makes sense to recover the body of knowledge there is in this area and reengineer prior computational models to evaluate their performance on newer architectures. In this paper, we focus on the design and implementation of stack splitting in the YapOr system. Our aim is to take advantage of its robustness to efficiently implement stack splitting support using shared memory, and then be able to directly compare the original YapOr with the YapOr using stack splitting. We consider two splitting models, vertical splitting and half splitting, and have adapted data structures, scheduling and incremental copy procedures in YapOr to cope with the new models. Experimental results, on a multicore machine with 24 cores, show that YapOr using stack splitting is, in general, comparable to the original YapOr, obtaining in some cases better performance than with only environment copying. Copyright © 2012 ACM.

CloseRead Abstract

2012

Motif Mining in Weighted Networks

Authors
Choobdar, S; Ribeiro, P; Silva, F;

Publication
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012)

Abstract
Unexpectedly frequent subgraphs, known as motifs, can help in characterizing the structure of complex networks. Most of the existing methods for finding motifs are designed for unweighted networks, where only the existence of connection between nodes is considered, and not their strength or capacity. However, in many real world networks, edges contain more information than just simple node connectivity. In this paper, we propose a new method to incorporate edge weight information in motif mining. We think of a motif as a subgraph that contains unexpected information, and we define a new significance measurement to assess this subgraph exceptionality. The proposed metric embeds the weight distribution in subgraphs and it is based on weight entropy. We use the g-trie data structure to find instances of k-sized subgraphs and to calculate its significance score. Following a statistical approach, the random entropy of subgraphs is then calculated, avoiding the time consuming step of random network generation. The discrimination power of the derived motif profile by the proposed method is assessed against the results of the traditional unweighted motifs through a graph classification problem. We use a set of labeled ego networks of co-authorship in the biology and mathematics fields. The new proposed method is shown to be feasible, achieving even slightly better accuracy. Since it does not require the generation of random networks, it is also computationally faster, and because we are able to use the weight information in computing the motif importance, we can avoid converting weighted networks into unweighted ones.

CloseRead Abstract