Publications

Publications by CRACS

2013

Creating and analysing a social network built from clips of online news

Authors
Figueira, Á; Devezas, J; Cravino, N; Revilla, LF;

Publication
Information Systems and Technology for Organizations in a Networked Society

Abstract
Current online news media are increasingly depending on the participation of readers in their websites while readers increasingly use more sophisticated technology to access online news. In this context, the authors present the Breadcrumbs system and project that aims to provide news readers with tools to collect online news, to create a personal digital library (PDL) of clips taken from news, and to navigate not only on the own PDL, but also on external PDLs that relate to the first one. In this article, the authors present and describe the system and its paradigm for accessing news. We complement the description with the results from several tests which confirm the validity of our approach for clustering of news and for analysing the gathered data.

CloseRead Abstract

2013

The community structure of a multidimensional network of news clips

Authors
Devezas, JL; Figueira, AR;

Publication
IJWBC

Abstract
We analysed the community structure of a network of news clips where relationships were established by the co-reference of entities in pairs of clips. Community detection was applied to a unidimensional version of the news clips network, as well as to a multidimensional version where dimensions were defined based on three different classes of entities: places, people, and dates. The goal was to study the impact on the quality of the identified community structure when using multiple dimensions to model the network. We did a two-fold evaluation, first based on the modularity metric and then based on human input regarding community semantics. We verified that the assessments of the evaluators differed from the results provided by the modularity metric, pointing towards the relevance of the utility and network integration phases in the identification of semantically cohesive groups of news clips. Copyright © 2013 Inderscience Enterprises Ltd.

CloseRead Abstract

2013

Temporal Visualization of a Multidimensional Network of News Clips

Authors
Gomes, F; Devezas, J; Figueira, A;

Publication
ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES

Abstract
The exploration of large networks carries inherent challenges in the visualization of a great amount of data. We built an interactive visualization system for the purpose of exploring a large multidimensional network of news clips over time. These are clips gathered by users from web news sources and references to people or places are extracted from. In this paper, we present the system's capabilities and user interface and discuss its advantages in terms of the browsing and extraction of knowledge from the data. These capabilities include a textual search and associated event detection, and temporal navigation allowing the user to seek a certain date and timespan.

CloseRead Abstract

2013

Clustering and Classifying Text Documents - A Revisit to Tagging Integration Methods

Authors
Cunha, E; Figueira, A; Mealha, O;

Publication
KDIR/KMIS 2013 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing, Vilamoura, Algarve, Portugal, 19 - 22 September, 2013

Abstract
In this paper we analyze and discuss two methods that are based on the traditional k-means for document clustering and that feature integration of social tags in the process. The first one allows the integration of tags directly into a Vector Space Model, and the second one proposes the integration of tags in order to select the initial seeds. We created a predictive model for the impact of the tags' integration in both models, and compared the two methods using the traditional k-means++ and the novel k-C algorithm. To compare the results, we propose a new internal measure, allowing the computation of the cluster compactness. The experimental results indicate that the careful selection of seeds on the k-C algorithm present better results to those obtained with the k-means++, with and without integration of tags.

CloseRead Abstract

2013

Clustering Documents Using Tagging Communities and Semantic Proximity

Authors
Cunha, E; Figueira, A; Mealha, O;

Publication
PROCEEDINGS OF THE 2013 8TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI 2013)

Abstract
Euclidean distance and cosine similarity are frequently used measures to implement the k-means clustering algorithm. The cosine similarity is widely used because of it's independence from document length, allowing the identification of patterns, more specifically, two documents can be seen as identical if they share the same words but have different frequencies. However, during each clustering iteration new centroids are still computed following Euclidean distance. Based on a consideration of these two measures we propose the k-Communities clustering algorithm (k-C) which changes the computing of new centroids when using cosine similarity. It begins by selecting the seeds considering a network of tags where a community detection algorithm has been implemented. Each seed is the document which has the greater degree inside its community. The experimental results found through implementing external evaluation measures show that the k-C algorithm is more effective than both the k-means and k-means++. Besides, we implemented all the external evaluation measures, using both a manual and an automatic "Ground Truth", and the results show a great correlation which is a strong indicator that it is possible to perform tests with this kind of measures even if the dataset structure is unknown.

CloseRead Abstract

2013

Community Detection by Local Influence

Authors
Cravino, N; Figueira, A;

Publication
ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES

Abstract
We present a new algorithm to discover overlapping communities in networks with a scale free structure. This algorithm is based on a node evaluation function that scores the local influence of a node based on its degree and neighbourhood, allowing for the identification of hubs within a network. Using this function we are able to identify communities, and also to attribute meaningful titles to the communities that are discovered. Our novel methodology is assessed using LFR benchmark for networks with overlapping community structure and the generalized normalized mutual information (NMI) measure. We show that the evaluation function described is able to detect influential nodes in a network, and also that it is possible to build a well performing community detection algorithm based on this function.

CloseRead Abstract