Publicacoes - INESC TEC

Publicações

Publicações por CRACS

2015

Automatic network configuration in virtualized environment using GNS3

Autores
Emiliano, R; Antunes, M;

Publicação
10TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2015)

Abstract
Computer networking is a central topic in computer science courses curricula offered by higher education institutions. Network virtualization and simulation tools, like GNS3, allows students and practitioners to test real world networking configuration scenarios and to configure complex network scenarios by configuring virtualized equipments, such as routers and switches, through each one's virtual console. The configuration of advanced network topics in GNS3 requires that students have to apply basic and very repetitive IP configuration tasks in all network equipments. As the network topology grows, so does the amount of network equipments to be configured, which may lead to logical configuration errors. In this paper we propose an extension for GNS3 network virtualizer, to automatically generate a valid configuration of all the network equipments in a GNS3 scenario. Our implementation is able to automatically produce an initial IP and routing configuration of all the Cisco virtual equipments by using the GNS3 specification files. We tested this extension against a set of networked scenarios which proved the robustness, readiness and speedup of the overall configuration tasks. In a learning environment, this feature may save time for all networking practitioners, both beginners or advanced, who aim to configure and test network topologies, since it automatically produces a valid and operational configuration for all the equipments designed in a GNS3 environment.

FecharLer Abstract

2015

The Impact of Longstanding Messages In Micro-Blogging Classification

Autores
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publicação
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

Abstract
Social networks are making part of the daily routine of millions of users. Twitter is among Facebook and Instagram one of the most used, and can be seen as a relevant source of information as users share not only daily status, but rapidly propagate news and events that occur worldwide. Considering the dynamic nature of social networks, and their potential in information spread, it is imperative to find learning strategies able to learn in these environments and cope with their dynamic nature. Time plays an important role by easily out-dating information, being crucial to understand how informative can past events be to current learning models and for how long it is relevant to store previously seen information, to avoid the computation burden associated with the amount of data produced. In this paper we study the impact of longstanding messages in micro-blogging classification by using different training time-window sizes in the learning process. Since there are few studies dealing with drift in Twitter and thus little is known about the types of drift that may occur, we simulate different types of drift in an artificial dataset to evaluate and validate our strategy. Results shed light on the relevance of previously seen examples according to different types of drift.

FecharLer Abstract

2015

Health Twitter Big Bata Management with Hadoop Framework

Autores
Cunha, J; Silva, C; Antunes, M;

Publicação
CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, CENTERIS/PROJMAN / HCIST 2015

Abstract
Social media advancements and the rapid increase in volume and complexity of data generated by Internet services are becoming challenging not only technologically, but also in terms of application areas. Performance and availability of data processing are critical factors that need to be evaluated since conventional data processing mechanisms may not provide adequate support. Apache Hadoop with Mahout is a framework to storage and process data at large-scale, including different tools to distribute processing. It has been considered an effective tool currently used by both small and large businesses and corporations, like Google and Facebook, but also public and private healthcare institutions. Given its recent emergence and the increasing complexity of the associated technological issues, a variety of holistic framework solutions have been put forward for each specific application. In this work, we propose a generic functional architecture with Apache Hadoop framework and Mahout for handling, storing and analyzing big data that can be used in different scenarios. To demonstrate its value, we will show its features, advantages and applications on health Twitter data. We show that big health social data can generate important information, valuable both for common users and practitioners. Preliminary results of data analysis on Twitter health data using Apache Hadoop demonstrate the potential of the combination of these technologies. (C) 2015 The Authors. Published by Elsevier B.V.

FecharLer Abstract

2015

Active Manifold Learning with Twitter Big Data

Autores
Silva, C; Antunes, M; Costa, J; Ribeiro, B;

Publicação
INNS CONFERENCE ON BIG DATA 2015 PROGRAM

Abstract
The data produced by Internet applications have increased substantially. Big data is a flaring field that deals with this deluge of data by using storage techniques, dedicated infrastructures and development frameworks for the parallelization of defined tasks and its consequent reduction. These solutions however fall short in online and highly data demanding scenarios, since users expect swift feedback. Reduction techniques are efficiently used in big data online applications to improve classification problems. Reduction in big data usually falls in one of two main methods: (i) reduce the dimensionality by pruning or reformulating the feature set; (ii) reduce the sample size by choosing the most relevant examples. Both approaches have benefits, not only of time consumed to build a model, but eventually also performance-wise, usually by reducing overfitting and improving generalization capabilities. In this paper we investigate reduction techniques that tackle both dimensionality and size of big data. We propose a framework that combines a manifold learning approach to reduce dimensionality and an active learning SVM-based strategy to reduce the size of labeled sample. Results on Twitter data show the potential of the proposed active manifold learning approach.

FecharLer Abstract

2015

DOTS: Drift Oriented Tool System

Autores
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publicação
NEURAL INFORMATION PROCESSING, ICONIP 2015, PT IV

Abstract
Drift is a given in most machine learning applications. The idea that models must accommodate for changes, and thus be dynamic, is ubiquitous. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. There are multiple drift patterns types: concepts that appear and disappear suddenly, recurrently, or even gradually or incrementally. Researchers strive to propose and test algorithms and techniques to deal with drift in text classification, but it is difficult to find adequate benchmarks in such dynamic environments. In this paper we present DOTS, Drift Oriented Tool System, a framework that allows for the definition and generation of text-based datasets where drift characteristics can be thoroughly defined, implemented and tested. The usefulness of DOTS is presented using a Twitter stream case study. DOTS is used to define datasets and test the effectiveness of using different document representation in a Twitter scenario. Results show the potential of DOTS in machine learning research.

FecharLer Abstract

2015

Performance Evaluation of Statistical Functions

Autores
Rodrigues, A; Silva, C; Borges, P; Silva, S; Dutra, I;

Publicação
2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY)

Abstract
Statistical data analysis methods are well known for their difficulty in handling large number of instances or large number of parameters. This is most noticeable in the presence of "big data", i.e., of data that are heterogeneous, and come from several sources, which makes their volume increase very rapidly. In this paper, we study popular and well-known statistical functions generally applied to data analysis, and assess their performance using our own implementation (DataIP) 1, MatLab and R. We show that DataIP outperforms MatLab and R by several orders of magnitude and that the design and implementation of these functions need to be rethought to adapt to today's data challenges.

FecharLer Abstract