Publications

Publications by CRACS

2017

Performance Metrics for Model Fusion in Twitter Data Drifts

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017)

Abstract
Ensemble approaches have revealed remarkable abilities to tackle different learning challenges, namely in dynamic scenarios with concept drift, e.g. in social networks, as Twitter. Several efforts have been engaged in defining strategies to combine the models that constitute an ensemble. In this work, we investigate the effect of using different metrics for combining ensembles' models, specifically performance-based metrics. We propose five performance combining metrics, having in mind that we may take advantage of diversity in classifiers, as their individual performance takes a leading role in defining their contribution to the ensemble. Experimental results on a Twitter dataset, artificially timestamped, suggest that using performance metrics to combine the models that constitute an ensemble can introduce relevant improvements in the overall ensemble performance.

CloseRead Abstract

2017

Adaptive learning for dynamic environments: A comparative approach

Authors
Costa, J; Silva, C; Antunes, M; Ribeiro, B;

Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
Nowadays most learning problems demand adaptive solutions. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. Various efforts have been pursued in machine learning settings to learn in such environments, specially because of their non-trivial nature, since changes occur between the distribution data used to define the model and the current environment. In this work we present the Drift Adaptive Retain Knowledge (DARK) framework to tackle adaptive learning in dynamic environments based on recent and retained knowledge. DARK handles an ensemble of multiple Support Vector Machine (SVM) models that are dynamically weighted and have distinct training window sizes. A comparative study with benchmark solutions in the field, namely the Learn + +.NSE algorithm, is also presented. Experimental results revealed that DARK outperforms Learn + +.NSE with two different base classifiers, an SVM and a Classification and Regression Tree (CART).

CloseRead Abstract

2017

Ontology-Based Framework Applied to Money Laundering Investigations

Authors
Carnaz, Gonçalo; Nogueira, Vitor Beires; Antunes, Mário;

Publication

Abstract
Criminal investigations face a deluge of structured and unstructured data obtained from heterogeneous sources like forensic reports or wiretap transcriptions. In these cases, finding relevant information can be a complex task. Ontologies have been successfully applied to several domains including legal, cyber crime and digital forensics. In this paper it is proposed a framework based on ontology engineering, that provides an unified approach to represent and reason with the criminal investigation data. Moreover, this framework is applied to the specific use case of money laundering.

CloseRead Abstract

2017

High Performance Computing for Computational Science - VECPAR 2016 - 12th International Conference, Porto, Portugal, June 28-30, 2016, Revised Selected Papers

Authors
Dutra, I; Camacho, R; Barbosa, JG; Marques, O;

Publication
VECPAR

Abstract

2017

Optimising the calculation of statistical functions

Authors
Rodrigues, A; Silva, C; Koerich Borges, PV; Silva, S; Dutra, I;

Publication
IJBDI

Abstract

2017

Evolvix BEST Names for semantic reproducibility across code2brain interfaces

Authors
Loewe, L; Scheuer, KS; Keel, SA; Vyas, V; Liblit, B; Hanlon, B; Ferris, MC; Yin, J; Dutra, I; Pietsch, A; Javid, CG; Moog, CL; Meyer, J; Dresel, J; McLoone, B; Loberger, S; Movaghar, A; Gilchrist Scott, M; Sabri, Y; Sescleifer, D; Pereda Zorrilla, I; Zietlow, A; Smith, R; Pietenpol, S; Goldfinger, J; Atzen, SL; Freiberg, E; Waters, NP; Nusbaum, C; Nolan, E; Hotz, A; Kliman, RM; Mentewab, A; Fregien, N; Loewe, M;

Publication
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES

Abstract
Names in programming are vital for understanding the meaning of code and big data. We define code2brain (C2B) interfaces as maps in compilers and brains between meaning and naming syntax, which help to understand executable code. While working toward an Evolvix syntax for general-purpose programming that makes accurate modeling easy for biologists, we observed how names affect C2B quality. To protect learning and coding investments, C2B interfaces require long-term backward compatibility and semantic reproducibility (accurate reproduction of computational meaning fromcoder-brains to reader-brains by code alone). Semantic reproducibility is often assumed until confusing synonyms degrade modeling in biology to deciphering exercises. We highlight empirical naming priorities from diverse individuals and roles of names in different modes of computing to show how naming easily becomes impossibly difficult. We present the Evolvix BEST (Brief, Explicit, Summarizing, Technical) Names concept for reducing naming priority conflicts, test it on a real challenge by naming subfolders for the Project Organization Stabilizing Tool system, and provide naming questionnaires designed to facilitate C2B debugging by improving names used as keywords in a stabilizing programming language. Our experiences inspired us to develop Evolvix using a flipped programming language design approach with some unexpected features and BEST Names at its core.

CloseRead Abstract