Publicacoes - INESC TEC

Publicações

Publicações por AI

2017

Comparison Between Co-training and Self-training for Single-target Regression in Data Streams using AMRules

Autores
Sousa, R; Gama, J;

Publicação
Proceedings of the Workshop on IoT Large Scale Learning from Data Streams co-located with the 2017 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2017), Skopje, Macedonia, September 18-22, 2017.

Abstract
A comparison between co-training and self-training method for single-target regression based on multiples learners is performed. Data streaming systems can create a significant amount of unlabeled data which is caused by label assignment impossibility, high cost of labeling or labeling long duration tasks. In supervised learning, this data is wasted. In order to take advantaged from unlabeled data, semi-supervised approaches such as Co-training and Self-training have been created to benefit from input information that is contained in unlabeled data. However, these approaches have been applied to classification and batch training scenarios. Due to these facts, this paper presents a comparison between Co-training and Self-learning methods for single-target regression in data streams. Rules learning is used in this context since this methodology enables to explore the input information. The experimental evaluation consisted of a comparison between the real standard scenario where all unlabeled data is rejected and scenarios where unlabeled data is used to improve the regression model. Results show evidences of better performance in terms of error reduction and in high level of unlabeled examples in the stream. Despite this fact, the improvements are not expressive.

FecharLer Abstract

2016

Sequential anomalies: a study in the Railway Industry

Autores
Ribeiro, RP; Pereira, P; Gama, J;

Publicação
MACHINE LEARNING

Abstract
Concerned with predicting equipment failures, predictive maintenance has a high impact both at a technical and at a financial level. Most modern equipments have logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for anomaly and novelty detection enables us to explore those datasets, building predictive systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case, we use a failure detection system to predict train door breakdowns before they happen using data from their logging system. We use sensor data from pneumatic valves that control the open and close cycles of a door. Still, the failure of a cycle does not necessarily indicates a breakdown. A cycle might fail due to user interaction. The goal of this study is to detect structural failures in the automatic train door system, not when there is a cycle failure, but when there are sequences of cycle failures. We study three methods for such structural failure detection: outlier detection, anomaly detection and novelty detection, using different windowing strategies. We propose a two-stage approach, where the output of a point-anomaly algorithm is post-processed by a low-pass filter to obtain a subsequence-anomaly detection. The main result of the two-level architecture is a strong impact in the false alarm rate.

FecharLer Abstract

2016

Online Multi-label Classification with Adaptive Model Rules

Autores
Sousa, R; Gama, J;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016

Abstract
The interest on online classification has been increasing due to data streams systems growth and the need for Multi-label Classification applications have followed the same trend. However, most of classification methods are not performed on-line. Moreover, data streams produce huge amounts of data and the available processing resources may not be sufficient. This work-in-progress paper proposes an algorithm for Multi-label Classification applications in data streams scenarios. The proposed method is derived from multi-target structured regressor AMRules that produces models using subsets of output attributes (output specialization strategy). Performance tests were conducted where the operation modes global, local and subset approaches of the proposed method were compared to each other and to others online multi-label classifiers described in the literature. Three datasets of real scenarios were used for evaluation. The results indicate that the subset specialization mode is competitive in comparison to local and global approaches and to other online multi-label classifiers.

FecharLer Abstract

2015

An overview on the exploitation of time in collaborative filtering

Autores
Vinagre, J; Jorge, AM; Gama, J;

Publicação
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Classic Collaborative Filtering (CF) algorithms rely on the assumption that data are static and we usually disregard the temporal effects in natural user-generated data. These temporal effects include user preference drifts and shifts, seasonal effects, inclusion of new users, and items entering the systemand old ones leavinguser and item activity rate fluctuations and other similar time-related phenomena. These phenomena continuously change the underlying relations between users and items that recommendation algorithms essentially try to capture. In the past few years, a new generation of CF algorithms has emerged, using the time dimension as a key factor to improve recommendation models. In this overview, we present a comprehensive analysis of these algorithms and identify important challenges to be faced in the near future.(C) 2015 John Wiley & Sons, Ltd.

FecharLer Abstract

2013

Rule Induction for Sentence Reduction

Autores
Cordeiro, J; Dias, G; Brazdil, P;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2013

Abstract
Sentence Reduction has recently received a great attention from the research community of Automatic Text Summarization. Sentence Reduction consists in the elimination of sentence components such as words, part-of-speech tags sequences or chunks without highly deteriorating the information contained in the sentence and its grammatical correctness. In this paper, we present an unsupervised scalable methodology for learning sentence reduction rules. Paraphrases are first discovered within a collection of automatically crawled Web News Stories and then textually aligned in order to extract interchangeable text fragment candidates, in particular reduction cases. As only positive examples exist, Inductive Logic Programming (ILP) provides an interesting learning paradigm for the extraction of sentence reduction rules. As a consequence, reduction cases are transformed into first order logic clauses to supply a massive set of suitable learning instances and an ILP learning environment is defined within the context of the Aleph framework. Experiments evidence good results in terms of irrelevancy elimination, syntactical correctness and reduction rate in a real-world environment as opposed to other methodologies proposed so far.

FecharLer Abstract