2016
Autores
Ribeiro, RP; Pereira, P; Gama, J;
Publicação
MACHINE LEARNING
Abstract
Concerned with predicting equipment failures, predictive maintenance has a high impact both at a technical and at a financial level. Most modern equipments have logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for anomaly and novelty detection enables us to explore those datasets, building predictive systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case, we use a failure detection system to predict train door breakdowns before they happen using data from their logging system. We use sensor data from pneumatic valves that control the open and close cycles of a door. Still, the failure of a cycle does not necessarily indicates a breakdown. A cycle might fail due to user interaction. The goal of this study is to detect structural failures in the automatic train door system, not when there is a cycle failure, but when there are sequences of cycle failures. We study three methods for such structural failure detection: outlier detection, anomaly detection and novelty detection, using different windowing strategies. We propose a two-stage approach, where the output of a point-anomaly algorithm is post-processed by a low-pass filter to obtain a subsequence-anomaly detection. The main result of the two-level architecture is a strong impact in the false alarm rate.
2016
Autores
Branco, P; Torgo, L; Ribeiro, RP;
Publicação
ACM COMPUTING SURVEYS
Abstract
Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.
2016
Autores
Branco, P; Ribeiro, RP; Torgo, L;
Publicação
CoRR
Abstract
2016
Autores
Zarmehri, MN; Soares, C;
Publicação
COLLABORATION IN A HYPERCONNECTED WORLD
Abstract
Taxi trip duration affects the efficiency of operation, the satisfaction of drivers, and, mainly, the satisfaction of the customers, therefore, it is an important metric for the taxi companies. Especially, knowing the predicted trip duration beforehand is very useful to allocate taxis to the taxi stands and also finding the best route for different trips. The existence of hyperconnected network can help to collect data from connected taxis in the city environment and use it collaboratively between taxis for a better prediction. As a matter of fact, the existence of high volume of data, for each individual taxi, several models can be generated. Moreover, taking into account the difference between the data collected by taxis, this data can be organized into different levels of hierarchy. However, finding the best level of granularity which leads to the best model for an individual taxi could be computationally expensive. In this paper, the use of metalearning for addressing the problem of selection of the right level of the hierarchy and the right algorithm that generates the model with the best performance for each taxi is proposed. The proposed approach is evaluated by the data collected in the Drive-In project. The results show that metalearning helps the selection of the algorithm with the best performance.
2016
Autores
Cerqueira, V; Pinto, F; Sa, C; Soares, C;
Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XV
Abstract
We describe a data mining workflow for predictive maintenance of the Air Pressure System in heavy trucks. Our approach is composed by four steps: (i) a filter that excludes a subset of features and examples based on the number of missing values (ii) a metafeatures engineering procedure used to create a meta-level features set with the goal of increasing the information on the original data; (iii) a biased sampling method to deal with the class imbalance problem; and (iv) boosted trees to learn the target concept. Results show that the metafeatures engineering and the biased sampling method are critical for improving the performance of the classifier.
2016
Autores
Lopes, MA; Soares, C; Almeida, A; Almada Lobo, B;
Publicação
HEALTH SYSTEMS
Abstract
With rising healthcare costs, using health personnel and resources efficiently and effectively is critical. International cross-country and simple worker-to-population ratio comparisons are frequently used for improving the efficiency of health systems, planning of health human resources and guiding policy changes. These comparisons are made between countries typically of the same continental region. However, if used imprudently, inconsistencies arising from frail comparisons of health systems may outweigh the positive benefits brought by new policy insights. In this work, we propose a different approach to international health system comparisons. We present a methodology to group similar countries in terms of mortality, morbidity, utilisation levels, and human and physical resources, which are all factors that influence health gains. Instead of constructing an absolute rank or comparing against the average, the method finds countries that share similar ground, upon which more reliable comparisons can then be conducted, including performance analysis. We apply this methodology using data from the World Health Organization's Health for All database, and we present some interesting empirical relationships between indicators that may provide new insights into how such information can be used to promote better healthcare planning and policy guidance.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.