Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2015

Validating the coverage of bus schedules: A Machine Learning approach

Autores
Mendes Moreira, J; Moreira Matias, L; Gama, J; de Sousa, JF;

Publicação
INFORMATION SCIENCES

Abstract
Nowadays, every public transportation company uses Automatic Vehicle Location (AVL) systems to track the services provided by each vehicle. Such information can be used to improve operational planning. This paper describes an AVL-based evaluation framework to test whether the actual Schedule Plan fits, in terms of days covered by each schedule, the network's operational conditions. Firstly, clustering is employed to group days with similar profiles in terms of travel times (this is done for each different route). Secondly, consensus clustering is used to obtain a unique set of clusters for all routes. Finally, a set of rules about the groups content is drawn based on appropriate decision variables. Each group will correspond to a different schedule and the rules identify the days covered by each schedule. This methodology is simultaneously an evaluator of the schedules that are offered by the company (regarding its coverage) and an advisor on possible changes to such offer. It was tested by using data collected for one year in a company running in Porto, Portugal. The results are sound. The main contribution of this paper is that it proposes a way to combine Machine Learning techniques to add a novel dimension to the Schedule Plan evaluation methods: the day coverage. Such approach meets no parallel in the current literature.

2015

Visualization of Evolving Large Scale Ego-Networks

Autores
Sarmento, R; Cordeiro, M; Gama, J;

Publicação
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II

Abstract
Large scale social networks streaming and visualization has been a hot topic in recent research. Researchers strive to achieve efficient streaming methods and to be able to gather knowledge from the results. Moreover treating the data as a continuous real time flow is a demand for immediate response to events in daily life. Our contribution is to treat the data as a continuous stream and represent it by streaming the egocentric networks (Ego-Networks) for particular nodes. We propose a non-standard node forgetting factor in the representation of the network data stream. Thus, this representation is sensible to recent events in users networks and less sensible for the past node events. The aim of these techniques is the visualization of large scale Ego-Networks from telecommunications social networks with power law distributions.

2015

Very fast decision rules for classification in data streams

Autores
Kosina, P; Gama, J;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Data stream mining is the process of extracting knowledge structures from continuous, rapid data records. Many decision tasks can be formulated as stream mining problems and therefore many new algorithms for data streams are being proposed. Decision rules are one of the most interpretable and flexible models for predictive data mining. Nevertheless, few algorithms have been proposed in the literature to learn rule models for time-changing and high-speed flows of data. In this paper we present the very fast decision rules (VFDR) algorithm and discuss interesting extensions to the base version. All the proposed versions are one-pass and any-time algorithms. They work on-line and learn ordered or unordered rule sets. Algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. In order to manage these situations we also present the adaptive extension (AVFDR) to detect changes in the process generating data and adapt the decision model. Detecting local drifts takes advantage of the modularity of the rule sets. In AVFDR, each individual rule monitors the evolution of performance metrics to detect concept drift. AVFDR prunes rules whenever a drift is signaled. This explicit change detection mechanism provides useful information about the dynamics of the process generating data, faster adaptation to changes and generates more compact rule sets. The experimental evaluation demonstrates that algorithms achieve competitive results in comparison to alternative methods and the adaptive methods are able to learn fast and compact rule sets from evolving streams.

2015

Streaming networks sampling using top-K networks

Autores
Sarmento, R; Cordeiro, M; Gama, J;

Publicação
ICEIS 2015 - 17th International Conference on Enterprise Information Systems, Proceedings

Abstract
The combination of top-K network representation of the data stream with community detection is a novel approach to streaming networks sampling. Keeping an always up-to-date sample of the full network, the advantage of this method, compared to previous, is that it preserves larger communities and original network distribution. Empirically, it will also be shown that these techniques, in conjunction with community detection, provide effective ways to perform sampling and analysis of large scale streaming networks with power law distributions.

2015

Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality

Autores
Saez, C; Rodrigues, P; Gama, J; Robles, M; Garcia Gomez, JM;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Knowledge discovery on biomedical data can be based on on-line, data-stream analyses, or using retrospective, timestamped, off-line datasets. In both cases, changes in the processes that generate data or in their quality features through time may hinder either the knowledge discovery process or the generalization of past knowledge. These problems can be seen as a lack of data temporal stability. This work establishes the temporal stability as a data quality dimension and proposes new methods for its assessment based on a probabilistic framework. Concretely, methods are proposed for (1) monitoring changes, and (2) characterizing changes, trends and detecting temporal subgroups. First, a probabilistic change detection algorithm is proposed based on the Statistical Process Control of the posterior Beta distribution of the Jensen-Shannon distance, with a memoryless forgetting mechanism. This algorithm (PDF-SPC) classifies the degree of current change in three states: In-Control, Warning, and Out-of-Control. Second, a novel method is proposed to visualize and characterize the temporal changes of data based on the projection of a non-parametric information-geometric statistical manifold of time windows. This projection facilitates the exploration of temporal trends using the proposed IGT-plot and, by means of unsupervised learning methods, discovering conceptually-related temporal subgroups. Methods are evaluated using real and simulated data based on the National Hospital Discharge Survey (NHDS) dataset.

2015

Classification of Evolving Data Streams with Infinitely Delayed Labels

Autores
Souza, VMA; Silva, DF; Batista, GEAPA; Gama, J;

Publicação
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA)

Abstract
The majority of evolving data streams classification algorithms assume that the actual labels of the predicted examples are readily available without any time delay just after a prediction is made. However, given the high label costs, dependence of an expert, limitations in data transmission or even restrictions imposed by the problem's nature, there is a large number of real-world applications in which the availability of actual labels is infinitely delayed (never available), In these cases, it is necessary the use of algorithms that does not follow the traditional process of monitoring the error rate to detect changes in data distribution and uses the most recent labeled data to update the classification model. In this paper, we propose the method Maasstfication to classify evolving data streams with infinitely delayed labels. Our method is inspired on the use of Micro-Cluster representation from online clustering algorithms. Considering the presence of incremental drifts, our approach uses a distance-based strategy to maintain the Micro-Clusters' positions updated. An evaluation in several synthetic and real data shows that Maassification achieves competitive accuracy results to state-of-the-art methods and adequate computational cost. The main advantage of the proposed method is the absence of critical parameters that require user's prior knowledge, as occurs with rival methods.

  • 239
  • 430