Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Pedro Pereira Rodrigues

2010

A Simple Dense Pixel Visualization for Mobile Sensor Data Mining

Autores
Rodrigues, PP; Gama, J;

Publicação
KNOWLEDGE DISCOVERY FROM SENSOR DATA

Abstract
Sensor data is usually represented by streaming time series. Current state-of-the-art systems for visualization include line plots and three-dimensional representations, which most of the time require screen resolutions that are not available in small transient mobile devices. Moreover, when data presents cyclic behaviors, such as in the electricity domain, predictive models may tend to give higher errors in certain recurrent points of time, but the human-eye is not trained to notice this cycles in a long stream. In these contexts, information is usually hard to extract from visualization. New visualization techniques may help to detect recurrent faulty predictions. En this paper we inspect visualization techniques in the scope of a real-world sensor network, quickly dwelling into future trends in visualization in transient mobile devices. We propose a simple dense pixel display visualization system, exploiting the benefits that it may represent on detecting and correcting recurrent faulty predictions. A case study is also presented, where a simple corrective strategy is studied in the context of global electrical load demand, exemplifying the utility of the new visualization method when compared with automatic detection of recurrent errors.

2009

An overview on mining data streams

Autores
Gama, J; Rodrigues, PP;

Publicação
Studies in Computational Intelligence

Abstract
The most challenging applications of knowledge discovery involve dynamic environments where data continuous flow at high-speed and exhibit non-stationary properties. In this chapter we discuss the main challenges and issues when learning from data streams. In this work, we discuss the most relevant issues in knowledge discovery from data streams: incremental learning, cost-performance management, change detection, and novelty detection. We present illustrative algorithms for these learning tasks, and a real-world application illustrating the advantages of stream processing. The chapter ends with some open issues that emerge from this new research area. © 2009 Springer-Verlag Berlin Heidelberg.

2007

Semi-fuzzy splitting in Online Divisive-Agglomerative Clustering

Autores
Rodrigues, PP; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
The Online Divisive-Agglomerative Clustering (ODAC) is an incremental approach for clustering streaming time series using a hierarchical procedure over time. It constructs a tree-like hierarchy of clusters of streams, using a top-down strategy based on the correlation between streams. The system also possesses an agglomerative phase to enhance a dynamic behavior capable of structural change detection. However, the split decision used in the algorithm focus on the crisp boundary between two groups, which implies a high risk since it has to decide based on only a small subset of the entire data. In this work we propose a semi-fuzzy approach to the assignment of variables to newly created clusters, for a better trade-off between validity and performance. Experimental work supports the benefits of our approach.

2007

Stream-based electricity load forecast

Autores
Gama, J; Rodrigues, PP;

Publicação
Knowledge Discovery in Databases: PKDD 2007, Proceedings

Abstract
Sensors distributed all around electrical-power distribution networks produce strean is of data it high-speed. From a data mining perspective, this sensor network problem is characterized by a large number of variables (sensors), producing a continuous flow of data, in a dynamic non-stationary environment. Companies make decisions to buy or sell energy based on load profiles and forecast. We propose an architecture based on an online clustering algorithm where each cluster (group of sensors with high correlation) contains a neural-network based predictive model. The goal is to maintain in real-time a clustering model and a predictive model able to incorporate new information at the speed data arrives. detecting changes and adapting the decision models to the most recent information. We present results illustrating the advantages of the proposed architecture, on several temporal horizons, and its competitiveness with another predictive strategy.

2007

Clustering techniques in sensor networks

Autores
Rodrigues, PP; Gama, J;

Publicação
Learning from Data Streams: Processing Techniques in Sensor Networks

Abstract
The traditional knowledge discovery environment, where data and processing units are centralized in controlled laboratories and servers, is now completely transformed into a web of sensorial devices, some of them with local processing ability. This scenario represents a new knowledge-extraction environment, possibly not completely observable, that is much less controlled by both the human user and a common centralized control process. © 2007 Springer-Verlag Berlin Heidelberg.

2007

Data stream processing

Autores
Gama, J; Rodrigues, PP;

Publicação
Learning from Data Streams: Processing Techniques in Sensor Networks

Abstract
The rapid growth in information science and technology in general and the complexity and volume of data in particular have introduced new challenges for the research community.Many sources produce data continuously. Examples include sensor networks, wireless networks, radio frequency identification (RFID), customer click streams, telephone records, multimedia data, scientific data, sets of retail chain transactions etc. These sources are called data streams. A data stream is an ordered sequence of instances that can be read only once or a small number of times using limited computing and storage capabilities. These sources of data are characterized by being open-ended, flowing at high-speed, and generated by non stationary distributions in dynamic environments. What distinguishes current data from earlier one is automatic data feeds. We do not just have people who are entering information into a computer. Instead, we have computers entering data into each other [25]. Nowadays there are applications in which the data are modeled best as transient data streams instead of as persistent tables. Examples of applications include network monitoring, user modeling in web applications, sensor networks in electrical networks, telecommunications data management, prediction in stock markets, monitoring radio frequency identification etc. In these applications it is not feasible to load the arriving data into a traditional data base management system (DBMS) and traditional DBMS are not designed to directly support the continuous queries required by these applications [3]. Carney et al. [6] pointed out the significant differences between data bases that are passive repositories of data and data bases that actually monitor applications and alert humans when abnormal activity is detected. In the former, only the current state of the data is relevant for analysis. Humans initiate queries, usually one-time, predefined queries. In the latter, data come from external sources (e.g., sensors), and require processing historic data. For example, in monitoring activity, queries should run continuously. The answer to a continuous query is produced over time, reflecting the data seen so far. Moreover, if the process is not strictly stationary (as most of real-world applications), the target concept could gradually change over time. For example, the type of abnormal activity (e.g., attacks in TCP/IP networks, frauds in credit card transactions etc.) changes over time. Organizations use decision support systems to identify potential useful patterns in data. Data analysis is complex, interactive, and exploratory over very large volumes of historic data, eventually stored in distributed environments. Traditional pattern discovery process requires online ad-hoc queries, not previously defined, that are successively refined. Nowadays, given the current trends in decision support and data analysis, the computer plays a much more active role, by searching hypotheses, evaluating and suggesting patterns. Due to the exploratory nature of these queries, an exact answer may not be required. A user may prefer a fast approximate answer. Range queries and selectivity estimation (the proportion of tuples that satisfy a query) are two illustrative examples where fast but approximate answers are more useful than slow and exact ones. Sensor networks are distributed environments producing multiple streams of data. We can consider the network as a distributed database we are interested in querying and mining. In this chapter we review the main techniques used for query and mining data streams that are of potential use in sensor networks. In Sect. 3.2 we refer to the data stream models and identify its main research challenges. Section 3.3 presents basic stream models. Section 3.4 present basic stream algorithms for maintaining synopsis over data streams. Section 3.5 concludes the chapter and points out future directions for research. © 2007 Springer-Verlag Berlin Heidelberg.

  • 24
  • 29