2016
Authors
Faria, ER; Goncalves, IJCR; de Carvalho, ACPLF; Gama, J;
Publication
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
In massive data analysis, data usually come in streams. In the last years, several studies have investigated novelty detection in these data streams. Different approaches have been proposed and validated in many application domains. A review of the main aspects of these studies can provide useful information to improve the performance of existing approaches, allow their adaptation to new applications and help to identify new important issues to be addresses in future studies. This article presents and analyses different aspects of novelty detection in data streams, like the offline and online phases, the number of classes considered at each phase, the use of ensemble versus a single classifier, supervised and unsupervised approaches for the learning task, information used for decision model update, forgetting mechanisms for outdated concepts, concept drift treatment, how to distinguish noise and outliers from novelty concepts, classification strategies for data with unknown label, and how to deal with recurring classes. This article also describes several applications of novelty detection in data streams investigated in the literature and discuss important challenges and future research directions.
2016
Authors
Gama, J;
Publication
Encyclopedia of Machine Learning and Data Mining
Abstract
2016
Authors
Tabassum, S; Gama, J;
Publication
DISCOVERY SCIENCE, (DS 2016)
Abstract
With the realization of networks in many of the real world domains, research work in network science has gained much attention now-a-days. The real world interaction networks are exploited to gain insights into real world connections. One of the notion is to analyze how these networks grow and evolve. Most of the works rely upon the socio centric networks. The socio centric network comprises of several ego networks. How these ego networks evolve greatly influences the structure of network. In this work, we have analyzed the evolution of ego networks from a massive call network stream by using an extensive list of graph metrics. By doing this, we studied the evolution of structural properties of graph and related them with the real world user behaviors. We also proved the densification power law over the temporal call ego networks. Many of the evolving networks obey the densification power law and the number of edges increase as a function of time. Therefore, we discuss a sequential sampling method with forgetting factor to sample the evolving ego network stream. This method captures the most active and recent nodes from the network while preserving the tie strengths between them and maintaining the density of graph and decreasing redundancy.
2016
Authors
Pereira, FSF; de Amo, S; Gama, J;
Publication
DISCOVERY SCIENCE, (DS 2016)
Abstract
User preferences are fairly dynamic, since users tend to exploit a wide range of information and modify their tastes accordingly over time. Existing models and formulations are too constrained to capture the complexity of this underlying phenomenon. In this paper, we investigate the interplay between user preferences and social networks over time. We propose to analyze user preferences dynamics with his/her social network modeled as a temporal network. First, we define a temporal preference model for reasoning with preferences. Then, we use evolving centralities from temporal networks to link with preferences dynamics. Our results indicate that modeling Twitter as a temporal network is more appropriated for analyzing user preferences dynamics than using just snapshots of static network.
2016
Authors
Sousa, MR; Gama, J; Brandao, E;
Publication
EXPERT SYSTEMS WITH APPLICATIONS
Abstract
We propose a new dynamic modeling framework for credit risk assessment that extends the prevailing credit scoring models built upon historical data static settings. The driving idea mimics the principle of films, by composing the model with a sequence of snapshots, rather than a single photograph. In doing so, the dynamic modeling consists of sequential learning from the new incoming data. A key contribution is provided by the insight that different amounts of memory can be explored concurrently. Memory refers to the amount of historic data being used for estimation. This is important in the credit risk area, which often seems to undergo shocks. During a shock, limited memory is important. Other times, a larger memory has merit. An application to a real-world financial dataset of credit cards from a financial institution in Brazil illustrates our methodology, which is able to consistently outperform the static modeling schema.
2016
Authors
Correa, FE; Oliveira, MDB; Gama, J; Correa, PLP; Rady, J;
Publication
COMPUTERS AND ELECTRONICS IN AGRICULTURE
Abstract
Agribusiness is an activity that generates huge amounts of temporal data. There are research centers that collect, store and create indexes of agricultural activities, providing multidimensional time series composed by years of data. In this paper, we are interested in studying the behavior of these time series, especially in what regards the evolution of agricultural price indexes over the years. We explore data mining techniques tailored to analyze temporal data, aiming to generate spatio-temporal trajectories of grains price indexes for six years of data. We propose the use of Tucker decomposition to both analyze the temporal patterns of these price indexes and map trajectories that represent their behavior over time in a concise and representative low-dimensional subspace. The case study presents an application of this methodology to real databases of price indexes of corn and soybeans in Brazil and the United States.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.