Publicacoes - INESC TEC

Publicações

Publicações por Pedro Gabriel Ferreira

2006

Establishing fraud detection patterns based on signatures

Autores
Ferreira, P; Alves, R; Belo, O; Cortesao, L;

Publicação
ADVANCES IN DATA MINING: APPLICATIONS IN MEDICINE, WEB MINING, MARKETING, IMAGE AND SIGNAL MINING

Abstract
All over the world we have been assisting to a significant increase of the telecommunication systems usage. People are faced day after day with strong marketing campaigns seeking their attention to new telecommunication products and services. Telecommunication companies struggle in a high competitive business arena. It seems that their efforts were well done, because customers are strongly adopting the new trends and use (and abuse) systematically communication services in their quotidian. Although fraud situations are rare, they are increasing and they correspond to a large amount of money that telecommunication companies lose every year. In this work, we studied the problem of fraud detection in telecommunication systems, especially the cases of superimposed fraud, providing an anomaly detection technique, supported by a signature schema. Our main goal is to detect deviate behaviors in useful time, giving better basis to fraud analysts to be more accurate in their decisions in the establishment of potential fraud situations.

FecharLer Abstract

2004

Clickstreams, the basis to establish user navigation patterns on web sites

Autores
Alves, R; Belo, O; Cavalcanti, F; Ferreira, P;

Publicação
DATA MINING V: DATA MINING, TEXT MINING AND THEIR BUSINESS APPLICATIONS

Abstract
Collecting and mining clickstream data from c-commerce sites has become increasingly important for marketing, advertising, and traffic analysis activities. Organizations are promoting many initiatives concerning user's navigation pattern discovery, in order to implement better sites, more functional and close to customers' needs. Basically, the main idea is to provide more quality of attendance in their sites, and, consequently, get more profitability. However, clickstream processing is not a simple task. The sequences of clicks are very difficult to handle using conventional techniques, essentially due to their diversity and nature. They include a lot of aspects that reveal the multidimensional perspective of web data. OLAP technology provides today the means and techniques to represent, store and analyse such kinds of multidimensional data. However, it does not offer discovery driven analysis to support traversal pattern identification processes on web sites. Mining traversal pattern techniques can be applied in conjunction with OLAP as an integrated alternative for understanding those particular sequences of clicks. In this paper we present an integrated OLAP and mining approach specially conceived for exploring user navigation patterns based on clickstreams. We also describe the multidimensional structure provided for modelling click sequences and the OLAP operations and mining techniques that can be pushed over data cubes to bring up navigation patterns.

FecharLer Abstract

2009

Spatial Clustering of Molecular Dynamics Trajectories in Protein Unfolding Simulations

Autores
Ferreira, PG; Silva, CG; Azevedo, PJ; Brito, RMM;

Publicação
COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS

Abstract
Molecular dynamics simulations is a valuable tool to study protein unfolding in silico. Analyzing the relative spatial position of the residues during the simulation may indicate which residues are essential in determining the protein structure. We present a method, inspired by a popular data mining technique called Frequent Itemset Mining, that clusters sets of amino acid residues with a synchronized trajectory during the unfolding process. The proposed approach has several advantages over traditional hierarchical clustering. © 2009 Springer Berlin Heidelberg.

FecharLer Abstract

2007

Evaluating deterministic motif significance measures in protein databases

Autores
Ferreira, PG; Azevedo, PJ;

Publicação
ALGORITHMS FOR MOLECULAR BIOLOGY

Abstract
Background: Assessing the outcome of motif mining algorithms is an essential task, as the number of reported motifs can be very large. Significance measures play a central role in automatically ranking those motifs, and therefore alleviating the analysis work. Spotting the most interesting and relevant motifs is then dependent on the choice of the right measures. The combined use of several measures may provide more robust results. However caution has to be taken in order to avoid spurious evaluations. Results: From the set of conducted experiments, it was verified that several of the selected significance measures show a very similar behavior in a wide range of situations therefore providing redundant information. Some measures have proved to be more appropriate to rank highly conserved motifs, while others are more appropriate for weakly conserved ones. Support appears as a very important feature to be considered for correct motif ranking. We observed that not all the measures are suitable for situations with poorly balanced class information, like for instance, when positive data is significantly less than negative data. Finally, a visualization scheme was proposed that, when several measures are applied, enables an easy identification of high scoring motifs. Conclusion: In this work we have surveyed and categorized 14 significance measures for pattern evaluation. Their ability to rank three types of deterministic motifs was evaluated. Measures were applied in different testing conditions, where relations were identified. This study provides some pertinent insights on the choice of the right set of significance measures for the evaluation of deterministic motifs extracted from protein databases.

FecharLer Abstract

2007

Deterministic motif mining in protein databases

Autores
Ferreira, PG; Azevedo, PJ;

Publicação
Successes and New Directions in Data Mining

Abstract
Protein sequence motifs describe, through means of enhanced regular expression syntax, regions of amino acids that have been conserved across several functionally related proteins. These regions may have an implication at the structural and functional level of the proteins. Sequence motif analysis can bring significant improvements towards a better understanding of the protein sequence-structure-function relation. In this chapter, we review the subject of mining deterministic motifs from protein sequence databases. We start by giving a formal definition of the different types of motifs and the respective specificities. Then, we explore the methods available to evaluate the quality and interest of such patterns. Examples of applications and motif repositories are described. We discuss the algorithmic aspects and different methodologies for motif extraction. A brief description on how sequence motifs can be used to extract structural level information patterns is also provided. © 2008, IGI Global.

FecharLer Abstract

2009

Deterministic pattern mining on genetic sequences

Autores
Ferreira, PG; Azevedo, PJ;

Publicação
Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques

Abstract
The recent increase in the number of complete genetic sequences freely available through specialized Internet databases presents big challenges for the research community. One such challenge is the efficient and effective search of sequence patterns, also known as motifs, among a set of related genetic sequences. Such patterns describe regions that may provide important insights about the structural and functional role of DNA and proteins. Two main classes can be considered: probabilistic patterns represent a model that simulates the sequences or part of the sequences under consideration and deterministic patterns that either match or not the input sequences. In this chapter a general overview of deterministic sequence mining over sets of genetic sequences is proposed. The authors formulate an architecture that divides the mining process workflow into a set of blocks. Each of these blocks is discussed individually. © 2010, IGI Global.

FecharLer Abstract