Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2020

Fraud Detection using Heavy Hitters: a Case Study

Autores
Veloso, B; Martins, C; Espanha, R; Azevedo, R; Gama, J;

Publicação
PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20)

Abstract
The high asymmetry of international termination rates, where calls are charged with higher values, are fertile ground for the appearance of frauds in Telecom Companies. In this paper, we present three different and complementary solutions for a real problem called Interconnect Bypass Fraud. This problem is one of the most common in the telecommunication domain and can be detected by the occurrence of abnormal behaviours from specific numbers. Our goal is to detect as soon as possible numbers with abnormal behaviours, e.g. bursts of calls, repetitions and mirror behaviours. Based on this assumption, we propose: (i) the adoption of a new fast forgetting technique that works together with the Lossy Counting algorithm; (ii) the proposal of a single pass hierarchical heavy hitter algorithm that also contains a forgetting technique; and (iii) the application of the HyperLogLog sketches for each phone number. We used the heavy hitters to detect abnormal behaviours, e.g. burst of calls, repetition and mirror. The hierarchical heavy hitters algorithm is used to detect the numbers that make calls for a huge set of destinations and destination numbers that receives a huge set of calls to provoke a denial of service. Additionally, to detect the cardinality of destination numbers of each origin number we use the HyperLogLog algorithm. The results shows that these three approaches combined complements the techniques used by the telecom company and make the fraud task more difficult.

2020

Improving Prediction with Causal Probabilistic Variables

Autores
Nogueira, AR; Gama, J; Ferreira, CA;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020

Abstract
The application of feature engineering in classification problems has been commonly used as a means to increase the classification algorithms performance. There are already many methods for constructing features, based on the combination of attributes but, to the best of our knowledge, none of these methods takes into account a particular characteristic found in many problems: causality. In many observational data sets, causal relationships can be found between the variables, meaning that it is possible to extract those relations from the data and use them to create new features. The main goal of this paper is to propose a framework for the creation of new supposed causal probabilistic features, that encode the inferred causal relationships between the target and the other variables. In this case, an improvement in the performance was achieved when applied to the Random Forest algorithm.

2020

A scalable saliency-based feature selection method with instance-level information

Autores
Cancela, B; Bolon Canedo, V; Alonso Betanzos, A; Gama, J;

Publicação
KNOWLEDGE-BASED SYSTEMS

Abstract
Classic feature selection techniques remove irrelevant or redundant features to achieve a subset of relevant features in compact models that are easier to interpret and so improve knowledge extraction. Most such techniques operate on the whole dataset, but are unable to provide the user with useful information when only instance-level information is required; in other words, classic feature selection algorithms do not identify the most relevant information in a sample. We have developed a novel feature selection method, called saliency-based feature selection (SFS), based on deep-learning saliency techniques. Our algorithm works under any architecture that is trained by using gradient descent techniques (Neural Networks, SVMs, ...), and can be used for classification or regression problems. Experimental results show our algorithm is robust, as it allows to transfer the feature ranking result between different architectures, achieving remarkable results. The versatility of our algorithm has been also demonstrated, as it can work either in big data environments as well as with small datasets.

2020

Impact of Trust and Reputation Based Brokerage on the CloudAnchor Platform

Autores
Veloso, B; Malheiro, B; Burguillo, JC; Gama, J;

Publicação
Advances in Practical Applications of Agents, Multi-Agent Systems, and Trustworthiness. The PAAMS Collection - 18th International Conference, PAAMS 2020, L'Aquila, Italy, October 7-9, 2020, Proceedings

Abstract
This paper analyses the impact of trust and reputation modelling on CloudAnchor, a business-to-business brokerage platform for the transaction of single and federated resources on behalf of Small and Medium Sized Enterprises (SME). In CloudAnchor, businesses act as providers or consumers of Infrastructure as a Service (IaaS) resources. The platform adopts a multi-layered multi-agent architecture, where providers, consumers and virtual providers, representing provider coalitions, engage in trust & reputation-based provider look-up, invitation, acceptance and resource negotiations. The goal of this work is to assess the relevance of the distributed trust model and centralised fuzzified reputation service in the number of resources successfully transacted, the global turnover, brokerage fees, losses, expenses and time response. The results show that trust and reputation based brokerage has a positive impact on the CloudAnchor performance by reducing losses and the execution time for the provision of both single and federated resources and increasing considerably the number of federated resources provided. © 2020, Springer Nature Switzerland AG.

2020

REST framework: A modelling approach towards cooling energy stress mitigation plans for future cities in warming Global South

Autores
Bardhan, R; Debnath, R; Gama, J; Vijay, U;

Publicação
SUSTAINABLE CITIES AND SOCIETY

Abstract
Future cities of the Global South will not only rapidly urbanise but will also get warmer from climate change and urbanisation induced effects. It will trigger a multi-fold increase in cooling demand, especially at a residential level, mitigation to which remains a policy and research gap. This study forwards a novel residential energy stress mitigation framework called REST to estimate warming climate-induced energy stress in residential buildings using a GIS-driven urban heat island and energy modelling approach. REST further estimates rooftop solar potential to enable solar photo-voltaic (PV) based decentralised energy solutions and establish an optimised routine for peer-to-peer energy sharing at a neighbourhood scale. The optimised network is classified through a decision tree algorithm to derive sustainability rules for mitigating energy stress at an urban planning scale. These sustainability rules established distributive energy justice variables in urban planning context. The REST framework is applied as a proof-of-concept on a future smart city of India, named Amaravati. Results show that cooling energy stress can be reduced by 80 % in the study area through sensitive use of planning variables like Floor Space Index (FSI) and built-up density. It has crucial policy implications towards the design and implementation of a national level cooling action plans in the future cities of the Global South to meet the UN-SDG - 7 (clean and affordable energy) and SDG - 11 (sustainable cities and communities) targets.

2020

A drift detection method based on dynamic classifier selection

Autores
Pinage, F; dos Santos, EM; Gama, J;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Machine learning algorithms can be applied to several practical problems, such as spam, fraud and intrusion detection, and customer preferences, among others. In most of these problems, data come in streams, which mean that data distribution may change over time, leading to concept drift. The literature is abundant on providing supervised methods based on error monitoring for explicit drift detection. However, these methods may become infeasible in some real-world applications-where there is no fully labeled data available, and may depend on a significant decrease in accuracy to be able to detect drifts. There are also methods based on blind approaches, where the decision model is updated constantly. However, this may lead to unnecessary system updates. In order to overcome these drawbacks, we propose in this paper a semi-supervised drift detector that uses an ensemble of classifiers based on self-training online learning and dynamic classifier selection. For each unknown sample, a dynamic selection strategy is used to choose among the ensemble's component members, the classifier most likely to be the correct one for classifying it. The prediction assigned by the chosen classifier is used to compute an estimate of the error produced by the ensemble members. The proposed method monitors such a pseudo-error in order to detect drifts and to update the decision model only after drift detection. The achievement of this method is relevant in that it allows drift detection and reaction and is applicable in several practical problems. The experiments conducted indicate that the proposed method attains high performance and detection rates, while reducing the amount of labeled data used to detect drift.

  • 102
  • 429