Publications

Publications by Carlos Manuel Soares

2014

Analysing Collaborative Filtering algorithms in a multi-agent environment

Authors
Cunha, T; Rossetti, RJF; Soares, C;

Publication
Modelling and Simulation 2014 - European Simulation and Modelling Conference, ESM 2014

Abstract
The huge amount of online information deprives the user to keep up with his/hers interests and preferences, Recommender Systems appeared to solve this problem, by employing social behavioural paradigms in order to recommend potentially interesting items to users, Among the several kinds of Recommender Systems, one of the most mature and most used in real world applications are known as Collaborative Filtering. These methods recommend items based on the preferences of similar-users, using only a user-item rating matrix. In this pa™ per we explain a methodology to use Multi™Agent based simulation to study the evolution of the data rating matrix and its effect on the performance of several Collaborative Filtering algorithms. Our results show that the best performing methods are user-based and item-based Collaborative Filtering and that the average algorithm performance is surprisingly constant for different rating schemes.

CloseRead Abstract

2017

Arbitrated Ensemble for Time Series Forecasting

Authors
Cerqueira, V; Torgo, L; Pinto, F; Soares, C;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT II

Abstract
This paper proposes an ensemble method for time series forecasting tasks. Combining different forecasting models is a common approach to tackle these problems. State-of-the-art methods track the loss of the available models and adapt their weights accordingly. Metalearning strategies such as stacking are also used in these tasks. We propose a metalearning approach for adaptively combining forecasting models that specializes them across the time series. Our assumption is that different forecasting models have different areas of expertise and a varying relative performance. Moreover, many time series show recurring structures due to factors such as seasonality. Therefore, the ability of a method to deal with changes in relative performance of models as well as recurrent changes in the data distribution can be very useful in dynamic environments. Our approach is based on an ensemble of heterogeneous forecasters, arbitrated by a metalearning model. This strategy is designed to cope with the different dynamics of time series and quickly adapt the ensemble to regime changes. We validate our proposal using time series from several real world domains. Empirical results show the competitiveness of the method in comparison to state-of-the-art approaches for combining forecasters.

CloseRead Abstract

2017

TexRep: A Text Mining Framework for Online Reputation Monitoring

Authors
Saleiro, P; Rodrigues, EM; Soares, C; Oliveira, E;

Publication
NEW GENERATION COMPUTING

Abstract
This work aims to understand, formalize and explore the scientific challenges of using unstructured text data from different Web sources for Online Reputation Monitoring. We here present TexRep, an adaptable text mining framework specifically tailored for Online Reputation Monitoring that can be reused in multiple application scenarios, from politics to finance. This framework is able to collect texts from online media, such as Twitter, and identify entities of interest and classify sentiment polarity and intensity. The framework supports multiple data aggregation methods, as well as visualization and modeling techniques that can be used for both descriptive analytics, such as analyze how political polls evolve over time, and predictive analytics, such as predict elections. We here present case studies that illustrate and validate TexRep for Online Reputation Monitoring. In particular, we provide an evaluation of TexRep Entity Filtering and Sentiment Analysis modules using well known external benchmarks. We also present an illustrative example of TexRep application in the political domain.

CloseRead Abstract

2013

Clustering for decision support in the fashion industry: A case study

Authors
Monte, A; Soares, C; Brito, P; Byvoet, M;

Publication
Lecture Notes in Mechanical Engineering

Abstract
The scope of this work is the segmentation of the orders of Bivolino, a Belgian company that sells custom tailored shirts. The segmentation is done based on clustering, following a Data Mining approach. We use the K-Medoids clustering method because it is less sensitive to outliers than other methods and it can handle nominal variables, which are the most common in the data used in this work. We interpret the results from both the design and marketing perspectives. The results of this analysis contain useful knowledge for the company regarding its business. This knowledge, as well as the continued usage of clustering to support both the design and marketing processes, is expected to allow Bivolino to make important business decisions and, thus, obtain competitive advantage over its competitors. © Springer International Publishing Switzerland 2013.

CloseRead Abstract

2014

MetaStream: A meta-learning based method for periodic algorithm selection in time-changing data

Authors
Debiaso Rossi, ALD; de Leon Ferreira de Carvalho, ACPDF; Soares, C; de Souza, BF;

Publication
NEUROCOMPUTING

Abstract
Dynamic real-world applications that generate data continuously have introduced new challenges for the machine learning community, since the concepts to be learned are likely to change over time. In such scenarios, an appropriate model at a time point may rapidly become obsolete, requiring updating or replacement. As there are several learning algorithms available, choosing one whose bias suits the current data best is not a trivial task. In this paper, we present a meta-learning based method for periodic algorithm selection in time-changing environments, named MetaStream. It works by mapping the characteristics extracted from the past and incoming data to the performance of regression models in order to choose between single learning algorithms or their combination. Experimental results for two real regression problems showed that MetaStream is able to improve the general performance of the learning system compared to a baseline method and an ensemble-based approach.

CloseRead Abstract

2013

CN2-SD for subgroup discovery in a highly customized textile industry: A case study

Authors
Almeida, S; Soares, C;

Publication
Lecture Notes in Mechanical Engineering

Abstract
The success of the textile industry largely depends on the products offered and on the speed of response to variations in demand that are induced by changes in consumer lifestyles. The study of behavioral habits and buying trends can provide models to be integrated into the decision support systems of companies. Data mining techniques can be used to develop models based on data. This approach has been used in the past to develop models to improve sales in the textile industry. However, the discovery of scientific models based on subgroup discovery algorithms, that characterize subgroups of observations with rare distributions, has not been made in this area. The goal of this work is to investigate whether these algorithms can extract knowledge that is useful for a particular kind of textile industry, which produces highly customized garments. We apply the CN2-SD subgroup discovery method to find rare and interesting subgroups products on a database provided by a manufacturer of custom-made shirts. The results show that it is possible to obtain knowledge that is useful to understand customer preferences in highly customized textile industries using subgroup discovery techniques. © Springer International Publishing Switzerland 2013.

CloseRead Abstract