Publications

Publications by LIAAD

2015

Customer segmentation in a large database of an online customized fashion business

Authors
Brito, PQ; Soares, C; Almeida, S; Monte, A; Byvoet, M;

Publication
ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING

Abstract
Data mining (DM) techniques have been used to solve marketing and manufacturing problems in the fashion industry. These approaches are expected to be particularly important for highly customized industries because the diversity of products sold makes it harder to find clear patterns of customer preferences. The goal of this project was to investigate two different data mining approaches for customer segmentation: clustering and subgroup discovery. The models obtained produced six market segments and 49 rules that allowed a better understanding of customer preferences in a highly customized fashion manufacturer/e-tailor. The scope and limitations of these clustering DM techniques will lead to further methodological refinements.

CloseRead Abstract

2015

Pruning Bagging Ensembles with Metalearning

Authors
Pinto, F; Soares, C; Mendes Moreira, J;

Publication
MULTIPLE CLASSIFIER SYSTEMS (MCS 2015)

Abstract
Ensemble learning algorithms often benefit from pruning strategies that allow to reduce the number of individuals models and improve performance. In this paper, we propose a Metalearning method for pruning bagging ensembles. Our proposal differs from other pruning strategies in the sense that allows to prune the ensemble before actually generating the individual models. The method consists in generating a set characteristics from the bootstrap samples and relate them with the impact of the predictive models in multiple tested combinations. We executed experiments with bagged ensembles of 20 and 100 decision trees for 53 UCI classification datasets. Results show that our method is competitive with a state-of-the-art pruning technique and bagging, while using only 25% of the models.

CloseRead Abstract

2015

TwitterJam: Identification of Mobility Patterns in Urban Centers Based on Tweets

Authors
Rebelo, F; Soares, C; Rossetti, RJF;

Publication
2015 IEEE FIRST INTERNATIONAL SMART CITIES CONFERENCE (ISC2)

Abstract
In the early twenty-first century, social networks served only to let the world know our tastes, share our photos and share some thoughts. A decade later, these services are filled with an enormous amount of information. Now, the industry and the academia are exploring this information, in order to extract implicit patterns. TwitterJam is a tool that analyses the contents of the social network Twitter to extract events related to road traffic. To reach this goal, we started by analysing tweets to know those which really contains road traffic information. The second step was to gather official information to confirm the extracted information. With these two types of information (official and general), we correlated them in order to verify the credibility of public tweets. The correlation between the two types of information was done separately in two ways: the first one concerns the amount of tweets in a certain time of day and the second on the localization of these tweets. Two hypothesis were also devised concerning these correlations. The results were not perfect but where reasonable enough. We also analysed tools suitable for the visualization of data to decide what is the best strategy to follow. At the end we developed a web application that shows the results, to help the analysis of results.

CloseRead Abstract

2015

Using Metalearning for Prediction of Taxi Trip Duration Using Different Granularity Levels

Authors
Zarmehri, MN; Soares, C;

Publication
Advances in Intelligent Data Analysis XIV

Abstract
Trip duration is an important metric for the management of taxi companies, as it affects operational efficiency, driver satisfaction and, above all, customer satisfaction. In particular, the ability to predict trip duration in advance can be very useful for allocating taxis to stands and finding the best route for trips. A data mining approach can be used to generate models for trip time prediction. In fact, given the amount of data available, different models can be generated for different taxis. Given the difference between the data collected by different taxis, the best model for each one can be obtained with different algorithms and/or parameter settings. However, finding the configuration that generates the best model for each taxi is computationally very expensive. In this paper, we propose the use of metalearning to address the problem of selecting the algorithm that generates the model with the most accurate predictions for each taxi. The approach is tested on data collected in the Drive-In project. Our results show that metalearning can help to select the algorithm with the best accuracy.

CloseRead Abstract

2015

The weighted rank correlation coefficient in the case of ties

Authors
Da Costa, JP; Roque, LAC; Soares, C;

Publication
STATISTICS & PROBABILITY LETTERS

Abstract
A new weighted rank correlation coefficient r(W2) has been introduced in Pinto da Costa (2011), following the coefficient r(W) introduced in Pinto Da Costa and Soares (2005); Soares et al. (2001); Pinto Da Costa et al. (2001). We give the expression of r(W2) in the case of ties and also present some simulations to study the behaviour of the coefficient.

CloseRead Abstract

2015

POPmine: Tracking Political Opinion on the Web

Authors
Saleiro, P; Amir, S; Silva, M; Soares, C;

Publication
CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING

Abstract
The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of opinion mining. We design and implement the POPmine system which is able to collect texts from web-based conventional media (news items in mainstream media sites) and social media (blogs and Twitter) and to process those texts, recognizing topics and political actors, analyzing relevant linguistic units, and generating indicators of both frequency of mention and polarity (positivity/negativity) of mentions to political actors across sources, types of sources, and across time.

CloseRead Abstract