Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2016

TweeProfiles3: visualization of spatio-temporal patterns on Twitter

Authors
Maia, A; Cunha, T; Soares, C; Abreu, PH;

Publication
NEW ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1

Abstract
With the advent of social networking, a lot of user-specific, voluntarily provided data has been generated. Researchers and companies noticed the value that lied within those enormous amounts of data and developed algorithms and tools to extract patterns in order to act on them. TweeProfiles is an offline clustering tool that analyses tweets over multiple dimensions: spatial, temporal, content and social. This project was extended in TweeProfiles2 by enabling the processing of real-time data. In this work, we developed a visualization tool suitable for data streaming, using multiple widgets to better represent all the information. The usefulness of the developed tool for journalism was evaluated based on a usability test, which despite its reduced number of participants yielded good results.

2016

Entropy-based discretization methods for ranking data

Authors
de Sa, CR; Soares, C; Knobbe, A;

Publication
INFORMATION SCIENCES

Abstract
Label Ranking (LR) problems are becoming increasingly important in Machine Learning. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, there are not many pre-processing methods for LR Some methods, like Naive Bayes for LR and APRIORI-LR, cannot handle real-valued data directly. Conventional discretization methods used in classification are not suitable for LR problems, due to the different target variable. In this work, we make an extensive analysis of the existing methods using simple approaches. We also propose a new method called EDiRa (Entropy-based Discretization for Ranking) for the discretization of ranking data. We illustrate the advantages of the method using synthetic data and also on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and also improves the results and efficiency of the learning algorithms.

2016

Exceptional Preferences Mining

Authors
de Sa, CR; Duivesteijn, W; Soares, C; Knobbe, A;

Publication
DISCOVERY SCIENCE, (DS 2016)

Abstract
Exceptional Preferences Mining (EPM) is a crossover between two subfields of datamining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where the preference relations between subsets of the labels significantly deviate from the norm; a variant of Subgroup Discovery, with rankings as the (complex) target concept. We employ three quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes 'exceptional' varies with the quality measure: the first gauges exceptional overall ranking behavior, the second indicates whether a particular label stands out from the rest, and the third highlights subgroups featuring unusual pairwise label ranking behavior. As proof of concept, we explore five datasets. The results confirm that the new task EPM can deliver interesting knowledge. The results also illustrate how the visualization of the preferences in a Preference Matrix can aid in interpreting exceptional preference subgroups.

2016

Selecting Collaborative Filtering Algorithms Using Metalearning

Authors
Cunha, T; Soares, C; Carvalho, ACPLFd;

Publication
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part II

Abstract
Recommender Systems are an important tool in e-business, for both companies and customers. Several algorithms are available to developers, however, there is little guidance concerning which is the best algorithm for a specific recommendation problem. In this study, a metalearning approach is proposed to address this issue. It consists of relating the characteristics of problems (metafeatures) to the performance of recommendation algorithms. We propose a set of metafeatures based on the application of systematic procedure to develop metafeatures and by extending and generalizing the state of the art metafeatures for recommender systems. The approach is tested on a set of Matrix Factorization algorithms and a collection of real-world Collaborative Filtering datasets. The performance of these algorithms in these datasets is evaluated using several standard metrics. The algorithm selection problem is formulated as classification tasks, where the target attribute is the best Matrix Factorization algorithm, according to each metric. The results show that the approach is viable and that the metafeatures used contain information that is useful to predict the best algorithm for a dataset. © Springer International Publishing AG 2016.

2016

AToMRS: A Tool to Monitor Recommender Systems

Authors
Costa, A; Cunha, T; Soares, C;

Publication
KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1

Abstract
Recommender systems arose in response to the excess of available online information. These systems assign, to a given individual, suggestions of items that may be relevant. These system's monitoring and evaluation are fundamental to the proper functioning of many business related services. It is the goal of this paper to create a tool capable of collecting, aggregating and supervising the results obtained from the recommendation systems' evaluation. To achieve this goal, a multi-granularity approach is developed and implemented in order to organize the different levels of the problem. This tool also aims to tackle the lack of mechanisms to enable visually assessment of the performance of a recommender systems' algorithm. A functional prototype of the application is presented, with the purpose of validating the solution's concept.

2016

Learning from the News: Predicting Entity Popularity on Twitter

Authors
Saleiro, P; Soares, C;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XV

Abstract
In this work, we tackle the problem of predicting entity popularity on Twitter based on the news cycle. We apply a supervised learning approach and extract four types of features: (i) signal, (ii) textual, (iii) sentiment and (iv) semantic, which we use to predict whether the popularity of a given entity will be high or low in the following hours. We run several experiments on six different entities in a dataset of over 150M tweets and 5M news and obtained F1 scores over 0.70. Error analysis indicates that news perform better on predicting entity popularity on Twitter when they are the primary information source of the event, in opposition to events such as live TV broadcasts, political debates or football matches.

  • 212
  • 430