Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2013

Dimensions as Virtual Items: Improving the predictive ability of top-N recommender systems

Autores
Domingues, MA; Jorge, AM; Soares, C;

Publicação
INFORMATION PROCESSING & MANAGEMENT

Abstract
Traditionally, recommender systems for the web deal with applications that have two dimensions, users and items. Based on access data that relate these dimensions, a recommendation model can be built and used to identify a set of N items that will be of interest to a certain user. In this paper we propose a multidimensional approach, called DaVI (Dimensions as Virtual Items), that consists in inserting contextual and background information as new user-item pairs. The main advantage of this approach is that it can be applied in combination with several existing two-dimensional recommendation algorithms. To evaluate its effectiveness, we used the DaVI approach with two different top-N recommender algorithms, Item-based Collaborative Filtering and Association Rules based, and ran an extensive set of experiments in three different real world data sets. In addition, we have also compared our approach to the previously introduced combined reduction and weight post-filtering approaches. The empirical results strongly indicate that our approach enables the application of existing two-dimensional recommendation algorithms in multidimensional data, exploiting the useful information of these data to improve the predictive ability of top-N recommender systems.

FecharLer Abstract

2013

Multi-interval Discretization of Continuous Attributes for Label Ranking

Autores
de Sa, CR; Soares, C; Knobbe, A; Azevedo, P; Jorge, AM;

Publicação
DISCOVERY SCIENCE

Abstract
Label Ranking (LR) problems, such as predicting rankings of financial analysts, are becoming increasingly important in data mining. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, pre-processing methods for LR are still very scarce. However, some methods, like Naive Bayes for LR and APRIORI-LR, cannot deal with real-valued data directly. As a make-shift solution, one could consider conventional discretization methods used in classification, by simply treating each unique ranking as a separate class. In this paper, we show that such an approach has several disadvantages. As an alternative, we propose an adaptation of an existing method, MDLP, specifically for LR problems. We illustrate the advantages of the new method using synthetic data. Additionally, we present results obtained on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and in some cases improves the results of the learning algorithms.

FecharLer Abstract

2013

Classifying heart sounds using multiresolution time series motifs: an exploratory study

Autores
Gomes, EF; Jorge, AM; Azevedo, PJ;

Publicação
International C* Conference on Computer Science & Software Engineering, C3S2E13, Porto, Portugal - July 10 - 12, 2013

Abstract
The aim of this work is to describe an exploratory study on the use of a SAX-based Multiresolution Motif Discovery method for Heart Sound Classification. The idea of our work is to discover relevant frequent motifs in the audio signals and use the discovered motifs and their frequency as characterizing attributes. We also describe different configurations of motif discovery for defining attributes and compare the use of a decision tree based algorithm with random forests on this kind of data. Experiments were performed with a dataset obtained from a clinic trial in hospitals using the digital stethoscope DigiScope. This exploratory study suggests that motifs contain valuable information that can be further exploited for Heart Sound Classification. © 2013 ACM.

FecharLer Abstract

2013

Using statistics, visualization and data mining for monitoring the quality of meta-data in web portals

Autores
Domingues, MA; Soares, C; Jorge, AM;

Publicação
INFORMATION SYSTEMS AND E-BUSINESS MANAGEMENT

Abstract
The goal of many web portals is to select, organize and distribute content in order to satisfy its users/customers. This process is usually based on meta-data that represent and describe content. In this paper we describe a methodology and a system to monitor the quality of the meta-data used to describe content in web portals. The methodology is based on the analysis of the meta-data using statistics, visualization and data mining tools. The methodology enables the site's editor to detect and correct problems in the description of contents, thus improving the quality of the web portal and the satisfaction of its users. We also define a general architecture for a system to support the proposed methodology. We have implemented this system and tested it on a Portuguese portal for management executives. The results validate the methodology proposed.

FecharLer Abstract

2013

Binary recommender systems: Introduction, an application and outlook

Autores
Jorge, AM;

Publicação
ACM International Conference Proceeding Series

Abstract
Recommender Systems are a hot application area these days, made popular by well known web sites. The problem of predicting user preferences is very demanding from the data mining algorithm design point of view, but it also poses challenges to evaluation and monitoring. Moreover, there is a lot of information that can be exploited, from clickstreams and background information to musical content and social interaction. As data grows and recommendation requests must be answered in a split second, online and agile solutions must be implemented. In this talk we will give a brief introduction to binary recommender systems, describe a particular hybrid application to music recommendation - from algorithm to online evaluation, and refer to context aware and online recommender algorithms. © 2013 ACM.

FecharLer Abstract

2013

Comparing relational and non-relational algorithms for clustering propositional data

Autores
Motta, R; Nogueira, BM; Jorge, AM; De Andrade Lopes, A; Rezende, SO; De Oliveira, MCF;

Publicação
Proceedings of the ACM Symposium on Applied Computing

Abstract
Cluster detection methods are widely studied in Propositional Data Mining. In this context, data is individually represented as a feature vector. This data has a natural nonrelational structure, but can be represented in a relational form through similarity-based network models. In these models, examples are represented by vertices and an edge connects two examples with high similarity. This relational representation allows employing network-based algorithms in Relational Data Mining. Specifically in clustering tasks, these models allow to use community detection algorithms in networks in order to detect data clusters. In this work, we compared traditional non-relational data-based clustering algorithms with clustering detection algorithms based on relational data using measures for community detection in networks. We carried out an exploratory analysis over 23 numerical datasets and 10 textual datasets. Results show that network models can efficiently represent the data topology, allowing their application in cluster detection with higher precision when compared to non-relational methods. Copyright 2013 ACM.

FecharLer Abstract