Publications

Publications by Alípio Jorge

2013

Multi-interval Discretization of Continuous Attributes for Label Ranking

Authors
de Sa, CR; Soares, C; Knobbe, A; Azevedo, P; Jorge, AM;

Publication
DISCOVERY SCIENCE

Abstract
Label Ranking (LR) problems, such as predicting rankings of financial analysts, are becoming increasingly important in data mining. While there has been a significant amount of work on the development of learning algorithms for LR in recent years, pre-processing methods for LR are still very scarce. However, some methods, like Naive Bayes for LR and APRIORI-LR, cannot deal with real-valued data directly. As a make-shift solution, one could consider conventional discretization methods used in classification, by simply treating each unique ranking as a separate class. In this paper, we show that such an approach has several disadvantages. As an alternative, we propose an adaptation of an existing method, MDLP, specifically for LR problems. We illustrate the advantages of the new method using synthetic data. Additionally, we present results obtained on several benchmark datasets. The results clearly indicate that the discretization is performing as expected and in some cases improves the results of the learning algorithms.

CloseRead Abstract

2017

Scalable Online Top-N Recommender Systems

Authors
Jorge, AM; Vinagre, J; Domingues, M; Gama, J; Soares, C; Matuszyk, P; Spiliopoulou, M;

Publication
E-COMMERCE AND WEB TECHNOLOGIES, EC-WEB 2016

Abstract
Given the large volumes and dynamics of data that recommender systems currently have to deal with, we look at online stream based approaches that are able to cope with high throughput observations. In this paper we describe work on incremental neighborhood based and incremental matrix factorization approaches for binary ratings, starting with a general introduction, looking at various approaches and describing existing enhancements. We refer to recent work on forgetting techniques and multidimensional recommendation. We will also focus on adequate procedures for the evaluation of online recommender algorithms.

CloseRead Abstract

2014

A study of machine learning methods for detecting user interest during web sessions

Authors
Jorge, AM; Leal, JP; Anand, SS; Dias, H;

Publication
PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14)

Abstract
The ability to have an automated real time detection of user interest during a web session is very appealing and can be very useful for a number of web intelligence applications. Low level interaction events associated with user interest manifestations form the basis of user interest models. However such data sets present a number of challenges from a machine learning perspective, including the level of noise in the data and class imbalance (given that the majority of content will not be of interest to a user). In this paper we evaluate a large number of machine learning techniques aimed at learning from class imbalanced data using two data sets collected from a real user study. We use the AUC, recall, precision and model complexity to compare the relative merits of these techniques and conclude that useful models with AUC above 0.8 can be obtained using a mix of sampling and cost based methods. Ensemble models can provide further accuracy but make deployment more complex.

CloseRead Abstract

2017

Improving Incremental Recommenders with Online Bagging

Authors
Vinagre, J; Jorge, AM; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)

Abstract
Online recommender systems often deal with continuous, potentially fast and unbounded flows of data. Ensemble methods for recommender systems have been used in the past in batch algorithms, however they have never been studied with incremental algorithms that learn from data streams. We evaluate online bagging with an incremental matrix factorization algorithm for top-N recommendation with positiveonly user feedback, often known as binary ratings. Our results show that online bagging is able to improve accuracy up to 35% over the baseline, with small computational overhead.

CloseRead Abstract

2017

Classifying Heart Sounds Using Images of MFCC and Temporal Features

Authors
Nogueira, DM; Ferreira, CA; Jorge, AM;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)

Abstract
Phonocardiogram signals contain very useful information about the condition of the heart. It is a method of registration of heart sounds, which can be visually represented on a chart. By analyzing these signals, early detections and diagnosis of heart diseases can be done. Intelligent and automated analysis of the phonocardiogram is therefore very important, to determine whether the patient's heart works properly or should be referred to an expert for further evaluation. In this work, we use electrocardiograms and phonocardiograms collected simultaneously, from the Physionet challenge database, and we aim to determine whether a phonocardiogram corresponds to a "normal" or "abnormal" physiological state. The main idea is to translate a 1D phonocardiogram signal into a 2D image that represents temporal and Mel-frequency cepstral coefficients features. To do that, we develop a novel approach that uses both features. First we segment the phonocardiogram signals with an algorithm based on a logistic regression hidden semi-Markov model, which uses the electrocardiogram signals as reference. After that, we extract a group of features from the time and frequency domain (Mel-frequency cepstral coefficients) of the phonocardiogram. Then, we combine these features into a two-dimensional time-frequency heat map representation. Lastly, we run a binary classifier to learn a model that discriminates between normal and abnormal phonocardiogram signals. In the experiments, we study the contribution of temporal and Mel-frequency cepstral coefficients features and evaluate three classification algorithms: Support Vector Machines, Convolutional Neural Network, and Random Forest. The best results are achieved when we map both temporal and Mel-frequency cepstral coefficients features into a 2D image and use the Support Vector Machines with a radial basis function kernel. Indeed, by including both temporal and Mel-frequency cepstral coefficients features, we obtain sligthly better results than the ones reported by the challenge participants, which use large amounts of data and high computational power.

CloseRead Abstract

2013

Classifying heart sounds using multiresolution time series motifs: an exploratory study

Authors
Gomes, EF; Jorge, AM; Azevedo, PJ;

Publication
International C* Conference on Computer Science & Software Engineering, C3S2E13, Porto, Portugal - July 10 - 12, 2013

Abstract
The aim of this work is to describe an exploratory study on the use of a SAX-based Multiresolution Motif Discovery method for Heart Sound Classification. The idea of our work is to discover relevant frequent motifs in the audio signals and use the discovered motifs and their frequency as characterizing attributes. We also describe different configurations of motif discovery for defining attributes and compare the use of a decision tree based algorithm with random forests on this kind of data. Experiments were performed with a dataset obtained from a clinic trial in hospitals using the digital stethoscope DigiScope. This exploratory study suggests that motifs contain valuable information that can be further exploited for Heart Sound Classification. © 2013 ACM.

CloseRead Abstract