Publications

Publications by LIAAD

2014

A study of machine learning methods for detecting user interest during web sessions

Authors
Jorge, AM; Leal, JP; Anand, SS; Dias, H;

Publication
PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14)

Abstract
The ability to have an automated real time detection of user interest during a web session is very appealing and can be very useful for a number of web intelligence applications. Low level interaction events associated with user interest manifestations form the basis of user interest models. However such data sets present a number of challenges from a machine learning perspective, including the level of noise in the data and class imbalance (given that the majority of content will not be of interest to a user). In this paper we evaluate a large number of machine learning techniques aimed at learning from class imbalanced data using two data sets collected from a real user study. We use the AUC, recall, precision and model complexity to compare the relative merits of these techniques and conclude that useful models with AUC above 0.8 can be obtained using a mix of sampling and cost based methods. Ensemble models can provide further accuracy but make deployment more complex.

CloseRead Abstract

2014

Classifying Heart Sounds using SAX Motifs, Random Forests and Text Mining techniques

Authors
Gomes, EF; Jorge, AM; Azevedo, PJ;

Publication
PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14)

Abstract
In this paper we describe an approach to classifying heart sounds (classes Normal, Murmur and Extra-systole) that is based on the discretization of sound signals using the SAX (Symbolic Aggregate Approximation) representation. The ability of automatically classifying heart sounds or at least support human decision in this task is socially relevant to spread the reach of medical care using simple mobile devices or digital stethoscopes. In our approach, sounds are first pre-processed using signal processing techniques (decimate, low-pass filter, normalize, Shannon envelope). Then the pre-processed symbols are transformed into sequences of discrete SAX symbols. These sequences are subject to a process of motif discovery. Frequent sequences of symbols (motifs) are adopted as features. Each sound is then characterized by the frequent motifs that occur in it and their respective frequency. This is similar to the term frequency (TF) model used in text mining. In this paper we compare the TF model with the application of the TFIDF (Term frequency - Inverse Document Frequency) and the use of bi-grams (frequent size two sequences of motifs). Results show the ability of the motifs based TF approach to separate classes and the relative value of the TFIDF and the bi-grams variants. The separation of the Extra-systole class is overly difficult and much better results are obtained for separating the Murmur class. Empirical validation is conducted using real data collected in noisy environments. We have also assessed the cost-reduction potential of the proposed methods by considering a fixed cost model and using a cost sensitive meta algorithm.

CloseRead Abstract

2014

GTE-Rank: Searching for Implicit Temporal Query Results

Authors
Campos, R; Dias, G; Jorge, AM; Nunes, C;

Publication
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3-7, 2014

Abstract
Temporal information retrieval has been a topic of great interest in recent years. Despite the efforts that have been conducted so far, most popular search engines remain underdeveloped when it comes to explicitly considering the use of temporal information in their search process. In this paper we present GTE-Rank, an online searching tool that takes time into account when ranking time-sensitive query web search results. GTE-Rank is defined as a linear combination of topical and temporal scores to reflect the relevance of any web page both in topical and temporal dimensions. The resulting system can be explored graphically through a search interface made available for research purposes.

CloseRead Abstract

2014

Web mining for the integration of data mining with business intelligence in web-based decision support systems

Authors
Domingues, MA; Jorge, AM; Soares, C; Rezende, SO;

Publication
Integration of Data Mining in Business Intelligence Systems

Abstract
Web mining can be defined as the use of data mining techniques to automatically discover and extract information from web documents and services. A decision support system is a computer-based information sy Analysis stem that supports business or organizational decision-making activities. Data mining and business intelligence techniques can be integrated in order to develop more advanced decision support systems. In this chapter, the authors propose to use web mining as a process to develop advanced decision support systems in order to support the management activities of a website. They describe the Web mining process as a sequence of steps for the development of advanced decision support systems. By following such a sequence, the authors can develop advanced decision support systems, which integrate data mining with business intelligence, for websites. © 2015, IGI Global.

CloseRead Abstract

2014

A data warehouse to support web site automation

Authors
Domingues, MA; Soares, C; Jorge, AM; Rezende, SO;

Publication
Journal of the Brazilian Computer Society

Abstract
Background: Due to the constant demand for new information and timely updates of services and content in order to satisfy the user’s needs, web site automation has emerged as a solution to automate several personalization and management activities of a web site. One goal of automation is the reduction of the editor’s effort and consequently of the costs for the owner. The other goal is that the site can more timely adapt to the behavior of the user, improving the browsing experience and helping the user in achieving his/her own goals. Methods: A database to store rich web data is an essential component for web site automation. In this paper, we propose a data warehouse that is developed to be a repository of information to support different web site automation and monitoring activities. We implemented our data warehouse and used it as a repository of information in three different case studies related to the areas of e-commerce, e-learning, and e-news. Result: The case studies showed that our data warehouse is appropriate for web site automation in different contexts. Conclusion: In all cases, the use of the data warehouse was quite simple and with a good response time, mainly because of the simplicity of its structure. © 2014, Domingues et al.; licensee Springer.

CloseRead Abstract

2014

Measuring the effectiveness of an e-commerce site throughweb and sales activity

Authors
Carneiro, AR; Jorge, AM; Brito, PQ; Domingues, MA;

Publication
Springer Proceedings in Mathematics and Statistics

Abstract