2012
Autores
Campos, R; Dias, G; Jorge, AM; Nunes, C;
Publicação
ACM International Conference Proceeding Series
Abstract
Generically, search engines fail to understand the user's temporal intents when expressed as implicit temporal queries. This causes the retrieval of less relevant information and prevents users from being aware of the possible temporal dimension of the query results. In this paper, we aim to develop a language-independent model that tackles the temporal dimensions of a query and identifies its most relevant time periods. For this purpose, we propose a temporal similarity measure capable of associating a relevant date(s) to a given query and filtering out irrelevant ones. Our approach is based on the exploitation of temporal information from web content, particularly within the set of k-top retrieved web snippets returned in response to a query. We particularly focus on extracting years, which are a kind of temporal information that often appears in this type of collection. We evaluate our methodology using a set of real-world text temporal queries, which are clear concepts (i.e. queries which are non-ambiguous in concept and temporal in their purpose). Experiments show that when compared to baseline methods, determining the most relevant dates relating to any given implicit temporal query can be improved with a new temporal similarity measure. © 2012 ACM.
2009
Autores
Mendes Moreira, J; Jorge, AM; Soares, C; de Sousa, JF;
Publicação
MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION
Abstract
Integration methods for ensemble learning can use two different approaches: combination or selection. The combination approach (also called fusion) consists on the combination of the predictions obtained by different models in the ensemble to obtain the final ensemble predication. The selection approach selects one (or more) models from the ensemble according to the prediction performance of these models on similar data from the validation set. Usually, the method to select similar data is the k-nearest neighbors with the Euclidean distance. In this paper we discuss other approaches to obtain similar data for the regression problem. We show that using similarity measures according to the target values improves results. We also show that selecting dynamically several models for the prediction task increases prediction accuracy comparing to the selection of just one model.
2011
Autores
de Sa, CR; Soares, C; Jorge, AM; Azevedo, P; Costa, J;
Publicação
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011
Abstract
Recently, a number of learning algorithms have been adapted for label ranking, including instance-based and tree-based methods. In this paper, we propose an adaptation of association rules for label ranking. The adaptation, which is illustrated in this work with APRIORI Algorithm, essentially consists of using variations of the support and confidence measures based on ranking similarity functions that are suitable for label ranking. We also adapt the method to make a prediction from the possibly conflicting consequents of the rules that apply to an example. Despite having made our adaptation from a very simple variant of association rules for classification, the results clearly show that the method is making valid predictions. Additionally, they show that it competes well with state-of-the-art label ranking algorithms.
2000
Autores
Lopes, AD; Jorge, A;
Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE
Abstract
In this article we discuss in detail two techniques for rule and case integration. Case-based learning is used when the rule language is exhausted. Initially, all the examples are used to induce a set of rules with satisfactory quality. The examples that are not covered by these rules are then handled as cases. The case-based approach used also combines rules and cases internally. Instead of only storing the cases as provided, it has a learning phase where, for each case, it constructs and stores a set of explanations with support and confidence above given thresholds. These explanations have different levels of generality and the maximally specific one corresponds to the case itself. The same case may have different explanations representing different perspectives of the case. Therefore, to classify a new case, it looks for relevant stored explanations applicable to the new case. The different possible views of the case given by the explanations correspond to considering different sets of conditions/features to analyze the case. In other words, they lead to different ways to compute similarity between known cases/explanations and the new case to be classified (as opposed to the commonly used fixed metric).
2012
Autores
Domingues, MA; Gouyon, F; Jorge, AM; Leal, JP; Vinagre, J; Lemos, L; Sordo, M;
Publicação
WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion
Abstract
In this paper we propose a hybrid music recommender system, which combines usage and content data. We describe an online evaluation experiment performed in real time on a commercial music web site, specialised in content from the very long tail of music content. We compare it against two stand-alone recommenders, the first system based on usage and the second one based on content data. The results show that the proposed hybrid recommender shows advantages with respect to usage- and content-based systems, namely, higher user absolute acceptance rate, higher user activity rate and higher user loyalty. Copyright is held by the International World Wide Web Conference Committee (IW3C2).
2003
Autores
Silva, ACE; Jorge, A; Torgo, L;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE
Abstract
The information contained in companies' financial statements is valuable to several users. Much of the relevant information in such documents is contained in tables and is currently mainly extracted by hand. We propose a method that accomplishes a prior step of the task of automatically extracting information from tables in documents: selecting the lines that are likely to belong to tables. Our method has been developed by empirically analyzing a set of Portuguese companies' financial statements using statistical and data mining techniques. Empirical evaluation indicates that more than 99% of table lines are selected after discarding at least 50% of all lines. The method can cope with the complexity of styles used in assembling information on paper and adapt its performance accordingly, thus maximizing its results.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.