2017
Authors
Brazdil, P; Vilalta, R; Giraud Carrier, CG; Soares, C;
Publication
Encyclopedia of Machine Learning and Data Mining
Abstract
2016
Authors
Boström, Henrik; Knobbe, ArnoJ.; Soares, Carlos; Papapetrou, Panagiotis;
Publication
IDA
Abstract
2017
Authors
Saleiro, P; Frayling, NM; Rodrigues, EM; Soares, C;
Publication
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017
Abstract
Improvements of entity-relationship (E-R) search techniques have been hampered by a lack of test collections, particularly for complex queries involving multiple entities and relationships. In this paper we describe a method for generating E-R test queries to support comprehensive E-R search experiments. Queries and relevance judgments are created from content that exists in a tabular form where columns represent entity types and the table structure implies one or more relationships among the entities. Editorial work involves creating natural language queries based on relationships represented by the entries in the table. We have publicly released the RELink test collection comprising 600 queries and relevance judgments obtained from a sample of Wikipedia List-of-lists-oflists tables. The latter comprise tuples of entities that are extracted from columns and labelled by corresponding entity types and relationships they represent. In order to facilitate research in complex E-R retrieval, we have created and released as open source the RELink Framework that includes Apache Lucene indexing and search specifically tailored to E-R retrieval. RELink includes entity and relationship indexing based on the ClueWeb-09-BWeb collection with FACC1 text span annotations linked to Wikipedia entities. With ready to use search resources and a comprehensive test collection, we support community in pursuing E-R research at scale. © 2017 ACM.
2014
Authors
Domingues, MA; Soares, C; Jorge, AM; Rezende, SO;
Publication
Journal of the Brazilian Computer Society
Abstract
Background: Due to the constant demand for new information and timely updates of services and content in order to satisfy the user’s needs, web site automation has emerged as a solution to automate several personalization and management activities of a web site. One goal of automation is the reduction of the editor’s effort and consequently of the costs for the owner. The other goal is that the site can more timely adapt to the behavior of the user, improving the browsing experience and helping the user in achieving his/her own goals. Methods: A database to store rich web data is an essential component for web site automation. In this paper, we propose a data warehouse that is developed to be a repository of information to support different web site automation and monitoring activities. We implemented our data warehouse and used it as a repository of information in three different case studies related to the areas of e-commerce, e-learning, and e-news. Result: The case studies showed that our data warehouse is appropriate for web site automation in different contexts. Conclusion: In all cases, the use of the data warehouse was quite simple and with a good response time, mainly because of the simplicity of its structure. © 2014, Domingues et al.; licensee Springer.
2017
Authors
Cunha, T; Soares, C; de Carvalho, ACPLF;
Publication
DISCOVERY SCIENCE, DS 2017
Abstract
Recommender Systems have become increasingly popular, propelling the emergence of several algorithms. As the number of algorithms grows, the selection of the most suitable algorithm for a new task becomes more complex. The development of new Recommender Systems would benefit from tools to support the selection of the most suitable algorithm. Metalearning has been used for similar purposes in other tasks, such as classification and regression. It learns predictive models to map characteristics of a dataset with the predictive performance obtained by a set of algorithms. For such, different types of characteristics have been proposed: statistical and/or information-theoretical, model-based and landmarkers. Recent studies argue that landmarkers are successful in selecting algorithms for different tasks. We propose a set of landmarkers for a Metalearning approach to the selection of Collaborative Filtering algorithms. The performance is compared with a state of the art systematic metafeatures approach using statistical and/or information-theoretical metafeatures. The results show that the metalevel accuracy performance using landmarkers is not statistically significantly better than the metafeatures obtained with a more traditional approach. Furthermore, the baselevel results obtained with the algorithms recommended using landmarkers are worse than the ones obtained with the other metafeatures. In summary, our results show that, contrary to the results obtained in other tasks, these landmarkers are not necessarily the best metafeatures for algorithm selection in Collaborative Filtering.
2014
Authors
Miranda, PBC; Prudencio, RBC; de Carvalho, APLF; Soares, C;
Publication
NEUROCOMPUTING
Abstract
Support Vector Machines (SVMs) have achieved a considerable attention due to their theoretical foundations and good empirical performance when compared to other learning algorithms in different applications. However, the SVM performance strongly depends on the adequate calibration of its parameters. In this work we proposed a hybrid multi-objective architecture which combines meta-learning (ML) with multi-objective particle swarm optimization algorithms for the SVM parameter selection problem. Given an input problem, the proposed architecture uses a ML technique to suggest an initial Pareto front of SVM configurations based on previous similar learning problems; the suggested Pareto front is then refined by a multi-objective optimization algorithm. In this combination, solutions provided by ML are possibly located in good regions in the search space. Hence, using a reduced number of successful candidates, the search process would converge faster and be less expensive. In the performed experiments, the proposed solution was compared to traditional multi-objective algorithms with random initialization, obtaining Pareto fronts with higher quality on a set of 100 classification problems.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.