Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Vitor Rocio

2010

Improving IdSay: A Characterization of Strengths and Weaknesses in Question Answering Systems for Portuguese

Authors
Carvalho, G; de Matos, DM; Rocio, V;

Publication
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS

Abstract
IdSay is a Question Answering system for Portuguese that participated at QA@CLEF 2008 with a baseline version (IdSayBL). Despite the encouraging results, there was still much room for improvement. The participation of six systems in the Portuguese task, with very good results either individually or in an hypothetical combination run, provided a valuable source of information. We made an analysis of all the answers submitted by all systems to identify their strengths and weaknesses. We used the conclusions of that analysis to guide our improvements, keeping in mind the two key characteristics we want for the system: efficiency in terms of response time and robustness to treat different types of data. As a result, an improved version of IdSay was developed, including as the most important enhancement the introduction of semantic information. We obtained significantly better results, from an accuracy in the first answer of 32.5% in IdSayBL to 50.5% in IdSay, without degradation of response time.

2007

Document retrieval for question answering: a quantitative evaluation of text preprocessing

Authors
Carvalho, G; de Matos, DM; Rocio, V;

Publication
Proceedings of the First Ph.D. Workshop in CIKM, PIKM 2007, Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 9, 2007

Abstract
Question Answering (QA) has been an area of interest for researchers, in part motivated by the international QA evaluation forums, namely the Text REtrieval Conference (TREC), and more recently, the Cross Language Evaluation Forum (CLEF) through QA@CLEF, that since 2004 includes the Portuguese language. In these forums, a collection of written documents is provided, as well as a set of questions, which are to be answered by the participating systems. Each system is evaluated by its capacity to answer the questions, as a whole, and there are relatively few results published that focus on the performance of its different components and their influence on the overall system performance. That is the case of the Information Retrieval (IR) component, which is broadly used in QA systems. Our work concentrates on the different options of preprocessing Portuguese text before feeding it to the IR component, evaluating their impact on the IR performance in the specific context of QA, so that we can make a sustained choice of which options to choose. From this work we conclude the clear advantage of the basic preprocessing techniques: case folding and removal of punctuation marks. For the other techniques considered, stop word removal enhanced the performance of the IR system but that was not the case as far as Stemming and Lemmatization are concerned. © 2007 ACM.

2005

Introduction

Authors
Lopes, GP; da Silva, JF; Rocio, V; Quaresma, P;

Publication
Progress in Artificial Intelligence - Lecture Notes in Computer Science

Abstract

2005

Lecture Notes in Artificial Intelligence: Introduction

Authors
Lopes, GP; Ferreira Da Silva, J; Rocio, V; Quaresma, P;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2005

TEMA'05: Workshop on Text Mining and Applications

Authors
Lopes, G; da Silva, J; Rocio, V; Quaresma, P;

Publication
2005 Portuguese Conference on Artificial Intelligence, Proceedings

Abstract

2005

Text Mining and Applications (TEMA 2005) - Introduction

Authors
Lopes, GP; da Silva, JF; Rocio, V; Quaresma, P;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract

  • 3
  • 4