Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Carla Teixeira Lopes is an assistant Professor in the Department of Informatics Engineering, University of Porto, Portugal. She is also a researcher at INESC TEC since 2014. She received a PhD in Informatics Engineering from the University of Porto in 2013. Her research interests lie at the intersection of information retrieval and human-computer interaction. She is interested in studying information search behaviour and in developing tools that help people search more successfully. Lately, she has been focused in exploring how context can help improve the experience of health consumers searching the Web.

Interest
Topics
Details

Details

  • Name

    Carla Lopes
  • Role

    Senior Researcher
  • Since

    01st May 2014
005
Publications

2024

Automatic Quality Assessment of Wikipedia Articles-A Systematic Literature Review

Authors
Moas, PM; Lopes, CT;

Publication
ACM COMPUTING SURVEYS

Abstract
Wikipedia is the world's largest online encyclopedia, but maintaining article quality through collaboration is challenging. Wikipedia designed a quality scale, but with such a manual assessment process, many articles remain unassessed. We review existing methods for automatically measuring the quality of Wikipedia articles, identifying and comparing machine learning algorithms, article features, quality metrics, and used datasets, examining 149 distinct studies, and exploring commonalities and gaps in them. The literature is extensive, and the approaches follow past technological trends. However, machine learning is still not widely used by Wikipedia, and we hope that our analysis helps future researchers change that reality.

2024

Automated image label extraction from radiology reports - A review

Authors
Pereira, SC; Mendonca, AM; Campilho, A; Sousa, P; Lopes, CT;

Publication
ARTIFICIAL INTELLIGENCE IN MEDICINE

Abstract
Machine Learning models need large amounts of annotated data for training. In the field of medical imaging, labeled data is especially difficult to obtain because the annotations have to be performed by qualified physicians. Natural Language Processing (NLP) tools can be applied to radiology reports to extract labels for medical images automatically. Compared to manual labeling, this approach requires smaller annotation efforts and can therefore facilitate the creation of labeled medical image data sets. In this article, we summarize the literature on this topic spanning from 2013 to 2023, starting with a meta-analysis of the included articles, followed by a qualitative and quantitative systematization of the results. Overall, we found four types of studies on the extraction of labels from radiology reports: those describing systems based on symbolic NLP, statistical NLP, neural NLP, and those describing systems combining or comparing two or more of the latter. Despite the large variety of existing approaches, there is still room for further improvement. This work can contribute to the development of new techniques or the improvement of existing ones.

2024

Unveiling Health Literacy through Web Search Behavior: A Classification-Based Analysis of User Interactions

Authors
Lopes, CT; Henriques, M;

Publication
Proceedings of the 2024 ACM SIGIR Conference on Human Information Interaction and Retrieval, CHIIR 2024, Sheffield, United Kingdom, March 10-14, 2024

Abstract
More and more people are relying on the Web to find health information. Challenges faced by individuals with low health literacy in the real world likely persist in the virtual realm. To assist these users, our first step is to identify them. This study aims to uncover disparities in the information-seeking behavior of users with varying levels of health literacy. We utilized data gathered from a prior user experiment. Our approach involves a classification scheme encompassing events during web search sessions, spanning the browser, search engine, and web pages. Employing this scheme, we logged interactions from video recordings in the user study and subjected the event logs to descriptive and inferential analyses. Our data analysis unveils distinctive patterns within the low health literacy group. They exhibit a higher frequency of query reformulations with entirely new terms, engage in more left clicks, utilize the browser's backward functionality more frequently, and invest more time in interactions, including increased scrolling on results pages. Conversely, the high health literacy group demonstrates a greater propensity to click on universal results, extract text from URLs more often, and make more clicks with the mouse middle button. These findings offer valuable insights for inferring users' health literacy in a non-intrusive manner. The automatic inference of health literacy can pave the way for personalized services, enhancing accessibility to information and education for individuals with low health literacy, among other benefits.

2023

A Social Media Tool for Domain-Specific Information Retrieval - A Case Study in Human Trafficking

Authors
Grine, T; Lopes, CT;

Publication
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I

Abstract
In a world increasingly present online, people are leaving a digital footprint, with valuable information scattered on the Web, in an unstructured manner, beholden to the websites that keep it. While there are potential harms in being able to access this information readily, such as enabling corporate surveillance, there are also significant benefits when used, for example, in journalism or investigations into Human Trafficking. This paper presents an approach for retrieving domain-specific information present on the Web using Social Media platforms as a gateway to other content existing on any website. It begins by identifying relevant profiles, then collecting links shared in posts to webpages related to them, and lastly, extracting and indexing the information gathered. The tool developed based on this approach was tested for a case study in the domain of Human Trafficking, more specifically in sexual exploitation, showing promising results and potential to be applied in a real-world scenario.

2023

Research Image Management Practices Reported by Scientific Literature: An Analysis by Research Domain

Authors
Rodrigues J.; Lopes C.T.;

Publication
Open Information Science

Abstract
Research data management is essential for safeguarding and prospecting data generated in a scientific context. Specific issues arise regarding data in image format, as this data typology poses particular challenges and opportunities; however, not much attention has been given to data as images. We reviewed 109 articles from several research domains where images were used either as data or metadata to understand how researchers specifically deal with this data format, and what are your habits and behaviors. We use the Web of Science (WoS), considering its five main areas of research. We included in the initial corpus the most relevant articles by research domain, selecting the ten most cited articles in WoS, by year, between 2010 and 2021. The selected articles should be in English and in open access. The results found that images have been used in scientific works numerous times, but, unfortunately, few are those in which they are the central element of the study. Photography is the type of image most used in most domains. In terms of the instruments used, the Technology and Life Sciences and Biomedicine domains use the microscope more, while the Arts and Humanities and Physical Sciences domains use the camera more. We found that the images are mostly produced in the context of the project, rather than reused by third parties. As for their collection scenario, these are mostly produced/used in a laboratory context. The overwhelming majority of the images present in the articles are digital, and only a small part is analog. We verify that Arts and Humanities are more likely to perform qualitative types of analyses, while Life Sciences and Biomedicine overwhelmingly use quantitative analyses. As for the issues of sharing and depositing, Life Sciences and Biomedicine is the domain that stands out the most in the tasks of depositing and sharing images. It was found that the licenses of a project are intrinsically related to the motivations for sharing results with third parties. Description, a fundamental step in the data management process, is neglected by a large number of researchers. The images are mostly not described or annotated and when this happens, researchers don't provide much detail about this.

Supervised
thesis

2023

Images as data and metadata: management practices to promote Findability, Accessibility, Interoperability and Reusability of research data

Author
Joana Patrícia de Sousa Rodrigues

Institution
UP-FEUP

2023

Archive users, their characteristics and motivations

Author
Luana Rodrigues Ponte

Institution
UP-FEUP

2023

ArchMine: Learning from non-machine-readable documents for additional insights

Author
Mariana Ferreira Dias

Institution
UP-FEUP

2023

Integration of models for linked data in cultural heritage and contributions to the FAIR principles

Author
Inês Dias Koch

Institution
UP-FEUP

2022

Digital images as data and metadata: description requirements for information retrieval and semantic interoperability

Author
Joana Patrícia de Sousa Rodrigues

Institution
UP-FEUP