Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Carla Lopes

2023

Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents

Autores
Dias, M; Lopes, CT;

Publicação
ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE

Abstract
Linked data is used in various fields as a new way of structuring and connecting data. Cultural heritage institutions have been using linked data to improve archival descriptions and facilitate the discovery of information. Most archival records have digital representations of physical artifacts in the form of scanned images that are non-machine-readable. Optical Character Recognition (OCR) recognizes text in images and translates it into machine-encoded text. This article evaluates the impact of image processing methods and parameter tuning in OCR applied to typewritten cultural heritage documents. The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II) to tune the methods' parameters. Evaluation results show that parameterization by digital representation typology benefits the performance of image pre-processing algorithms in OCR. Furthermore, our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results. In particular, Adaptive Thresholding, Bilateral Filter, and Opening are the best-performing algorithms for the theater plays' covers, letters, and overall dataset, respectively, and should be applied before OCR to improve its performance.

2023

Moving from ISAD(G) to a CIDOC CRM-based Linked Data Model in the Portuguese Archives

Autores
Koch, I; Lopes, CT; Ribeiro, C;

Publicação
ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE

Abstract
Archives are facing numerous challenges. On the one hand, archival assets are evolving to encompass digitized documents and increasing quantities of born-digital information in diverse formats. On the other hand, the audience is changing along with how it wishes to access archival material. Moreover, the interoperability requirements of cultural heritage repositories are growing. In this context, the Portuguese Archives started an ambitious program aiming to evolve its data model, migrate existing records, and build a new archival management system appropriate to both archival tasks and public access. The overall goal is to have a fine-grained and flexible description, more machine-actionable than the current one. This work describes ArchOnto, a linked open data model for archives, and rules for its automatic population from existing records. ArchOnto adopts a semantic web approach and encompasses the CIDOC Conceptual Reference Model and additional ontologies, envisioning interoperability with datasets curated by multiple communities of practice. Existing ISAD(G)-conforming descriptions are being migrated to the new model using the direct mappings provided here. We used a sample of 25 records associated with different description levels to validate the completeness and conformity of ArchOnto to existing data. This work is in progress and is original in several respects: (1) it is one of the first approaches to use CIDOC CRM in the context of archives, identifying problems and questions that emerged during the process and pinpointing possible solutions; (2) it addresses the balance in the model between the migration of existing records and the construction of new ones by archive professionals; and (3) it adopts an open world view on linking archival data to global information sources.

2023

Unveiling Archive Users: Understanding Their Characteristics and Motivations

Autores
Ponte, L; Koch, I; Lopes, CT;

Publicação
LEVERAGING GENERATIVE INTELLIGENCE IN DIGITAL LIBRARIES: TOWARDS HUMAN-MACHINE COLLABORATION, ICADL 2023, PT II

Abstract
An institution must understand its users to provide quality services, and archives are no exception. Over the years, archives have adapted to the technological world, and their users have also changed. To understand archive users' characteristics and motivations, we conducted a study in the context of the Portuguese Archives. For this purpose, we analysed a survey and complemented this analysis with information gathered in interviews with archivists. Based on the most frequent reasons for visiting the archives, we defined six main archival profiles (genealogical research, historical research, legal purposes, academic work, institutional purposes and publication purposes), later characterised using the results of the previous analysis. For each profile, we created a persona for a more visual and realistic representation of users.

2023

Linking Theory and Practice of Digital Libraries: 27th International Conference on Theory and Practice of Digital Libraries, TPDL 2023, Zadar, Croatia, September 26-29, 2023, Proceedings

Autores
Alonso, O; Cousijn, H; Silvello, G; Marrero, M; Lopes, CT; Marchesin, S;

Publicação
TPDL

Abstract

2024

Automated image label extraction from radiology reports - A review

Autores
Pereira, SC; Mendonca, AM; Campilho, A; Sousa, P; Lopes, CT;

Publicação
ARTIFICIAL INTELLIGENCE IN MEDICINE

Abstract
Machine Learning models need large amounts of annotated data for training. In the field of medical imaging, labeled data is especially difficult to obtain because the annotations have to be performed by qualified physicians. Natural Language Processing (NLP) tools can be applied to radiology reports to extract labels for medical images automatically. Compared to manual labeling, this approach requires smaller annotation efforts and can therefore facilitate the creation of labeled medical image data sets. In this article, we summarize the literature on this topic spanning from 2013 to 2023, starting with a meta-analysis of the included articles, followed by a qualitative and quantitative systematization of the results. Overall, we found four types of studies on the extraction of labels from radiology reports: those describing systems based on symbolic NLP, statistical NLP, neural NLP, and those describing systems combining or comparing two or more of the latter. Despite the large variety of existing approaches, there is still room for further improvement. This work can contribute to the development of new techniques or the improvement of existing ones.

2024

Unveiling Health Literacy through Web Search Behavior: A Classification-Based Analysis of User Interactions

Autores
Lopes, CT; Henriques, M;

Publicação
Proceedings of the 2024 ACM SIGIR Conference on Human Information Interaction and Retrieval, CHIIR 2024, Sheffield, United Kingdom, March 10-14, 2024

Abstract
More and more people are relying on the Web to find health information. Challenges faced by individuals with low health literacy in the real world likely persist in the virtual realm. To assist these users, our first step is to identify them. This study aims to uncover disparities in the information-seeking behavior of users with varying levels of health literacy. We utilized data gathered from a prior user experiment. Our approach involves a classification scheme encompassing events during web search sessions, spanning the browser, search engine, and web pages. Employing this scheme, we logged interactions from video recordings in the user study and subjected the event logs to descriptive and inferential analyses. Our data analysis unveils distinctive patterns within the low health literacy group. They exhibit a higher frequency of query reformulations with entirely new terms, engage in more left clicks, utilize the browser's backward functionality more frequently, and invest more time in interactions, including increased scrolling on results pages. Conversely, the high health literacy group demonstrates a greater propensity to click on universal results, extract text from URLs more often, and make more clicks with the mouse middle button. These findings offer valuable insights for inferring users' health literacy in a non-intrusive manner. The automatic inference of health literacy can pave the way for personalized services, enhancing accessibility to information and education for individuals with low health literacy, among other benefits.

  • 13
  • 14