2024
Authors
Moas, PM; Lopes, CT;
Publication
ACM COMPUTING SURVEYS
Abstract
Wikipedia is the world's largest online encyclopedia, but maintaining article quality through collaboration is challenging. Wikipedia designed a quality scale, but with such a manual assessment process, many articles remain unassessed. We review existing methods for automatically measuring the quality of Wikipedia articles, identifying and comparing machine learning algorithms, article features, quality metrics, and used datasets, examining 149 distinct studies, and exploring commonalities and gaps in them. The literature is extensive, and the approaches follow past technological trends. However, machine learning is still not widely used by Wikipedia, and we hope that our analysis helps future researchers change that reality.
2024
Authors
Pereira, SC; Mendonca, AM; Campilho, A; Sousa, P; Lopes, CT;
Publication
ARTIFICIAL INTELLIGENCE IN MEDICINE
Abstract
Machine Learning models need large amounts of annotated data for training. In the field of medical imaging, labeled data is especially difficult to obtain because the annotations have to be performed by qualified physicians. Natural Language Processing (NLP) tools can be applied to radiology reports to extract labels for medical images automatically. Compared to manual labeling, this approach requires smaller annotation efforts and can therefore facilitate the creation of labeled medical image data sets. In this article, we summarize the literature on this topic spanning from 2013 to 2023, starting with a meta-analysis of the included articles, followed by a qualitative and quantitative systematization of the results. Overall, we found four types of studies on the extraction of labels from radiology reports: those describing systems based on symbolic NLP, statistical NLP, neural NLP, and those describing systems combining or comparing two or more of the latter. Despite the large variety of existing approaches, there is still room for further improvement. This work can contribute to the development of new techniques or the improvement of existing ones.
2024
Authors
Lopes, CT; Henriques, M;
Publication
PROCEEDINGS OF THE 2024 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL, CHIIR 2024
Abstract
More and more people are relying on the Web to find health information. Challenges faced by individuals with low health literacy in the real world likely persist in the virtual realm. To assist these users, our first step is to identify them. This study aims to uncover disparities in the information-seeking behavior of users with varying levels of health literacy. We utilized data gathered from a prior user experiment. Our approach involves a classification scheme encompassing events during web search sessions, spanning the browser, search engine, and web pages. Employing this scheme, we logged interactions from video recordings in the user study and subjected the event logs to descriptive and inferential analyses. Our data analysis unveils distinctive patterns within the low health literacy group. They exhibit a higher frequency of query reformulations with entirely new terms, engage in more left clicks, utilize the browser's backward functionality more frequently, and invest more time in interactions, including increased scrolling on results pages. Conversely, the high health literacy group demonstrates a greater propensity to click on universal results, extract text from URLs more often, and make more clicks with the mouse middle button. These findings offer valuable insights for inferring users' health literacy in a non-intrusive manner. The automatic inference of health literacy can pave the way for personalized services, enhancing accessibility to information and education for individuals with low health literacy, among other benefits.
2024
Authors
Koch, I; Ribero, C; Poveda-Villalon, M; Rico, M; Lopes, CT;
Publication
LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES, PT I, TPDL 2024
Abstract
Various sectors within the heritage domain have developed linked data models to describe their cultural artefacts comprehensively. Within the archival domain, ArchOnto, a data model rooted in CIDOC CRM, uses linked data to open archival information to new uses through the prism of linked data. This paper seeks to investigate the potential to use information in archival records in a larger context. It aims to leverage classes and properties sourced from repositories deemed informal due to their crowd-sourcing nature and the possibility of inconsistencies or lack of precision in the data but rich in content, such as the cases of Wikidata and DBpedia. The anticipated outcome is attaining a more comprehensive and expressive archival description, fostering enhanced understanding and assimilation of archival information among domain specialists and lay users. To achieve this, we first analyse existing archive records currently described under the ISAD(G) standard to discern the typologies of entities involved. Subsequently, we map these entities within the ArchOnto ontology and establish correspondences with alternative models. We observed that entities associated with people, places, and events benefited the most from integrating properties sourced from Wikidata and DBpedia. This integration enhanced their comprehensibility and enriched them at a semantic level.
2024
Authors
Rodrigues, J; Lopes, CT;
Publication
METADATA AND SEMANTIC RESEARCH, MTSR 2023
Abstract
Data description is a fundamental step in Research Data Management (RDM). When it comes to images, the challenge is increased, as they have characteristics that differentiate them from other typologies. We conducted a study in which we obtained a set of 27 images described according to their content, by researchers of the projects where they are inserted. After obtaining the ground-truth that would support the analysis, we proceeded to two more stages of description, one through an automatic processing tool (Vision AI) and the other through researchers with no knowledge of the images. We concluded that the human description is more elucidative of the images' content, namely at a semantic level. In turn, the automatic tools enhance a more literal description. This study allowed us to reflect on the description of images in a research context and to discuss the potential of formal analysis and analysis of the semantic expression of images.
2024
Authors
Rodrigues, J; Lopes, CT;
Publication
METADATA AND SEMANTIC RESEARCH, MTSR 2023
Abstract
Research data management includes activities that organize and manage the life of a research project and is crucial for consistent work performance. Some activities are related to the description, which is a fundamental step, since it allows data to be properly documented and interpreted, promoting their subsequent reuse and sharing. The description is usually done through text, but other typologies can also be used, such as images, taking advantage of their potential and particular characteristics to promote description. We used a qualitative method of investigation through an exploratory case study. We conducted 16 semi-structured interviews, with researchers who have produced, described, and published research data, in order to understand how images can assume the role of metadata in data description. We found that all interviewees would like to have the possibility of describing data with images, but they consider that the publishing platforms have to be prepared for this. Most researchers were able to identify descriptors that could include images and also describe those that they consider being the greatest advantages of the project. All researchers consider that images as metadata would be a more direct gateway to the data. The issue of data description through resources other than text has never been properly investigated. The existing literature does not develop the theme, although images have had an abrupt growth in society and science. This work aims to open new paths, raise new ideas and raise awareness of new and original practices.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.