Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2022

NER in Archival Finding Aids: Extended

Autores
Cunha, LFD; Ramalho, JC;

Publicação
MACHINE LEARNING AND KNOWLEDGE EXTRACTION

Abstract
The amount of information preserved in Portuguese archives has increased over the years. These documents represent a national heritage of high importance, as they portray the country's history. Currently, most Portuguese archives have made their finding aids available to the public in digital format, however, these data do not have any annotation, so it is not always easy to analyze their content. In this work, Named Entity Recognition solutions were created that allow the identification and classification of several named entities from the archival finding aids. These named entities translate into crucial information about their context and, with high confidence results, they can be used for several purposes, for example, the creation of smart browsing tools by using entity linking and record linking techniques. In order to achieve high result scores, we annotated several corpora to train our own Machine Learning algorithms in this context domain. We also used different architectures, such as CNNs, LSTMs, and Maximum Entropy models. Finally, all the created datasets and ML models were made available to the public with a developed web platform, NER@DI.

2022

NER in Archival Finding Aids: Next Level

Autores
Cunha, LFD; Ramalho, JC;

Publicação
INFORMATION SYSTEMS AND TECHNOLOGIES, WORLDCIST 2022, VOL 2

Abstract
Currently, there is a vast amount of archival finding aids in Portuguese archives, however, these documents lack structure (are not annotated) making them hard to process and work with. In this way, we intend to extract and classify entities of interest, like geographical locations, people's names, dates, etc. For this, we will use an architecture that has been revolutionizing several NLP tasks, Transformers, presenting several models in order to achieve high results. It is also intended to understand what will be the degree of improvement that this new mechanism will present in comparison with previous architectures. Can Transformer-based models replace the LSTMs in NER? We intend to answer this question along this paper.

2022

Fine-Tuning BERT Models to Extract Named Entities from Archival Finding Aids

Autores
Costa Cunha, LF; Ramalho, JC;

Publicação
Proceedings of the 26th International Conference on Theory and Practice of Digital Libraries - Workshops and Doctoral Consortium, Padua, Italy, September 20, 2022.

Abstract
In recent works, several NER models were developed to extract named entities from Portuguese Archival Finding Aids. In this paper, we are complementing the work already done by presenting a different NER model with a new architecture, Bidirectional Encoding Representation from Transformers (BERT). In order to do so, we used a BERT model that was pre-trained in Portuguese vocabulary and fine-tuned it to our concrete classification problem, NER. In the end, we compared the results obtained with previous architectures. In addition to this model we also developed an annotation tool that uses ML models to speed up the corpora annotation process. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)

2022

Reasoning with Portuguese Word Embeddings

Autores
Costa Cunha, LF; Almeida, JJ; Simões, A;

Publicação
11th Symposium on Languages, Applications and Technologies, SLATE 2022, July 14-15, 2022, Universidade da Beira Interior, Covilhã, Portugal.

Abstract
Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models’ parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models’ evaluation methods. © Luís Filipe Cunha, J. João Almeida, and Alberto Simões.

2022

Classification of Dementia in Adults

Autores
Neto, C; Ferreira, D; Nunes, J; Braga, L; Martins, L; Cunha, L; Machado, J;

Publicação
DEVELOPMENTS AND ADVANCES IN DEFENSE AND SECURITY, MICRADS 2021

Abstract
Dementia is a broad term for a large number of conditions, and it is often associated with Alzheimer's disease. A reliable diagnosis of this disease, especially in the early stages, may prevent further complications. As such, machine learning algorithms can be applied in order to validate and correctly classify cases of dementia or non dementia in adults, assisting physicians in the diagnosis and management of this clinical condition. In this study, a dataset containing magnetic resonance imaging comparisons of demented/non demented adults was used to conduct a Data Mining process, following the Cross Industry Standard Process for Data Mining methodology, with the main goal of classifying instances of dementia. Different machine learning algorithms were applied during this process, more specifically Support Vector Machines, Decision Trees, Logistic Regression, Neural Networks, Naive Bayes and Random Forest. The maximum accuracy of 95.41% was achieved with the Naive Bayes algorithm using Split Validation.

2022

Interpretability of Machine Intelligence in Medical Image Computing - 5th International Workshop, iMIMIC 2022, Held in Conjunction with MICCAI 2022, Singapore, Singapore, September 22, 2022, Proceedings

Autores
Reyes, M; Abreu, PH; Cardoso, JS;

Publicação
iMIMIC@MICCAI

Abstract

  • 116
  • 506