Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2022

The Usage of Data Augmentation Strategies on the Detection of Murmur Waves in a PCG Signal

Autores
Torres, J; Oliveira, J; Gomes, EF;

Publicação
BIOSIGNALS: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 4: BIOSIGNALS

Abstract
Cardiac auscultation is a key screening tool used for cardiovascular evaluation. When used properly, it speeds up treatment and thus improving the patient's life quality. However, the analysis and interpretation of the heart sound signals is subjective and dependent of the physician's experience and domain knowledge. A computer assistant decision (CAD) system that automatically analyse heart sound signals, can not only support physicians in their clinical decisions but also release human resources to other tasks. In this paper, and to the best of our knowledge, for the first time a SMOTE strategy is used to boost a Convolutional Neural Network performance on the detection of murmur waves. Using the SMOTE strategy, a CNN achieved an overall of 88.43%.

2022

Approaches to manage and understand student engagement in programming

Autores
Tavares, PC; Gomes, EF; Henriques, PR; Vieira, DM;

Publicação
Open Education Studies

Abstract
Computer Programming Learners usually fail to get approved in introductory courses because solving problems using computers is a complex task. The most important reason for that failure is concerned with motivation; motivation strongly impacts on the learning process. In this paper we discuss how techniques like program animation, and automatic evaluation can be combined to help the teacher in Computer Programming courses. In the article, PEP system will be introduced to explain how it supports teachers in classroom and how it engages students on study sessions outside the classroom. To support that work, students' motivation was studied; to complement that study, a survey involving students attending the first year of Algorithms and Programming course of an Engineering degree was done. It is also presented a tool to analyse surveys, using association rules. © 2022 Paula Correia Tavares et al., published by De Gruyter.

2022

Automatic Classification of Bird Sounds: Using MFCC and Mel Spectrogram Features with Deep Learning

Autores
Carvalho, S; Gomes, EF;

Publicação
VIETNAM JOURNAL OF COMPUTER SCIENCE

Abstract
Bird species identification is a relevant and time-consuming task for ornithologists and ecologists. With growing amounts of audio-annotated data, automatic bird classification using machine learning techniques is an important trend in the scientific community. Analyzing bird behavior and population trends helps detect other organisms in the environment and is an important problem in ecology. Bird populations react quickly to environmental changes, which make their real-time counting and tracking challenging and very useful. A reliable methodology that automatically identifies bird species from audio would therefore be a valuable tool for the experts in different scientific and applicational domains. The goal of this work is to propose a methodology to identify bird sounds. In this paper, we explore deep learning techniques that are being used in this domain, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to classify the data. In deep learning, audio problems are commonly approached by converting them into images using audio feature extraction techniques such as Mel Spectrograms and Mel Frequency Cepstral Coefficients (MFCCs). We propose and test multiple deep learning and feature extraction combinations in order to find the most suitable approach to this problem.

2022

TimeLMs: Diachronic Language Models from Twitter

Autores
Loureiro, D; Barbieri, F; Neves, L; Anke, LE; Camacho-Collados, J;

Publicação
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): PROCEEDINGS OF SYSTEM DEMONSTRATIONS

Abstract
Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual learning strategy contributes to enhancing Twitter-based language models' capacity to deal with future and out-of-distribution tweets, while making them competitive with standardized and more monolithic benchmarks. We also perform a number of qualitative analyses showing how they cope with trends and peaks in activity involving specific named entities or concept drift. TimeLMs is available at https://github.com/cardiffnlp/timelms.

2022

Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation

Autores
Abdulmumin, I; Dash, SR; Dawud, MA; Parida, S; Muhammad, SH; Ahmad, IS; Panda, S; Bojar, O; Galadanci, BS; Bello, BS;

Publicação
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

Abstract
Multi-modal Machine Translation (MMT) enables the use of visual information to enhance the quality of translations. The visual information can serve as a valuable piece of context information to decrease the ambiguity of input sentences. Despite the increasing popularity of such a technique, good and sizeable datasets are scarce, limiting the full extent of their potential. Hausa, a Chadic language, is a member of the Afro-Asiatic language family. It is estimated that about 100 to 150 million people speak the language, with more than 80 million indigenous speakers. This is more than any of the other Chadic languages. Despite a large number of speakers, the Hausa language is considered low-resource in natural language processing (NLP). This is due to the absence of sufficient resources to implement most NLP tasks. While some datasets exist, they are either scarce, machine-generated, or in the religious domain. Therefore, there is a need to create training and evaluation data for implementing machine learning tasks and bridging the research gap in the language. This work presents the Hausa Visual Genome (HaVG), a dataset that contains the description of an image or a section within the image in Hausa and its equivalent in English. To prepare the dataset, we started by translating the English description of the images in the Hindi Visual Genome (HVG) into Hausa automatically. Afterward, the synthetic Hausa data was carefully post-edited considering the respective images. The dataset comprises 32,923 images and their descriptions that are divided into training, development, test, and challenge test set. The Hausa Visual Genome is the first dataset of its kind and can be used for Hausa-English machine translation, multi-modal research, and image description, among various other natural language processing and generation tasks.

2022

Scalable transcriptomics analysis with Dask: applications in data science and machine learning

Autores
Moreno, M; Vilaca, R; Ferreira, PG;

Publicação
BMC BIOINFORMATICS

Abstract
Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.

  • 66
  • 429