Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2020

Using autoencoders as a weight initialization method on deep neural networks for disease detection

Autores
Ferreira, MF; Camacho, R; Teixeira, LF;

Publicação
BMC MEDICAL INFORMATICS AND DECISION MAKING

Abstract
Background As of today, cancer is still one of the most prevalent and high-mortality diseases, summing more than 9 million deaths in 2018. This has motivated researchers to study the application of machine learning-based solutions for cancer detection to accelerate its diagnosis and help its prevention. Among several approaches, one is to automatically classify tumor samples through their gene expression analysis. Methods In this work, we aim to distinguish five different types of cancer through RNA-Seq datasets: thyroid, skin, stomach, breast, and lung. To do so, we have adopted a previously described methodology, with which we compare the performance of 3 different autoencoders (AEs) used as a deep neural network weight initialization technique. Our experiments consist in assessing two different approaches when training the classification model - fixing the weights after pre-training the AEs, or allowing fine-tuning of the entire network - and two different strategies for embedding the AEs into the classification network, namely by only importing the encoding layers, or by inserting the complete AE. We then study how varying the number of layers in the first strategy, the AEs latent vector dimension, and the imputation technique in the data preprocessing step impacts the network's overall classification performance. Finally, with the goal of assessing how well does this pipeline generalize, we apply the same methodology to two additional datasets that include features extracted from images of malaria thin blood smears, and breast masses cell nuclei. We also discard the possibility of overfitting by using held-out test sets in the images datasets. Results The methodology attained good overall results for both RNA-Seq and image extracted data. We outperformed the established baseline for all the considered datasets, achieving an average F(1)score of 99.03, 89.95, and 98.84 and an MCC of 0.99, 0.84, and 0.98, for the RNA-Seq (when detecting thyroid cancer), the Malaria, and the Wisconsin Breast Cancer data, respectively. Conclusions We observed that the approach of fine-tuning the weights of the top layers imported from the AE reached higher results, for all the presented experiences, and all the considered datasets. We outperformed all the previous reported results when comparing to the established baselines.

2020

A Study on Hyperparameter Configuration for Human Activity Recognition

Autores
Crarcia, KD; Carvalho, T; Mendes Moreira, J; Cardoso, JMP; de Carvalho, ACPLF;

Publicação
14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019)

Abstract
Human Activity Recognition is a machine learning task for the classification of human physical activities. Applications for that task have been extensively researched in recent literature, specially due to the benefits of improving quality of life. Since wearable technologies and smartphones have become more ubiquitous, a large amount of information about a person's life has become available. However, since each person has a unique way of performing physical activities, a Human Activity Recognition system needs to be adapted to the characteristics of a person in order to maintain or improve accuracy. Additionally, when smartphones devices are used to collect data, it is necessary to manage its limited resources, so the system can efficiently work for long periods of time. In this paper, we present a semi-supervised ensemble algorithm and an extensive study of the influence of hyperparameter configuration in classification accuracy. We also investigate how the classification accuracy is affected by the person and the activities performed. Experimental results show that it is possible to maintain classification accuracy by adjusting hyperparameters, like window size and window overlap, depending on the person and activity performed. These results motivate the development of a system able to automatically adapt hyperparameter settings for the activity performed by each person.

2020

Reconciling Predictions in the Regression Setting: An Application to Bus Travel Time Prediction

Autores
Mendes Moreira, J; Baratchi, M;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020

Abstract
In different application areas, the prediction of values that are hierarchically related is required. As an example, consider predicting the revenue per month and per year of a company where the prediction of the year should be equal to the sum of the predictions of the months of that year. The idea of reconciliation of prediction on grouped time-series has been previously proposed to provide optimal forecasts based on such data. This method in effect, models the time-series collectively rather than providing a separate model for time-series at each level. While originally, the idea of reconciliation is applicable on data of time-series nature, it is not clear if such an approach can also be applicable to regression settings where multi-attribute data is available. In this paper, we address such a problem by proposing Reconciliation for Regression (R4R), a two-step approach for prediction and reconciliation. In order to evaluate this method, we test its applicability in the context of Travel Time Prediction (TTP) of bus trips where two levels of values need to be calculated: (i) travel times of the links between consecutive bus-stops; and (ii) total trip travel time. The results show that R4R can improve the overall results in terms of both link TTP performance and reconciliation between the sum of the link TTPs and the total trip travel time. We compare the results acquired when using group-based reconciliation methods and show that the proposed reconciliation approach in a regression setting can provide better results in some cases. This method can be generalized to other domains as well.

2020

UnFOOT: Unsupervised Football Analytics Tool

Autores
Coutinho, JC; Moreira, JM; de Sa, CR;

Publicação
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT III

Abstract
Labelled football (soccer) data is hard to acquire and it usually needs humans to annotate the match events. This process makes it more expensive to be obtained by smaller clubs. UnFOOT (Unsupervised Football Analytics Tool) combines data mining techniques and basic statistics to measure the performance of players and teams from positional data. The capabilities of the tool involve preprocessing the match data, extraction of features, visualization of player and team performance. It also has built-in data mining techniques, such as association rule mining, subgroup discovery and a proposed approach to look for frequent distributions.

2020

Comparing State-of-the-Art Neural Network Ensemble Methods in Soccer Predictions

Autores
Neves, TM; Moreira, JM;

Publicação
Foundations of Intelligent Systems - 25th International Symposium, ISMIS 2020, Graz, Austria, September 23-25, 2020, Proceedings

Abstract
For many reasons, including sports being one of the main forms of entertainment in the world, online gambling is growing. And in growing markets, opportunities to explore it arise. In this paper, neural network ensemble approaches, such as bagging, random subspace sampling, negative correlation learning and the simple averaging of predictions, are compared. For each one of these methods, several combinations of input parameters are evaluated. We used only the expected goals metric as predictors since it is able to have good predictive power while keeping the computational demands low. These models are compared in the soccer (also known as association football) betting context where we have access to metrics, such as rentability, to analyze the results in multiple perspectives. The results show that the optimal solution is goal-dependent, with the ensemble methods being able to increase the accuracy up to +3 % over the best single model. The biggest improvement over the single model was obtained by averaging dropout networks. © 2020, Springer Nature Switzerland AG.

2020

kNN Prototyping Schemes for Embedded Human Activity Recognition with Online Learning

Autores
Ferreira, PJS; Cardoso, JMP; Moreira, JM;

Publicação
Comput.

Abstract
The kNN machine learning method is widely used as a classifier in Human Activity Recognition (HAR) systems. Although the kNN algorithm works similarly both online and in offline mode, the use of all training instances is much more critical online than offline due to time and memory restrictions in the online mode. Some methods propose decreasing the high computational costs of kNN by focusing, e.g., on approximate kNN solutions such as the ones relying on Locality-Sensitive Hashing (LSH). However, embedded kNN implementations also need to address the target device’s memory constraints, especially as the use of online classification needs to cope with those constraints to be practical. This paper discusses online approaches to reduce the number of training instances stored in the kNN search space. To address practical implementations of HAR systems using kNN, this paper presents simple, energy/computationally efficient, and real-time feasible schemes to maintain at runtime a maximum number of training instances stored by kNN. The proposed schemes include policies for substituting the training instances, maintaining the search space to a maximum size. Experiments in the context of HAR datasets show the efficiency of our best schemes. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.

  • 114
  • 429