Publications

Publications by João Mendes Moreira

2019

Mining Frequent Distributions in Time Series

Authors
Coutinho, JC; Moreira, JM; de Sa, CR;

Publication
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2019), PT II

Abstract
Time series data is composed of observations of one or more variables along a time period. By analyzing the variability of the variables we can reveal patterns that repeat or that are correlated, which helps to understand the behaviour of the variables over time. Our method finds frequent distributions of a target variable in time series data and discovers relationships between frequent distributions in consecutive time intervals. The frequent distributions are found using a new method, and relationships between them are found using association rules mining.

CloseRead Abstract

2020

A Study on Hyperparameter Configuration for Human Activity Recognition

Authors
Crarcia, KD; Carvalho, T; Mendes Moreira, J; Cardoso, JMP; de Carvalho, ACPLF;

Publication
14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019)

Abstract
Human Activity Recognition is a machine learning task for the classification of human physical activities. Applications for that task have been extensively researched in recent literature, specially due to the benefits of improving quality of life. Since wearable technologies and smartphones have become more ubiquitous, a large amount of information about a person's life has become available. However, since each person has a unique way of performing physical activities, a Human Activity Recognition system needs to be adapted to the characteristics of a person in order to maintain or improve accuracy. Additionally, when smartphones devices are used to collect data, it is necessary to manage its limited resources, so the system can efficiently work for long periods of time. In this paper, we present a semi-supervised ensemble algorithm and an extensive study of the influence of hyperparameter configuration in classification accuracy. We also investigate how the classification accuracy is affected by the person and the activities performed. Experimental results show that it is possible to maintain classification accuracy by adjusting hyperparameters, like window size and window overlap, depending on the person and activity performed. These results motivate the development of a system able to automatically adapt hyperparameter settings for the activity performed by each person.

CloseRead Abstract

2019

Machine Learning predictive model of grapevine yield based on agroclimatic patterns

Authors
Sirsat, MS; Mendes Moreira, J; Ferreira, C; Cunha, M;

Publication
Engineering in Agriculture, Environment and Food

Abstract
Grapevine yield prediction during phenostage and particularly, before harvest is highly significant as advanced forecasting could be a great value for superior grapevine management. The main contribution of the current study is to develop predictive model for each phenology that predicts yield during growing stages of grapevine and to identify highly relevant predictive variables. Current study uses climatic conditions, grapevine yield, phenological dates, fertilizer information, soil analysis and maturation index data to construct the relational dataset. After words, we use several approaches to pre-process the data to put it into tabular format. For instance, generalization of climatic variables using phenological dates. Random Forest, LASSO and Elasticnet in generalized linear models, and Spikeslab are feature selection embedded methods which are used to overcome dataset dimensionality issue. We used 10-fold cross validation to evaluate predictive model by partitioning the dataset into training set to train the model and test set to evaluate it by calculating Root Mean Squared Error (RMSE) and Relative Root Mean Squared Error (RRMSE). Results of the study show that rf_PF, rf_PC and rf_MH are optimal models for flowering (PF), colouring (PC) and harvest (MH) phenology respectively which estimate 1484.5, 1504.2 and 1459.4 (Kg/ha) low RMSE and 24.6%, 24.9% and 24.2% RRMSE, respectively as compared to other models. These models also identify some derived climatic variables as major variables for grapevine yield prediction. The reliability and early-indication ability of these forecast models justify their use by institutions and economists in decision making, adoption of technical improvements, and fraud detection. © 2019 Asian Agricultural and Biological Engineering Association

CloseRead Abstract

2021

Embedding Traffic Network Characteristics Using Tensor for Improved Traffic Prediction

Authors
Bhanu, M; Mendes Moreira, J; Chandra, J;

Publication
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Techniques for using multi-way traffic patterns for traffic prediction is gaining importance. One of the possible techniques for representing the multi-way traffic patterns is tensors. Tensor decomposition is used to generate low-rank approximations of the original tensor that is subsequently used for traffic volume prediction. However, the existing tensor-based approaches do not consider certain important mutual relationships among the locations like temporal traffic reciprocity that can improve the prediction accuracy. In this paper, we introduce TeDCaN, a "Tensor Decomposition method with Characteristic Network" constraints that generate low rank approximations of the original tensor considering the traffic reciprocity at different pair of locations. Investigations using large traffic datasets from 2 different cities reveal that the prediction accuracy of TeDCaN considerably outperforms several state-of-art baselines for cases when complete traffic data is available as well as situations when a certain fraction of the data is missing - a likely scenario in many real datasets. We discover that TeDCaN achieves around 20% reduction in the RMSE scores as compared to the baselines. TeDCaN is applicable in many operations on such a big traffic network where the existing models would either be inapplicable or hard to perform. As one of the major yields, TeDCaN generates a "reduced dimensional network embedding" that captures the similarity of the nodes considering the traffic volume as well as the reciprocity of traffic between the nodes.

CloseRead Abstract

2020

Reconciling Predictions in the Regression Setting: An Application to Bus Travel Time Prediction

Authors
Mendes Moreira, J; Baratchi, M;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020

Abstract
In different application areas, the prediction of values that are hierarchically related is required. As an example, consider predicting the revenue per month and per year of a company where the prediction of the year should be equal to the sum of the predictions of the months of that year. The idea of reconciliation of prediction on grouped time-series has been previously proposed to provide optimal forecasts based on such data. This method in effect, models the time-series collectively rather than providing a separate model for time-series at each level. While originally, the idea of reconciliation is applicable on data of time-series nature, it is not clear if such an approach can also be applicable to regression settings where multi-attribute data is available. In this paper, we address such a problem by proposing Reconciliation for Regression (R4R), a two-step approach for prediction and reconciliation. In order to evaluate this method, we test its applicability in the context of Travel Time Prediction (TTP) of bus trips where two levels of values need to be calculated: (i) travel times of the links between consecutive bus-stops; and (ii) total trip travel time. The results show that R4R can improve the overall results in terms of both link TTP performance and reconciliation between the sum of the link TTPs and the total trip travel time. We compare the results acquired when using group-based reconciliation methods and show that the proposed reconciliation approach in a regression setting can provide better results in some cases. This method can be generalized to other domains as well.

CloseRead Abstract

2020

UnFOOT: Unsupervised Football Analytics Tool

Authors
Coutinho, JC; Moreira, JM; de Sa, CR;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT III

Abstract
Labelled football (soccer) data is hard to acquire and it usually needs humans to annotate the match events. This process makes it more expensive to be obtained by smaller clubs. UnFOOT (Unsupervised Football Analytics Tool) combines data mining techniques and basic statistics to measure the performance of players and teams from positional data. The capabilities of the tool involve preprocessing the match data, extraction of features, visualization of player and team performance. It also has built-in data mining techniques, such as association rule mining, subgroup discovery and a proposed approach to look for frequent distributions.

CloseRead Abstract