2019
Autores
Silva, ME; Pereira, I; McCabe, B;
Publicação
JOURNAL OF TIME SERIES ANALYSIS
Abstract
This work investigates outlier detection and modelling in non-Gaussian autoregressive time series models with margins in the class of a convolution closed parametric family. This framework allows for a wide variety of models for count and positive data types. The article investigates additive outliers which do not enter the dynamics of the process but whose presence may adversely influence statistical inference based on the data. The Bayesian approach proposed here allows one to estimate, at each time point, the probability of an outlier occurrence and its corresponding size thus identifying the observations that require further investigation. The methodology is illustrated using simulated and observed data sets.
2019
Autores
Faes, L; Pereira, MA; Silva, ME; Pernice, R; Busacca, A; Javorka, M; Rocha, AP;
Publicação
PHYSICAL REVIEW E
Abstract
Information storage, reflecting the capability of a dynamical system to keep predictable information during its evolution over time, is a key element of intrinsic distributed computation, useful for the description of the dynamical complexity of several physical and biological processes. Here we introduce a parametric approach which allows one to compute information storage across multiple timescales in stochastic processes displaying both short-term dynamics and long-range correlations (LRC). Our analysis is performed in the popular framework of multiscale entropy, whereby a time series is first "coarse grained" at the chosen timescale through low-pass filtering and downsampling, and then its complexity is evaluated in terms of conditional entropy. Within this framework, our approach makes use of linear fractionally integrated autoregressive (ARFI) models to derive analytical expressions for the information storage computed at multiple timescales. Specifically, we exploit state space models to provide the representation of lowpass filtered and downsampled ARFI processes, from which information storage is computed at any given timescale relating the process variance to the prediction error variance. This enhances the practical usability of multiscale information storage, as it enables a computationally reliable quantification of a complexity measure which incorporates the effects of LRC together with that of short-term dynamics. The proposed measure is first assessed in simulated ARFI processes reproducing different types of autoregressive dynamics and different degrees of LRC, studying both the theoretical values and the finite sample performance. We find that LRC alter substantially the complexity of ARFI processes even at short timescales, and that reliable estimation of complexity can be achieved at longer timescales only when LRC are properly modeled. Then, we assess multiscale information storage in physiological time series measured in humans during resting state and postural stress, revealing unprecedented responses to stress of the complexity of heart period and systolic arterial pressure variability, which are related to the different role played by LRC in the two conditions.
2019
Autores
Sohan, MF; Rahman, SSMM; Munna, MTA; Allayear, SM; Rahman, MH; Rahman, MM;
Publicação
Communications in Computer and Information Science - Next Generation Computing Technologies on Computational Intelligence
Abstract
2019
Autores
Younus, M; Munna, MTA; Alam, MM; Allayear, SM; Ara, SJF;
Publicação
Studies in Big Data - Data Management and Analysis
Abstract
2019
Autores
Munna M.T.A.; Alam M.M.; Allayear S.M.; Sarker K.; Ara S.J.F.;
Publicação
Advances in Intelligent Systems and Computing
Abstract
In today’s era, most of the people are suffering with chronic diseases because of their lifestyle, food habits and reduction in physical activities. Diabetes is one of the most common chronic diseases which has affected to the people of all ages. Diabetes complication arises in human body due to increase of blood glucose (sugar) level than the normal level. Type-2 diabetes is considered as one of the most prevalent endocrine disorders. In this circumstance, we have tried to apply Machine learning algorithm to create the statistical prediction based model that people having diabetes can be aware of their prevalence. The aim of this paper is to detect the prevalence of diabetes relevant complications among patients with Type-2 diabetes mellitus. The processing and statistical analysis we used are Scikit-Learn, and Pandas for Python. We also have used unsupervised Machine Learning approaches known as Artificial Neural Network (ANN) and K-means Clustering for developing classification system based prediction model to judge Type-2 diabetes mellitus chronic diseases.
2019
Autores
Marques, F; Duarte, H; Santos, J; Domingues, I; Amorim, JP; Abreu, PH;
Publicação
SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING
Abstract
The machine learning field has grown considerably in the last years. There are, however, some problems still to be solved. The characteristics of the training sets, for instance, are known to affect the classifiers performance. Here, and inspired by medical applications, we are interested in studying datasets that are both ordinal and imbalanced. Ordinal datasets present labels where only the relative ordering between different values is significant. Imbalanced datasets have very different quantity of examples per class. Building upon our previous work, we make three new contributions, (1) extend the number of classifiers, (2) evaluate two techniques to balance intermediate train sets in binary decomposition methods (often used in multi-class contexts and ordinal ones in particular), and (3) propose a new, iterative, classifier-based oversampling algorithm that we name InCuBAtE. Experiments were made on 6 private datasets, concerning the assessment of response to treatment on oncologic diseases, and 15 public datasets widely used in the literature. When compared with our previous work, results have improved (or remained the same) for 4 of the 6 private datasets and for 11 out of the 15 public datasets.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.