Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by CSE

2021

A Dataset of Photos and Videos for Digital Forensics Analysis Using Machine Learning Processing

Authors
Ferreira, S; Antunes, M; Correia, ME;

Publication
DATA

Abstract
Deepfake and manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. Ransomware, the dissemination of fake news, and digital kidnapping-related crimes are the most recurrent, in which tampered multimedia content has been the primordial disseminating vehicle. Digital forensic analysis tools are being widely used by criminal investigations to automate the identification of digital evidence in seized electronic equipment. The number of files to be processed and the complexity of the crimes under analysis have highlighted the need to employ efficient digital forensics techniques grounded on state-of-the-art technologies. Machine Learning (ML) researchers have been challenged to apply techniques and methods to improve the automatic detection of manipulated multimedia content. However, the implementation of such methods have not yet been massively incorporated into digital forensic tools, mostly due to the lack of realistic and well-structured datasets of photos and videos. The diversity and richness of the datasets are crucial to benchmark the ML models and to evaluate their appropriateness to be applied in real-world digital forensics applications. An example is the development of third-party modules for the widely used Autopsy digital forensic application. This paper presents a dataset obtained by extracting a set of simple features from genuine and manipulated photos and videos, which are part of state-of-the-art existing datasets. The resulting dataset is balanced, and each entry comprises a label and a vector of numeric values corresponding to the features extracted through a Discrete Fourier Transform (DFT). The dataset is available in a GitHub repository, and the total amount of photos and video frames is 40,588 and 12,400, respectively. The dataset was validated and benchmarked with deep learning Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) methods; however, a plethora of other existing ones can be applied. Generically, the results show a better F1-score for CNN when comparing with SVM, both for photos and videos processing. CNN achieved an F1-score of 0.9968 and 0.8415 for photos and videos, respectively. Regarding SVM, the results obtained with 5-fold cross-validation are 0.9953 and 0.7955, respectively, for photos and videos processing. A set of methods written in Python is available for the researchers, namely to preprocess and extract the features from the original photos and videos files and to build the training and testing sets. Additional methods are also available to convert the original PKL files into CSV and TXT, which gives more flexibility for the ML researchers to use the dataset on existing ML frameworks and tools.

2021

Detailed Black-Box Monitoring of Distributed Systems

Authors
Neves, F; Vilaca, R; Pereira, J;

Publication
APPLIED COMPUTING REVIEW

Abstract
Modern containerized distributed systems, such as big data storage and processing stacks or micro-service based applications, are inherently hard to monitor and optimize, as resource usage does not directly match hardware resources due to multiple virtualization layers. For instance, inter-application traffic is an important factor in as it directly indicates how components interact, it has not been possible to accurately monitor it in an application independent way and without severe overhead, thus putting it out of reach of cloud platforms. In this paper we present an efficient black-box monitoring approach for gathering detailed structural information of collaborating processes in a distributed system that can be queried for various purposes, as it includes both information about processes, containers, and hosts, as well as resource usage and amount of data exchanged. The key to achieving high detail and low overhead without custom application instrumentation is to use a kernel-aided event driven strategy. We validate a prototype implementation by applying it to multi-platform microservice deployments, evaluate its performance with micro-benchmarks, and demonstrate its usefulness for container placement in a distributed data storage and processing stack (i.e., Cassandra and Spark).

2021

The Relationship Between Cybersickness, Sense of Presence, and the Users' Expectancy and Perceived Similarity Between Virtual and Real Places

Authors
Magalhaes, M; Melo, M; Bessa, M; Coelho, AF;

Publication
IEEE ACCESS

Abstract
This paper aims to explore the impact of sense of presence and cybersickness on the users' expectancy and perceived similarity between virtual and the corresponding real environments. Two virtual reality setups were tested (non-immersive and immersive) to achieve further conclusions. This research encompassed a quantitative analysis using data collection based on questionnaires, applied to a sample of 45 participants. A virtual experience was conducted (to explore users' cybersickness and sense of presence), followed by a visit to the actual real sites (to determine the degree of perceived similarity between the virtual and the corresponding real environment and if their expectations were fulfilled). Our results show a positive correlation between the global sense of presence and perceived similarity and users' expectancy for the non-immersive VR setup. A positive correlation was also found between the global cybersickness on both perceived similarity and users' expectancy for the immersive VR setup. Implications of such results for virtual tourism are discussed.

2021

Multivariate Outlier Detection in Postprocessing of Multi-temporal PS-InSAR Results using Deep Learning

Authors
Aguiar, P; Cunha, A; Bakon, M; Ruiz Armenteros, AM; Sousa, JJ;

Publication
INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS / INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT / INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES 2020 (CENTERIS/PROJMAN/HCIST 2020)

Abstract
Multi-temporal InSAR (MT-InSAR) techniques proved to be very effective for deformation monitoring. However, decorrelation and other noise sources, can be limiting factors in MT-InSAR. The obtained observations (PS - Persistent scatterers) are usually very demanding from a computational perspective, as they can reach hundreds of thousands of observations. To simplify and speed up the classification process, in this study we present an approach based on Convolutional Neural Networks (CNN) classification models, for the detection of MT-InSAR outlying observations. For each PS, the corresponding MT-InSAR parameters, its neighbouring scatterers parameters and its relative position are considered. Tests in two independent PS datasets, covering the regions of Bratislava city and the suburbs of Prievidza, Slovakia, were performed. The results showed that such models are robust and reduced computation time method for the evaluation of MT-InSAR outlying observations. However, the applicability of these models is limited by the deformation pattern in which such models were trained. (C) 2021 The Authors. Published by Elsevier B.V.

2021

Horus: Non-Intrusive Causal Analysis of Distributed Systems Logs

Authors
Neves, F; Machado, N; Vilaca, R; Pereira, J;

Publication
51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2021)

Abstract
Logs are still the primary resource for debugging distributed systems executions. Complexity and heterogeneity of modern distributed systems, however, make log analysis extremely challenging. First, due to the sheer amount of messages, in which the execution paths of distinct system components appear interleaved. Second, due to unsynchronized physical clocks, simply ordering the log messages by timestamp does not suffice to obtain a causal trace of the execution. To address these issues, we present Horus, a system that enables the refinement of distributed system logs in a causally-consistent and scalable fashion. Horus leverages kernel-level probing to capture events for tracking causality between application-level logs from multiple sources. The events are then encoded as a directed acyclic graph and stored in a graph database, thus allowing the use of rich query languages to reason about runtime behavior. Our case study with TrainTicket, a ticket booking application with 40+ microservices, shows that Horus surpasses current widely-adopted log analysis systems in pinpointing the root cause of anomalies in distributed executions. Also, we show that Horus builds a causally-consistent log of a distributed execution with much higher performance (up to 3 orders of magnitude) and scalability than prior state-of-the-art solutions. Finally, we show that Horus' approach to query causality is up to 30 times faster than graph database built-in traversal algorithms.

2021

Time series analysis via network science: Concepts and algorithms

Authors
Silva, VF; Silva, ME; Ribeiro, P; Silva, F;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
There is nowadays a constant flux of data being generated and collected in all types of real world systems. These data sets are often indexed by time, space, or both requiring appropriate approaches to analyze the data. In univariate settings, time series analysis is a mature field. However, in multivariate contexts, time series analysis still presents many limitations. In order to address these issues, the last decade has brought approaches based on network science. These methods involve transforming an initial time series data set into one or more networks, which can be analyzed in depth to provide insight into the original time series. This review provides a comprehensive overview of existing mapping methods for transforming time series into networks for a wide audience of researchers and practitioners in machine learning, data mining, and time series. Our main contribution is a structured review of existing methodologies, identifying their main characteristics, and their differences. We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified way and language. We first describe the case of univariate time series, which can be mapped to single layer networks, and we divide the current mappings based on the underlying concept: visibility, transition, and proximity. We then proceed with multivariate time series discussing both single layer and multiple layer approaches. Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic. This article is categorized under: Fundamental Concepts of Data and Knowledge > Data Concepts Fundamental Concepts of Data and Knowledge > Knowledge Representation

  • 71
  • 220