Publicacoes - INESC TEC

Publicações

Publicações por CRACS

2021

Evaluating cybersecurity attitudes and behaviors in Portuguese healthcare institutions

Autores
Nunes, P; Antunes, M; Silva, C;

Publicação
INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS / INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT / INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES 2020 (CENTERIS/PROJMAN/HCIST 2020)

Abstract
The growing digitization of healthcare institutions and its increasing dependence on Internet infrastructure has boosted the concerns related to data privacy and confidentiality. These institutions have been challenged with specific issues, namely the sensitivity of data, the specificity of networked equipment, the heterogeneity of healthcare professionals (nurses, doctors, administrative staff and other) and the IT skills they have. In this paper we present the results obtained with a study made with healthcare professionals on evaluating their awareness level with the information security, namely by assessing their attitudes and behaviours in cybersecurity. The methodology consisted in translating, adjusting and applying two previously validated and already published Likert-type response scales, in a healthcare institution in Portugal, namely "Centro Hospitalar Barreiro Montijo" (CHBM). The scales used were cybersecurity risky behaviour (RScB) and cybersecurity and cybercrime in business attitudes (ATC-IB). Although there were no significant statistical differences between the sociodemographic factors and the scores obtained on both scales, the results showed a relationship between acquired behaviours and the attitudes of involvement with work and organizational commitment, establishing a bridge for the quantification in awareness.(C) 2021 The Authors. Published by Elsevier B. V.

FecharLer Abstract

2021

Information Security and Cybersecurity Management: A Case Study with SMEs in Portugal

Autores
Antunes, M; Maximiano, M; Gomes, R; Pinto, D;

Publicação
J. Cybersecur. Priv.

Abstract
Information security plays a key role in enterprises management, as it deals with the confidentiality, privacy, integrity, and availability of one of their most valuable resources: data and information. Small and Medium-sized enterprises (SME) are seen as a blind spot in information security and cybersecurity management, which is mainly due to their size, regional and familiar scope, and financial resources. This paper presents an information security and cybersecurity management project, in which a methodology based on the well-known ISO-27001:2013 standard was designed and implemented in fifty SMEs that were located in the center region of Portugal. The project was conducted by a business association located at the center of Portugal and mainly participated by SMEs. The Polytechnic of Leiria and an IT auditing/consulting team were the other two entities that participated on the project. The characterisation of the participating enterprises, the ISO-27001:2013 based methodology developed and implemented in SMEs, as well as the results obtained in this case study, are depicted and analysed in the paper. The attained results show a clear benefit to the audited and intervened SMEs, being mainly attested by the increasing of their information security management robustness and collaborators’ cyberawareness.

FecharLer Abstract

2021

An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing

Autores
Carnaz, G; Antunes, M; Nogueira, VB;

Publicação
DATA

Abstract
Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.

FecharLer Abstract

2021

A Graph Database Representation of Portuguese Criminal-Related Documents

Autores
Carnaz, G; Nogueira, VB; Antunes, M;

Publicação
INFORMATICS-BASEL

Abstract
Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.

FecharLer Abstract

2021

Exposing Manipulated Photos and Videos in Digital Forensics Analysis

Autores
Ferreira, S; Antunes, M; Correia, ME;

Publicação
JOURNAL OF IMAGING

Abstract
Tampered multimedia content is being increasingly used in a broad range of cybercrime activities. The spread of fake news, misinformation, digital kidnapping, and ransomware-related crimes are amongst the most recurrent crimes in which manipulated digital photos and videos are the perpetrating and disseminating medium. Criminal investigation has been challenged in applying machine learning techniques to automatically distinguish between fake and genuine seized photos and videos. Despite the pertinent need for manual validation, easy-to-use platforms for digital forensics are essential to automate and facilitate the detection of tampered content and to help criminal investigators with their work. This paper presents a machine learning Support Vector Machines (SVM) based method to distinguish between genuine and fake multimedia files, namely digital photos and videos, which may indicate the presence of deepfake content. The method was implemented in Python and integrated as new modules in the widely used digital forensics application Autopsy. The implemented approach extracts a set of simple features resulting from the application of a Discrete Fourier Transform (DFT) to digital photos and video frames. The model was evaluated with a large dataset of classified multimedia files containing both legitimate and fake photos and frames extracted from videos. Regarding deepfake detection in videos, the Celeb-DFv1 dataset was used, featuring 590 original videos collected from YouTube, and covering different subjects. The results obtained with the 5-fold cross-validation outperformed those SVM-based methods documented in the literature, by achieving an average F1-score of 99.53%, 79.55%, and 89.10%, respectively for photos, videos, and a mixture of both types of content. A benchmark with state-of-the-art methods was also done, by comparing the proposed SVM method with deep learning approaches, namely Convolutional Neural Networks (CNN). Despite CNN having outperformed the proposed DFT-SVM compound method, the competitiveness of the results attained by DFT-SVM and the substantially reduced processing time make it appropriate to be implemented and embedded into Autopsy modules, by predicting the level of fakeness calculated for each analyzed multimedia file.

FecharLer Abstract

2021

A Dataset of Photos and Videos for Digital Forensics Analysis Using Machine Learning Processing

Autores
Ferreira, S; Antunes, M; Correia, ME;

Publicação
DATA

Abstract
Deepfake and manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. Ransomware, the dissemination of fake news, and digital kidnapping-related crimes are the most recurrent, in which tampered multimedia content has been the primordial disseminating vehicle. Digital forensic analysis tools are being widely used by criminal investigations to automate the identification of digital evidence in seized electronic equipment. The number of files to be processed and the complexity of the crimes under analysis have highlighted the need to employ efficient digital forensics techniques grounded on state-of-the-art technologies. Machine Learning (ML) researchers have been challenged to apply techniques and methods to improve the automatic detection of manipulated multimedia content. However, the implementation of such methods have not yet been massively incorporated into digital forensic tools, mostly due to the lack of realistic and well-structured datasets of photos and videos. The diversity and richness of the datasets are crucial to benchmark the ML models and to evaluate their appropriateness to be applied in real-world digital forensics applications. An example is the development of third-party modules for the widely used Autopsy digital forensic application. This paper presents a dataset obtained by extracting a set of simple features from genuine and manipulated photos and videos, which are part of state-of-the-art existing datasets. The resulting dataset is balanced, and each entry comprises a label and a vector of numeric values corresponding to the features extracted through a Discrete Fourier Transform (DFT). The dataset is available in a GitHub repository, and the total amount of photos and video frames is 40,588 and 12,400, respectively. The dataset was validated and benchmarked with deep learning Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) methods; however, a plethora of other existing ones can be applied. Generically, the results show a better F1-score for CNN when comparing with SVM, both for photos and videos processing. CNN achieved an F1-score of 0.9968 and 0.8415 for photos and videos, respectively. Regarding SVM, the results obtained with 5-fold cross-validation are 0.9953 and 0.7955, respectively, for photos and videos processing. A set of methods written in Python is available for the researchers, namely to preprocess and extract the features from the original photos and videos files and to build the training and testing sets. Additional methods are also available to convert the original PKL files into CSV and TXT, which gives more flexibility for the ML researchers to use the dataset on existing ML frameworks and tools.

FecharLer Abstract