1994
Autores
BRITO, P;
Publicação
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Abstract
We study assertion objects that constitute a particular class of symbolic objects. Symbolic objects constitute a data analysis driven formalism, which can be compared to propositional calculus, but which is oriented toward the duality intension (characteristic properties) versus extension (set of all individuals verifying a given set of properties). The set of assertion objects is endowed with a partial order and a quasi-order. We focus on the property of completeness, which precisely expresses the duality intension-extension. The order structure of complete assertion objects is studied, using notions of lattice theory and Galois connection, and extending Wille's work to multiple-valued data. Two results are then obtained for particular cases.
2011
Autores
Noirhomme Fraiture, M; Brito, P;
Publicação
Statistical Analysis and Data Mining
Abstract
This paper introduces symbolic data analysis, explaining how it extends the classical data models to take into account more complete and complex information. Several examples motivate the approach, before the modeling of variables assuming new types of realizations are formally presented. Some methods for the (multivariate) analysis of symbolic data are presented and discussed. This is however far from being exhaustive, given the present dynamic development of this new field of research. Copyright © 2011 Wiley Periodicals, Inc., A Wiley Company.
2006
Autores
Silva, HB; Brito, P; da Costa, JP;
Publicação
PATTERN RECOGNITION
Abstract
Applying graph theory to clustering, we propose a partitional clustering method and a clustering tendency index. No initial assumptions about the data set are requested by the method. The number of clusters and the partition that best fits the data set, are selected according to the optimal clustering tendency index value.
2007
Autores
Brito, P;
Publicação
ADVANCES IN DATA ANALYSIS
Abstract
In this paper we discuss some issues which arise when applying classical data analysis techniques to interval data, focusing on the notions of dispersion, association and linear combinations of interval variables. We present some methods that have been proposed for analysing this kind of data, namely for clustering, discriminant analysis, linear regression and interval time series analysis.
1995
Autores
BRITO, P;
Publicação
ANNALS OF OPERATIONS RESEARCH
Abstract
We recall a formalism based on the notion of symbolic object (Diday [15], Brito and Diday [8]), which allows to generalize the classical tabular model of Data Analysis. We study assertion objects, a particular class of symbolic objects which is endowed with a partial order and a quasi-order. Operations are then defined on symbolic objects. We study the property of completeness, already considered in Brito and Diday [8], which expresses the duality extension/intension. We formalize this notion in the framework of the theory of Galois connections and study the order structure of complete assertion objects. We introduce the notion of c-connection, as being a pair of mappings (f, g) between two partially ordered sets which should fulfil given conditions. A complete assertion object is then defined as a fixed point of the composed f o g; this mapping is called a ''completeness operator'' for it ''completes'' a given assertion object. The set of complete assertion objects forms a lattice and we state how suprema and infima are obtained. The lattice structure being too complex to allow a clustering study of a data set, we have proposed a pyramidal clustering approach [8]. The symbolic pyramidal clustering method builds a pyramid bottom-up, each cluster being described by a complete assertion object whose extension is the cluster itself. We thus obtain an inheritance structure on the data set. The inheritance structure then leads to the generation of rules.
2012
Autores
Brito, P; Pedro Duarte Silva, APD;
Publicação
JOURNAL OF APPLIED STATISTICS
Abstract
A parametric modelling for interval data is proposed, assuming a multivariate Normal or Skew-Normal distribution for the midpoints and log-ranges of the interval variables. The intrinsic nature of the interval variables leads to special structures of the variance-covariance matrix, which is represented by five different possible configurations. Maximum likelihood estimation for both models under all considered configurations is studied. The proposed modelling is then considered in the context of analysis of variance and multivariate analysis of variance testing. To access the behaviour of the proposed methodology, a simulation study is performed. The results show that, for medium or large sample sizes, tests have good power and their true significance level approaches nominal levels when the constraints assumed for the model are respected; however, for small samples, sizes close to nominal levels cannot be guaranteed. Applications to Chinese meteorological data in three different regions and to credit card usage variables for different card designations, illustrate the proposed methodology.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.