Special issue in International Journal of Approximate Reasoning (Elsevier, ISSN: 0888-613X) and a corresponding workshop on

Organized by Oviedo University^{1}
and European Centre for Soft Computing

A differentiated treatment for stochastic and epistemic uncertainties is an important issue for many modeling and decision making problems. Different frameworks for representing imprecise, uncertain or vague information have been proposed (possibility theory, fuzzy sets, imprecise probabilities, etc.) The design and validation of computational intelligence systems that discover knowledge on the basis of incomplete and imprecise information, including learning algorithms, classification and regression models is deeply influenced by these representations and by the different interpretations of mathematical tools like fuzzy or random sets.

The special issue aims to encourage discussion on those theoretical and methodological aspects that impact practical applications on modeling and knowledge discovery. Articles reporting on both theoretical and empirical research will be considered for inclusion, as well as survey and position papers suggesting new research directions and critiques of current trends.

Selected authors of position papers will be invited as speakers in a two-day workshop that will take place on Wednesday, 16th May 2012 and Thursday, 17th May 2012 at European Centre for Soft Computing, Mieres, Asturias (Spain).

Audience members are expected to ask questions to the speakers. Each of these position papers will be accompanied with the most relevant discussions and a final rejoinder from the author.

- Collect basic papers by speakers
- Collect significant comments and rejoinders
- Collect replies to questions posed in the round tables
- Based on this material write a joint position paper clarifying the various approaches to uncertainty in statistics and formalized canonical problems

Chairman: Didier Dubois

- In the last 40 years some formalisms have emerged for handling uncertainty
and/or human-originated information, that seem to compete with the
probabilistic tradition.
- Fuzzy set theory (Zadeh)
- Random sets (Kendall, Matheron)
- Belief functions (Shafer Smets)
- Possibility theory (Shackle, Zadeh)
- Imprecise probabilities (Dempster, Walley)
- etc.

- All these formalisms put forward the use of sets as opposed to or as complementary to the use of probability distributions.
- The aim of the round table is to better understand their impact on statistics and information processing.

- Wednesday, 16th
- (09:00-09:30) Presentation, discussion of the work plan
- (09:40-11:00) Ana Colubi: Statistical methods for random fuzzy sets: Theory and applications (50’ + 30’ discussion)
- (11:00-11:15) Coffee break
- (11:15-12:35) Eyke Hullermeier: Learning from Imprecise Data: On the Notion of Data Disambiguation (50’ + 30’ discussion)
- (12:40-13:30) Presentation of the topics of the roundtable. Proposal about issues to be discussed the following day.
- (13:30-15:00) Lunch
- (15:10-16:30) Didier Dubois: Statistical Reasoning with Set-Valued Information: Ontic vs. Epistemic Views (50’ + 30’ discussion)
- (16:30-16:45) Coffee break
- (16:45-18:15) Thierry Denoeux: Statistical inference from uncertain data in the belief function framework (50’ + 30’ discussion)
- (20:30-22:30) Dinner

- Thursday, 17th
- (09:00-10:20) James Keller: Comparing Partitions from Clustering Algorithms (50’ + 30’ discussion)
- (10:20-10:35) Coffee break
- (10:35-11:55) Christian Borgelt: Approaches to Fault-Tolerant Item Set Mining (50’ + 30’ discussion)
- (11:55-12:10) Coffee break
- (12:10-13:30) Serafín Moral: Imprecise probability models for representing ignorance. Applications to learning credal networks (50’ + 30’ discussion)
- (13:30-15:00) Lunch
- (15:30-17:30) Roundtable
- (17:30-17:45) Closing. Discussion about the Special Issue

- Conjunctive and Disjunctive sets. Ontic and Epistemic fuzzy sets.
- Joining different kinds of fuzzy data in the same model.
- Different frameworks for representing imprecise, uncertain or vague information.
- Different interpretations of fuzzy sets.
- Methodological aspects concerning the role of fuzzy sets in machine learning.
- Statistical inference techniques for random sets and random fuzzy sets.
- Classification and regression models using fuzzy data.
- Classification and regression models from low quality data.
- Performance assessment of classification and regression models with non-conventional output.
- Upper-lower probability models in the treatment of epistemic prior information about parameters in machine learning.

- Ana Colubi
Title: Statistical methods for random fuzzy sets: Theory and applications.

Abstract: A random fuzzy set is a model to formalize the random generation of fuzzy data. Fuzzy data are often used to represent perceptions, ratings, subjective opinions, etc. In some cases, they represent an imprecise perception of a precise quantity, while some other times they represent an intrinsically non-precise characteristic. That is the case, for instance, of the expert assessment of the quality of any item. In this context, fuzzy data can be treated as elements of a conventional metric space. In any case, when the final aim is to obtain statistical conclusions which do not refer to any (possible existing) underlying quantity but to the fuzzy data itself, the available results in probability and statistics for metric spaces may be applied.

Some statistical tools based on a family of intuitive and operative L2-type metrics inspired on the mid-spread decomposition of intervals will be recalled. It will be shown that the rich theory of statistics for Hilbert space may be occasionally used. However, the lack of linearity of the space of fuzzy sets endowed with the the usual arithmetic implies some difficulties. Specifically, inferences on the fuzzy mean, the Frechet variance and regression problems will be discussed. Real-life examples will be used to illustrate the methods.

- Eyke Hullemeier
Title: Learning from Imprecise Data: On the Notion of Data Disambiguation

Abstract: An increasing number of publications is currently devoted to the learning of models from imprecise data, such as interval data or, more generally, data modeled in terms of fuzzy subsets of an underlying reference space. Needless to say, this idea also requires the extension of corresponding learning algorithms. Unfortunately, this is often done without clarifying the actual meaning of an interval or fuzzy observation, and the interpretation of membership functions. Distinguishing between an ”ontic” and an ”epistemic” interpretation of (fuzzy) set-valued data, we argue that different interpretations call for different types of extensions of existing learning algorithms and methods for data analysis. Then, focusing on the epistemic view, we argue that, in model induction from imprecise data, one should try to find a model that ”disambiguates” the data instead of reproducing it. More specifically, this leads to a learning procedure that performs model identification and data disambiguation simultaneously. This idea is illustrated by means of two concrete problems, namely regression analysis with fuzzy data and classifier learning from ambiguously labeled instances.

- Didier Dubois
Title: Statistical Reasoning with Set-Valued Information: Ontic vs. Epistemic Views

Abstract: Sets, hence fuzzy sets, may have a conjunctive or a disjunctive reading. In the conjunctive reading a (fuzzy) set represents an ob ject of interest for which a (gradual rather than Boolean) composite description makes sense. In contrast disjunctive (fuzzy) sets refer to the use of sets as a representation of incomplete knowledge. They do not model ob jects or quantities, but partial information about an underlying ob ject or a precise quantity. In this case the fuzzy set captures uncertainty, and its membership function is a possibility distribution. We call epistemic such fuzzy sets, since they represent states of incomplete knowledge. Distinguishing between ontic and epistemic fuzzy sets is important in information-processing tasks because there is a risk of misusing basic notions and tools, such as distance between fuzzy sets, variance of a fuzzy random variable, fuzzy regression, etc. We discuss several examples where the ontic and epistemic points of view yield different approaches to these concepts.

- Thierry Denoeux
Title: Statistical inference from uncertain data in the belief function framework

- James Keller
Title: Comparing Partitions from Clustering Algorithms

Abstract: Many of us participate in clustering research as a means of exploration aimed at understanding the structure and organization of vague and imprecise data. Most papers focus on the creation of new approaches to perform clustering. But, just how good are the results of clustering algorithms? There are several well known measures of cluster validity that are routinely utilized. Most focus on balancing the criteria of compactness and separation. We present here a method for comparing crisp and soft partitions (i.e., probabilistic, fuzzy and possibilistic) to a known crisp reference partition. Many of the classical indices that have been used with outputs of crisp clustering algorithms are generalized so that they are applicable for candidate partitions of any type. In particular, focus is placed on generalizations of the Rand index. Additionally, we extend these partition comparison methods by (1) investigating the behavior of the soft Rand for comparing non-crisp, specifically possibilistic, partitions and (2) we demonstrate how the possibilistic Rand and visual assessment of (cluster) tendency (VAT) algorithm can be used to discover the number of actual clusters and coincident clusters for outputs from the possibilistic c-means (PCM) algorithm.

- Christian Borgelt
Title: Approaches to Fault-Tolerant Item Set Mining

Abstract: In standard frequent item set mining a transaction supports an item set only if all items in the set are present. However, in many cases the transaction data to analyze is imperfect: items that are actually contained in a transaction are not recorded as such. The reasons can be manifold, ranging from noise through measurement errors to an underlying feature of the observed process. In such a case full containment is too strict a requirement that can render it impossible to find certain relevant groups of items. By relaxing the support definition, allowing for some items of a given set to be missing from a transaction, this drawback can be amended. The resulting item sets have been called approximate, fault-tolerant or fuzzy item sets. In this talk I present two cost-based approaches and accompanying efficient algorithms to find such item sets. The first works by inserting missing items into transactions, penalizing the transaction weight in such a case, while the second computes and evaluates subset size occurrence distributions. I demonstrate the benefits of the algorithms by applying them to an artificial data set (as a proof of concept) and to a concept detection task on the 2008/2009 Wikipedia Selection for schools.

- Serafín Moral
Title: Imprecise probability models for representing ignorance. Applications to learning credal networks

Abstract: This paper will investigate suitable imprecise prior probability models for learning with credal networks. The best known imprecise model for inference about a multinomial distribution is the Imprecise Dirichlet Model (IDM) which assumes as prior information the set of all Dirichlet distributions with a fixed equivalent sample size S . However, the IDM has been shown to be too cautious in some given situations and non useful to learn about independence relationships from data. To solve these problems, other alternative models have been proposed, as the imprecise sample size Dirichlet model (ISSDM). We will consider the application of the ISSDM to learn credal networks, both, for determining the graphical structure and for estimating the parameters. Special emphasis will be given to the principles, assumptions, and justification of the model. An important aspect that will be studied will be the distinction between global procedures (which divide the global sample size between the different conditional distributions) and local procedures (which assume a model for each conditional distribution), showing that this can be a source of the imprecision in the equivalent sample size. An algorithm will be given to learn the structure of a credal network and the suitability of different propagation procedures will be discussed. Finally, we will make some experiments to show the behaviour of the ISSDM for classification problems and when learning from databases.

~~Open call for papers : February 10, 2011~~~~Tentative submission (title and abstract): March 20, 2012~~- Workshop: May 16 - May 17, 2012
- Paper submission: July 20, 2012
- First revision: September 20, 2012
- Updated versions: October 20, 2012
- Second revision: November 20, 2012
- Final version: December 20, 2012

Inés Couso

Dept. Statistics, Operational Research and M. T.

University of Oviedo

Gijón E-33002, Spain

Tel: +34 985181906

email: couso@uniovi.es

Luciano Sánchez

Dept. Computer Science

University of Oviedo

Gijón E-33002, Spain

Tel: +34 985182130

Fax: +34 985181986

email: luciano@uniovi.es

^{1}Under Research Projects TIN2008-06681-C06-04: “Knowledge Discovery based on
Evolutionary Learning: Current Trends and New Challenges (KEEL-CTNC) / Evolutionary
Learning with Low Quality Data and Genetic Fuzzy Systems. Distributed High-Dimensional
Data sets” and TIN2011-24302: “CI-LQD: Computational Intelligence Techniques for
Modeling and Decision Making with Low Quality Data: Theoretical, Methodological and
Practical Issues”