Abstracts of the papers published in 1998


Title: DISCRETIZATION METHODS FOR DATA MINING.

Authors: Hung Son Nguyen, Sinh Hoa Nguyen

Published in: In: L. Polkowski, A. Skowron (eds.): Rough Sets in Knowledge Discovery. Physica-Verlag, Heidelberg 1998, pp. 451-482.

Abstract: Analysis of information systems and, in particular, decision tables belongs to the most important tasks of Artificial Intelligence. A decision table describes a finite subset of a collection, also referred to as a universe, of objects belonging to different categories called decision classes. There exist many methods of analyzing decision tables, but because of the complexity of the problem, most of them are designed for dealing with tables whose attributes can have only a small number of possible values. When it comes to analyze attributes with real values, they must undergo a process called discretization (or quantization), which divides the attribute's value into intervals. Such intervals form new values for the attribute and, in consequence, allow to reduce the size of the attribute's value set. The main goals of the paper is to characterize the computational complexity of the discretization problem by defining some optimal criteria based on rough set theory. One of the known strategy of decision algorithm generation is based on {\bf Minimal Description Length Principle} (MDLP) (algorithms computed by application of this strategy have usually feasible time and space complexity). The second aim of the paper is to construct heuristic for decision algorithm generation from discretized features and estimate their quality with respect to the MDLP. The last goal of the paper is to compare the presented methods with existing ones on different experimental data.
 

Back to List of publications


Title: DISCRETIZATION PROBLEMS FOR ROUGH SET METHODS

Authors: Hung Son Nguyen

Published in: In: L. Polkowski, A. Skowron (eds.), Proc. of the first International Conference on Rough Sets & Current Trend in Computing (RSCTC'98), June 1998, Warsaw, Poland, pp. 545-552.

Abstract: We study the relationship between reduct problem in Rough Sets theory and the problem of real value attribute discretization. We consider the problem of searching for a minimal set of cuts on attribute domains that preserves discernibility of objects with respect to any chosen attributes subset of cardinality $s$ (where $s$ is a parameter given by a user). Such a discretization procedure assures that one can keep all reducts consisting of at least $s$ attributes. We show that this optimization problem is NP-hard and it is interesting to find efficient heuristics for solving this problem.

Back to List of publications


Title: PATTERN EXTRACTION FROM DATA

Authors: Sinh Hoa Nguyen, Hung Son Nguyen

Published in: Proceedings of the Conference of Information Processing and Management of Uncertainty in Knowledge-Based Systems IPMU'98, July 1998, Paris, France, pp. 1346-1353.

Abstract: Searching for patterns is one of the main goals in data mining. Patterns have important applications in many KDD domains like rule extraction or classification. In this paper we present some methods of rule extraction by generalizing the existing approaches for the pattern problem. These methods, called partition of attribute values or grouping of attribute values, can be applied to decision tables with symbolic value attributes. If data tables contain symbolic and numeric attributes, some of the proposed methods can be used jointly with discretization methods. Moreover, these methods are applicable for incomplete data. The optimization problems for grouping of attribute values are either NP-complete or NP-hard. Hence we propose some heuristics returning approximate solutions for such problems.

Keywords: Data Mining, patterns, decision rules, discretization, value grouping.

Back to List of publications


Title: THE DECOMPOSITION PROBLEM IN MULTI-AGENT SYSTEMS

Authors: Hung Son Nguyen, Sinh Hoa Nguyen

Published in: In J. Komorowski, A. Skowron, I. Duntsch (Eds.): Proceedings of the ECAI'98 Workshop on Synthesis of Intelligent Agent Systems from Experimental Data, August 1998, Brighton, UK. The Extended version was published in H.D. Burkhard, L. Czaja, P. Starke (Eds.): The Procedings of the Workshop in Concurrency, Specification and Programming, September 1998, Humboldt Universitat zu Berlin, Germany.
Abstract: We consider a synthesis of complex objects by a multi-agent system based on rough mereology theory \cite{PS96}. Any agent can produce complex objects from parts obtained from his sub-agents using some composition rules. Agents are equipped with decision tables describing partial specifications of their synthesis tasks. We investigate some problems of searching for optimal task specifications for sub-agents having task specification for a super-agent. We propose a decomposition scheme (based on rough set and rough mereology theory) consistent with given composition rules. The computational complexity of decomposition problems is discussed by showing that these problems are equivalent to some well known graph theory problems.We also propose some heuristics for considered problems. As an application of multi-agent system we will show an effective decomposition and synthesis scheme for a production process of complex objects. We will show an upper bound of an error rate in synthesis process of our system.

Back to List of publications


Title: PATTERN EXTRACTION FROM DATA

Authors: Sinh Hoa Nguyen, Hung Son Nguyen

Published in: A. Skowron (Ed.): Fundamenta Informaticae. Vol. 34 No. 1-2, pp. 129-144. 1998.

Abstract: Searching for patterns is one of the main goals in data mining. Patterns have important applications in many KDD domains like rule extraction or classification. In this paper we present some methods of rule extraction by generalizing the existing approaches for the pattern problem. These methods, called partition of attribute values or grouping of attribute values, can be applied to decision tables with symbolic value attributes. If data tables contain symbolic and numeric attributes, some of the proposed methods can be used jointly with discretization methods. Moreover, these methods are applicable for incomplete data. The optimization problems for grouping of attribute values are either NP-complete or NP-hard. Hence we propose some heuristics returning approximate solutions for such problems.

Keywords: Data Mining, patterns, decision rules, discretization, value grouping.

Back to List of publications


Title: FROM OPTIMAL HYPERPLANES TO OPTIMAL DECISION TREES

Authors: Hung Son Nguyen

Published in: A. Skowron (Ed.): Fundamenta Informaticae. Vol. 34 No. 1-2, pp. 145-174. 1998.

Abstract: We present an optimal hyperplane searching method for decision tables using Genetic Algorithms. This method can be used to construct a decision tree for a given decision table. We also present some properties of the set of hyperplanes determined by our methods and evaluate an upper bound on the depth of the constructed decision tree.

Keywords: Boolean reasoning, discretization, oblique hyperplanes, decision tree

Back to List of publications