Agglomerative Hierarchical Clustering: An Introduction to Essentials. (3) Standardization, Normalization and Dimensionality Reduction of a Data Matrix

Authors

  • Refat Aljumily

Keywords:

corpus, vector, matrix, standardization, coefficient of variation, normalization, dimensionality reduction

Abstract

In a previous tutorial article I looked at a proximity coefficient and in the light of that proximity created a vectordistance matrix and used it to construct a hierarchical tree using different hierarchical clustering methods which will be the basis for exploratory multivariate analysis The present article deals with three topics i standardization for variable scales variation ii normalization for sample length variation and iii dimensionality reduction or minimization of data space These techniques reflect the author s academic background and particular area of interest and are by necessity not a particular purpose and are straightforwardly applicable to other kinds of data and thus to a wide range of analysis in Linguistics My treatment of these techniques is necessarily introductory and brief I hope that this article will provide practitioners with an introductory overview of these techniques used for cluster analysis of electronic corpora of linguistic data The assumption is that the data is in the form of an m x n matrix D in which may require to transform it in various ways prior to cluster analyzing it Standardized data matrix enables practitioners to measure the variation between n-variables and to cluster the cases they describe in common scales and values regardless of their original scales and values Normalized data matrix enables practitioners to eliminate the effect of variation in length among n-samples and to cluster them as if they were all about the same length regardless of their original length Dimensionality-reduced space data matrix enables practitioners to select and or extract n-most interesting variables relevant to the research question and to visualize an existing pattern regardless of the original space A worked example is given to illustrate the effect each transformation technique has on a given data matrix These transformation techniques have their own strengths and weakness but are beyond the scope of

How to Cite

Refat Aljumily. (2016). Agglomerative Hierarchical Clustering: An Introduction to Essentials. (3) Standardization, Normalization and Dimensionality Reduction of a Data Matrix. Global Journal of Human-Social Science, 16(G3), 55–63. Retrieved from https://socialscienceresearch.org/index.php/GJHSS/article/view/1711

Agglomerative Hierarchical Clustering: An Introduction to Essentials. (3) Standardization, Normalization and Dimensionality Reduction of a Data Matrix

Published

2016-03-15