A PATTERN RECOGNITION TECHNIQUE FOR THE RECOGNITION OF OBJECTS WITH CLOSED 2D CONTOUR
GIUSEPPE MANFREDI
Faculty of Political Science, LUSPIO University, Via C. Colombo 200
Rome, 00139, Italy
E-mail: giuseppe.manfredi@gmail.com
LUCIANO NIEDDU
Faculty of Economics, LUSPIO University, Via C. Colombo 200
Rome, 00139, Italy
E-mail: l.nieddu@gmail.com
In this paper we present a Pattern Recognition technique to recognize two dimensional objects with closed contours via the use of Fast Fourier Transform and a constrained k-means clustering technique. The basic assumption is that the objects can be classified according to the pattern in their contour, i.e. features such as size, position and orientation are only confounding factors which should not be used in the recognition process and which can result in spurious classification if not properly handled.
1. Introduction
Pattern Recognition and automatic classification techniques have become a very important and effective tool in many real problems where objects can be grouped according to some characteristics and an automated technique is necessary to determine which class an object belongs to (Hand, 1981; Vapnik, 1995).
The aim of this paper is to propose a supervised statistical patter recognition technique (Duda and Hart, 1973; Watanabe, 1985) suitable to discriminate between objects with respect to their two dimensional contour.
The proposed technique can be used on two dimensional objects or on the two dimensional projection on a plane of a 3D object. The pattern recognition algorithm presented in this paper is a non parametric statistical pattern recognition algorithm base on a constrained k-means algorithm (Gordon, 1999).
The proposed technique will be tested in a simulation study.
The main assumption of this paper is that only the contour is useful to classify objects onto a set of k mutually exclusive classes, i.e. features like size, orientation and position should be considered as confounding factors in the classification process. Therefore a feature extraction technique able to extract features invariants to those characteristics represents a fundamental step of the classification process. This assumption is valid in a variety of real classification problems such as, e.g. geometrical shapes, handwritten characters, OCR etc.(Khotanzad and Hong, 1990; McLachlan, 1992; Yokono and Poggio, 2004).
The outline of the paper is as follows: in Section 2 the proposed algorithm will be presented, while in Section 3 the feature extraction technique used to get invariant features will be shown. In Section 4 the results of a simulation study will be depicted and in Section 5 the due conclusions will be drawn.
2. The Algorithm
The construction of a supervised pattern recognition system is made up of three main steps:
(i) A pre-processing is applied to the data in order to make them comparable and to filter-out the part of information which is not directly related to the classification problem.
(ii) Feature selection and/or extraction is applied to the data in order to compact the dimension of the pattern vector trying to retain as much as possible of the information correlated to the classification problem. The features extracted should be invariant to incidental changes in confounding factors and sensing configuration (e.g., rotation, displacement etc.).
(iii) The pattern recognition algorithm is trained on a subset of the data set (training set) at hand and its performance is evaluated using a cross-validation scheme (see Duda and Hart, 1973; Hand, 1981; McLachlan, 1992; Yokono and Poggio, 2004).
The algorithm presented in this paper is a supervised learning algorithm and is made up of two phases: in the first phase the algorithm needs to be trained using a dataset that has been previously classified by an expert or in any other way deemed appropriate (training set). In the second phase the algorithm can be used to classify new entities for whom the class is unknown (query points, i.e. verification set).
The aim of this algorithm is that of finding subclasses in the dataset which can be used to classify new vectors of unknown class.
Usually the performance of the algorithm is assessed through a cross-validation or a jackknife procedure (Efron, 1982; Efron, 1983, Ripley, 1996). Thus the dataset is split into two independent subsets, both composed of elements of known classes, one called training set that is used to fine-tune the algorithm, and one called test set, that is used to assess the performance of the algorithm in classification.
Given a dataset of n pattern vectors in , assume a partition defined on the dataset, i.e. each pattern vector is assigned to one and only one of k known classes. Besides, let’s assume a Euclidean norm be defined on the dataset. Let be a function from onto the set which, for each pattern vector , j=1,…, n, gives the class it belongs to:
The algorithm presented begins computing the barycentre of each class as the arithmetic mean of all the pattern vectors belonging to a class, yielding an initial set of k barycentres .
Compute the Euclidean distance of each pattern vector from each barycentre . If each pattern vector is closer to the barycentre of its class the algorithm stops, otherwise there will be a non empty set M of pattern vectors which belong to a class and are closer to a barycentre of a different class. In M select the pattern vector that is farthest from the barycentre of its class. The pattern vector will be used as a seed for a new barycentre for class . A k-means algorithm will then be performed for all the pattern vectors in class using, as starting point, the set of barycentre vectors for class and the vector . The k-means algorithm will be iterated until there will be no change in the centroids computed for class .
Once the k-means algorithm converges the set of centroids is composed of k+1 elements. The centroids at the new iterations need not be computed for all classes, but only for class since the clusters in the other classes have remained unchanged. In the following step the distance of each pattern vector from all the centroids is computed anew, and so is the set M of pattern vectors which are closer to a centroid of other classes than to one of its own. If the set M is not empty then the pattern vector in M which is farthest from a centroid of its own class is once again selected to be a seed for a new centroid. Let this pattern vector be , the number of centroids for class is then increased considering the previous centroids and the new seed. For class a k-means algorithm is then performed until there is no change in the allocation of each element of class to a centroid. Once again then the distance of all the elements in the dataset from the updated set of centroid is then calculated and the subset of element closer to a centroid of a different class then to one of their own class is generated. This procedure iterates until the set M is empty. The convergence of the algorithm in a finite number of steps has been proved in various ways (see Nieddu and Patrizi, 2001; Nieddu and Patrizi, 2000; Patrizi, 1979).
Once converged, this algorithm yields a set of centroids which, in the worst case, are in a number equal to the number of elements in the dataset and which has a lower bound in the number of classes. It is worth noticing that if the partition defined on the dataset is consistent with the features considered, i.e. if the pattern vectors are linearly separable, then the algorithm generates a number of centroids equal to the number of classes. On the other hand, if the dataset is not linearly separable, then the algorithm continues splitting the classes until the subclasses obtained are linearly separable. It is obvious that it can continue splitting until all the subclasses are composed of only one vector (singleton). The flow chart for the algorithm is depicted in Figure 1.
The algorithm will not converge only if the dataset is not coherent, i.e.: there are two vectors in the dataset which belong to different classes and are represented by the same pattern vector (see Nieddu and Patrizi, 2000; Nieddu and Patrizi, 2001). In this case the algorithm will not be able to linearly separate these two vectors. This problem can be easily overcome increasing the dimension of the vector space until the two vectors occupy different point in .
Once the algorithm has converged, the sets of centroids can be used to classify new pattern vector of unknown class (query points) assigning the new element to the class of the centroid it is closer to. It is worth noticing that if pattern vectors from the training set are used as query points, then the algorithm always classify them correctly because, once converged, all pattern vectors in the training set are closer to a centroid of their own class.
Fig. 1. Flow-chart of the classification algorithm
The following proposition then immediately follows:
Proposition: (see Nieddu and Patrizi, 2001; Nieddu and Patrizi, 2000) if the dataset is coherent the apparent recognition (McLachlan, 1992) rate of the algorithm is 100%.
Therefore if the training set is large enough to represent all the possible prototypes of the objects under consideration and if the features considered for each objects are sufficient enough to assure coherence of the dataset then the algorithm should be able to correctly classify any new query points.
3. Feature Extraction
Once the objects that need to be classified have been represented into a pattern vector, a fundamental part of every Pattern Recognition algorithm consists in extracting features from the pattern vector, both to get rid of non important features and to reduce the dimension of the vectors the algorithm has to work with (Bellman, 1961).
The aim of this section is to describe the feature extraction technique used to map the pattern vector into a feature vector which is invariant to changes in position, size and rotation. Let’s consider the closed contour of a 2D object. Set a polar coordinate system with pole at the barycentre of the object. Considering a constant increment equal to for the angle of the polar coordinate system, n points are sampled from the contour going counterclockwise.
Therefore the contour of the object is represented by a set of n radiuses . On this sequence of n real numbers the Discrete Fourier Transform (Cooley et al., 1969) has been applied, obtaining another sequence , of n complex numbers describing the structure of the frequencies that make up the contour. To speed up the process the Fast Fourier Transform (FFT) (Cooley et al., 1977) has been used requiring only operations.
The feature vector obtained according to this procedure is invariant to change in position because the origin of the coordinate system is the barycentre of the object. To assure invariance to rotation, considering that a rotation in the object influences only the phase of each element of the FFT (Brigham, 1988; Cooley et al., 1969; Cooley et al., 1977) only the modulus of each complex element of the FFT will be considered in the recognition process.
Invariance to changes in dimension will be attained considering the equivalent radius for each object (Yokono and Poggio, 2004) which is the radius of a circle with the same area of the considered object. Invariance to changes in size will be attained dividing each modulus by the equivalent radius. Therefore the new patter vector will be:
The information in this pattern vector is then transformed using a Karhunen-Loéve (Watanabe, 1965) expansion in order to reduce the size of the vector retaining only the information contained in the first few components of the expansion. This is particularly useful when the data is affected by noise, which can be filtered out considering only the information in the first few components.
4. Results
In this Section the results of extensive experiments on simulated images will be presented. We have generated 2D images of circles, ellipses and dodecagons and we have tested our proposed technique on these images. To compare the performance of the proposed technique the results of Fisher Discriminant Analysis (Duda and Hart, 1973) on the same data have been used.
The dataset is composed of 2700 images, 900 circles, 900 ellipses and 900 dodecagons. The radiuses, the axes and the apothems have all been randomly generated sampling form a uniform distribution between 0 and 2000 with the only constraint that the two axes of the ellipse could not be equal.
The dataset has been split into two subsets: 2400 images have been used for training and 300 images randomly selected from the 2700 have been used for testing. The test dataset is the same throughout all the experiments to make results immediately comparable.
On each image 16 and 32 points have been sampled from the contour and then the 16 and 32 radiuses have been used as pattern vector to represent the image. The experiments have been conducted on this data and on the data perturbed by an additive random Gaussian noise with zero mean and standard deviation equal to 1, 3, 5, and 30 added directly to the sampled radiuses.
On the pattern vector the FFT has been applied and then data have been made invariant to rotation and size as described in the previous Section and the Karhunen-Loéve (KL) transform has then been applied. For the experiments only the first 3, 5, 7, 9 e 11 elements of the KL transform have been retained.
The results of the proposed technique for samples of 16 radiuses are depicted in Table 1. The proposed technique achieves 100% perfect recognition on the non perturbed data.
Table 1. Performance of the proposed technique on invariant features extracted sampling 16 points from the contour of each image.
Performance of the proposed algorithm on a sample of 16 points for each object (invariant data)
Dimension of the feature vector Original Pictures Original Pictures with Gaussian additive noise
No noise N(0,1) N(0,3) N(0,5) N(0,30)
3 100 97.3 94.0 84.3 63.3
5 100 96.7 93.0 88.3 63.3
7 100 97.3 93.3 90.0 64.0
9 100 97.3 93.7 90.0 61.3
11 100 97.3 93.3 89.7 62.7
As expected the performance of the algorithm decreases as the level of noise increases. With a noise with a standard deviation of 30 the algorithm performs quite well with a correct recognition around 60%. The performance does not seem to be affected by the number of features retained from the KL expansion.
In Table 2 the analogous results for samples of 32 radiuses have been displayed. The performance of the algorithm seems to be quite comparable to the one achieved with 16 radiuses, possibly meaning that we have reached Nyquist rate and then 16 radiuses are sufficient enough to capture the shape of the picture and therefore over sampling does not really effect the performance.
Table 2. Performance of the proposed technique on invariant features extracted sampling 32 points from the contour of each image
Performance of the proposed algorithm on a sample of 32 points for each object (invariant data)
Dimension of the feature vector Original Pictures Original Pictures with Gaussian additive noise
No noise N(0,1) N(0,3) N(0,5) N(0,30)
3 100 96.7 90.0 89.7 63.7
5 100 96.7 90.3 90.7 62.0
7 100 96.7 91.3 91.7 67.7
9 100 97.0 92.3 91.0 66.0
11 100 97.0 93.7 90.0 61.7
In Table 3 the analogous results on 16 and 32 radiuses for Fisher Discriminant Analysis applied to invariant feature vectors have been displayed. More precisely, the 16 and 32 sampled radiuses have been transformed using FFT and then have been made invariant to rotation and size. No KL expansion have been applied on this data, mainly because FDA is already based on a dimensional reduction principle (see e.g. Duda and Hart, 1973).
FDA never achieves a perfect recognition rate of 100% on the original data, and then the recognition rate decreased abruptly with the presence noise. It is nonetheless greater than the random recognition rate of 33% but is always worse than the performance of the proposed algorithm.
Table 3. Performance of FDA on invariant features extracted sampling 16 and 32 points from the contour of each image
Performance Fisher Discriminant Analysis sampling 16 and 32 points for each object
Dimension of the feature vector Original Pictures Original Pictures with Gaussian additive noise
No noise N(0,1) N(0,3) N(0,5) N(0,30)
16 89.7 57.0 55.7 54.7 50.7
32 87.3 53.7 55.7 56.0 52.0
To test the effect of the invariant feature extraction technique the algorithms have been applied on the FFT complex output. The feature vectors obtained are still invariant to positioning but are no longer invariant to rotation and size variations. In Table 4 the performance of the proposed algorithm for samples of 16 radiuses has been depicted while in Table 5 the analogous results for 32 radiuses have been displayed.
Table 4. Performance of the proposed technique on the output of the FFT applied on samples of 16 points from the contour of each image (no invariance)
Performance of the proposed algorithm on a sample of 16 points for each object (complex data)
Dimension of the feature vector Original Pictures Original Pictures with Gaussian additive noise
No noise N(0,1) N(0,3) N(0,5) N(0,30)
3 94.3 94.3 86.0 84.3 56.3
5 93.7 94.7 89.0 83.7 57.0
7 93.7 94.7 87.3 84.7 63.0
9 93.7 93.7 90.0 83.7 59.3
11 93.7 95.3 92.0 86.0 60.0
The algorithm performs quite well even on non invariant features, but now the perfect recognition rate of 100% is no longer achieved. This could possibly be due to the fact that the size of each picture could act as a confounding factor for a Euclidean distance based algorithm as is the one presented in this paper.
In Table 5 the results for 32 radiuses are comparable or worse than those obtained with 16 radiuses, except for a noise N(0,30) which requires a higher sampling frequency to capture the shape of the object.
Table 5. Performance of the proposed technique on the output of the FFT applied on samples of 32 points from the contour of each image (no invariance)
Performance of the proposed algorithm on a sample of 32 points for each object (complex data)
Dimension of the feature vector Original Pictures Original Pictures with Gaussian additive noise
No noise N(0,1) N(0,3) N(0,5) N(0,30)
3 93.0 90.7 90.3 82.0 62.3
5 93.7 93.7 88.3 83.0 62.3
7 91.3 94.0 91.0 84.3 60.3
9 91.3 92.7 90.3 84.0 64.7
11 91.3 93.0 90.3 85.3 62.7
In Table 6 the results for FDA applied on the output of FFT have been displayed. The performance of FDA is better than the one obtained on non invariant features but is still worse than the one obtained by the proposed algorithm on the same data. A notable exception is the performance of FDA under high noise (70.3%) with samples of 16 elements which is better than the best results obtained by the proposed algorithm on the same data (63.0%).
Table 6. Performance of FDA on the output of the FFT applied on features extracted sampling 16 and 32 points from the contour of each image. (no invariance)
Performance Fisher Discriminant Analysis sampling 16 and 32 points for each object
Dimension of the feature vector Original Pictures Original Pictures with Gaussian additive noise
No noise N(0,1) N(0,3) N(0,5) N(0,30)
16 93.7 92.7 90.0 85.0 70.3
32 94.3 93.0 93.0 91.7 59.0
5. Conclusions
The technique presented in this paper is a non parametric pattern recognition technique to recognize object with closed 2D contour. The main assumption that has been made is that the shape and pattern in the contour are sufficient characteristics to classify the objects. All other characteristics such as size, position and orientation of the object should then be regarded as confounding factors in the classification process and should be filtered out from the pattern vector in order to obtain a feature vector which contains information only on the shape of the object. This is the case for the proposed simulations study where objects with different contours have been recognized according to the information on the frequency structure of a number of radiuses systematically (Cochran, 1977) sampled from the contour of each image centering at the barycentre.
The technique with the invariant transformations allow to achieve perfect recognition of the object without noise, and more than holds its own when additive Gaussian noise is present.
References
Bellman, R. (1961) Adaptive Control Processes: A Guided Tour. Princeton University Press.
Brigham, E. O. (1988) The fast Fourier transform and its applications. Englewood Cliffs, N.J.: Prentice Hall.
Cochran, W. G. (1977) Sampling Techniques, Wiley New York.
Cooley, J. W., Lewis, P. A. and Welch, P. D. (1969) “The finite Fourier transform”. IEEE Trans. Audio Electroacoustics 17 (2): 77–85.
Cooley, J. W., Lewis, P. A. and Welch, P. D. (1977) “The Fast Fourier Transform and its Application to Time Series Analysis” in Statistical Methods for Digital Computers, Wiley, New York.
Duda, R. O. and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley, New York.
Efron, B. (1982) The Jackknife, the Bootstrap and Other Resampling Plans, Philadelphia: SIAM.
Efron, B. (1983), “Estimating the error rate of a prediction rule: Improvement on cross-validation”, Journal of the American Statistical Association, 78, 316-331.
Gordon, D. (1999) Classification. Chapman & Hall Ltd, London; New York.
Hand, D. J. (1981) Discrimination and Classification. John Wiley & Sons.
Yokono, J. J. and Poggio, T. (2004) “Oriented filters for Object Recognition: an empirical study”. Proceedings of the IEEE Conference on Face and Gesture Recognition (FG2004). Seoul, Korea.
Khotanzad, A. and Hong, Y. H. (1990) “Rotation invariant image recognition using features selected via a systematic method” Pattern Recognition, 23(10) pp. 1089-1101.
Kiryati, N. and Maydan, D. (1989) “Calculating geometric properties from Fourier representation”, Pattern Recognition, 22(5), pp 469-475.
Lowe, D. G. (1999) “Object recognition from local scale invariant features” in ICCV, pp. 1150-1157.
McLachlan, G. J. (1992) Discriminant analysis and statistical pattern recognition. John Wiley & Sons.
Nieddu, L. and Patrizi, G. (2000) “Formal properties of pattern recognition algorithms: A review”. European Journal of Operational Research, 120:459–495.
Nieddu, L. and Patrizi, G. (2001) “Optimization and algebraic techniques for image analysis”. In M. Lassonde, editor, Approximation, Optimization and Mathematical Economics, pages 235 –242. Physica-Verlag.
Patrizi, G. (1979) Optimal clustering properties. Ricerca Operativa, 10:37–60.
Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press.
Vapnik, V. N. (1995) The Nature of Statistical Learning Theory. Springer-Verlag, Berlin.
Watanabe, S. (1965) “Karhunen-Loéve expansion and factor analysis. Theoretical remarks and applications” Transactions of the Fourth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, pp 635-660.
Watanabe, S. (1985) Pattern Recognition: human and mechanical. Wiley, New York.
訂閱:
張貼留言 (Atom)
沒有留言:
張貼留言