A CLUSTERING METHODOLOGY FOR INDUSTRY CATEGORIZATION BASED ON TIME SERIES ANGLE VARIATION
DABIN ZHANG
Department of Information Management, Huazhong Normal University ,Wuhan 430079,P.R.China
Institute of Systems Science, Academy of Mathematics and Systems, Chinese Academy of Science, Beijing 100190, P.R.China
E-mail: zdbff@yahoo.com.cn
HAIBIN XIE and SHOUYANG WANG
Institute of Systems Science, Academy of Mathematics and Systems, Chinese Academy of Science, Beijing 100190, P.R.China
Analyzing the relationship among business cycles of various industries is important not only for government decision-making but also for personal portfolio diversification choice, it is therefore of great importance to classify industrial categories and analyze their relationship and linkage mode based on their business cycle index. Based on time series angle variation, a clustering method is proposed to classify the industry. The proposed method can overcome the shortcomings of traditional distance similarity in the translation and fluctuations operation, thus better reflecting the time sequence trends. For verification purpose, the business cycle indices of ten industries from CEInet statistical database are used for empirical analysis. Experimental results revealed that the proposed angle variation based clustering approach can not only obtain better categorization results than traditional clustering approaches, but also provide an important reference in understanding the relation between industries for investors and regulators
1. Introduction
Business cycle of industry reveals the principles of involvement, providing entrepreneurs and investors with a scientific tool to better understand the economy trends and capture the economy turning point (Zhang et al. 2009). Since all relevant information transmission mechanism and sensitivity are reflected into business cycle, business cycle is indeed comprehensive. Thus, how to applying historical business cycle date to mine linkage relation can help to classify industry. In this paper, we apply clustering method of data mining to cluster industry business cycle index. In order to overcome the shortcomings of traditional clustering in the translation and fluctuations operation, a clustering method based on angle variation is proposed. For verification purpose, the business cycle index of ten industries from CEInet statistical database are used for empirical analysis
2. Literature Review
The industry classification in China are conducted according to the characteristics of economy activities, based on which the China Statistical Bureau in May, 2005 released industry classification standard which is national economy industry classification. Besides, Shang et, al. (2008) adopt input-output-method to divide the productive service industry in China. Niu et, al. (2007) employ factor analysis and clustering method to investigate the manufacturing industry in China. Guo et, al. (2003) study the industry classification standard from both management and investment prospective, proposing that some must-standards should be considered in compiling the unified index. Based on above theory, Yang et, al. (2004) classify the list companies in China from investment prospective, demonstrating that their classifying results are better than that based on management prospective. There is voluminous literature on industry classification based on industry linkage principles (Yuan et,al. 2008).
3. Clustering Method Based on Angle Variation
3.1. Classical Time Serial Clustering Methods
The key points in time serial clustering method lie in time serial approximate representation and similarity measure. The classical methods contain such direct distance methods as the Euclidean distance, the Manhattan distance, and the Minkowski distance (Goldina et,al. 2004). Methods such as the Fourier transform (Wu et,al. 2000), ARMA parameters and the time warping model and so on are also included in the classical methods (Fraley et,al. 1998, Lee et,al. 2004, Vlachos et,al. 2002). However, all these methods in reality have their own drawbacks in time serial data mining. For example, the detailed fluctuation is great disturbance for studying the time sequence trends; Euclidean tends to be disturbed by fluctuation; though time warping method could well overcome the deficiency in Euclidean method, its complexity in algorithm limits its application.
3.2. Angle Series Representation
The original series can be transformed into angle form as ,where ,denotes the time , . can be further transformed into ,where ,see figure 1.
Angle Series Representation can map ,where n is the original time series length. Therefore, Angle Series Representation method not only maintains the trend in original series but also reduce the dimension.
3.3. Clustering Procedures Based on Angle Variation
The procedures can be described as follows:
Step 1,transforming the original series X into 。
Step 2,transforming the into standard form。Besides the standard deviation and the range methods, there is anther standardization method, called shifting method. is shifted into ,namely add an constant to each term in ,physically it means the included angle towards the time-axis for every term in ,see Figure 2.
Step 3, establishing the similarity matrix.
Step 4,clustering by transitive closure、net-making method、maximal tree method、or K-means.
4. Clustering study on ten business cycles of industry in China
4.1. Data Study
The data collected are from 2002-03 to 2009-03. An empirical business cycle study is done on top ten industry: mining industry, manufacturing industry, electricity, gas and water production and supply industry (EGWPS), building industry, traffic, transportation, warehousing and postal industry(TTWP), wholesale and retail industry(WR), realty industry, social service industry(SS), computer information transmission and software industry(CITS), and the lodging and catering industry(LC). Table 1 presents the data we collect, and Figure 3 plots the time series curve.
Table 1 Top ten business cycle
Mining Manu-facturing EGWPS Building TTWP WR Realty SS CITS LC
2002-03 131.1 118.3 133.1 108.8 117.6 125.3 115.6 143.1 122.2 101.9
2002-06 146.3 130.1 136.8 123 109.6 127.2 118.4 140.3 116.2 108.7
2002-09 148.5 130.3 138.7 122.9 118.8 127.4 117 140.9 121.9 114.7
2002-12 156 134 134.9 123.9 116.2 129.5 117.2 141.7 126.1 114.4
2003-03 151.8 133 137.4 109.3 123.8 131.9 125.4 151.8 135.7 115
2003-06 150.5 123.2 141.2 118.8 62.4 122.6 65.7 137.4 112.7 23.2
2003-09 154.4 134.3 143.4 124.8 127 130.6 116.5 147.2 128.6 122.4
2003-12 157.2 137.6 142.8 128.7 126.3 131.3 112.8 152.9 135.9 121.5
2004-03 150.9 137.9 134 121.2 136.9 135.9 121.7 157.1 139.7 118.5
2004-06 157.3 132.9 136.5 130.6 131.2 133 125 152.1 138.8 129.3
2004-09 162 132.5 140.7 130.3 132.7 132.4 128 151.8 137.5 127.1
2004-12 162.9 134.4 138.1 131.2 127.4 132.9 122.8 156.5 133 126.9
2005-03 159.1 130.8 130.1 121.9 136.5 134.6 122.5 152.3 141.8 118.3
2005-06 162.8 128.4 137.2 128.7 130.7 126.6 126 154.2 135 125.7
2005-09 156.2 128.9 138.1 130.9 130.3 123.7 126.8 154.7 136.7 126.2
2005-12 162.5 128 139.6 132.3 125.9 127.6 121 153.7 138 123.6
2006-03 159.1 127.9 138.5 124.2 128.3 133 122.9 153.8 141.2 118.9
2006-06 162.6 134 138.5 135.2 122.5 135 128.1 153.9 141.6 127.2
2006-09 159.6 133.1 142.9 139.4 131.2 132.6 129.3 156.4 142 129.8
2006-12 161 138.3 145.6 138.1 125.3 133.7 127.6 157 144.6 130.3
2007-03 157.9 139.3 137 127.7 137.7 138.8 128.1 155.1 151.7 125.1
2007-06 162.6 145.8 148.3 142.4 137.7 141.1 136.9 158.5 152.1 131.4
2007-09 160.1 142.2 149.9 143.2 139.2 140.3 139 160 154.7 130.2
2007-12 163.9 141 147.7 146.4 133.4 140.3 135.5 160.3 152.2 129.5
2008-03 153.7 131.4 137.2 136.7 135.1 132.2 132.2 159 153.2 123.3
2008-06 163.9 134.1 134.1 144.2 129 131.8 124.8 162.9 147.5 121.3
2008-09 162.5 123.7 122.1 135.3 119.6 118.9 127 147.6 143.6 119
2008-12 113.8 96 108.8 134.3 95.2 101.7 112.9 143.8 127.2 111.3
2009-03 105.6 98.4 108.8 115.7 104.5 100.9 105.7 147.7 122.6 98.4
4.2. Business Cycle Clustering
The number labeled as 1, 2, 3…, 10 along the abscissa axis are corresponding to mining industry, manufacturing industry, electricity, gas and water production and supply industry, building industry, traffic, transportation , warehousing and postal industry, wholesale and retail industry, realty industry, social service industry, computer information transmission and software industry, and the lodging and catering industry.
From the results in Figure 3 and Figure 4, it can be observed that though not close in quantity, time series classified into the same group have similarities in decreasing or increasing trends. The clustering results based on traditional method simply classify the close series into the same group, totally ignoring the trend in the series. For example, the traditional method failed to classify the series 2 (whole sale and retail industry), 6 (social service industry) and 8 (manufacturing industry) into the same group, though the three series have similar trends, confirming the success of the angle variable quantity in overcoming such drawbacks of the traditional method, demonstrating the efficiency of new method in capturing the trend implied in series.
It can be observed from figure 4 that the manufacturing, wholesale and retail industry can be categorized into one group, which does not mean the three industries have consistencies in industry structures, industry properties or product characteristics, but only demonstrate that these three industries have similar trends in business cycle. Therefore, instead of providing a industry partition standard, business cycle clustering only help us to better understand the interrelationships among different industries, offering a good guidance for portfolio choice.
5. Conclusion
Business cycle of industry is an important field in macroeconomic study. however, literature on industry interrelationships is still scarce. In this paper, based on angle variation time series clustering method is proposed and applied to industry clustering in China. The clustering results confirm the efficiency of new method, providing a good guidance for both policy-making and portfolio choice. However, there is a drawback in this paper, as we fail to give proper explanation for the clustering results, which of course needs our further study. Besides with the fast development of socialist market economy and the further adjustment of industrial structures, especially after experiencing the subprime crisis, a lot of new industries are supposed to merge,which will lead to change in industry interrelationships. Therefore, we should renew our knowledge on Business cycle of industry correlation.
Acknowledgments
This work was partially supported by the China National Natural Science Foundation (Grant No. 70971052), the China Post Doctor Foundation (Grant No. 20080440539).
References
Fraley,C.,Raftery,A.E.(1998) “How many cluster? Which clustering method? Answers via model-based cluster analysis”, The Computer Journal, (41):578-588
Goldina, D.Q., Millsteinb, T.D.,Kutlua, A.(2004) “Bounded similarity querying for time-series data”, Information and Computation, 194(2):203-241
Guo,.P.F. (2003) “The preparation of the reunification of China must implement the Industry Classification Standard Index”,Development Research,(3):51-52
Lee, S.J., Kwon, D., Lee, S.(2004) “Minimum distance queries for time series data”, Journal of systems and software, 69(1/2):105-113
Niu, X.Q., Chen L.(2007) “Research on Chinese partition of manufacture industry based on factor analysis”, Journal of Wuhan University of Technology (Social Sciences Editon), 20(6):792-795
Sang, Y.L., Shen, Y.M., Qiu, L. (2008) “The definition of China's Producer Services and Industrial Classification”, Journal of Capital Normal University ( Natural Science Edition),29(6):87-93
Vlachos, M, Kollios, G., Gunopulos, D.(2002) “Discovering similar multidimensional trajectories”, The 18th International Conference on Data Engineering, San Jose,USA:673-684
Wu, Y.L., Agrawal,D.,Abbadi,E.A. (2000) “A comparison of DFT and DWT based similarity search in time-series database, The 9th ACM CIKM international Conference on Information and Knowledge Management, McLean, VA, November:488-495
Yang, W.J.(2006) “Analyzing the growing and withering of steel industry from the period relationship among related industries”, Finance and Trade Research,(2):17-23
Yang,Z.J., Guo,P.F., Jiao,T.(2004) “China's Listed Companies Industry Classification Standard of the Theoretical and Empirical Study”, Scientology and Science Technology Manage, (1):124-127
Yuan, D., Zhang, Y.M., Du, N., Dong, Q.X. (2008) “The price linkage between coal and electricity based on the industrial linkage theory”, Reformation and Strategy, 24(4):115-117
Zhang, D.B., Yu L., Wang S.Y., Song Y.W.(2009) “A novel PPGA-based clustering analysis method for business cycle indicator selection”, Frontiers of Computer Science in China, 3(2):217-225
Zhang, P., Li X.R., Zhang J.Y., Zhang Z.L.(2008) “Included angle distance of time series and similarity search”, Pattern Recognition and Artificial Intelligence, 21(6):763-767
訂閱:
張貼留言 (Atom)
沒有留言:
張貼留言