ANALYSIS OF DISCIPLINE-CROSSING BASED ON CONTENTS OF BASIC RESEARCH PROJECTS
ZHANG JING; DANG YANZHONG; XUAN ZHAOGUO
Institute of Systems Engineering, Dalian University of Technology
Dalian, 116024, P R China
jingyiqianli@gmail.com; yzhdang@dlut.edu.cn; xgz@dl.cn
Discipline-crossing are very important to the research and development of disciplines. This paper is to find discipline-crossing based on the contents of basic research projects through relative methods of text mining. Four variables are defined to measure the degree of discipline-crossing and the characteristics of one discipline when it crosses with others, including discipline-crossing rate, traceability, simultaneity and propagation. And a case of the cross analysis of three disciplines of management science with other twelve disciplines is given to verify the details of this method.
1. Introduction
The intersection of existential disciplines is a major trend of the development of modern science technologies and contributes to the current scientific progress, major breakthrough of the scientific research and the generation of high and new technologies greatly due to its revolutionary influence. Moreover, the discipline-crossing plays a more and more significant role in development of science and technologies. Thus, it becomes the first priority thing how to find the subjects that cross with each other and what characteristics should the exploring progress depend on.
Discipline-crossing is an inter-discipline research activity (Lu, 2005), which embraces the cooperation of two or more subjects with the same goal. Here the subject is a virtually independent subject system (Chinese National Standard G/T13735-92), which includes three basic elements: research area or object of research, theory system and research methods. Based on the definition of subject, the intercross between subjects is the cross between the elements of these subjects. That is the similar research area, theory system or research methods, etc.
Recent studies on discipline-crossing mainly concentrates on the discipline-cross mechanism (Wang, Shi, 2002; Research Group of Subsidization of Crossed-disciplines,1999), cross-discipline models (Li, Liu, 2004; Wu et al., 2005), discipline-crossing methods (Jin, 2006) and quantitative analysis of it (Zhao, Liu, 2008), etc. Most of these studies intend to find the law of discipline-crossing by analyzing and extracting from the cross-discipline results. However, how to find the discipline-cross draws scientists’ little attention. Discipline-crossing can not necessarily become a new cross-discipline unless it is in a proper environment. The development of cross-discipline did a great meaning to it of science, and appropriate measures should be taken to facilitate the cross-discipline developing progress.
Discipline-crossing improves the intersection and merges different subjects together. In addition, different characteristics of each subject and the different cooperation degree between subjects lead to the different crossing degree. Discipline-crossing rate is defined in this article to measure degree of the crossing between different subjects.
Time-delay must be considered in discipline-crossing. An application of a theory in a different subject can come after the invention of this theory, or a theory originated from a different subject is later than the emergence of the research field. To figure out the time feature of discipline-crossing, three variables are defined in this paper, which are used to measure the lag quality, the simultaneous quality and the advance quality of the subjects crossing with each other separately.
The rest of this paper is organized as follows. Section 2 describes the method of discipline-crossing analysis based on basic research projects. Section 3 presents experimental results and Section 4 concludes the paper.
2. Method of discipline-crossing analysis based on the contents of basic research projects
The process of the method described below can be summarized here, such as the Fig.1, and then you can investigate one by one in detail.
Fig.1. Flow chart of discipline-crossing identification
2.1. Project representation based on Vector Space Model (VSM)
Vector Space Model (VSM) (Salton, et al., 1975) is one of the most important methods for text representation. In the model, each document can be viewed as a collection of different terms and represented by a vector , where tk is a specific term in the document and wk is a weight associated with tk. In order to obtain the weights, various methods have been proposed in the literature. Among these methods, the most common-used one is TF-IDF (Salton, 1988), in which wk is calculated by multiplying term frequency (TF) by inverse document frequency (IDF).
According to VSM, there are two steps to represent the projects as following.
(1) Feature selection. The main purpose of feature selection is finding the best features to represent the proposals, which are the representatives of different projects. Each proposal consists of title, abstract, keywords, and body. In particular, the former three items, including title, abstract and keywords will be analyzed in this study. In addition, features used in this paper are on the basis of the Chinese Classified Subject Thesaurus. It should be noted that stop words are removed from feature collection.
(2) Documents representation. Once features of a proposal are chosen, the VSM of the corresponding projects can be expressed as follows.
(2.1)
Where Wi is the weight of term(feature) Ti , and it is calculated by means of TF-IDF:
(2.2)
Where VTF-i is the frequency that term ti appears in proposal Pi, N is the total number of the projects and Ni is the number of proposals which contains term ti .
2.2. Determination of similar projects
Based on VSM, distance between two documents can be easily computed. There are many methods to measure this distance, such as Cosine similarity, Euclidean distance or Maximum distance (Han, Kamber, 2001). Herein, as an effective and widely-used method in text mining, Cosine similarity will be utilized in this paper.
For a given collection of projects, each project can be represent by Eq. (2.1), and the cosine similarity can be computed as follows.
(2.3)
The bigger the value of Simij is, the more similar between the projects Pi and Pj is. If the similarity is too small, then these two projects are not similar. In light of the proceeding statement, given a threshold (thr(Sim)) of similarity according to experience, two projects are not similar if Simij is smaller than this threshold, and vice versa.
2.3. Identification of discipline-crossing
Two disciplines are crossing if they are similar in research contents. As an abstract system, discipline does not have clear boundary and defined by the basic research projects lying in its range. A proposal with a relevant field subject or a subfield subject can express the crossing of these disciplines. If most of projects in two different disciplines are similar, we can say that these two disciplines have high crossing degree. It is assumed that they can be merged into one discipline or there will be a new research area contained both areas of them.
Based on the above analysis, the number of similar projects between two disciplines can express the degree of the crossing between them. With more similar projects, the degree of discipline-crossing will be higher. The number of projects is different in different disciplines, which also affects the number of similar projects. Taking both factors into consideration, discipline-crossing rate is defined to identify discipline-crossing. It is a ratio of similar projects number to total projects number of two different disciplines.
(2.4)
Where CRij is the discipline-crossing rate of disciplines i and j, Numi is total number of projects in discipline i, Numi,j is the number of projects in discipline i which is similar to projects in discipline j.
Discipline-crossing rate reflects the degree of two disciplines crossing with each other. If the value of the rate is big enough, two disciplines belong to the set of discipline-crossing. A threshold (thr(Cr)) of discipline-crossing rate can be used to determine discipline-crossing. If the value of CRij is greater than or equal to (thr(Cr)), they are crossing, and vice versa.
2.4. Characteristics of discipline-crossing
This paper mainly discusses discipline-crossing over time. With crossing with other disciplines, one discipline appears to be different by means of the different levels of development and the different research contents. For example, one discipline’s theories can be used to the other disciplines, and it is named the spread of this discipline’s knowledge. On the other hand, that discipline which refers to the theory is tracing back to the source of the knowledge. Projects are the representatives of disciplines and the time-delay feature of discipline-crossing can be reflected by the time-series of projects establishment, which are the basis of three variables defined to measure the time-delay feature.
Traceability of a discipline reflects the level it applies others’ theories or methods or research areas for its research contents. Concretely, if a project in one discipline established after that of another and these two projects are similar, this similar-project pair reflects the traceability of the former discipline. So, in the set of similar-projects in certain discipline-crossing, the ratio of that kind of projects pair to total number of the set can reflect the traceability of a discipline when it crosses with another one. Denoted by ST, the calculation of traceability is as follows.
(2.5)
Where STij is the degree of traceability of discipline i traces back to j; m is the number of similar-projects pairs between discipline i and j, (Pki, Pkj)is the kth similar-projects pair, Year(Pki) is the year of establishment of project in discipline i at the kth similar-projects pair, and Year(Pkj)is the year of establishment of project in discipline j in the kth similar-projects pair.
Propagation of a discipline reflects the level of its application of theories, methods or objects to other disciplines. Concretely, in the set of similar-projects in certain discipline-crossing, the earlier establishment projects in one discipline reflect the propagation of it. Denoted by SP, the calculation of propagation is as follows.
(2.6)
Where SPij is the degree of propagation of discipline i being referred by j.
Simultaneity expresses the level of common development of two disciplines. Specifically, if projects in certain similar-project pair are established at the same time, it reflects the simultaneity of the two disciplines. Denoted by SS, the calculation of simultaneity is as follows.
(2.7)
Where SSij is the degree of simultaneity of disciplines i and j, it reflects the mutual learning degree of i and j that happens in the same year.
3. Experimental results and analysis
3.1. Data
The proposed method will be used to identify disciplines (not include given disciplines) which cross with the given disciplines. 9965 proposals of all given disciplines ranging of years from 2003 to 2008 are analyzed as the illustrative example. Table 1 summarizes the data used in this paper.
Table 1. Description of the data
Discipline code The name of discipline Number of proposals
101 Mathematics 526
206 Chemical Engineering and Industrial Chemistry 161
303 Ecology 26
304 Forestry 77
401 Geography 749
504 Metallurgical and Mining 115
506 Engineering Thermal Physics and Energy Utilization 8
508 Building Environment and Structure of Engineering disciplines 461
509 Water Science and Ocean Engineering 303
601 Electronic and Information System 380
602 Computer Science 2095
603 Automation 1673
701 Management Science and Engineering 1063
702 Business Administration 1049
703 Macroeconomic Management and policy 1270
Total All disciplines 9956
The data of table 1 is divided into two groups. One group (denoted by G1) involves 701,702 and 703 which are three disciplines of management science. Another group (denoted by G2) contains the remaining disciplines of other sciences. This experiment aims to find disciplines that crossed with management science. According to the method of Section 2 disciplines of G2 crossed with these of G1 can be identified respectively.
3.2. Results
With a given threshold thr(Sim)=0.5, 161 projects in discipline 701 are similar to 235 projects in other disciplines;78 in discipline 702 are similar to 100 projects while119 in discipline 703 are similar to 157 projects. From the number of similar-projects, discipline-crossing rate is calculated by Eq.(2.5) and concrete values of discipline-crossing rate are illustrated in Figures 2 to 4 as to 701, 702 and 703 respectively.
Fig.2. discipline-crossing rate between 701 and other disciplines
Fig.3. discipline-crossing rate between 702 and other disciplines
Fig.4. discipline-crossing rate between 703 and other disciplines
With a given threshold thr(Cr)=0.02, discipline-crossing can be identified.
It is illustrated in Fig.2 that the discipline-crossing exists between 101, 508, 602, 603 and 701. 701, representing management and engineering, is a subject to research the general principles and special phenomena in the field of management system, economics system by the means of the system science and system engineering methodology. It is the combination of system science, scientific management, economics, computer science, operational research, engineering theory, leadership science, etc (http://bbs.kaoyan.com/thread-1845292-1-1.html). 101, 508, 602, 603 stands for mathematics, architecture environment and structure engineering, computer science and automation respectively. It is no denying that those four subjects are associated with each other from the introduction of the subject - the management science and engineering.
It is shown in Fig.3 that the discipline-crossing related to 702 are 401 and 602. 702 represents business administration, in which the research field of the enterprise information management refers to computer science, so 602(computer science) is one. 401 (geography) is unexpected, but after close analysis, we found that the Logistics and Supply Chain Management in 701 may associate with geography.
It can be seen in Fig.4 that the discipline-crossing related to 703 are 401 and 509. 703 represents macro management and policy, in which agriculture and forestry management, resources environment policy and management and regional development management may involve the subject content of geography (401), as well as water science and Marine engineering (509).
The results demonstrate the practicable of identifying discipline-crossing by the variable of discipline-crossing rate. We must mention that only disciplines crossed with each other are identified in this paper, the specific domains of the crossing are not pointed out. Our further research will focus on that problem.
3.3. Characteristics of discipline-crossing
Table 2 shows the value of traceability, simultaneity and propagation degree of 701 when it is crossed with 101, 508, 602 and 603. The value of traceability degree of 701 and 101 is 0.35, and the simultaneity is 0.19 and the propagation is 0.46. All of the three time-delay attributes are shown when 701 and 101, 508, 602, 603 crossed. The traceability degree of 701 and 101 is greater than the propagation of them. It is seems as a confused consequence. In general, 101 (mathematics) is regarded as a fundamental theoretical subject and many other subjects are originated from it. However, we ignore one point that there are application-oriented researches in mathematics, such as applied mathematics, operational research, etc. Researchers in mathematics also seek practical problems from other areas and find the theory to explain or resolve these problems. Therefore, it is understandable that the degree of 101 tracing back to 701 is greater than 701 tracing back to it. The traceability degree of 701 and 602 surpasses the propagation of them. The traceability degree of 701 and 508, 603 is smaller than the propagation. From this result, we can conclude that management science and engineering introduces the theories, methods or research objects of building environment and engineering and automation into its research, and in this process, discipline-crossing appears between them. Besides, other two disciplines sometimes refer to management science and engineering, and apply them to their research.
Table 2. Characteristic of discipline-crossing between 701 and 101、508、602、603
Discipline Traceability Simultaneity Propagation
101 0.35 0.19 0.46
508 0.52 0.21 0.26
602 0.34 0.26 0.40
603 0.40 0.25 0.35
Table 3 shows the value of traceability, simultaneity, propagation of 702 when it is crossed with 401 and 602. The value of propagation degree of 702 crossing with 401 is 0.62 exceeds the other two features greatly, which are both 0.19. That means discipline-crossing between 702 and 401 (geography) are mainly application of theories, and methods in 702 to 401 or some follow-up studies of 702 continues in 401 while the crossing between 702 and 602 are mainly the application of theories and methods of computer science in 702.
Table 3. Characteristic of discipline-crossing between 702 and 401、602
Discipline Traceability Simultaneity Propagation
401 0.19 0.19 0.62
602 0.63 0.17 0.20
Table 4 shows the value of traceability, simultaneity, propagation of 703 when it is crossed with 401 and 509. The value of propagation is higher than the value of other two features. Discipline-crossing between 703 and 401, 509 are mainly the application of theories, methods of 703 in the other two disciplines. It is not yet higher of the value of propagation than traceability. Therefore, application of theories, methods of 401 or 509 in 703 not only exists when they cross with each other but also exists in the general development of two disciplines.
Table 4. Characteristic of discipline-crossing between 703 and 401、509
Discipline Traceability Simultaneity Propagation
401 0.38 0.15 0.47
509 0.30 0.19 0.52
From tables 2 to 4, we summarize that discipline-crossing mainly concentrates on tracing back to or promote one discipline. The value of simultaneity degree is generally lower than the other two features. The crossing of two disciplines is a process of mutual introducing of theories, methods and research objects. Then finally, the degree of crossing is much greater, even at last maybe the inter-discipline occurs.
4. Conclusions
This paper focuses on the discipline-crossing, which is determined by using vector space model and similarity calculation of text mining of basic research projects. Traceability, simultaneity and propagation of a given discipline are calculated to measure its time-delay property of crossing. Experimental results show the effectiveness of this method and demonstrate that this method is helpful to find discipline-crossing and the laws of formation of discipline-crossing.
References
Lu, Y.X. (2005) “Significance of discipline-crossing and interdisciplinary”, Forum, 20(1), 58-60.
Wang, J.H., Shi, H.Y. (2002) “Analysis of discipline-crossing from the perspective of system science”. Science of Science and Management, 23(12), 05-08.
Research Group of Subsidization of Crossed-disciplines, NSFC. (1999) “Research on the Subsidization of crossed-.disciplines”. China Basic Science,02(4):39-46.
Li, C.J., Liu, Z.L. (2004) “A Research on the mode of interdisciplinary for development in modern science”. Studies in Science of Science, 22(3), 244-248.
Wu, D.Q., Zhang, J., Zhao, H.L., Wu, G.H. (2005) “The cross-subjects’ models and the conditions to promote their development”. Science Research Management, 26(5), 157-160.
Jin, W.Y. (2006) “Analysis of the discipline-crossing method”. Studies in Science of Science, 24(5), 667-671.
Zhao, X.C., Liu, Z.L.(2008) “Quantitative Analysis on the Occurrence Status of Interdisciplinary in Natural Science”. Studies in Dialects of Nature, 24(11), 101¬-105.
Salton, G., Wong, A., Yang, C. S. (1975) “A vector space model for automatic indexing”. Communications of the ACM, 18(11): 613-620.
Salton, G., Allan, J., Buckley, C. (1994) “Automatic structuring and retrieval of large text files”. Communications of the ACM, 37(2):97-108.
Han, J., Kamber, M.(2001) Data mining: concepts and techniques, Morgan Kaufmann.
訂閱:
張貼留言 (Atom)
沒有留言:
張貼留言