MULTI-VIEW OF PATENT KNOWLEDGE VISUALIZATION
WANG GUICAI
Institute of Systems Engineering, Dalian University of Technology
Dalian, 116024, China
lanmengcaizi@163.com
WU JIANGNING1 AND XUAN ZHAOGUO2
Institute of Systems Engineering, Dalian University of Technology
Dalian, 116024, China
jnwu@dlut.edu.cn1 and xzg@dl.cn2
Patent knowledge is the key drive to the technology innovation and product innovation of enterprises. Presenting patent knowledge in a visual way can facilitate many decision-making tasks, such as revealing business trends, inspiring novel industrial solutions, making investment policies, and so on. Current visualization technologies for patent knowledge, however, provide only one perspective including statistic graphs, citation networks, topic maps, etc. In this paper, we propose a combined visualization method for patent knowledge presentation with thematic map, domain ontology tree map and geographic map. It visualizes patent content and the corresponding analysis results in different dimensions, and provides multiple views on patent knowledge and cross-mapping between different views. As a result, the proposed visualization method contributes an overview of patent relationships in a particular domain in a comprehensible way, through which new business trends or threats and new technology innovation opportunities can be discovered easily.
1. Introduction
Patent is the concentrated expression of technical innovation which has becoming a crucial capability for enterprises to exist. About 90% of research results are contained in patent documents, so that patent documents are valuable to industry, business, policy-making communities, and law (Tseng et al., 2007). Patent knowledge is an important decision support for enterprises struggling to compete within their competitive domain. Discovering patent knowledge like new business trends and innovation opportunities in patent literature can keep the product strategy highly competitive and aligned with markets.
However extracting the right information from the lengthy patent literature with rich technical and legal terminologies is very difficult. Moreover presenting the patent content has become more difficult (Kim et al., 2007). An increasingly used mechanism for patent information presentation is the visual 2D maps that can leverage the capability of the human visual system to identify patterns and anomalies.
The advantages of patent knowledge visualization with the map are as follows:
(i) It provides an intuitive display making it easy to detect patterns and irregularities within the patent landscape.
(ii) It shows kinds of relationships of patent documents in a comprehensible way, such as citation relationships, topic relationships and so on.
(iii) It uses a 2D spatial landscape to represent the technology clusters. Therefore, users may benefit from the patents around the area they focus on.
Current visualization methods for patent knowledge presentation with the map, however, provide only one perspective, which results in much more information hidden in the other perspectives being lost.
In order to alleviate the drawbacks of the current patent maps, we propose a combined visualization method for patent knowledge presentation including thematic map, domain ontology tree map and geographic map. The positive effects of multiple views have been documented in several studies (Chimera and Shneiderman, 1994). Guidelines are listed for using multiple views and their potential to reveal correlations and disparities are emphasized in the article by Baldonado et al (2000).
Our multi-view presentation method can visualize the contextualized patent knowledge in different dimensions, providing multiple perspectives on patent knowledge and a cross-mapping between different perspectives. With the help of the multiple views and mapping between them mentioned above, an overview of patents in a particular topic or field can be represented in a comprehensive way. In detail, the multi-view map can not only provide the relationships between patent documents, but also display the area in which the patents are applied and the level at which a patent is of the hierarchy of International Patent Classification (IPC). Therefore, patent researchers or readers could discover new trends and innovation opportunities in the patent literature.
2. Related research
Patent analysis is mainly categorized into two mainstreams: one focuses on the structured information in the patent documents, such as publication number, application number, priority number, publication date and so on, and represents the results in a statistic graph; another one mainly focuses on the textual or semantic analysis on the unstructured information, such as title, abstract, claim and so on, and represents the results in various kinds of maps. The latter helps enterprises or patent readers grasp not only an overview of the technologies in a particular domain but also the relationships between technologies without the specialized knowledge. Therefore, it has attracted more attention from many researchers.
Visualization with the map is an effective way to recognize the regulation among amount of text data. Currently, many researchers have developed several technologies to browse different kind of textual data, and many visualization systems have been developed in various fields.
So far, the patent citation map is widely applied to represent the relationships in patent information. However, such map only uses the structured information in patent documents without the technology content itself, so it is difficult to reveal the technology itself involved in patents and the overall relationships among patent documents without the other views.
The Japan Patent Office has been producing and providing more than 50 types of expressions and more than 200 maps for several technology fields since 1997 (JIII, 2000). Besides, many other countries like Korea (Ryoo and Kim, 2005), Italy (Camus and Brancaleon, 2003; Fattori et al., 2003) and the USA (Steven et al, 2002) also provide many kinds of patent maps.
Artificial Intelligent methods have been introduced into patent analysis as well. Lamirel et al. (2002) proposed a neural network for mapping scientific and technical information (articles, patents) in order to assist a user in carrying out the complex process of analyzing large quantities of such information. Tseng et al. (2007) created a real world patent map for an important domain: US patents whose assignee is National Science Council (NSC) experimentally based on text mining techniques.
All above related work present the patent knowledge in one perspective, ignoring the information from the other dimensions. For patent researchers or readers, getting an overview of patent knowledge from different perspectives is desirable.
3. Three Visual Perspectives
In this paper, we try to visualize the patent knowledge in multiple views and provide a cross-mapping among three different maps, namely thematic map, geographic map and ontology tree map.
In order to construct these three visual perspectives, the patent information extracted from patent document set is mainly analyzed in three dimensions.
(i) Semantic analysis: analyzing the topic of each patent document and the relationships between patent documents by the clustering algorithm.
(ii) Geographic analysis: analyzing the distribution of domain patents.
(iii) Ontology analysis: analyzing the hierarchy of IPC and forming the patent domain ontology.
Each patent document can be organized, indexed, searched and explored in these three dimensions.
3.1. Thematic Map
Once the patent user selects a topic or a keyword he/she is interested in, the patent documents related to the particular domain will be pre-analyzed with a textual analysis tool as well as a hierarchical agglomerative clustering algorithm (Jain et al. 1999), and then the clustering results will be projected onto a 2D plane like a landscape by transformations.
The process of constructing the thematic map of patent documents in Chinese context includes the following three main steps.
(i) Pre-analyzing
Constructing thematic map mainly uses unstructured information in patent documents, namely the title and the abstract. We firstly use the word segmentation method to tokenize the sentences in title and abstract respectively, and then the stop word dictionary is used in the text processing. Afterwards, every patent document can be represented by a set of words it contains.
Hence a patent document vector can be obtained by using a traditional vector space model (VSM) (Salton, 1988). Let tf(d, t) be the absolute frequency of term tT in document dD, where D is the set of documents and T=(t1, ..., tm) is a term set. Thus a document vector can be presented as
. (1)
Where tf_idf(d, ti) is defined by
. (2)
Once tf_idf weighting is applied, we can calculate the similarity between two document vectors di, djD by computing their cosine of angle
. (3)
After constructing the model of patent documents and computing the similarities between patent documents, we can group patent documents by means of the hierarchical agglomerative clustering algorithm.
The process of pre-analyzing is illustrated in Figure 1.
Fig. 1. The process of pre-analyzing
(ii) Constructing 3D thematic map
3D thematic map is generated based on the 2D document distribution computed by the force-directed placement. Firstly, document clustering centroids are randomly distributed in the viewing rectangle, and other patent documents are placed around each centroids with the help of liner iteration force-directed placement by Chlmers (1996). Secondly, each patent document in the layout represents a small peak with a shape of a spherical cap thus having height. The patent documents in a cluster are placed close to each other in a small area according to the similarities between them, and the heights are accumulated to form a larger area. The larger area with different heights represents a cluster, and the peak indicates the core topic. As a result, many mountains are displayed in the viewing area, and each mountain shows a cluster containing many patent documents (see Figure 2).
Fig.2. 3D thematic map
(iii) Projecting onto 2D thematic map with contour lines
In order to project the thematic map onto the 2D plane, we should record the height information and display them in a proper form.
First, several planes with different heights are used to divide each mountain into different parts, and each part has a range of height. The results are displayed in Figure 3.
Fig.3. 3D thematic map divided by several planes with different heights
Generated by orthographically projecting, a 2D thematic map can then be formed. And the cross section in 3D map generates many contour lines, which envelop the most similar patents. As shown in Figure 4, the points indicate the patent documents, and the distances of any two points show the semantic similarity of these two patent documents.
By assigning colors to different heights, the system renders a topography that resembles a contour map (see Figure 4), and Figure 5 is a refined map.
The thematic map allows visualizing massive amounts of patent documents. The topology of thematic map is determined by the textual similarity of patent documents. The peaks of the visual landscape indicate abundant coverage on a particular topic, whereas valleys represent sparsely populated parts of the information space.
Fig. 4. 2D thematic map with contour lines
Fig.5. A refined thematic map
3.2. Geographic Map
Thematic maps show the distribution of particular domain patent documents, from which we can recognize the related patents, namely what is the patent in a particular domain and which is the related patents. But where are the patents invented or applied? Geographic map is motivated by the interactive need when using the thematic map.
By extracting and processing the address information involved in patent documents, we can get the city information every patent belongs to.
And then we provide a geographic map based on the extracted geographical information. Open GIS (Geography Information System) information from National Dynamic Atlas (http://www.webmap.cn/index2.php) provides detail geographic information, including their names, their geographic position (latitude and longitude), administrative divisions, and populations and so on in an easily parseable format. Therefore, we use SharpMap (http://www.codeplex.com/SharpMap), an open source software about GIS, to display the geographic map.
Finally, the geographic information as well as the statistical information extracted from patent documents is projected on the geography map generated by SharpMap. So every patent document can be placed on the just geographic position where the inventor or the patent assignee is located, as shown in Figure 6.
Fig. 6. A geographic map for patent distribution information
3.3. Ontology Tree Map
The domain ontology is built based on the IPC from State Intellectual Property Office of the People’s Republic of China (http://www.sipo.gov.cn/sipo2008/). It contains hyponymy relationships as partly shown in Figure 7.
Fig.7. Ontology tree map
Figure 7 shows the ontology tree map, where the root of the ontology tree map is “H” in IPC. The node on the tree can be expanded if it has sub-nodes, and it can also be folded in order to hide its sub-nodes. In Figure 7, (1) is the initial tree with the three hidden layers; (2) expands into the middle layer; and (3) expands into the leaf nodes.
4. Mapping among three different perspectives
Up to now, an overall perspective with three different maps, i.e. thematic map, geographic map and ontology tree map, can be arrived at.
(i) Thematic map places patent documents according to their similarities on a 2D topography. The thematic map shows the relationships between patent documents, where the peak represents the cluster of documents about a topic and valleys indicate sparsely populated parts of the information space.
(ii) Geographic map indicates the distribution of domain patents according to the geographic information extracted from patent documents.
(iii) Domain ontology tree map summarizes the domain patent classification and their relationships.
These three kinds of maps can be activated simultaneously by triggering the developed mapper (see Figure 8).
Fig.8. Mechanism of synchronizing multiple views
As shown in Figure 8, after a particular patent document set is processed, the mapper maps the analysis results on thematic map, geographic map and ontology map, respectively. Any operations in one perspective can trigger an immediate update of context information provided by the other perspectives. This is the essence of multiple views, which can provide the patent knowledge between any views synchronously.
5. Conclusions
In this paper, we present a combined method of patent knowledge visualization to overcome the drawbacks of the current patent maps. Comparing to the related research work, our method comprehensively applies both the structured and the unstructured information in patent documents, and visualizes the patent knowledge and the corresponding analysis results in different dimensions spanning thematic map, geographic map and domain ontology tree map. Moreover, it emphasizes the cross-mapping between different views and the synchronizing activities triggered by the mapper. As a result, the proposed visualization method contributes an overview of patent relations in a particular domain in a comprehensible way which is beneficial for decision-makers to get more completed patent knowledge from different perspectives simultaneously.
6. Acknowledgements
This work is sponsored by the National Natural Science Foundation of China (NSFC) under grant No. 70771019 and supported by the National High Technology Research and Development Program of China (No.2008AA04Z107).
References
Baldonado, M.Q.W, Woodruff, A. and Kuchinsky, A. (2000) “Guidelines for using multiple views in information visualization”, Proceedings of the working conference on Advanced visual interfaces, 110 – 119
Camus, C. and Brancaleon, R. (2003) “Intellectual assets management: From patents to knowledge”, World Patent Information, 25(2): 155-159.
Chalmers, M. (1996) “A Linear Iteration Time Layout Algorithm for Visualizing High-Dimensional Data”, Proceedings of the 7th conference on Visualization, 127-132.
Chimera, R. and Shneiderman, B. (1994) “An exploratory evaluation of three interfaces for browsing large hierarchical tables of contents”, ACM Transactions on Information Systems, 12 (4): 383-406.
Fattori, M., Pedrazzi, G. and Turra, R. (2003) “Text mining applied to patent mapping: A practical business case”, World Patent Information, 25(4): 335-342.
Jain, A.K., Murty, M.N. and Flynn, P.J. (1999) “Data Clustering: A Review”, ACM Computing Surveys, 31(3): 264-323.
Japan Institute of Invention and Innovation (2000), Guide book for practical use of patent map for each technology field.
Kim, Y.G, Suh, K.H and Park, S.C. (2007) “Visualization of patent analysis for emerging technology”, Expert Systems with Applications, 34 (3): 1804-1812.
Lamirel, J.C., Shehabi, S.A., Hoffann, M. and Francois, C. (2003) “Intelligent patent analysis through the use of a neural network: Experiment of multi-viewpoint analysis with the MultiSOM model”, Proceedings of the ACL-2003 workshop on Patent corpus processing, 7-23.
Ryoo, J.H. and Kim, I.G. (2005). Workshop H – What patent analysis can tell about companies in Korea, Far East Meets West in Vienna.
Salton, G. and Buckley, C. (1988) “Term-weighting approaches in automatic text retrieval”, Information Processing and Management, 24(5): 513-523.
Steven, M., Camille, D., We, Z., Sinan, S. and Yemenu, D. (2002) “DIVA: A visualization system for exploring documents databases for technology forecasting”, Computers & Industrial Engineering, 43(4): 841-862.
Tseng, Y.H., Lin, C.J. and Lin, Y.I. (2007) "Text Mining Techniques for Patent Analysis", Information Processing and Management, 43(5): 1216-1247.
2009年12月9日 星期三
訂閱:
張貼留言 (Atom)
沒有留言:
張貼留言