km working group: MODELING EARLY WARNING SYSTEM

MODELING EARLY WARNING SYSTEM TO PREDICT TERRORIST THREATS: PRILIMINARY RESULTS
PIR ABDUL RASOOL QURESHI
Counterterrorism Research Lab
The Maersk Mc-Kinney Moller Institute
University of Southern Denmark, Campusvej 55, DK-5230, Odense, Denmark
E-mail: parq.jad@gmail.com

UFFE KOCK WIIL AND NASRULLAH MEMON
Counterterrorism Research Lab
The Maersk Mc-Kinney Moller Institute
University of Southern Denmark, Campusvej 55, DK-5230, Odense, Denmark
E-mail: ukwiil@mmmi.sdu.dk , memon@mmmi.sdu.dk

This paper presents a model for early warning system to detect terrorist threats. The model is based on data collection from open source intelligence methodology. We present a concept of generation early warning to predict terrorist threats. Our ideas rely on investigative data mining, and study of complex covert networks to extract useful information for terrorist threat indication. Presented model can be used as a core framework for early warning system.
1. Introduction

Intelligence agencies use many resources to collect information from various sources (Figure 1). Much of the valuable information is publicly available (one estimate is 80 % (Steele R.D., 2006)). However, much of the valuable open source information is missed due to the limited research and development in this area (Steele R.D., 2006). Knowledge about the structure and organization of terrorist networks is important for both terrorism investigation and the development of effective strategies to prevent terrorist attacks (Memon N., 2007). However, except for network visualization, terrorist network analysis remains primarily a manual process (Xu J., Chen H.; 2006). Existing open source intelligence analysis tools do not provide advanced structural analysis techniques that allow for the extraction of network knowledge from terrorist information. In addition, open source intelligence analysis is faced with several challenges. Relevant information must be found among the enormous amount of open source information available. The collected information may be noisy: it may not be complete; it may be wrong; it needs to be filtered; etc. Once the relevant information is found and filtered, it needs to be analyzed, verified, and visualized in a comprehensive manner.
It is noted that that open source intelligence analysis is not a substitute for traditional classified work. However, analysis of open source information can help to augment the analysis results available from classified information, hence providing the intelligence analysts with better support of decision making.

Figure 1: The importance of open source information (Steele R.D., 2006)

In this perspective, we have established Counterterrorism Research Lab at The Maersk Mc-Kinney Moller Institute. The overall objective of the Lab is to specify, develop, and evaluate novel tools and techniques for open source intelligence in close collaboration with intelligence analysts. The tool philosophy is that the intelligence analysts are in charge and tools are there to assist them. Thus, the purpose of the tools is to support as many of the knowledge management processes as possible to assist the intelligence analysts in performing their work more efficiently.
In this context, efficient means that the analysts arrive at better analysis results much faster. In general, the tools fall into two categories: (1) Semi-automatic tools that need to be configured by the intelligence analysts to perform the dedicated task. (2) Manual tools that support the intelligence analysts in performing specific tasks by providing dedicated features that enhance the efficiency when performing manual intelligence analysis. In this paper, we discuss and demonstrate model for generating early warnings to predict terrorist threats. The preliminary results of the proposed model are also shown to the readers.
The model is an extension of iMiner prototype, considering the limitation of the prototype (Memon N., 2007), we try to extend the new system on the feedback given by the counterterrorism experts.
2. The State of Art
Several knowledge management processes, tools, and techniques are relevant in the context of counterterrorism as shown in Figure 2 (Wiil, U. K., Memon, N., and Gniadek, J., 2009). Overall, the processes in the leftmost column involve acquiring data from various sources, the processes in the middle column involve processing data into relevant information, and the processes in the right most column involve further analysis and interpretation of the information into useful knowledge that the intelligence analysts can use to support their decision making.

Figure 2: Knowledge management processes for counterterrorism

The iMiner prototype (Memon N. et al., 2009) includes tools for data conversion, data mining, social network analysis, visualization, and for the knowledge base (Wiil, U. K., Memon, N., and Gniadek, J., 2009). iMiner incorporates several advanced mathematical models and techniques useful for counterterrorism like subgroup detection, network efficiency estimation, and destabilization strategies for terrorist networks including detection of hidden hierarchies (Memon N., 2007). In relation to iMiner, several collections of authenticated datasets of terrorist events that have occurred or were planned have been harvested from open source databases (i.e., www.trackingthethreats.com). iMiner’s models and algorithms have been validated using these datasets (Memon N. et al, 2009).
Counterterrorism research approaches can be divided into two categories: data collection and data modelling. The Dark Web Project conducted at the AI Lab, University of Arizona (Professor Chen) is a prominent example relating to the data collection approach (Chen H. et al., 2008), (Chen H. et al, 2008). The Networks and Terrorism Project conducted at the CASOS Lab, Carnegie Mellon University (Professor Kathleen Carley) is a prominent example relating to the data modelling approach (Carley K. M. et al, 2007) and Tsvetovat M., and Carley K. M., 2005). The proposed research Early Warning System (EWaS) combines the data collection and data modelling approaches into holistic prototypes for open source intelligence (like iMiner). The proposed research involves techniques from disciplines such as data mining [(i.e., (Mena, J., 2003) and (Devlin, K., and Lorden, G., 2007)], social network analysis, [(Carpenter, M. A., and Stajkovic, A. D., 2006) and (Gloor, P. A., and Zhao, Y., 2006)], hypertext [(Shipman, F. M., 2001) and (Engelbart, D. C., 1962]), visualization, [(Thomas, J., and Cook, K., 2006) and (Xu J., Chen H., 2005)] and many others. To our knowledge, no other approaches provide a similar comprehensive coverage of tools and techniques to support advanced terrorist domain models. Thus, the proposed holistic research approach to support intelligence analysis is considered unique.
3. Proposed System
Early warning as the name employ, is a set of techniques used to alert intelligence agencies/ law enforcement personnel that an unauthorized intrusion is about to occur or that the protected site is under surveillance. Identifying terrorist threats requires a large spectrum of data which in many cases are collected from various sources. The process of unification, fusion and interpretation of the collected data is crucial due to data redundancy and specially to enable accurate predictions (Najgebauer N, et al, 2008). In this paper we propose an early warning system which is aimed to generate early warnings against the possibility of terrorists carrying out any act of terrorism. The target can be achieved by closely monitoring the information about the terrorist and the entities associated with them. To achieve such objectives, besides knowing the existing terrorist and their connections, the identification of new entities and the new connections between the entities are of premiere importance. All of this information can be retrieved from a wide range of heterogeneous data sources like information may exists within the data buried in the servers of any governmental or private organizations, or in form of text of some web sites on Internet, in form of any news item, or in the form of data present in some RSS feed file or may be in the brain of some investigative working on the field. Thus providing mechanism to data being acquired from at least all of the discussed data sources is within the scope of the project.
As the acquisition of the data is very important, but does not alone suffices to achieve the task, we need to transform the acquired data into some data structure, so that complex computation can be performed and detailed investigation can take place. The suited data structure in this case is graph. If we transform this data into graphs, then geodesic measures and terrorist network analysis tools may help us to achieve the solution. Thus the visualization of graphs is also an implicit of requirement of the project. Once we get our hand on data, we can semantically analyze the data to extract the information of our interest from it and the transform such information into graphs of entities and relations among them, since the extraction of such information (semantic analysis) from all of the data can be too expensive. We may filter the data on basis of known entities, just to narrow down the scope of our semantic analysis operation. This operation may yield us entities and their relations that may have no importance in our context. We need to again filter out this unwanted information and then publish the rest of usable information which will be further investigated to generate warnings.

Together with generating warnings, we provide some mechanisms to just help out manual analysis and investigation over such information, so that it can be utilized fully. We provide investigators some interface to perform link analysis, study geodesic and terrorist network analytic properties of a network, smart search any entity and identify different patterns in terrorist networks and place triggers to generate the warnings to on their presence in any other future networks.
3.1. Data Processing Cycle
There are five major steps to process the data from heterogeneous data sources to generate warnings (if any) as shown in the Figure 3. The phases are discussed below:

Figure 3: Data Processing Cycle
3.1.1. Acquisition Phase
The first step is acquisition phase, in which we just input all of the data from heterogeneous data sources and then weight it for presence of any of the known keywords (keywords can be entities, their relations, their living places etc.). The weight indicates the relativity of data in our context. More the weight, it is more likely to be containing information we need. If the data worth analysis, it is being short listed for semantic analysis. If the information is in language other than English, it is being translated to English in acquisition phase. In this phase implementation processes shown in the left most column of Figure 2 takes place, where we acquire data from different data sources.
3.1.2. Extraction Phase
As soon as data is being short listed for semantic analysis, it is analyzed semantically to extract information hidden in the data. The extracted entities and their relations are kept in the data store. As all of the entities and their relations are identified during extraction process, and since the information may contain some unwanted entities or relations, the data store in which this information is kept is called dirty database. It is here, we transform the information within data into graph data structure.
In this phase together with the Information Generation Phase (defined in the following paragraph) implements the processes shown in middle column of Figure 2. In this implementation, thorough semantic analysis of data, identification of patterns which can be used to transform of data into useful information and filtering of unwanted data, takes place.
3.1.3. Information Generation Phase
In this phase, the information in dirty database is further filtered and only those entities and relations which related to our domain are selected and then published using publisher application.
3.1.4. Investigating Phase
In this phase the different investigations are being made. The information is evaluated with the terrorist network analysis techniques like dependence centrality, position role index, and also geodesic measurements like average path length, clustering coefficient, density (Memon N., 2007) are applied. Role Analysis (detecting different roles in a network) would be in this phase, to identify different roles assigned to different nodes of a graph (network/ cell), which is a necessary step to estimate the outcome of a particular terrorist activity. In this step, we track and monitor changes in the different characteristics of a terrorist network over a particular span of time. Also the same data is made available to investigators for manual investigation. Interfacing this information with some investigation frameworks like iMiner (Memon N., 2007) could be made available to carry out the detailed investigation in this phase.
3.1.5. Warning Generation Phase
If the warning generation engine encounters any shift in measurements and characteristics of the graph, and the extent of shift is high as to a dangerous level, the warnings are generated. The warning engine can also identify the presence of user identified patterns within graphs to generate warnings. The users are provided with access to warning generation engine, to input and test their researched patterns and theories, all of which are kept with warning generation engine and are used in warning generation phase.
The cycle starts from acquisition phase by retrieving the document of interest i.e. containing information about any entity in our domain; we extract information lying in that document, investigating it using social network analysis techniques and generate warnings if information matches criteria of any of the warning generating rules. Warnings are sent to users and the link of the data source is preserved in our database. We use these links which are saved in the last stage to acquire the same documents and continue with the rest of cycle after a definite time period to accommodate change against update, if any.
In investigation and warning generation phases, we interpret and investigate the information gathered so far and analyze the consequences to generate the warnings. These processes are shown in right most column in Figure 2.
The phases discussed above are managed through system architecture discussed in the next section.
4. System Architecture of Early Warning System
The early warning generation system is actually a system of sub systems, all working together to assist warning generation engine to do its part i.e. generate warnings. Each of these sub systems can be network applications running in clustered environment with the standards of parallel processing to simple web applications for user interfacing.

Figure 4: System Architecture
The main parts of system architecture shown in Figure 4 are discussed in the following sections.
4.1. Acquisition Cluster
The main application running over acquiring cluster will be acquisition system, which is responsible of acquiring information from Internet as well as many government and private databases available online. It is actually a system of wrappers and interfaces corresponding to the data sources, which in case can be extended to support any type of the data sources, if necessary. The acquisition application also interfaces search engines to search web about the information for any of the required entity and may contain the specialized routines to carry out dark web analysis. Since special time oriented treatment of news web sites, and digging the RSS feeds are within the purview of the scope of acquisition system, it is equipped with news-base (A framework to interface news websites) and smart RSS readers. The information from the data sources which are presenting information in any language other than English is not only acquired but also translated to English by acquisition system with the help of clients and consumers of translation services which are available online like Google translations. The acquisition system also short lists the related information for further semantic analysis on basis of known keywords kept in keywords database usually updated by the extraction system.
The acquisition system is a parallel computing system compiling the plug-n-play architecture. We can extend the number of acquisition terminals in real time depending on processing needs. The acquisition system will interact with the data source management system, the other system running in the acquisition cluster to identify the data sources, from where data is needed to be acquired.
Data source management system will be over all responsible for maintaining the list of the registered data sources, their access details, the priority list with respect to preference for a data source etc. It also records the different experiences about the data sources like frequency of updating and the structural information about that data source (which part of the page contains the real information) necessary to wisely schedule and smartly control the acquisition system for acquiring information from that particular data source.
4.2. Extraction Cluster
The extraction application performs semantic analysis of short listed data runs over extraction cluster. The extraction application also is a network of computers running in parallel to complete the task. The number of terminals may be extended to feed the processing needs. The extraction application processes the data, semantically identifies the entities and their relations and transforms it into the graph data structure. It also updates the identification of new entities or their relations into keywords database being used by the acquisition system. The extraction application saves all of the extracted information, even containing the information about unrelated entities in a database, which is so called dirty database. The information from dirty database is published into a final database containing terrorist information which is being used by Terrorist Investigation Portal.
The publisher application is the other application which shares the extraction cluster with extraction application. This application runs in any of two modes, automatic or manual. In automatic mode, it evaluates the information in dirty database with a set of rules and the information seems to be complying with the rules will be published automatically. In manual mode, it shows the information to the end user, who can be any domain expert and lets user to publish the entities. The information filtered by publisher application is kept in our production database, called terrorist networks database.
The information in terrorist networks database could be made readily available to all manual investigators and scientists/ researchers and a variety of system by virtual Terrorists Investigation Portal, which is composed of a set of services and a portal application. These services can be consumed by any client, in our context it could be consumed to interface this information from an investigation framework like iMiner to carry of geodesic investigation and terrorist network analysis. The Terrorist Investigation Portal will be an online system open to all registered users, where they can not only visualize the graphs formed and updated automatically on daily basis but also view the geodesic and terrorist network analysis models and patterns present in graph. They can input their investigated patterns and test their theories and other many facilities which are useful in terrorist network analysis.
4.3. Investigation System
The services could be exposed by Terrorist Investigation Portal would be used to enable the proven technologies of any investigation system like iMiner (Memon N., 2007) to investigate the terrorist networks so formed. The investigation may encompasses the social network analysis techniques, i.e. determining the change in dependence of a network on a particular node, role analysis by evaluating the nodes of the network with respect to position role centrality, and determining the roles of different nodes in that network and identifying the change in role with the evolution of a network, measuring values of geodesic centralities like degree, Eigen-vector centrality and calculating the average path lengths, clustering coefficient, density etc., (Memon N., 2007) of a network. All of this information can be used to define a trigger which will be fired to generate the warning if any network is evolved to the desired state.
4.4. Warning Generation System
The warning generation engine with the help the investigation system and keeps on closely watching the changes in any of the geodesic measurements. The changes in any of the measurement may cause the warning engine to generate the warnings. The warning generation engine can be configured in to set thresholds about these changes for generation of warnings. The warnings can also be generated when the engine witnesses the presence of any user identified pattern in the network graph. The changes in relationships of entities or addition of a node to a graph may also result in generation of alerts like any other normal social network sites. All of these warnings are sent to all of the subscribed users, on their subscribed devices like an email client or mobile phones (SMS receiver). The implementation of a part of the system architecture is discussed in the next section.
5. Preliminary Results
We started working on the proposed model to generate early warning to detect terrorist threats. In this context, presently we have a working prototype which is capable of analyzing links between entities; calculate the geodesic and various centralities of a network. Although the identification and import of published entities is dependent on manual effort. Initially we worked over the open source databases, for example, www.trackingthethreats.com, extracted the entities and their relations and imported all of the information into terrorist networks information database.

Figure 5: The Terrorist Investigation Portal

Using terrorist network analysis techniques (Memon N., 2007), we can easily draw a link chart and analyze a network from social network analysis perspectives. Figure 5 is an example of thwarted terrorist plot known as Bonjika as shown in Figure 5.
During this study, we made possible extensions in iMiner prototype; the new prototype is a web-based application. Now user can add, delete, edit, and modify entities and relationships from a network according to time the network grows or shrinks. In addition, all the mathematical models implemented in iMiner (Memon N., 2007) are available in the new system. Figure 6 shows hidden hierarchy of the Bonjika plot, where it is crystal clear that Khalid Shaikh Muhammad (KSM) is leading one of the large cell of terrorists.
We are working with counterterrorism experts and members of intelligence agencies to design and develop early warning system as shown in Figure 4.

Figure 6: Hierarchical chart of Bojinka Plot(Memon N., 2007)

6. Conclusion
There are quite a few implementations of early warning systems in market like MediSys, EWM (Early Warning Monitor), for more details readers can visit http://globesec.jrc.ec.europa.eu/publications/brochures/brochures/LB7606422ENC.pdf. These systems mainly work on the principles of acquisition of data from online sources, usually news sites and carrying out frequency based analysis of entities, i.e., number of appearances of any entity with a particular time span. The inclusion of social network analysis tools like uncovering hidden hierarchies (Memon N.; Larsen H.L.; Hicks D.; Harkiolakis N., 2008) and a detailed centrality based analysis like dependence centrality (Memon N.; Hicks D.; Larsen H. L., 2006) may enhance the effectiveness of such systems.
In this paper we presented a model and a prototype of early warning system to detect terrorist threats which can be developed into a professional analysis tool. It requires lot of time and efforts to build whole system in coming few years.
References
Carpenter, M. A., and Stajkovic, A. D. 2006. Social network theory and methods as tools for helping business confront global terrorism: Capturing the case and contingencies presented by dark social networks. Corporate strategies under international terrorism and adversity. Edward Elgar Publishing.
Carley K.M. et al. 2006. Toward an Interoperable Dynamic Analysis Toolkit. Decision Support Systems 43(4): 1324-1347.
Chen H. et al. 2008, Terrorism informatics: Knowledge management and data mining for homeland security, Springer.
Chen H. et al. 2008. IEDs in the Dark Web: Genre Classification of Improvised Explosive Device Web Pages. In proc. IEEE ISI 2008.
Devlin, K., and Lorden, G. 2007. The Numbers Behind NUMB3RS: Solving Crime with Mathematics. Plume.
Engelbart, D. C. 1962. Augmenting Human Intellect: A Conceptual Framework, Summary Report AFOSR-3233, Stanford Research Institute.
Gloor, P. A., and Zhao, Y. 2006. Analyzing Actors and Their Discussion Topics by Semantic Social Network Analysis. Information Visualization. IV 2006, pp. 130-135.
Memon, N., Wiil, U. K., Alhajj, R., Atzenbeck, C., and Harkiolakis, N. 2009. Harvesting Covert Networks: The Case Study of the iMiner Database. Accepted for the International Journal of Networking and Virtual Organizations (IJNVO). InderScience (to appear).
Memon N.; Larsen H.L.; Hicks D.; Harkiolakis N. 2008. Detecting Hidden Hierarchy in Terrorist Networks: Some Case Studies: Lecture Notes in Computer Science, vol. 5075/2008, pp. 477-489.
Memon N.; Hicks D.; Larsen H. L.; Uqaili M.A. 2007. Understanding the Structure of Terrorist Networks, In International Journal of Business Intelligence and Data Mining, vol. 2(4), pp. 401-425.
Memon N. 2007 “Investigative Data Mining: Analyzing, Visualizing and Destabilizing Terrorist Networks”, PhD dissertation, Aalborg University.
Memon N.; Hicks D.; Larsen H. L. 2006. How Investigative Data Mining Can Help Intelligence Agencies to Discover Dependence of Nodes in Terrorist Networks. Lecture Notes in Computer Science, vol. 4632/2006, pp. 430-441.
Steele, R. D. 2006. The Failure of 20th Century Intelligence. www.oss.net/FAILURE.
Shipman, F. M., Hsieh, H, Maloor, P., and Moore, J. M. 2001. The Visual Knowledge Builder: A Second Generation Spatial Hypertext, In Proc. of the ACM Hypertext Conference, pp. 113-122. ACM Press.
Thomas, J., and Cook, K. 2006. A Visual Analytics Agenda. IEEE Computer Graphics and Applications 26(1), 10–13.
Tsvetovat M., and Carley K. M. 2005. Structural Knowledge and Success of Anti-terrorist Activity: The Downside of Structural Equivalence. Journal of social structures 6(2).
Wiil, U. K., Memon, N., and Gniadek, J.. 2009. Knowledge Management Processes, Tools and Techniques for Counterterrorism. International Conference on Knowledge Management and Information Sharing (KMIS 2009), (Funchal, Portugal, October). INSTICC Press, pp. 29-36.
Xu J., Chen H. (2005) “CrimeNet Explorer: A framework for criminal network knowledge discovery ”, ACM Transactions on Information Systems, Vol. 23(2), pp. 201-226.

km working group

2009年12月10日星期四

MODELING EARLY WARNING SYSTEM

沒有留言:

張貼留言

標籤

網誌存檔

關於我自己