2009年12月9日 星期三

CONSTRUCTING A MULTI-PHASE

CONSTRUCTING A MULTI-PHASE NEURAL COMBINATORIAL PREDICTOR FOR TIME SERIES FORECASTING
LEAN YU † and SHOUYANG WANG
Institute of Systems Science, Academy of Mathematics and Systems Science,
Chinese Academy of Sciences, Beijing, 100190, China
†E-mail: yulean@amss.ac.cn
KIN KEUNG LAI
Department of Management Sciences, City University of Hong Kong
Tat Chee Avenue, Kowloon, Hong Kong, China
E-mail: mskklai@cityu.edu.hk
In this study, a four-phase neural network combinatorial procedure is proposed for time series forecasting. Some methods for formulating combinatorial predictors – including the methods of preprocessing time series data, generating a set of neural network predictors by varying training data and network type, selecting combination members from a set of individual neural predictors, and combining selected members into an aggregated predictor – are presented. For verification, some real-world experiments are conducted. The empirical results reveal that the proposed multi-phase procedure allows one to design an effective neural network combinatorial forecasting approach for time series prediction.
1. Introduction
Since the influential work of Bates and Granger (1969), combinatorial forecasting techniques have been widely used in many practical tasks with considerable success. Previous studies in various fields have shown that the prediction performance of a predictor can be enhanced when various forecasts are combined (sometimes even in a simple fashion). For a survey of research results on combinatorial forecasts, readers are referred to Clemen (1989) for further details. The approaches used to combine various predictors have recently attracted major interest in neural network community due to the instability of neural networks (i.e., small changes in training set and/or parameter selection can produce large changes in prediction outputs) and the efficiency of combiners (i.e., combinatorial predictors can effectively improve performance). Some theoretical results and empirical applications are provided. Typically, some studies revealed that neural combinations are effective only if the networks forming them make different errors. Hansen and Salamon (1990) showed that neural networks combinatorial classifiers combined using the “majority” rule could increase classification accuracy in pattern recognition problems only if the networks make independent errors. Tumer and Ghosh (1996) pointed out that increases in classification accuracy depend on error-independence far more than on the particular combination method used. The review article by Sharkey (1996) also stressed the fundamental role of error diversity in determining the effectiveness of a neural network combination. Accordingly, many combination methods, including linear methods (e.g., simple averaging (Hansen and Salamon, 1990) and weighted averaging (Breiman, 1996; Benediktsson et al., 1997)) and nonlinear methods (Huang et al., 1995; Yu et al., 2007; 2008), are proposed. Meanwhile, some experimental results reported in the literature have shown that the forecasting accuracy provided by the combination of a set of neural networks can outperform the accuracy of the best single network.
Despite the considerable successes of combinatorial forecasts, most existing work has focused on the research of combination methods and pattern classification problems. A complete neural network combinatorial forecasting procedure for time series prediction has not yet been formulated. For this purpose, this paper tries to construct a multi-phase time series neural combinatorial predictor for supplying a gap. Some methods used to formulate time series neural network combination forecasting procedure are provided, and include the following approaches: methods of preprocessing time series data, e.g., data reconciliation and data transformation; generating a set of individual neural predictors by varying training data or network type; selecting combinatorial members from a set of individual neural predictors; and combining selected members into an aggregated predictors.
The rest of the paper is organized as follows. The next section describes the main procedure of constructing a four-phase neural combinatorial forecasting procedure for time series prediction. In Section 3, two real-world data experiments are provided to explain the proposed procedure. Finally, the conclusions are summarized in Section 4.
2. Main Procedure of Constructing a Neural Combinatorial Predictor
In this section, we first describe the procedure of our proposed multi-phase neural combinatorial predictor. Subsequently we review and propose some design methods for every phase in the process of formulating the neural combinatorial predictor in detail. In addition, some related key issues in every phase are also addressed in this section.
2.1. Overview of Main Procedure
In order to improve the performance of time series forecasting, we propose a multi-phase neural combinatorial procedure for time series forecasting, as illustrated in Figure 1.
From Figure 1, it is easy to see that the proposed procedure consists of four phases, i.e., (1) preprocessing time series data; (2) generating different neural predictors; (3) selecting the appropriate number of single neural predictors from the considerable number of candidate predictors generated by the previous phase; and (4) combining selected neural predictors into an aggregated neural predictor.
In the four phases, time series preprocessing includes data reconciliation and data transformation. Individual neural predictors with weak error-correlation can be generated by varying the network type and network architecture, and using different initial conditions and different training data. In the individual neural predictors, some representative neural predictors are selected as combinatorial members in terms of error-independence measurement. Finally, these selected members are combined into an aggregated predictor in order to predict time series. Subsequently, we review and present some detailed approaches in formulating neural combinatorial predictors.

Figure 1 Main Process of Constructing Neural Combinatorial Predictor
2.2. Preprocessing Time Series Data
Usually, time series data preprocessing includes two tasks: data reconciliation and data transformation.
2.2.1. Data Reconciliation
In neural network learning, using data with different scales often leads to the instability of neural networks (Weigend and Gershenfeld, 1994). At the very least, data must be scaled into the range used by the input neurons in the neural network. This is typically -1 to 1 or zero to 1 (Lou, 1993). Many commercially available generic neural network development programs, such as BrainMaker (http://www.calsci.com), automatically scale each input. Moreover, neural networks always require that the data range is neither too small nor too large, so that the computer’s precision limits are not exceeded. Otherwise, the data should be scaled. Furthermore, data reconciliation helps to improve the performance of neural networks (Lou, 1993). Therefore, data reconciliation is necessary for treating multi-scale data. We propose using the sigmoidal function reconciliation approach in this study.
A sigmoidal function can be used as a data reconciliation method, depending on the characteristics of the data. Usually, a sigmoidal function is utilized, as shown in Eq. (1).
(1)
where ri is used to constrain the range for the ith transformed element of the n-element data set, qi can be selected as the smallest in the ith element of data set, and pi decides the sharpness of the transferred function. Also, Eq. (1) can compress the abnormal data to a specific range. Note that different continuous and differentiable transformation functions can also be selected.
It is advisable to scale the data so that the different input signals have approximately the same numerical range. This is not necessary for feed-forward and Hopfield networks, but is recommended for all network models. The reason is that the other network models rely on Euclidean measures, so that un-scaled data could bias or interfere with the training process. Reconciling the data so that all inputs have the same range often speeds up the training and improves the resulting performance of the derived model.
2.2.2. Data Transformation
In this study, the main aim of data transformation is to transform time series data into a training matrix to satisfy neural network learning. The main reason for this is that reconciled data is still a string of simple numbers for time series, but neural networks are nonlinear dynamic systems with multiple inputs and outputs in general, and many parameters must be specified beforehand. Therefore, the question of how to transform a string of pure numbers into an input matrix vector or how to determine the size and structure of a training set has been one of the key problems in neural network time series forecasting. In existing literature, some studies (Azoff, 1994; Moody, 1995) presented simple transformation methods, for example, the moving window (Azoff, 1994) method. However, these methods often require a large quantity of data, which limits the use of neural network models. In order to overcome the drawbacks of previous methods, we therefore propose a data transformation method with interval variation to create a neural network training matrix.
For a univariate time series with a given size of training set, we can use the data transformation method with variable intervals to obtain a training set matrix provided that an interval value is specified. Supposing that the size of the source data D is M, the size of new training data set is N, and sampling interval is K, we have the matrix construction algorithm, as shown in Figure 2:

Fig. 2 Matrix Construction Algorithms
2.3. Generating Individual Neural Predictors
There are two different types for generating individual neural predictors: homogeneous models using the same network type, and heterogeneous models using the different network type. As mentioned earlier, we need to generate some error-independent neural predictors as the members of combinatorial predictor to reduce error-variance and improve network performance.
For homogeneous neural network model generation, several methods have been investigated for the generation of combinatorial members making different errors (Sharkey, 1996). Such methods basically rely on varying the parameters related to the design and to the training of neural network. In particular, some main methods are listed below:
(a) Different network architectures: by changing the number of hidden layers and the number of nodes in every layer, different neural networks with different architectures can be created.
(b) Different training data: by re-sampling and preprocessing time series data, we can obtain different training sets, thus making different network generations. There are six techniques that can be used to obtain diverse training data sets: bagging (Breiman, 1996), noise injection (Raviv and Intrator, 1996), cross-validation (Krogh and Vedelsby, 1993), stacking (Wolpert, 1992), boosting (Schapire, 1990) and input decimation (Tumer and Ghosh, 1996).
(c) Different learning algorithms: by selecting different core learning algorithms, different neural networks can also be generated. For example, a multi-layer feed-forward network can use the steep-descent algorithm or Levenberg-Marquardt algorithm or other learning algorithms.
(d) Different initial conditions: combinatorial members can be created by varying the initial random weights, learning rate and momentum rate, from which each network is trained.
For heterogeneous neural network model generation, we can create neural network combinatorial members by using different network types. For example, multi-layer perceptrons (MLPs), back-propagation networks (BPNs), radial basis function (RBF) neural networks, and probabilistic neural networks (PNNs) can be used to create the combinatorial members. In addition, neural combinatorial members could be created using a hybridization of two or more of the above methods, e.g., different network types plus different training data (Sharkey, 1996). In our study we adopt such a hybridization method to create combinatorial members. Once some individual neural predictors are created, we need to select some representative members for combination purposes.
2.4. Selecting Appropriate Combinatorial Members
After training, each member of combinatorial predictor has generated its own result. However, if there are a great number of individual members, we need to select a subset of representatives in order to improve combination efficiency. For heterogeneous neural models, we need to select a few typical representatives of the same type by way of some measurements and then combine selected models with different types. For homogeneous models, we select some models with error weak correlation. As earlier mentioned, we tend to use heterogeneous models, but for every different type of model we often generate a set of models with different initial conditions. Thus, in a certain model class with same type, we select a representative model for combination purposes. In the literature, three methods – principal component analysis (PCA) (Yu et al., 2007), choose the best (CTB) (Partridge and Yates, 1996), and choose from subspace (CFS) (Partridge and Yates, 1996) – are used. The PCA is used to select a collection of members from candidate members using the maximizing eigenvalue of the error matrix. The CTB assumes a given size of the final combination C*; then we select the networks with the highest forecasting performance from combined network members to formulate the combination C*. The CFS is based on the idea that for each network type, it chooses the network exhibiting the best performance. It should be noted that the term “subspace” refers to the subset of network models related to a given network type. Here we adopt a new approach called conditional generalized variance (CGV) minimization method (Yu et al., 2008) to select the error-independent network model.
2.5. Combining the Selected Members
Depended upon the work done in previous phases, a set of appropriate combinatorial members can be collected. The subsequent task is to combine these selected members into an aggregated predictor in an appropriate combination strategy (i.e., how to combine the selected members into an aggregate model for final prediction purposes). Generally, there are two combination strategies: linear combination and nonlinear combination strategies.
Typically, linear combination strategy includes two approaches: the simple averaging approach and the weighted averaging approach. There are three types of weighted averaging: the simple mean squared error (MSE) approach, stacked regression (modified MSE) approach and variance-based weighted approach.
The nonlinear combination strategy is a promising approach for determining the optimal neural combinatorial predictor’s weight. The existing literature mentioned two mian nonlinear combination approaches: the neural network-based nonlinear combination method (Huang et al., 1995) and the support vector machine (SVM) based nonlinear combination (Yu et al., 2007). For further details, readers are referred to Huang et al. (1995) and Yu et al. (2007). For further verification, multiple combination approaches will be tested in the empirical study.
3. Empirical Analysis
3.1. Experimental Data and Evaluation Criterion
In this study, two real data examples are presented to illustrate the proposed combinatorial forecasting model. The data set used for our experiments consists of two time series data: the S&P 500 index series, and the exchange rate of the British pound against US dollar (GBP/USD for short) series. The data used in this study are daily and the entire data set covers the period from January 1, 1991 to December 31, 2002 with the total of 3131 observations. We take monthly data from January 1, 1991 to December 31, 2000 as the in-sample data sets (including training set (January 1, 1991 to December 31, 1998) and validation set (January 1, 1999 to December 31, 2000)) and take the data from January 1, 2001 to December 31, 2002 as the out-of-sample data set (i.e., testing set), which are used to evaluate the good or bad performance of predictions based on evaluation measurements. Particularly, the root mean square error (RMSE), which is one of the most important evaluation criteria, is used as an evaluation measurement.
3.2. Experimental Design
The motivations of our experiments were mainly: (i) to evaluate the effective of the proposed combinatorial forecasting approach; and (ii) to compare our proposed approach with other design approaches proposed in the literature.
Considering the first aim, we need to create different initial combinations. For heterogeneous models, combinatorial members were created using different neural network types, namely, multiplayer perceptrons (MLPs), back-propagation networks (BPNs) and radial basis function (RBF) networks. For homogeneous models (e.g., individual BPNs), combinatorial members were generated by varying the network architecture, initial random weights, training data, etc. For simplicity, we report the results related to six initial combinations, here referred to as combinations C1, C2, …, C6 created by the following generation phases:
(a) Combinations C1, C2 and C3 comprised 30 MLPs, BPNs and RBFs respectively. Three architectures with one or two hidden layers and various numbers of neurons per layer were used in the generation phase. For each architecture, 10 training processes with different training data and initial weights were performed. All the networks had six input nodes and one output node corresponding to the data feature and forecasting purpose respectively.
(b) Combinations C1, C2 and C3 comprised 30 MLPs, BPNs and RBFs respectively. Three architectures with one or two hidden layers and various numbers of neurons per layer were used in the generation phase. For each architecture, 10 training processes with different training data and initial weights were performed. All the networks had six input nodes and one output node corresponding to the data feature and forecasting purpose respectively.
(c) Combination C5 comprised 10 BPNs and 20 RBFs. Ten different training processes with different initial random weights and training data were performed to create 10 BPNs. Twenty RBF networks were generated with 10 different training data and 10 different K-means clustering algorithms.
(d) Combination C6 consisted of 10 MLPs (the same as C4), 10 BPNs (the same as C5) and 10 RBF networks (the same as C4)
With regard to the second experimental aim, we compare our proposed approach with another approach. Accordingly, the comparison of the selection phase and combination phase are carried out. In particular, there are four selection strategies in selection phase: our proposed minimizing the conditional generalized variance (CGV for short), PCA, “choose the best” (CTB for short), and “choose from subspace” (CFS for short). There are also six combination strategies used for comparative purposes in the combination phase: the simple averaging method (SAM), simple MSE method (SMSE), stacked regression method (SRM), variance-based weighting (VBW) method, artificial neural network-based combination method (ANN), and our proposed SVMR-based nonlinear combination (SVMR) method.
3.3. Experimental Results and Comparisons
3.3.1. Experiments with Combination C1, C2 and C3
The main aim of these three experiments was to evaluate the effectiveness of our proposed approach in the design of neural network combinatorial predictors. It is worth noting that this is a difficult task, since networks of the same type are poorly independent in terms of Partridge results (Partridge, 1996). Our proposed approach selected combination C1* made up of eight MLPs, C2* made up of seven BPNs, and C3* made up of 11 RBF networks, belonging to different network architectures. Table 1 shows the performance of the different neural network combinations selected using our proposed approach. For comparative purposes, the performance of different combinations with different selection strategies and combination strategies are also reported. The performance was evaluated in terms of RMSE. It is worth noting that all RMSE values reported in Table 1 are based on the testing set.
Table 1 Comparative performance for different neural network combinations C1 - C3
Time
Series Combi-
nation Selection
Strategy Combination Strategy
SAM SMSE SRM VBW ANN SVMR
S&P
500 C1* CGV 0.0205 0.0224 0.0213 0.0189 0.0156 0.0144
PCA 0.0263 0.0255 0.0274 0.0256 0.0182 0.0165
CTB 0.0237 0.0242 0.0264 0.0245 0.0178 0.0161
CFS 0.0219 0.0239 0.0248 0.0228 0.0166 0.0155
C2* CGV 0.0221 0.0208 0.0196 0.0177 0.0145 0.0123
PCA 0.0274 0.0256 0.0228 0.0209 0.0187 0.0165
CTB 0.0261 0.0251 0.0211 0.0198 0.0189 0.0158
CFS 0.0253 0.0244 0.0202 0.0188 0.0172 0.0140
C3* CGV 0.0211 0.0221 0.0209 0.0201 0.0186 0.0158
PCA 0.0254 0.0265 0.0258 0.0269 0.0209 0.0187
CTB 0.0232 0.0251 0.0247 0.0254 0.0189 0.0175
CFS 0.0219 0.0237 0.0216 0.0232 0.0191 0.0166
GBP/
USD C1* CGV 0.0115 0.0123 0.0098 0.0089 0.0078 0.0071
PCA 0.0125 0.0156 0.0117 0.0101 0.0092 0.0085
CTB 0.0118 0.0145 0.0109 0.0099 0.0089 0.0081
CFS 0.0121 0.0133 0.0101 0.0091 0.0084 0.0077
C2* CGV 0.0105 0.0112 0.0101 0.0089 0.0081 0.0067
PCA 0.0125 0.0133 0.0121 0.0106 0.0098 0.0081
CTB 0.0118 0.0128 0.0118 0.0098 0.0091 0.0075
CFS 0.0111 0.0119 0.0112 0.0091 0.0085 0.0071
C3* CGV 0.0088 0.0075 0.0071 0.0068 0.0054 0.0019
PCA 0.0103 0.0096 0.0098 0.0077 0.0068 0.0054
CTB 0.0093 0.0091 0.0085 0.0075 0.0061 0.0039
CFS 0.0101 0.0085 0.0089 0.0072 0.0059 0.0028
From Table 1 we can see that: (1) fixing a certain combination strategy, our proposed approach (i.e., CGV) is slight better, but the gap is small, indicating that the combination with the same type network does not contain error-independent networks that can be selected by a design method to improve performances; (2) fixing a certain selection strategy, our proposed combination strategy (SVMR-based nonlinear combination method) outperforms other combination strategies described in the literature, implying that our proposed combination approach is one of the effective and promising combination methods.
3.3.2. Experiments with Combination C4 and C5
These two experiments were aimed to evaluate to what extent our design method can exploit error-independence to improve the performance of a set of “weak” neural networks (C4) and “strong” neural networks (C5). Therefore, combination C4 consists of 20 MLPs and 10 RBFs whose performances were good (e.g., RMSE<0.02). In addition, introduction of the RBF aims to increase the independence of combinatorial members. In two experiments, our proposed selection strategy extracts four MLPs (characterized by two different architectures and two different initial weights) and two RBF networks with two different kernels to formulate an optimal combination predictor C4* from initial C4. Similarly, two BPNs with different architectures and four RBF networks with four different kernels are extracted from C5 to form another optimal combinatorial predictor C5*. Table 2 shows the performance of the different combinations for the S&P 500 index and the GBP/USD series. Similarly, all the reported values in Table 2 are based on the testing data sets.
Table 2 Comparative performance for different neural network combinations C4 and C5
Time
Series Combi-nation Selection
Strategy Combination Strategy
SAM SMSE SRM VBW ANN SVMR
S&P
500 C4* CGV 0.0179 0.0156 0.0165 0.0177 0.0154 0.0123
PCA 0.0199 0.0189 0.0187 0.0201 0.0175 0.0148
CTB 0.0188 0.0180 0.0172 0.0196 0.0171 0.0133
CFS 0.0181 0.0176 0.0169 0.0182 0.0165 0.0128
C5* CGV 0.0166 0.0145 0.0162 0.0172 0.0148 0.0119
PCA 0.0185 0.0174 0.0189 0.0196 0.0169 0.0138
CTB 0.0179 0.0168 0.0178 0.0185 0.0160 0.0127
CFS 0.0171 0.0152 0.0171 0.0177 0.0155 0.0121
GBP/
USD C4* CGV 0.0081 0.0071 0.0068 0.0065 0.0056 0.0021
PCA 0.0095 0.0088 0.0085 0.0081 0.0069 0.0048
CTB 0.0089 0.0084 0.0078 0.0075 0.0065 0.0041
CFS 0.0085 0.0077 0.0072 0.0077 0.0058 0.0035
C5* CGV 0.0075 0.0070 0.0061 0.0058 0.0044 0.0017
PCA 0.0093 0.0086 0.0078 0.0067 0.0058 0.0044
CTB 0.0081 0.0081 0.0075 0.0062 0.0051 0.0036
CFS 0.0078 0.0074 0.0069 0.0060 0.0047 0.0028
As seen from Tables 1-2, We can find that: (1) the performance of heterogeneous model combinations are generally better than those of homogeneous model combination, comparing Table 1 and Table 2; (2) the “strong” network combination is slightly better, but the difference is small, relative to the “weak” network combination, as shown in Table 2; (3) the results show that our proposed selection strategy and combination strategy consistently outperform other strategies when the performance is as a measurement; (4) the performance of combination C4* also performs better than the combinations with the same network type, comparing Table 2 with Table 1 (this implies that neural network can also be combined effectively from a set of weak networks, but only by a detailed analysis of error independence); (5) the performance of the ANN and SVMR approaches are better than that of the conventional linear combination approach. This also proves the conclusion of Granger and Ramanathan (1984): that the unconstrained least squares method can be applied to obtain a better forecasting performance than that obtained by using the ordinary least squares.
3.3.3. Experiments with Combination C6
The goal of this experiment is basically similar to the previous subsection. Our proposed approach can formulate an effective combination C6* extracting one MLP, one BPN and one RBF network from C6. Table 3 shows the performance of the network combinations. All values refer to the test data sets. It is not hard to see that since our proposed strategies clearly outperform the others, similar conclusions can be drawn as those made in the previous experiments.
Table 3 Comparative performance for different neural network combinations C6
Time
Series Combi-
nation Selection
Strategy Combination Strategy
SAM SMSE SRM VBW ANN SVMR
S&P
500 C6* CGV 0.0105 0.0123 0.0133 0.0158 0.0118 0.0098
PCA 0.0122 0.0158 0.0158 0.0177 0.0125 0.0117
CTB 0.0118 0.0147 0.0149 0.0162 0.0119 0.0115
CFS 0.0109 0.0135 0.0140 0.0165 0.0117 0.0111
GBP/
USD C6* CGV 0.0063 0.0055 0.0047 0.0051 0.0039 0.0009
PCA 0.0075 0.0067 0.0058 0.0065 0.0051 0.0028
CTB 0.0070 0.0061 0.0051 0.0058 0.0048 0.0017
CFS 0.0066 0.0056 0.0052 0.0057 0.0038 0.0014
Comparing the results in Tables 1-3, we can conclude that: (1) in the four selection strategies, the CGV is the best, followed by CFS, CTB and PCA, for of all the combinations C1*- C6*; (2) of the six combination strategies, the nonlinear combination strategy is much better than the linear combination strategy; furthermore, the SVMR strategy is slightly better than the ANN strategy, indicating that the SVMR strategy is a very promising approach for combination forecasts; (3) in the six optimal combinations, C6* shows the best results. The main reason for this is that the degree of error independence of the combinatorial members in C6* is better than other combinations; (4) generally, the performance shown in Tables 2 and 3 is better than that in Table 1, implying that heterogeneous model combination performs better than the homogeneous model combination; and (5) the performance of the GBP/USD series is better than that of the S&P 500 index series. One main reason may be that the S&P 500 index is more volatile than the GBP/USD in financial markets.
4. Conclusions
Combining the prediction of several different neural predictors into an aggregated neural network prediction often gives improved performance over any individual prediction and is considered to be an effective technique for improving the generalization ability of single neural network predictors. In this study, we propose a novel four-phase nonlinear combinatorial predictor for time series forecasting. The experimental results reported in this paper demonstrate the effectiveness of the proposed design approach. The comparison shows that with our proposed approach it is possible to formulate an effective combinatorial predictor by selecting some error-independent networks with minimizing conditional generalized variance algorithm and an SVMR-based nonlinear combination approach. Review of the experimental results yields the final conclusion: the proposed four-phase nonlinear combination procedure can be used as an alternative tool for time series combinatorial prediction.
References
Azoff, M.E. (1994) Neural Network Time Series Forecasting of Financial Markets. New York: John Wiley & Sons.
Bates, J.M. and Granger, C.W.J. (1989) “The combination of forecasts”, Operational Research Quarterly, 20: 451-468.
Benediktsson, J.A., Sveinsson, J.R., Ersoy, O.K., Swain, P.H. (1997) “Parallel consensual neural networks”, IEEE Transactions on Neural Networks, 8(1): 54-64.
Breiman, L. (1996) “Bagging predictors”, Machine Learning, 24: 123-140.
Breiman, L. (1996) “Stacked regressions”, Machine Learning, 24: 49-64.
Clemen, R.T. (1989) “Combining forecasts: A review and annotated bibliography”, International Journal of Forecasting, 5: 559-583.
Granger, C.W.J. and Ramanathan, R. (1984) “Improved methods of forecasting”, Journal of Forecasting, 3: 197-204.
Hansen, L.K. and Salamon, P. (1990) “Neural network ensembles”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10): 993-1001.
Huang, Y.S., Liu, K., Suen, C.Y. (1995) “The combination of multiple classifiers by a neural network approach”, International Journal of Pattern Recognition and Artificial Intelligence, 9: 579-597.
Krogh, A. and Vedelsby, J. (1995) “Neural network ensembles, cross validation, and active learning”, In: Tesauro, G., Touretzky, D., and Lean, T. (Eds.) Advances in Neural Information Processing Systems: 231-238, Cambridge, Massachusetts: MIT Press.
Lou, M. (1993) “Preprocessing data for neural networks”, Technical Analysis of Stocks & Commodities Magazine, October, 1993.
Moody, J. (1996) “Economic forecasting: challenges and neural network solution”, Proceedings of International Symposium on Artificial Neural Networks, 1995.
Partridge, D. and Yates, W.B. (1996) “Engineering multiversion neural-net systems”, Neural Computation, 8: 869-893.
Partridge, D. (1996) “Network generalization differences quantified”, Neural Networks, 9: 263-271.
Raviv, Y. and Intrator, N. (1996) “Bootstrapping with noise: an effective regularization technique”, Connection Science 1996; 8: 355-372.
Schapire, R.E. (1990) “The strength of weak learnability”, Machine Learning, 5: 197-227.
Sharkey, A.J.C. (1996) “On combining artificial neural nets”, Connection Science, 8: 299-314.
Tumer, K. and Ghosh, J. (1996) “Error correlations and error reduction in ensemble classifiers”, Connection Science, 8: 385-404.
Weigend, A.S. and Gershenfeld, N.A. (1994) Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, USA.
Wolpert, D. (1992) “Stacked generalization”, Neural Networks, 5: 241-259.
Yu, L., Wang, S.Y., Lai, K.K. (2007) Foreign-Exchange-Rate Forecasting With Artificial Neural Networks. New York: Springer.
Yu, L., Wang, S.Y., Lai, K.K., Zhou, L.G. (2008). Bio-Inspired Credit Risk Analysis - Computational Intelligence with Support Vector Machines. Berlin: Springer-Verlag.

沒有留言:

張貼留言