Multi-pretrained Deep Neural Network

Reading time: 5 minute
...

📝 Original Info

  • Title: Multi-pretrained Deep Neural Network
  • ArXiv ID: 1606.00540
  • Date: 2016-06-03
  • Authors: Zhen Hu, Zhuyin Xue, Tong Cui, Shiqiang Zong, Chenglong He

📝 Abstract

Pretraining is widely used in deep neutral network and one of the most famous pretraining models is Deep Belief Network (DBN). The optimization formulas are different during the pretraining process for different pretraining models. In this paper, we pretrained deep neutral network by different pretraining models and hence investigated the difference between DBN and Stacked Denoising Autoencoder (SDA) when used as pretraining model. The experimental results show that DBN get a better initial model. However the model converges to a relatively worse model after the finetuning process. Yet after pretrained by SDA for the second time the model converges to a better model if finetuned.

💡 Deep Analysis

Figure 1

📄 Full Content

Neural Network has long been a widely used model in machine learning. In 1982 Hopfield proposed the Hopfield Network [8] and proved that neural network can be used to simulate the XOR function. In 1986, Hinton et. al proposed the Backpropagation algorithm(BP algorithm) to train multiple-layer neural network [16] and henceforth neural network has become a widely used model in machine learning areas such as image processing [5], control [9] and optimization [14].

However, the training process of neural network is a non-convex problem while the BP algorithm is intrinsically a gradient-descent algorithm which is guaranteed to converge to a global optima when solving convex problems [1]. This very property makes the BP algorithm converges to a local optima which is severely dependent to the Initial state of the network. In real cases, researchers usually randomly choose different Initial states before training the network and at last choose the best-performed model. This trick is very inefficient and even unbearable when training large-scale network. Some researchers proposed improved algorithms such as simulated annealing [4], Genetic algorithm [19] to reduce the training epoches. However the progress was not so expecting. What’s more, as the powerful expressive ability of neural network, a network in the global optimal state maybe severely over-fitting [17], and even worse than a network in the local optimal state. As the fast development fast-learnable model such as Support Vector Machine [6], researchers were more and more considerable about the training data [3]. As the scale of training data rises, more complicated neural network were proposed and the harder the models were trained by BP algorithm.

To relief the training burden, Lecun et. al introduced the parameter bonding strategy into the training process of neural network [12]. In their model, they bonded the parameter to suit some kind of prior knowledge to restrict the expression power of neural network. And by doing so, the number of local optima was reduced which made the training of deep neural network possible. In the parameter bonding training process, the global optima is not important any more. In fact, by bonding the parameters, the global optima is un-reachable. The work of Lecun et.al inspired researchers to divide the training process into two stage. One for feature extracting which was named as feature learning and one for classification. Lecun et. al proposed the model named as LeNet5 [12] which was a 8-layer neural network. The parameter bonding was conducted by replacing product of weight matrix and input vector by convolution of kernel and input vector. The prior was the shifting invariant of images. The proposed model solved the MNIST problem well.

Another strategy to solve the large-scale machine learning problem was focus on the initial state. In 2006, Hinton et. al proposed the pre-training strategy [7]. The pre-training process aims at seeking a well-performed initial state of the neural network which was fine-tuned by the BP algorithm. The fine-tuning process was conducted only once. Since the pre-training process was layer-wised, the computation complexity increased linearly an the number of layers raised. The pre-training process proposed by Hinton et. al took the conjoint layers as Restricted Boltzmann Machine (RBM) and pre-trained the layers by maximizing the likelihood function of RBM. Some researchers used models other than RBM to pre-train the neural network. Honglak Lee et. al proposed the convolutional RBM [13] while Larochelle et. al proposed the auto-encoder (AE) [11]. In convolutional RBM, the computation rule between layers is convolution other than production and in AE the optimization problem is to minimize the reconstruction error other than maximizing the likelihood function. Some researchers improved AE, such as De-noising Auto-encoder (DAE) [18], Contractive Autoencoder (CAE) [15] and so on.

In our work, we proposed a new pre-training process to get a better initial state. We want to combine the advantage of different models to pre-train the network as much as possible. We proposed the Multi Pre-trained Deep Neural Network (MPDNN). In our model, we pre-trained the network by RBM and DAE multi-times and the experiment results show that the network performed best when pre-trained by RBM and then pre-trained by DA.

The paper was arranged as follow. In section 2, we introduced our proposed model. we reported our experimental results in section 3 we also tested different pre-training strategies in this section. In section 4 we concluded our work.

In this section, we introduced our proposed model. Since our model was based on RBM and DAE, we introduced these two models first.

In RBM, nodes were divided into visible nodes and hidden nodes. The visible nodes took the original data as input and the hidden nodes were not directly connected to the input. The value of visible and hidden nodes were denoted by v and

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut