0

Is anyone here who is familiar with echo state networks? I created an echo state network in c#. The aim was just to classify inputs into GOOD and NOT GOOD ones. The input is an array of double numbers. I know that maybe for this classification echo state network isn't the best choice, but i have to do it with this method.

My problem is, that after training the network, it cannot generalize. When i run the network with foreign data (not the teaching input), i get only around 50-60% good result.

More details: My echo state network must work like a function approximator. The input of the function is an array of 17 double values, and the output is 0 or 1 (i have to classify the input into bad or good input).

So i have created a network. It contains an input layer with 17 neurons, a reservoir layer, which neron number is adjustable, and output layer containing 1 neuron for the output needed 0 or 1. In a simpler example, no output feedback is used (i tried to use output feedback as well, but nothing changed).

The inner matrix of the reservoir layer is adjustable too. I generate weights between two double values (min, max) with an adjustable sparseness ratio. IF the values are too big, it normlites the matrix to have a spectral radius lower then 1. The reservoir layer can have sigmoid and tanh activaton functions.

The input layer is fully connected to the reservoir layer with random values. So in the training state i run calculate the inner X(n) reservor activations with training data, collecting them into a matrix rowvise. Using the desired output data matrix (which is now a vector with 1 ot 0 values), i calculate the output weigths (from reservoir to output). Reservoir is fully connected to the output. If someone used echo state networks nows what im talking about. I ise pseudo inverse method for this.

The question is, how can i adjust the network so it would generalize better? To hit more than 50-60% of the desired outputs with a foreign dataset (not the training one). If i run the network again with the training dataset, it gives very good reults, 80-90%, but that i want is to generalize better.

I hope someone had this issue too with echo state networks.

halfer
  • 19,824
  • 17
  • 99
  • 186
Zs.
  • 21
  • 2
  • "If someone can, please respond so i can explain my problem more thoroughly." how about you explain more throughly, then we respond? – Rivasa May 04 '12 at 23:40

3 Answers3

2

If I understand correctly, you have a set of known, classified data that you train on, then you have some unknown data which you subsequently classify. You find that after training, you can reclassify your known data well, but can't do well on the unknown data. This is, I believe, called overfitting - you might want to think about being less stringent with your network, reducing node number, and/or training based on a hidden dataset.

The way people do it is, they have a training set A, a validation set B, and a test set C. You know the correct classification of A and B but not C (because you split up your known data into A and B, and C are the values you want the network to find for you). When training, you only show the network A, but at each iteration, to calculate success you use both A and B. So while training, the network tries to understand a relationship present in both A and B, by looking only at A. Because it can't see the actual input and output values in B, but only knows if its current state describes B accurately or not, this helps reduce overfitting.

Usually people seem to split 4/5 of data into A and 1/5 of it into B, but of course you can try different ratios.

In the end, you finish training, and see what the network will say about your unknown set C.

Sorry for the very general and basic answer, but perhaps it will help describe the problem better.

Superbest
  • 25,318
  • 14
  • 62
  • 134
  • Thanks for your answer, but im not sure i understand it correctly. So you say that after i trained the network with A and calculated the Reservoir to output connections, run the network with A and B and from that i sould calculate a cuccess rate. This way the internal reservoir state X(n) will change in the way it will be "good" for dataset C? I will try that thank you! :) Hope it will help. – Zs. May 05 '12 at 13:09
  • You're welcome. perhaps this would be helpful: http://stackoverflow.com/questions/2976452/whats-the-diference-between-train-validation-and-test-set-in-neural-networks – Superbest May 05 '12 at 13:18
  • Hmm, the problem is i dont use backpropagation. My method is diffrenet.I use an offline algorithm to calculate the output weights. – Zs. May 05 '12 at 13:44
1

If your network doesn't generalize that means it's overfitting.

To reduce overfitting on a neural network, there are two ways:

  • get more training data
  • decrease the number of neurons

You also might think about the features you are feeding the network. For example, if it is a time series that repeats every week, then one feature is something like the 'day of the week' or the 'hour of the week' or the 'minute of the week'.

Neural networks need lots of data. Lots and lots of examples. Thousands. If you don't have thousands, you should choose a network with just a handful of neurons, or else use something else, like regression, that has fewer parameters, and is therefore less prone to overfitting.

Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71
1

Like the other answers here have suggested, this is a classic case of overfitting: your model performs well on your training data, but it does not generalize well to new test data.

Hugh's answer has a good suggestion, which is to reduce the number of parameters in your model (i.e., by shrinking the size of the reservoir), but I'm not sure whether it would be effective for an ESN, because the problem complexity that an ESN can solve grows proportional to the logarithm of the size of the reservoir. Reducing the size of your model might actually make the model not work as well, though this might be necessary to avoid overfitting for this type of model.

Superbest's solution is to use a validation set to stop training as soon as performance on the validation set stops improving, a technique called early stopping. But, as you noted, because you use offline regression to compute the output weights of your ESN, you cannot use a validation set to determine when to stop updating your model parameters---early stopping only works for online training algorithms.

However, you can use a validation set in another way: to regularize the coefficients of your regression! Here's how it works:

  • Split your training data into a "training" part (usually 80-90% of the data you have available) and a "validation" part (the remaining 10-20%).
  • When you compute your regression, instead of using vanilla linear regression, use a regularized technique like ridge regression, lasso regression, or elastic net regression. Use only the "training" part of your dataset for computing the regression.
  • All of these regularized regression techniques have one or more "hyperparameters" that balance the model fit against its complexity. The "validation" dataset is used to set these parameter values: you can do this using grid search, evolutionary methods, or any other hyperparameter optimization technique. Generally speaking, these methods work by choosing values for the hyperparameters, fitting the model using the "training" dataset, and measuring the fitted model's performance on the "validation" dataset. Repeat N times and choose the model that performs best on the "validation" set.

You can learn more about regularization and regression at http://en.wikipedia.org/wiki/Least_squares#Regularized_versions, or by looking it up in a machine learning or statistics textbook.

Also, read more about cross-validation techniques at http://en.wikipedia.org/wiki/Cross-validation_(statistics).

lmjohns3
  • 7,422
  • 5
  • 36
  • 56