| ABOUT US | ARCHIVES | LINKS | RSS FEED | MONDAYS | |

3quarksdaily

An Eclectic Digest of Science, Art and Literature

« perceptions: SAD? | Main | Growing things in Gaza »

January 10, 2011

Reservoir computing: A New Hope?

Neural_networking Artificial neural networks are computational models inspired by the organization of neurons in the brain. They are used to model and analyze data, to implement algorithms and in attempts to understand the computational principles used by the brain. The popularity of neural networks in computer science, machine learning and cognitive science has varied wildly, both across time and between people. To an enthusiast, neural networks are seen as a revolutionary way of conceiving of computation; the entry point to robust, distributed, easily parallelizable processing; the means to build artificial intelligence systems that replicate the complexity of the brain; and a way to understand the computations that the brain carries out. To skeptics they are poorly understood and over-hyped, offering little insight into general computational principles either in computer science or in cognition. Neural networks are often called "the second best solution to any problem". Depending on where you stand, this either means that they are often promising but never actually useful or that they are applicable to a range of problems and do almost as well as solutions explicitly tailored to the particular details of a problem (and only applicable to that particular problem).

Neural networks typically consist of a number of simple information processing units (the "neurons"). Each neuron combines a number of inputs (some or all of which come from other neurons) to give an output, which is then typically used as input to other neurons in the network. The connections between neurons normally have weights, which determine the strength of the effect of the neurons on each other. So, for example, a simple neuron could sum up all its inputs weighted by the connection strengths and give an output of 0 or 1 depending on whether this sum is below or above some threshold. This output then functions as an input to other neurons, with appropriate weights for each connection.

A computation involves transforming some stream of input into some stream of output. For example, the input stream might be a list of numbers that come into the network one by one, and the desired output stream might be the squares of those numbers. Some or all of the neurons receive the input through connections just like those between neurons. The output stream is taken to be the output of some particular set of neurons in the network. The network can be programmed to do a particular transformation ("trained") by adjusting the strengths of connections between different neurons and between the inputs and the neurons. Typically this is done before the network is used to process the desired input, but sometimes the connection weights are changed according to some pre-determined rule as the network processes input.

Training a network by modifying the connection weights between elements is quite different from programming a computer, which involves combining a series of simple operations on data to get a desired output. In general, training a network is quite hard. Changing a particular connection strength changes the behavior of the whole network in a way that isn't easily deduced from just the two neurons participating in the connection. Similarly, once the network has been trained, it's hard to interpret the role of an individual weight in the overall computation. Indeed, the popularity of neural networks has waxed and waned with advances in algorithms for training and interpreting connection weights.

Feedforward

One way to make this problem tractable is to use a simpler class of network. A "feedforward" network is one in which neurons do not form loops, and information flows in only one direction. So if neuron A connects to neuron B, and neuron B connects to neuron C, then neuron C can't connect to neuron A. Feedforward networks have the advantage that a change in a weight only affects a certain subset of neurons (the neurons downstream of the connection) and the effect of the change on the output is easy to understand. This makes training feedforward networks comparatively simple and algorithms to do this have been around for a while.

Recurrent

However, using just feedforward networks is quite restrictive. "Recurrent" networks (networks that allow loops) are able to maintain memories, make it easier to combine information that entered the network at different times[1] and make it easier for an input signal to be transformed in different ways[2]; they are also more like the networks seen in the cortex, which makes them better candidates for understanding the brain. Overcoming the barriers to training recurrent networks is thus quite valuable.

Reservoir computing is a new approach to constructing neural networks that tries to combine several of the useful features of recurrent networks with a feedforward-like ease of training[3]. The typical network here consists of a large recurrent network (the "reservoir"), which receives the input and a smaller "readout" circuit, which receives connections from some or all of the neurons in the recurrent network but doesn't connect back to them. The connection weights in the recurrent network are fixed when the network is constructed and then don't change. Reservoir computing is based on a pair of insights, each of which is more or less relevant in different contexts.

ReservoirNetwork

To understand the first, imagine a memory task. You give a recurrent network an input signal for a while and then stop, requiring the network to retain a memory of that input signal so that it can be used at some later time. For example, you want to give your network a number and then have it output the square of that number at some time in the future; more realistically, imagine some circuit in your brain trying to keep track of a phone number until you can find a pen and paper.

The standard way to do this is to have the signal push the network into a state where the outputs of the neurons activate each other leading to an unchanging steady-state (for example, a representation of the sequence of digits in the phone number). However, even if there are no stable states in the network (besides the one where no neuron is active), the network still has some memory. An input will excite certain neurons and then bounce around the loops in the network before fading away. A simple example would be neuron A activating neuron B, which activates neuron C, which activates neuron A and so on, progressively losing strength. As long as these traces fade away slowly enough, and as long as they can be distinguished from the traces of other signals, the system has a useful memory. This approach gives rise to the names under which reservoir networks were first introduced: "echo-state networks" and "liquid-state machines". The signal "echoes" around the reservoir as it slowly dies out and these echoes are used for computation; alternately, instead of echoes, the traces are like the ripples on the surface of a pond after a weight has been dropped in. At least for a while, you can reconstruct where the weight was dropped in.

More generally, instead of giving the network an input signal, stopping the input signal and then having the network reproduce it, you could give the network a continuous stream of input and require it to convert this to a stream of output, where the output at any given time depends on the previous history of the input. In this case, the echoes of the signal at previous times would combine with the signal at the current time in the network. For many networks, the history of the input can still be extracted (i.e. the echoes can be separated). Building a network that has useful echoes is much easier than building one with a particular set of stable states: you can pick connection weights randomly from a large class of distributions and the network will have this property.

The other insight exploits the fact that recurrent networks are more complex than feedforward ones and can more easily transform an input signal into a diversity of output signals. Traditionally the weights in a recurrent network would be chosen to implement a particular transformation but, as previously discussed, this is often hard to do. Instead, you can use the fact that having a large (but not absurdly large) number of neurons and choosing the weights randomly gives you a diverse set of transformations[4]. Each neuron is effectively computing some transformation of the input signal (depending on how it's connected) and, if the echoes are strong enough, some of these transformations depend on the history of the input as well as the current input state. In general, none of these is the transformation you want to compute but, if you've created a diverse reservoir of transformations, some of them can be combined to give you the transformation you want[5].

This is where the previously mentioned readout unit comes in. Since the readout does not connect back into the reservoir, the connections between the reservoir and the readout are feedforward. The weights of these connections are now changed so that the readout approximates the desired transformation by combining the outputs of the reservoir. The calculation done by the readout is simple but it uses the complex transformations calculated by the reservoir as building blocks. Calculating the weights for the readout is a simple calculation, since it involves no loops (remember that the weights in the reservoir, which involve loops, do not change), and, as long as the reservoir has enough memory and is implementing a diverse set of transformations, the readout should do well. Also note that the same recurrent network can be used to calculate multiple transformations in parallel, as long as they have separate readouts.

There are a few different properties this reservoir needs to have. If you give it an input signal and then stop, the input needs to bounce around for long enough to make memory-based computation feasible. On the other hand, if signals bounce around for too long then the echoes from different signals will be hard to distinguish from each other. Simultaneously the recurrent network needs to be able to compute a diverse set of transformations of the input signal. The initial set of fixed weights given to the recurrent network will affect these properties, and studying how to choose these weights for particular applications is an active area of research. But it is driven by the initial discovery that choosing the weights randomly is often quite effective and that, in general, choosing a good reservoir is much simpler than explicitly trying to train the recurrent network to do particular transformations.

Reservoir computing is only a few years old, but has created a lot of interest as a promising way to exploit recurrent networks for computational purposes while still preserving the tractability of a feedforward network. Mathematicians and computer scientists are interested in it for its possible applications to problems in pattern recognition and prediction, and are working on characterizing what makes a good reservoir and when a reservoir-based approach is most likely to succeed. Neuroscientists are looking for evidence that the brain uses these sorts of networks. Given the boom-and-bust history of fads in neural networks, it's probably too early to make any grand pronouncements. Still, at the very least, reservoir computing provides an interesting new way of thinking about how to compute with networks and how to compute in parallel, both of which are long-standing scientific problems.

 

[1] Feedforward networks can be specifically designed to give you some limited memory, but this isn't a general property.

[2] The architecture of a feedforward network doesn't have to be too complex in order to approximate all reasonable transformations but this usually requires extremely large numbers of neurons.

[3] Reservoir computing was introduced by Jaeger H. (2001) and Maass W., Natschlaeger T. and Markram H. (2002)

[4] The weights do not need to be chosen randomly. You just need a diversity of transformations and it turns out that this can often be achieved without any special structure in the weights. This is linked to the observation that, in high dimensions, randomly chosen sets often do nearly as well as optimally chosen sets for certain tasks (and certain meanings of "random").

[5] To some extent this property includes the previous one: having a diversity of transformations often means having some with long memory. This is equivalent to having the "echoes" of the signal lasting for a long time in the network.

Posted by Rishidev Chaudhuri at 12:45 AM | Permalink

Comments

Thanks Rishidev. That is a very clear, balanced and (for a layman like myself) accessible description of what goes on in reservoir computing. Artificial intelligence is not dead; it's just learning, yes?

Posted by: Pete Chapman | Jan 10, 2011 8:25:39 AM

Fascinating stuff and brilliantly presented, Rishi! Thanks very much.

Posted by: Abbas Raza | Jan 10, 2011 8:38:09 AM

Hey, neat explanation! Could also suggest references for books/papers where I can learn more about Reservoir Computing neural networks

Posted by: Vikash | Jan 10, 2011 8:43:47 AM

Fascinating read, but it more so left me with a desire to read and know more.

I think I'm going to agree with the above in that you should link some more material.

Posted by: Descartes | Jan 10, 2011 9:28:55 AM

I'm with Vikash and Descartes!

Posted by: Abbas Raza | Jan 10, 2011 9:34:08 AM

Near [4], you mention in parenthesis that the number of neurons should not be absurdly large. Why not ? Does it actually harm sustenance of echoes/memory ??

Nice Article, btw :)

Posted by: Sumiran | Jan 10, 2011 6:14:50 PM

Excellent article, Rishidev. As per other comments - references for further reading?

Posted by: Nick Maley | Jan 10, 2011 8:05:01 PM

Thanks for the comments. There are a few references below. A couple of things, though. Where you look for more information depends a little bit on how much math and how much neuroscience you're happy with: there are papers that are more abstract and others that are attempting to be more biologically plausible. But either way, remember that you don't need to follow every step of an argument to see what the author is talking about. Also, different reviews have different audiences and some are more general than others; if the one you're reading isn't to your taste, there's probably another one you'd like.

If you don't have access to journals, most of these will be available from the authors' websites or elsewhere, so just do a search for the title and the authors' names.

The original papers are quite interesting:
Wolfgang Maass, Thomas Natschlager, Henry Markram: "Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations", Neural Computation 2002

Herbert Jaeger and Harald Haas "Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication" Science 2004

This is a general review:
"An overview of reservoir computing: theory, applications and implementations" Benjamin Schrauwen, David Verstraeten, Jan Van Campenhout

Neural Networks Volume 20, Issue 3 (April 2007) is a special issue on echo state networks and liquid state machines and has a number of articles, including an introduction. You can see the table of contents just by searching.

A more biological review:
"State-dependent computations: spatiotemporal processing in cortical networks" Dean Buonomano and Wolfgang Maass, Nature Reviews Neuroscience (February 2009)

There are many more papers, but this seems like a good starting point.

Posted by: Rishidev Chaudhuri | Jan 12, 2011 2:48:32 PM

Hey Sumiran,
Having more neurons generally helps you. I was trying to contrast this approach with purely feedforward networks. You can approximate all reasonable functions with feedforward networks, but you typically need very large numbers of neurons, and the network you build is only useful for that function. By contrast, at least empirically, you don't need as many neurons here to get a decent approximation, and the circuit can be used to compute other things as well. So the "not absurdly large" is a practical benefit rather than a requirement.

Posted by: Rishidev Chaudhuri | Jan 12, 2011 2:52:15 PM

What is the purpose of reservoir computing?

Neural networks were initially presented as "learners" -- essentially a form of nonlinear regression. But they seem to have lost favour (for this purpose, at least) because they are hard to analyze, hard to interpret and not efficient (statistical efficiency, not computational). The theory on neural networks seemed to be an interesting exercise in approximating functions, but rarely deals with statistical properties like bias and variance.

Do reservoirs have some advantages compared compared to nonlinear regression? Or are they used for entirely different purposes?

Posted by: Armchair Guy | Feb 7, 2011 9:04:07 AM

Post a comment






Subscribe to this blog's feed  

PayAnywhere with iphone credit card swiper

Android Tablet

Bluetooth Headset

2013 New Style Dresses

Compare Car Rental Prices

DHgate.com Wholesale

3QD on Facebook

3QD on Kindle

3QD by Daily Email

Receive all blogposts at the same time every day.

Enter your Email:


Preview 3QD Email

3QD on Twitter

Miscellany

Lijit Search

AddThis Social Bookmark Button

Add to Google

Recent Comments

Jim on Friday Poem

JF on REFLECTIONS ON WOOLWICH

Jesse on REFLECTIONS ON WOOLWICH

Kenan Malik on REFLECTIONS ON WOOLWICH

Pierre on REFLECTIONS ON WOOLWICH

chris on Race Is Not Biology

Dave Ranning on REFLECTIONS ON WOOLWICH

Sumiran on Friday Poem

prasad on Race Is Not Biology

omar on REFLECTIONS ON WOOLWICH

G on REFLECTIONS ON WOOLWICH

Erich on REFLECTIONS ON WOOLWICH

omar on Race Is Not Biology

Raza Husain on Race Is Not Biology

Raza Husain on Race Is Not Biology

Josef Stern on REFLECTIONS ON WOOLWICH

Colette on POETRY IN TRANSLATION: CORDOBA

Dana on A young Houston couple is planning to give away $4 billion—but only to projects that prove they are worth it. Can they redefine the world of philanthropy?

omar on REFLECTIONS ON WOOLWICH

Dredd on REFLECTIONS ON WOOLWICH

omar on Race Is Not Biology

prasad on Race Is Not Biology

JF on REFLECTIONS ON WOOLWICH

Sundar on REFLECTIONS ON WOOLWICH

omar on REFLECTIONS ON WOOLWICH

Acclaim For 3QD


"I couldn't tear myself away from 3 Quarks Daily, to the point of neglecting my work. Congratulations on this superb site."—Steven Pinker, Johnstone Professor of Psychology, Harvard University.

"I have placed 3 Quarks Daily at the head of my list of web bookmarks."—Richard Dawkins, Charles Simonyi Professor of the Public Understanding of Science at Oxford University.

"Just wanted you to know I’m one of many who reads and enjoys 3 Quarks....almost daily."—David Byrne, musician, former lead-singer of the Talking Heads, artist, intellectual.

Read more here.

The 3QD Prizes

Subscribe to this blog's feed