© 2026 Theodore P. Pavlic
MIT License

Echo State Network / Reservoir Computing Explorer

A frozen random dynamical system converts temporal patterns into trainable spatial features

© 2026 Theodore P. Pavlic
· MIT License
0.90
input x(t) — driven for t = 0 … 149, then silenced
6 (out of N=40) reservoir neuron traces hi(t) — drag bar to select snapshot
Center — Spatial fingerprint at selected t  ·  gray dots: N=40 neurons  ·  colored dots: 6 traced neurons from aboveCorners — Gallery of all 4 signal families sampled at 10 time slices  ·  highlight: single best L² match to fingerprint
Classifier vote — best-match family at each of 80 time slices across the driven phase (t = 20 … 99) · ★ = plurality winner
The reservoir is working memory. See below for explanation.
Deeper connections and references ▾
Takens' embedding theorem (1981) guarantees that a scalar time series contains enough information to reconstruct the full attractor of the underlying dynamical system via delay embedding. The reservoir is effectively computing a nonlinear generalization of this: each neuron integrates the input history with a different effective time constant and nonlinearity, producing a set of overlapping delay-like projections. Jaeger & Haas (2004, Science) and, more precisely, Miao, Narayanan & Li (2023, IEEE Transactions on Neural Networks and Learning Systems) formalize this: training a Reservoir Computing Network (RCN) is equivalent to learning a map between a window of historical data and the future — a map whose existence Takens' theorem guarantees for generic dynamical systems. Recent work by Bollt et al. (2025/2026, AIP Chaos) strengthens this further, proving that a generic reservoir map produces an isometric embedding of the input attractor — not just a topological one — so the reservoir represents the system without metric distortion.

Cover's theorem (1965) states that a classification problem cast into a sufficiently high-dimensional space via a nonlinear mapping is more likely to be linearly separable than in the original low-dimensional space. That is precisely what the reservoir does: it maps a scalar time series into ℝᴺ, and the linear classifier exploits the resulting separability. Gauthier et al. (2021, Nature Communications, Next Generation Reservoir Computing) make this explicit: traditional RC exploits Cover's theorem via the high-dimensional reservoir state, while their "next-generation" variant achieves the same end using polynomial features of time-shifted data — exploiting Takens' theorem directly without a recurrent network. Both approaches work for the same deep reason.

The unification: reservoir computing combines both theorems in one mechanism. Takens says the input history is recoverable from a scalar stream; Cover says high-dimensional nonlinear projection makes it separable. The reservoir does both simultaneously — no explicit delay construction, no kernel design, no training of the recurrent weights. The extension to feed-forward "time-delay neural networks" (TDNN) as reservoirs follows naturally: a window of past inputs with nonlinear features is a finite-dimensional Takens embedding, and its dimensionality provides the Cover-style expansion that enables linear readout.

References
  1. Takens, F. (1981). Detecting strange attractors in turbulence. In D. Rand & L.-S. Young (Eds.), Lecture Notes in Mathematics (Vol. 898, pp. 366–381). Springer. https://doi.org/10.1007/BFb0091924
  2. Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, EC-14(3), 326–334. https://doi.org/10.1109/PGEC.1965.264137
  3. Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304(5667), 78–80. https://doi.org/10.1126/science.1091277
  4. Miao, W., Narayanan, V., & Li, J.-S. (2023). Interpretable design of reservoir computing networks using realization theory. IEEE Transactions on Neural Networks and Learning Systems, 34(9), 6379–6389. https://doi.org/10.1109/TNNLS.2021.3136495
  5. Gauthier, D. J., Bollt, E., Griffith, A., & Barbosa, W. A. S. (2021). Next generation reservoir computing. Nature Communications, 12, 5564. https://doi.org/10.1038/s41467-021-25801-2
  6. Hart, A. G. (2025). Generic and isometric embeddings in reservoir computers. Chaos: An Interdisciplinary Journal of Nonlinear Science, 35(11), 111103. https://doi.org/10.1063/5.0301957
input x(t) time series FROZEN Reservoir random fixed weights N neurons ρ controls memory decay h(T) ∈ ℝᴺ instantaneous reservoir state TRAINED Linear Classifier one layer · w · h(T) reservoir maps signal to separable ℝᴺ → linear works class
① Input signals — 100 steps each
↓  drives reservoir (N = 20 neurons, ρ = 0.9)  ↓
② Reservoir neuron traces hi(t) — 8 of 20 shown · shaded = averaging window (t = 50 … 99)
↓  √(mean(hi(t)²)) over shaded window  ↓
③ Response-amplitude fingerprint — one bar per neuron · height = RMS over averaging window
How the fingerprint forms. Each reservoir neuron responds to the input with its own oscillation. During the steady-state window (shaded) the pattern is stable and repeating — it encodes the input's character, not its phase. Taking the RMS of each neuron's output collapses the time dimension into a single number per neuron: the oscillation amplitude. Different input signals produce distinct bar-height signatures. Try the phase shift slider — drag Signal B's phase from 0° to 360°. Watch the waveform slide in panel ①, the traces shift in panel ②, but the orange bars in panel ③ stay locked in place. Why is this? For periodic signals (sine, square wave), the averaging window spans many complete cycles and mean(x²) over any integer number of cycles equals amplitude²/2 exactly — the phase cancels algebraically. For the chirp, the instantaneous frequency at each time step is fixed by t (not by the starting phase), so different phases trace the same frequency sweep in the window; the RMS averages over many frequencies and nearly cancels the phase. The slight jitter you see with the chirp is the residual from this approximate (not exact) cancellation — the chirp never completes full cycles at any single frequency, so there is no perfect algebraic cancellation, only statistical averaging.
50
0.90
0.10
60
Raw x(T=end)
LDA of ESN fingerprint
Each dot = one noisy trial (random phase + noise). Left — raw x(T=end): uniformly distributed on [−1, 1] regardless of frequency — zero information. Right — LDA of response-amplitude fingerprint: for each trial the ESN is driven for 100 steps; then for each of the N neurons we compute sqrt(mean(hᵢ(t)²)) over the steady-state window. This per-neuron oscillation amplitude is phase-invariant (mean(x²) over a full cycle of any sinusoid equals amplitude²/2, regardless of starting phase), so random phases no longer scatter the feature vectors. The two classes form compact, well-separated clusters along the Fisher discriminant axis; the vertical line is the LDA decision boundary. This is why Tab 3 achieves high accuracy with a simple Linear Classifier: the reservoir converts a hard temporal classification problem into an easy spatial one.
input x(t) time series FROZEN Reservoir random fixed weights N neurons ρ controls memory decay amplitude ∈ ℝᴺ √( time-avg of hᵢ(t)² ) over steady-state window phase-invariant TRAINED Linear Classifier one layer · w · h(T) reservoir maps signal to separable ℝᴺ → linear works class
0.10
Raw — x(T=end) only
single instantaneous value; no pattern info
random guessing = 50%
ESN — response-amplitude fingerprint ∈ ℝᴺ
per-neuron oscillation amplitude over steady state (N=35)
random guessing = 50%
Test accuracy vs. reservoir size N  ·  dashed line = 50% chance level
Why raw fails: x(T=end) is uniformly distributed regardless of frequency — no better than a coin flip. Why linear classification of the ESN fingerprint works at ~100%: the per-neuron response amplitude — sqrt(mean(hᵢ(t)²)) over the steady-state window — is phase-invariant: for a periodic input, mean(x²) over a full cycle equals amplitude²/2 regardless of starting phase. Each neuron’s response amplitude is frequency-specific, landing different signal types in distinct regions of ℝᴺ. In that high-dimensional space, a single hyperplane separates the classes. This is a general property of reservoir computing: the reservoir's random nonlinear dynamics expand the input into ℝᴺ, making the problem linearly solvable. High accuracy from a simple one-layer readout is expected — the reservoir did the hard work.
Without reservoir x(t) input x(T=end) 1 scalar no history TRAINED Linear Classifier 1-layer · trained ≈50% chance random phase → x(T=end) uniform → no info
With reservoir x(t) input FROZEN Reservoir N neurons random fixed weights TRAINED Linear Classifier 1-layer · trained ≫50% classifier sees time-avg of hᵢ(t)² over steady-state window response amplitude is phase-invariant → classifiable