Nate Sutton, Ph.D.

Reinforcement Learning Using Spiking Neural Networks

Latest Results

The chart below is based on trained neuron responses from each letter of the alphabet being presented 3 times in repetition then having the next letter presented. Neurons, as represented as y-axis values, having points on the graph specific to only 1 letter (3 points in a row) shows the model has learned well, and otherwise shows an area learning can improve. Points represent neurons responding (being activated) through firing spikes. Work has been done to auto-optimize parameters with 90 runs with different parameter settings. Parameters from amongst the top scoring ones selected for further testing are in this training and testing script. Also, parameter auto-optimization code. Average performance using the optimized parameters for 8 runs has been measured as 97.3% overall accuracy (TP+TN/TP+FP+TN+FN) and 62.8% (high 78%, low 46%) precision (TP/TP+FP).

Realistic Neuron Property Modeling Key Features: Spike Timing Dependent Plasticity * Active Dendrites * Direct to Soma Signaling * Lateral Inhibition * Learning Rate * Neuroscience Toolkit

26 Char Test Results with Spike Occurrences of Neurons

Letters used as stimuli were represented as images with 15 pixels (3×5) and presented sequentially for 100ms at a time 3 times in a row. Spikes occur when a neuron’s voltage reaches 10mv. Once a spike occurs the voltages of the other neurons are inhibited which is visible as their large drops on the plot. The inhibition and excitation are greater than wanted (-65mv at some points) but work is being done to scale that better. An area undergoing active work is improving the discrimination of characters close in pixel presentation to each other, for example, ‘B’ and ‘D’ have the same neuron in the video below. Nate will work further on the performance after his current work on joining a lab or other position, Ignacio as well if he can get the chance but we welcome contributions to this open source project. The video below shows a close-up view of voltage response measurement of four neurons trained to respond to example letters (A, B, C, and D). We are trying to improve the simulation’s eyesight in the way a human would trying to read an eye chart! Overall the neurons specialized reasonably well to the input, but there is room for greater performance as well.

Intro material: ReadMe and Intro. Authors: Nate and Ignacio. Our code.
Based on: Character Recognition using Spiking Neural Networks.

Animation | Video Download

Training

Values are trained for the model by displaying each character after each other for one spike interval (100ms).

Time and Refractory Period

Time is initially incremented and refractory period status variables are processed. Time&Refrac

First layer and Dirac function

Spikes from the first layer are represented as presynaptic spike times across an 11 epoch (30000ms) timeframe. They are coded into a variable that has lookups done in it to refer back to when spikes have occurred. Spikes that are found cause the triggering of a dirac function which is one of the cofactors in the dendrite and somaDirect equations, which build their signal that leads to the soma potential. Dirac Function

Weight

The weights are set at random initially but at a range of values that allows dirac activated dendrite and somaDirect signals to combine to create postsynaptic spikes (from output neurons) from the beginning or soon enough afterward. After initial spiking has occurred the reinforcement learning comes into effect where character input that causes a presynaptic spike and post synaptic spike after that causes a weight increase. The increase is defined by the weighting equations. weight changereturnDeltaWreturnNewW

Weights of All Neurons During the Training Process with 4 Characters

3dBarChartAnim

The below video shows the final weights produced after training 26 neurons with the full alphabet for 30000ms total time. Each neuron is intended to have weights specialized in accordance with only one distinct character.

Weights of All Neurons After Training with 26 Characters

3dBarChartRotatingAnim

Weights of the Neurons After Training with 4 Characters:

3dBarChartGenerator

Tau

The learning rate for the dendrite, tau, is dependent on the results of the weight calculation. It creates faster learning when a stronger weight is present. Tau

Resistance

Resistance is dependent on tau and assists with allowing the voltage to reach a spiking level even with lower weights and tau. Resistance

Relationship Between Weight, Tau, and Resistance:

3dBarWTauRAnim

Dendrite Input

This is one receptor of presynaptic input that translates signals into other values that are passed to the soma. The equation below specifies how. Cofactors in the equation are divided by their units to normalize the values.
Dendrite eq. (eq. 1) in the paper is:

Eq. in Brian2 (open source prog): dv/dt = (((-v/mV)+((r/mV)*(w/volt)*(dirac/volt)))/(tau))*mV : volt Dendrite Equations

Soma Direct Input

Another receptor of presynaptic input.
SomaDirect eq. (eq. 4) in the paper is:

In Brian2: dv/dt = (((-v/mV)+(summedWandDirac/volt))/(tauS))*mV : volt SomaDirect Equations

Lateral Inhibition

Due to competition amongst neurons inhibition signals are sent from one to another when they receive input. A winner-take-all type of implementation was created where inhibition is auto-tuned, the neuron with the greatest soma membrane potential change is the only one allowed to create a post-synaptic spike. The membrane potential is scaled based on the inhibition. Lateral Inhibition

Soma Membrane Potential Charge (Um)

Um (eq. 5) in the paper

Um prior to lat. inh.: dprelimV/dt = (-prelimV+((Rm/mV)*(SynI+DendI*1.0)))/(tauM) : volt (unless refractory)
Um after lat. inh.: dv/dt = v/(1*second): volt
Soma Membrane Potential Equations

Spikes of Neurons During Training

The dots represent a spike fired for an input character stimulus. Notice how over time the neurons specialize in the way they designate themselves for only one character. That is the reinforcement learning causing the neurons to specialize, and in this example, it takes toward 10000ms for that. In some training simulations they do not all correctly become designated to one character.

Below the spikes generated during training with the full alphabet are shown. Greater specialization toward the end shows the degree of effectiveness in the training.

26 Char Training with Neuron Spikes

Click to enlarge

Check for resets

After each spike interval has passed, some logic is included to reset values upon spike occurrences. Resets

Testing

The weights and subsequently tau and resistance generated are used as the trained model values and tests are run to evaluate the performance. Input characters are presented three times in a row for three spike intervals (300ms total). Observed spikes (fired or not fired) are compared to expected values and performance is reported. intitializeTrainedModelParametersevaluateClassifierPerfOutputEvaluationResults

Further descriptions of the simulation results are here

Main code for the simulation