Sparse Distributed Memory (SDM) is a fascinating way of managing the exponential growth in memory. As opposed to the traditional random-access memory (RAM) where there is a one to one correspondence between an address and a *hard location, *SDM can map multiple addresses to a multiple of *hard location*. I won’t delve into why this is useful, I suggest reading Mr. Kanerva’s paper, *A Cerebellar-Model Associative Memory As A Generalized Random-Access Memory*, which I believe is available in IEEE. The purpose of this article is to discuss the shortcoming of writing data in SDM and an amendment from Reinforcement Learning community.

# SDM Write/Read operations

Before we delve on addressing the shortcoming, let us quickly review on how the SDM *write/read* operation works. The figure below is taken from Mr. Kanerva’s paper mentioned above. Again, this is just a quick summary of the *write/read* operations, thus if lost, consult the paper.

In the paper, it describes the writing of *1011001010* and then *0101010110 *to the SDM. Prior to any operations in SDM, the* Up/Down Counters* or the data storage itself are all zeros. For the first insertion, *1011001010* activated 3 address register in which its Hamming Distance is less than or equal 3. This then updates each cell in activated/selected rows in *Up/Down Counters*. For instance, all columns aligned to the *1s* in *1011001010* are all incremented by one, and all columns aligned to *0s* are decremented.

Similar operations occurred for second insertion, *0101010110* but different rows due to the different hamming distance less than or equal three are activated.

Reading is by summing all the activated columns. For instance, reading the data addressed by *0101010110* is through summing all the activated columns, resulting in (*-3, -5, -3, 5, 5, 3, -3, 3, -3, 3). *The sum is finally converted to bits, assigning 0 if the sum of a column is negative, whilst 1 if otherwise. So in our case, this is *0001110101*.

# Problem with SDM Writing

Now suppose each cell in *Up/Down Counter* is represented by a 64bit signed integer (to allow for negatives) in a 64bit machine, thus occupying exactly one memory cell. Then there is a scenario in which after 264. Although this may seem like a super worst-case, it’s an overflow we’d rather not have. I have been reading unmentioned implementation and one of it has a *try/catch* or an *if/else* guard, considering these write operations happens a lot, this is a significant overhead.

# Solution 1: Geometric Series

So our problem is integer overflow. How do we solve this? We want something that is something with a horizontal asymptote, something that slows growing as it approaches a value, all to avoid the overflow.

Geometric series have the following form:

Where *a *is the initial value, I usually set this to 1 and *r *is the common ratio. As you can see above, the sum of geometric series is:

If *r* is set to |r| < 1, then the series converges or “hits” a horizontal asymptote. So how is this useful as a replacement for integers in *Up/Down Counters*?

Imagine each cell in *Up/Down Counters* is a double (64bit floating point), *a=1, and |r| < 1 *(again so we converge), how do we utilized Geometric Series’ converging nature?

- Let
**R**be the current value of a cell. - Let
**R’**be the next value of the same cell.

So we acquired **R** in our nth update. Since it is a geometric series where *a=1, and |r| < 1*.

R = 1 + r + … + rn

This is not very useful since it assumes we keep track of integer *n* which defeats the purpose. How can we make a recurrence relationship, that is, how do we get **R’ **from **R**? Continuing from formula above:

R = 1 + r + … + rn

R’ = 1 + r + … + rn + rn+1

R’ = 1 + (r + … + rn) x r

**R’ = 1 + R x r**

To generalize *a=1*,

**R’ = a + R x r**

That’s it. That is how we increment our cell in *Up/Down Counter.* Are we done? How do we decrement?

# Solution 2: From Reinforcement Learning

To solve the decrement problem above, we’ll introduce an equation from Reinforcement Learning, specifically, SARSA.

To quickly summarize (you can skip if you want) SARSA, (st, at) is the state-action pair. Q(st, at) is what we are trying to update base on the current observed reward of the next state Q(st+1, at+1). Step-size α controls how fast Q(st, at) learns or converge while 𝛄 dictates how influential future states are. Basically, by shifting rt+1 + 𝛄Q(st+1, at+1) to something greater than Q(st, at), Q(st, at) will be incremented and vice-versa. There lies the idea behind our final solution. If you decompose the equation above, you’ll also unravel a geometric series.

If you don’t care about the SARSA equation above, it’s fine. Basically it allows us to increment/decrement. To make build our final solution, simply replace Q(st, at) as **R**, or the floating point we are trying to update in our *Up/Down counter *cell.

**R <- R + r[1 – R], ** *for increment*

**R <- R + r[-1 – R], ** *for decrement*

Done! All I can say is shift your common-ratio/step-size to a really small value so you won’t converge quickly. Maybe change **1** to something bigger for the same reason. Other than that, NO MORE OVERFLOW!

# Conclusion

Truth be told that the update formula from Reinforcement Learning is not unique. It is everywhere in machine learning literature such as *Perceptron Learning Rule* and the ever more popular *Linear Regression.* I just really love Reinforcement Learning that’s why I want to nudge it in.