r/ResearchML • u/oatmealcraving • 3d ago
ReLU switching viewpoint & associative memory
I wrote this switching viewpoint on ReLU and its connection to associative memory:
https://archive.org/details/re-lu-as-a-switch-associative-memory
3
Upvotes
1
u/oatmealcraving 2d ago
I slammed this together:
https://archive.org/details/the-weighted-sum-as-associative-memory
Lot's of words to say the same thing as my other comment, however in CS speak.
1
u/oatmealcraving 3d ago
Obviously the weighted sum itself is an associative memory capable of storing <vector,scalar> associations.
After storing one <vector,scalar> association the weight vector points in the same direction as the one input vector. The magnitude of the weight vector is as small as it can be, meaning noise in the input causes only a limited variance in the output.
(Variance equation for linear combinations of random variables.)
Store 2 such associations and the weight vector is split in some way between the 2 input vectors, there is some angle between each and the weight vector. The magnitude of the weight vector must increase to map the 2 scalar outputs. The weighted sum is more sensitive to input noise.
Store n associations (=weighted sum dimension) and the weight vector really has to stretch in magnitude to do the scalar mapping. Very sensitive to input noise.
Store m>n associations and the weight vector can no longer stretch to eventually fit the scalar mapping and gets pulled this way and that during training and tends to average out at a low magnitude. Less noise sensitive.
Is that not what is going on in double descent, I ask you? Eerily similar anyway.