Saturday, November 30, 2013

Introduction to Reinforcement Learning Models

Someone very near and dear to me just sent me a picture of herself cuddled up on the couch in her pajamas with an Argentinian Tegu. That's right lady, I said Tegu. The second coming of Sodom and Gomorrah - you heard it here first, folks! I mean, I know it's the twenty-first century and all, but what the heck.

Looks like I'll be pushing her to buy that lucrative life insurance policy much earlier than planned!

Anyway, I think that little paroxysm of righteous anger provides an appropriate transition into our discussion of reinforcement learning. Previously we talked about how a simple model can simulate an organism processing a stimulus, such as a tone, and begin to associate that with rewards or lack of rewards, which in turn leads to either greater levels of dopamine firing, or depressed levels of dopamine firing. Over time, dopamine firing begins to respond to the conditioned stimulus itself instead of the reward as it becomes more tightly linked to receiving the reward in the near future. This phenomenon is so strong and reliable across all species, it can even be observed in the humble sea slug Aplysia, which is one ugly sucker if I've ever seen one. Probably wouldn't stop her from cuddling up with that monstrosity, though!

Anyway, that only describes one form of learning - to wit, classical conditioning. (Do you think I am putting on airs when I use a phrase like "to wit"? She thinks that I do; but then again, she also has passionate, perverted predilections for cold-blooded wildlife.) Obviously, any animal in the food chain - even the ones she associates with - can be classically conditioned to almost anything. Much more interesting is operant conditioning, in which an individual has to make certain choices, or actions, and then evaluate the consequences of those choices. Kind of like hugging reptiles! Oh hey, she probably thinks, let's see if hugging this lizard - this pebbly-skinned, fork-tongued, unblinking beast - results in some kind of reward, like gold coins shooting out of my mouth. In operant conditioning parlance, the rush of gold coins flowing out of one's orifice would be a reinforcer, which increases the probability of that action in the future; while a negative event, such as being fatally bitten by the reptile - which pretty much any sane person would expect to happen - would be a punisher, which decreases the probability of that action in the future.

The classically conditioned responses, in other words, serve the function of a critic which monitors for stimuli and reliably-predicted reinforcers or punishers following those stimuli, while operant conditioning can be thought of as an actor role, where choices are made and the results evaluated against what was expected. Sutton and Barto, a pair of researchers considerably less sanguinary than Hodgkin and Huxley, were among the first to propose and refine this model, assigning the critic role to the ventral striatum and the actor role to the dorsal striatum. So, that's where they are; if you want to find the actor component of reinforcement learning, for example, just grab a flashlight and examine the dorsal striatum inside someone's skull, and, hey presto! there it is. I won't tell you what it looks like.

However, we can form some abstract idea about what the actor component looks like by simulating it in Matlab. No, just in case you were wondering, this won't help you hook up with Komodo Dragons! It will, however, refine our understanding of how reinforcement learning works, by building upon the classical conditioning architecture we discussed previously. In this case, weights are still updated, but now we have two actions to choose from, which results in four combinations: either one or the other, both at the same time, or neither. In this example, only doing action 1 will lead to a reward, and this gets learned right quick by the simulation. As before, a surface map of delta shows the reward signal being transferred from the actual reward itself to the action associated with that reward, and a plot of the vectors shows action 1 clearly dominating over action 2. The following code will help you visualize these plots, and see how tweaking parameters such as the discount factor and learning rate affect delta and the action weights. But it won't help you get those gold coins, will it?

close all

numTrials = 200;
numSteps = 100;
weights = zeros(100,200); %Array of weights from steps 1-100, initialized to zero

discFactor = 0.995; %Discounting factor
learnRate = 0.3; %Learning Rate
delta = zeros(100,200); %Empty vector
V = []; %Empty vector
x = [zeros(1,19) ones(1,81)];

r = zeros(100,200); %Reward vector, which will be populated with 1's whenever a reward occurs (in this case, when action1 == 1 and action2 == 0)


for idx = 1:numTrials
    for t = 1:numSteps-1
        if t==20
            as1=x(t)*W1; %Compute action signals at time step 20 within each trial
            ap1 =  exp(as1)/(exp(as1)+exp(as2)); %Softmax function to calculate probability associated with each action
            ap2 =  exp(as2)/(exp(as1)+exp(as2));
            if n<(idx)=1;
            if n<ap2                a2(idx)=1;
            if a1(idx)==1 && a2(idx)==0 %Only deliver reward when action1 ==1 and action2 ==0
        V(t,idx) = x(t).*weights(t, idx);
        V(t+1,idx) = x(t+1).*weights(t+1, idx);
        delta(t+1,idx) = r(t+1,idx) + discFactor.*V(t+1,idx) - V(t,idx);
        weights(t, idx+1) = weights(t, idx)+learnRate.*x(t).*delta(t+1,idx);
        W1 = W1 + learnRate*delta(t+1,idx)*a1(idx);
        W2 = W2 + learnRate*delta(t+1,idx)*a2(idx);
    w1Vect(idx) = W1;
    w2Vect(idx) = W2;


set(gcf, 'renderer', 'zbuffer') %Can prevent crashes associated with surf command

hold on
plot(w2Vect, 'r')



Oh, and one more thing that gets my running tights in a twist - people who don't like Bach. Who the Heiligenstadt Testament doesn't like Bach? Philistines, pederasts, and pompous, nattering, Miley Cyrus-cunnilating nitwits, that's who! I get the impression that most people have this image of Bach as some bewigged fogey dithering around in a musty church somewhere improvising fugues on an organ, when in fact he wrote some of the most hot-blooded, sphincter-tightening, spiritually liberating music ever composed. He was also, clearly, one of the godfathers of modern metal; listen, for example, to the guitar riffs starting at 6:38.

...Now excuse me while I clean up some of the coins off the floor...

Friday, November 29, 2013

Master's Recital Music Videos

Since there are several haters out there who doubt that I can play piano, here, finally, is video evidence from a recent recital. In case you're confused, I'm the tall guy at the keyboard wearing all black.

Any mistakes, ensemble slipups, or counting errors are solely my fault, and in no way reflect on Sonja. (I said I could play; I didn't say anything about playing well.) Make sure to buy a bunch of hand lotion and Kleenex before listening to these masterpieces.

Thursday, November 28, 2013

Reminiscences of SFN 2013

In the past two weeks since I have returned from San Diego, I still find myself struck with lucid recollections during my hypopompic states, the memories of my travels twinned with hallucinations no less beautiful and no less real, were I to try distinguish the two.

When I close my eyes I can still feel the pitch-perfect weather against my skin; I can hear the purl of fountainwater underneath the lush gardens and exquisite statuary of Balboa Park; I can see those silvery, sunwarmed beaches, speckled with families and surfers and oceangoers of all stripes and ages, the terns overhead lazily riding the wind thermals and the surf below gently lapping at the shores. The man is not to be envied who does not find his spirit refreshed and invigorated by the scintillating waves of the ocean, his eros not aroused by the sight of brown-skinned beauties emerging from the sea with beads of saltwater clinging to their skin and their delicate pink toes sinking into the argentate sand; no, he is not to be envied who does not find some spark of religious awe kindled by the sight of the sun bleeding slowly into the horizon and replaced by the pale disc of the moon, pasted in that inky firmament like some ghostly wafer, overlooking the dark abyss of water out there past men's knowing, where stars are drowning and whales ferry their vast souls through the black and seamless sea.

It is well to be surrounded by such sights and sounds, as, set against the backdrop of a conference devoted to science, the conscious mind is all the more appreciative of the particulars and the practicalities of what he believes, secretly or openly, to be the healthiest, the most rigorous, the most downright of human endeavors - that of scientific inquiry. Aristotle once claimed that the twin peaks of human pleasure consist of one, sexual intercourse, and two, thinking; and once one has felt the slow-burning satisfaction of scientific experimentation, of hypotheses proposed and tested, of results surprising one in the most unexpected of ways, what right man would believe otherwise? And after a day of lively discussion and heated debate, after filling one's cup with as many poster sessions and workshops and talks as one can handle and drinking it to the lees, then one encounters the night; and, the mind still reeling from the heady fumes of science, the senses attuned to all the nuances that weren't there before, walking past the garish lighting of the restaurants and pubs of the Gaslamp district, brazen hussies with their sultry strolls and minimalist vesture calling out to each other in the darkness, steroid-inflated bouncers guarding the doors of nightclubs exuding faint rumbles of bass punctuated by shrieks and laughter - and it is here that one becomes aware of certain beauties and lusts and terrors and menace that until now were only thinly hidden. The juxtaposition of such different modes of experience makes each of them in turn that much more powerful, more savory, more piquant.

During my days at the conference inside the convention center, therefore, I expected all of the hobgoblins and ecstasies of the nights before to melt away like snow in sunshine; but even here there is an element of the surreal. Within the bowels of the convention center, several football fields long, were rows upon rows of posters, almost beyond reckoning; here is one person surrounded by intrigued colleagues, gesturing expressively with his hands, his face beaming; there is another over there who could not be more different, all alone, head down and sullenly gazing at the floor, one hand holding the opposite forearm, a perfect picture of dejection. Wind your way through the exhibition section where companies are hawking their wares, photos showing how results look before and after the application of their device, mechanical contraptions demonstrating how the latest stereotaxic equipment drills into any location without any error, no fuss, no muss. You would expect the vendors to be much more animated, to act like some sort of scientific carnival barker; but unfortunately, they sit around, this one checking her phone, that one with a saturnine expression pasted on his face and a toothpick affixed to the side of his mouth.

Upstairs through the pavilio, and enter the ballroom, where chairs are stacked in rows as neatly and ceremoniously as gravestones in a cemetery. Far away at the front of the ballroom a speaker is at the podium, a miniscule dot at this distance, but whose person is projected on several large screens hanging from the ceiling, the amount of exposure beyond the most egomaniacal totalitarian's wet dream. After a couple of hours of talks and results and diagrams of models, out the door again into the hallway, past several smaller conference rooms packed with listeners. Along my way I reach down to pick up a discarded pamphlet off the floor from the American Association for the Advancement of Science; inside, it laments that more than half of the United States population still believes in psychic phenomena such as ESP and seances. "Fifty-seven percent of American adults believe in phenomena unsupported by any evidence whatsoever," it says. "It would be better to get that number closer to zero." It then lists several resources and initiatives to educate the population to think scientifically. The younger the age at which they can stage an intervention, it seems, the better.

While I can appreciate the sentiment, part of me thinks that this feeling is misguided. I have several close colleagues who would be horrified to wake up to a world denuded of superstitions and myths; so satisfying is the sense of superiority they feel in mocking those who still hold groundless beliefs, and the repercussions so minimal, that to take that away from them would be to take away their chief joy. Conversations would dry up, bereft of the usual potent feelings of solidarity and indignation, and the only ties that used to bind them to other like-minded individuals would dissolve. No, clearly a world free of people believing crazy shit would be a catastrophe. I think that a more reasonable goal would be to get the number of persons down to about five or ten percent; that way, the Association can still claim no small measure of success in their crusade, and there will still be plenty of eccentrics left over to insult, belittle, and marginalize.

After making the rounds at all of the talks, I go back to the poster session, where new posters have been pinned up on the boards, some of them still reeking the stench of hot ink fresh off the printer. Wander around, and you begin to notice how some individuals tend to dominate the conversation surrounding a poster and poins out all the experimental flaws with a minimum of decency. To counteract this, I usually leave in one or two glaringly obvious errors in any poster I present or any paper I submit; that way, one can more easily comment on it and feel as though they have done something useful, usually leaving all of the other material alone.

That being said, however, still be aware that there is much research out there which is smoke and mirrors; having been in the game for quite some time, I can provide a short list of words and phrases that should immediately set off alarm bells in your head: neuroscience; significant; brain; rat; human; monkey; hypothesis; anterior cingulate; activation; voxels; cake; "game-changer"; default poop network. Beware the siren song of these words that charm the ear and bewitch the mind; they are beautiful but treacherous ondines who, given the chance, will wrap their briny arms around you, dragging you down to your death in the bottomless sea.

Saturday, November 9, 2013

Society for Neuroscience 2013!

Hey guys,

This afternoon, I will be leaving Indiana for temperate, sunwarmed San Diego! If you get a chance, be sure to stop by my poster and say hi!

Poster time: Monday, November 11th, 1:00pm-5:00pm
Poster board number: KKK25, Halls B-H

Friday, November 8, 2013

Master's Recital: Debussy, Bach, and Shostakovich

For those of you in Bloomington, I will be accompanying for a cellist's master's recital here at the Jacobs School of Music. The program features a sonata by Debussy so unbelievably colorful and vivid, your synesthesia will go haywire; some of the clearest, most soul-refreshing chamber music by Bach; and everyone's favorite, Shostakovich's colossal cello concerto no. 1, which, in classical music terms, is known as a bodice-ripper.

To bring you this, we've put in countless hours of arduous, painstaking practice, endured invective, censorship, and misunderstanding from the most trenchant of critics, and gone through long nights of rehearsal punctuated by hours of bickering and quarreling, torn sheet music and thrown metronomes - but always followed by tearful reconciliation. Has it been worth it? Heck yes. But then again, I'll let you all be the judge of that.

When: 7:00pm, Friday, November 8th
Where: Recital Hall, ground floor of Merrill Hall (1201 E 10th St)
Who: Andy Jahn, Piano; Sonja Kraus, Cello

===== Program =====

Debussy: Sonata for Cello and Piano
I. Prologue: Sostenuto e molto risoluto
II. Sérénade
III. Finale: Animé: Léger et nerveux

Bach: Gamba Sonata No. 1 in G Major for Cello and Piano
I. Adagio non troppo
II. Allegro, ma non tanto
III. Andante quasi Lento
IV. Allegro moderato

Shostakovich: Cello Concerto No. 1 in E-Flat Major, Op. 107
I. Allegretto
II. Moderato
III. Cadenza
IV. Allegro con moto

Thursday, November 7, 2013

Computational Models of Reinforcement Learning: An Introduction

The process of learning what is good for us, and what is bad for us, is incredibly complex; but the rudiments have been outlined, and we can gain some insight by starting with the basic building blocks of what is known as reinforcement learning. During reinforcement learning, we come to associate certain actions with specific outcomes - push one button and get a piece of cake; push another button, and receive a blast of voltage to your nipples. Through experience we begin to flesh out a mental picture of what decisions are likely to lead to certain events; and, though merely observing someone else we can learn about what to do, or what not to do, even in the absence of reinforcers or punishers.

Before we get there, however, let's approach our subject from an even more basic form of learning - classical conditioning. In this case, no actions are needed; one merely observes a stimulus, such as a tone of a certain frequency, and learns that it predicts a specific outcome, such as the arrival of food. In this case, the tone is the conditioned stimulus, the food is the unconditioned stimulus, and salivating in response to the tone, after enough pairings between the tone and the food, becomes the conditioned response.

Let me give an example from my dating history. You may find this particular story I am about to relate to be way, way too much information; but if you've been reading for this long, I assume that we're on close enough terms that divulging such graphic details of my personal life will, far from driving us apart, bring us closer together by allowing us to bond over our shared humanity.

So. Onions. I am - or I used to be - indifferent to them. All I could say about them was that they had a smooth, eely texture when fried in oil; that they released a pungent aroma when sliced, diced, and crushed; and that their flavor was particularly sharp. Other than that, I had nothing else to say about them. Onions were onions.

But one day - never mind when - I began to see a girl who absolutely loved onions. Onions were inseparable from any dish she made; and so close was the association between her mood and the amount of onions she put into her cooking - casseroles, curries, tartlets, you name it - that, were I to witness her eating an entire onion in the raw, I would assume her to be in the seventh heaven.

My little onion, I used to call her, as a sign of my undying affection; and whenever we made love, we would first scatter onion shavings upon the bed, or the grass, or the movie theater seat, as a ritual to consecrate the beautiful, sacred act that was about to be made manifest. And when she would part the pillowy gates of her mouth and cleave her lips to mine, that pregnant moment filled with an anticipation so poignant you could hardly bear it, I would inhale deeply, feeling the overpowering, acrid smell of onions run over single one of my nose hairs and driving my olfactory bulbs insensate with desire.

"Darling, do you love me?" she would ask, breathing heavily, the odorous waves of onion wafting across the thin slit of air between us and mooring within my nostrils.

"Yes, my little onion," I would reply. "Yes; yes; a thousand times yes!"

Such was our love, then; and you can hardly imagine my shock and desolation when, several years into this relationship of onions and unadulterated bliss, some knave, jealous of our happiness no doubt, took it upon himself to poison one of her onions, and kill her! My bereavement was only slightly assuaged by the fact that she had, only a few days before, took out an extremely lucrative life insurance policy, having named me as the sole beneficiary.

After three painful, soul-searching days of mourning, however, I eventually gained the strength to renew my courtships with several other desirable young ladies. Yet, while throughout this period I continued to seduce innumerable women and live a Byronesque lifestyle of aristocratic excess, I couldn't help feeling some conspicuous lack, some defect in any affair, any tryst I willingly thrust myself into. At first I blamed the girls themselves: this one with long, lithe arms, but perhaps a shade too willowy; this one with a bold, intriguing personality, but perhaps a bit too pert for my taste; and yet another, sloe-eyed, with beautiful brown irises, but which, upon closer inspection, revealed the slightest of discrepancies in the size of one pupil compared to the other. Not having taken myself for a very discriminating fellow before my relationship with Mary, that light of my life, that fire of my loins - in other words, that onion chick I was talking about earlier, in case you couldn't tell - I found myself at a total loss.

While ruminating over my sudden change in amorous tastes, one day I found myself absentmindedly skimming the menu at a local bistro; and then - mirabile dictu! - I saw the item French onion soup inconspicuously nestled under the Appetizers section. Feeling my pulse quicken, I followed my instincts and ordered the soup, aware that I had hit upon the answer to my problem. Soon after, a disembodied hand placed the soup in front of me; and, slowly, meaningfully, I gazed down into the thick brown liquid. I braced myself, inhaled deeply, and somewhere in my brain a key unlocked the overflowing warehouse of my desire. Memories came flooding back; memories of Mary; memories of onion; and, most of all, memories of that pungent, acidic smell crushed out from the shavings underneath our bodies.

Having solved the puzzle, I now embark on a new chapter of my life; and nowhere do I go now without my peeler, and without my paring knife!

This story wonderfully illustrates some of the key components of classical conditioning. First, an unconditioned stimulus - Mary - elicited an unconditioned response from me - feelings of arousal. Because of Mary's repeated pairings with onions, the onions became a conditioned stimulus that signified an upcoming session of especially gratifying hanky-panky, and eventually by themselves elicited the conditioned response of arousal.

In psychological terms, this process of learning is called the "critic" part of learning; a stimulus signified some kind of upcoming reward, and over time a person learns this associations, eventually beginning to shift their usual feelings of pleasure and excitement from the reward itself to the stimulus signifying the reward. The critic evaluates how reliable the association is, and, depending on the individual, associations can be learned relatively slowly, or relatively quickly.

Let's focus on a landmark Science paper by Schultz, Dayan, & Montague (1997). This paper mathematically modeled different phases of reinforcement learning, and outlined several equations that can simulate how much an organism will response to the conditioned stimulus and to the reward itself. The following Matlab code implements equations 3 and 4 from the paper, using 200 trials and 100 timesteps within each trial. The weights are updated on each trial, and the prediction error, represented by delta, will become increasingly larger and move close to the time of the presentation of the conditioned stimulus. Note in the following figure from the Schultz et al paper, that when an organism has been conditioned to expect a reward at a certain time, the omission of that reward will lead to a large negative deflection in the prediction error signal.

Similar surface maps can be generated using the following code; I suggest adjusting the learning rate and discount factor parameters to see how they affect the error prediction signal, and also the administration of the reward at different times. Building up this intuition will be critical in understanding more advanced models of reinforcement learning, in which outcomes are contingent upon particular actions. And don't forget to keep eating those onions!

numTrials = 200;
numSteps = 100;
weights = zeros(100,200); %Array of weights from steps 1-100, initialized to zero

discFactor = 0.995; %Discounting factor
learnRate = 0.3; %Learning Rate
delta = zeros(100,200);
V = []; %Empty vector, sum of all future rewards
x = [zeros(1,19) ones(1,81)]; %Presentation of conditioning stimulus
r = zeros(100,200); %Reward

for idx = 1:numTrials
    for t = 1:numSteps-1   
        V(t,idx) = x(t).*weights(t, idx);
        V(t+1,idx) = x(t+1).*weights(t+1, idx);
        delta(t+1,idx) = r(t+1,idx) + discFactor.*V(t+1,idx) - V(t,idx);
        weights(t, idx+1) = weights(t, idx)+learnRate.*x(t).*delta(t+1,idx);


Friday, November 1, 2013

Indianapolis Monumental Marathon: Participant Tracking

Fellow brainbloggers, cognitive neuroscientists, and endurance sport enthusiasts,

Tomorrow I will be competing in the Indianapolis Monumental Marathon. Conditions at start time are low forties and partly cloudy, with a high chance of lactic acid, extreme muscle fatigue, and reduced sperm count. My goals: Break the two-hours-and-thirty-minutes barrier; place in the top ten; and to bring the heat on all these pasty-faced, weekend-warrior poltroons and send y'all back to Ireland.

If you would like to get updates on my race, simply sign up here and enter my name (First name: Andrew; Last name: Jahn). In a way, it's just like watching a horror film, or witnessing a particularly nasty breakup - all of the thrill, none of the danger. The things I do for you guys.