A model of learning that is decades-old is under fire with implications for AI.
一种已有数十年历史的学习模式正受到抨击,并对人工智能产生影响。
The buzz of a notification or the ding of an email might inspire excitement - or dread.
通知的嗡嗡声或电子邮件的叮当声可能会让人们感到兴奋或恐惧。
In a famous experiment, Ivan Pavlov showed that dogs can be taught to salivate at the tick of a metronome or the sound of a harmonium.
伊万·巴甫洛夫进行的一项著名实验证明,通过学习,狗会因为节拍器的滴答声或铃声流口水。
This connection of cause to effect - known as associative, or reinforcement learning - is central to how most animals deal with the world.
这种因果关系——被称为联想学习,或强化学习——是大多数动物行为的核心。
Since the early 1970s the dominant theory of what is going on has been that animals learn by trial and error.
自20世纪70年代初以来,关于这种因果关系存在一种主要理论,认为动物的学习行为是通过“尝试与错误”实现的。
Associating a cue (a metronome) with a reward (food) happens as follows.
将信号(节拍器)与奖励(食物)相关联的过程如下。
When a cue comes, the animal predicts when the reward will occur.
信号出现时,动物会预测奖励何时出现。
Then, it waits to see what arrives.
然后,它会等待接下来发生的事件。
After that, it computes the difference between prediction and result - the error.
之后,它会计算预测和结果(误差)之间的不同。
Finally, it uses that error estimate to update things to make better predictions in future.
最后,它利用误差估计进行调整,以便在未来做出更好的预测。
Belief in this approach was itself reinforced in the late 20th century by two things.
20世纪末,有两件事加强了人们对这种方法的信任。
One of these was the discovery that it is also good at solving engineering problems related to artificial intelligence (AI).
其中之一,是人们发现这种方法也能够有效解决与人工智能(AI)相关的工程问题。
Deep neural networks learn by minimizing the error in their predictions.
深度神经网络是以“将预测误差最小化”为依据的一种机器学习技术。
The other reinforcing observation was a paper published in Science in 1997.
另一件事是1997年发表在《科学》杂志上的一篇论文。
It noted that fluctuations in levels in the brain of dopamine, a chemical which carries signals between some nerve cells and was known to be associated with the experience of reward, looked like prediction-error signals.
这篇论文指出,大脑中多巴胺水平的波动看起来像是预测-误差的信号。多巴胺是一种在神经细胞之间传递信号的化学物质,与大脑的奖励机制有关。
Dopamine-generating cells are more active when the reward comes sooner than expected or is not expected at all, and are inhibited when the reward comes later or not at all - precisely what would happen if they were indeed such signals.
如果奖励比预期来得更早或超出预期,产生多巴胺的细胞会更活跃,而如果奖励来得更晚或根本没有奖励,产生多巴胺的细胞就会受到抑制——如果这些细胞真的是信号,这就是大脑中会发生的事情。
A nice story, then, of how science works.
这是一个关于科学如何运作的好故事。
But if a new paper, also published in Science, turns out to be correct, it is wrong.
但是,如果同样发表在《科学》杂志上的一篇新论文被证明是正确的,那么它就是错误的。
Researchers have known for a while that some aspects of dopamine activity are inconsistent with the prediction-error model.
研究人员早就发现,多巴胺活动的某些方面与预测-误差模型不一致。
But, in part because it works so well for training artificial agents, these problems have been swept under the carpet.
但是,因为它在训练人工智能方面效果太好,这些问题在某种程度上被掩盖了。
Until now.
直到现在。
The new study, by Huijeong Jeong and Vijay Namboodiri of the University of California, San Francisco, and a team of collaborators, has turned the world of neuroscience on its head.
由加州大学旧金山分校的郑慧贞和维贾伊·南博迪里以及一个合作团队完成的这项新研究,彻底颠覆了神经科学世界。
It proposes a model of associative learning which suggests that researchers have got things backwards.
这项研究提出了一个联想学习模型,认为研究人员把事情搞反了。
Their suggestion, moreover, is supported by an array of experiments.
此外,他们的新想法还得到了一系列实验的支持。
The old model looks forward, associating cause with effect.
之前的学习模型着眼于未来,用原因推断结果。
The new one does the opposite.
而新的学习模型正好相反。
It associates effect with cause.
它用结果反推原因。
They think that when an animal receives a reward (or punishment), it looks back through its memory to work out what might have prompted this event.
他们认为,当一只动物受到奖励(或惩罚)时,它会回顾自己的记忆,找出可能导致这一事件的原因。
Dopamine's role in the model is to flag events meaningful enough to act as causes for possible future rewards or punishments.
多巴胺在该模型中的作用是标记足够有意义的事件,作为未来可能获得奖励或惩罚的原因。