Distributional value coding and reinforcement learning in the brain
大脑中的分布值编码和强化学习
基本信息
- 批准号:10539251
- 负责人:
- 金额:$ 3.45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-08-01 至 2024-07-31
- 项目状态:已结题
- 来源:
- 关键词:AffectAlgorithmsAnatomyAnimalsAnxietyArchitectureBehaviorBehavioralBernoulli DistributionBipolar DisorderBrainBrain regionCalciumCellsCodeCollaborationsColorComplexCorpus striatum structureDopamineDopamine D2 ReceptorEnvironmentEventFutureGamblingGrainHeterogeneityImageIndividualLaboratoriesLeadLearningLogicMachine LearningMeasuresMental DepressionMental disordersModificationMolecularMusNatureNeuronsOdorsOutcomePatternPerformancePopulationPopulation TheoryPredictive ValueProbabilityPsychological reinforcementRL13ResolutionRewardsRoleSchemeShapesSiliconStimulusStructureSystemTailTestingVentral StriatumVentral Tegmental AreaWaterWorkaddictionburden of illnesscell typeclassical conditioningdensitydesigndopamine systemdopaminergic neurondrug of abuseimprovedin vivoin vivo two-photon imagingmaladaptive behaviorneural circuitneuromechanismoptimismreceptorrelating to nervous systemtheoriestwo photon microscopytwo-photon
项目摘要
ABSTRACT
Making predictions about future rewards in the environment, and taking actions to obtain those rewards, is critical
for survival. When these predictions are overly optimistic — for example, in the case of gambling addiction — or
overly pessimistic — as in anxiety and depression — maladaptive behavior can result and present a significant
disease burden. A fundamental challenge for making reward predictions is that the world is inherently stochastic,
and events on the tails of a distribution need not reflect the average. Therefore, it may be useful to predict not
only the mean, but also the complete probability distribution of upcoming rewards. Indeed, recent advances in
machine learning have demonstrated that making this shift from the average reward to the complete reward
distribution can dramatically improve performance in complex task domains. Despite its apparent complexity,
such “distributional reinforcement learning” can be achieved computationally with a remarkably simple and
biologically plausible learning rule. A recent study found that the structure of dopamine neuron activity may be
consistent with distributional reinforcement learning, but it is unknown whether additional neuronal circuity is
involved — most notably the ventral striatum (VS) and orbitofrontal cortex (OFC), both of which receive dopamine
input and are thought to represent anticipated reward, also called “value”. Here, we propose to investigate
whether value coding in these downstream regions is consistent with distributional reinforcement learning. In
particular, we will record from these brain regions while mice perform classical conditioning with odors and water
rewards. In the first task, we will hold the mean reward constant while changing the reward variance or higher-
order moments, and ask whether neurons in the VS and OFC represent information over and above the mean,
consistent with distributional reinforcement learning. In principle, this should enable us to decode the complete
reward distribution purely from neural activity. In the second task, we will present mice with a panel of odors
predicting the same reward amount with differing probabilities. The simplicity of these Bernoulli distributions will
allow us to compare longstanding theories of population coding in the brain — that is, how probability distributions
can be instantiated in neural activity to guide behavior. In addition to high-density silicon probe recordings, we
will perform two-photon calcium imaging in these tasks to assess whether genetically and molecularly distinct
subpopulations of neurons in the striatum contribute differentially to distributional reinforcement learning. Finally,
we will combine these recordings with simultaneous imaging of dopamine dynamics in the striatum to ask how
dopamine affects striatal activity in vivo. Together, these studies will help clarify dopamine’s role in learning
distributions of reward, as well as its dysregulation in addiction, anxiety, depression, and bipolar disorder.
抽象的
对环境中的未来奖励做出预测,并采取行动获得这些奖励是至关重要的
生存。当这些预测过于乐观时(例如,在赌博成瘾的情况下)或
过于悲观(如焦虑和抑郁中)可能会导致不良适应性行为
伯恩疾病。做出奖励预测的一个基本挑战是,世界本质上是随机的,
分布尾部的事件不必反映平均值。因此,预测不预测可能很有用
只有平均值,但同时也是即将到来的奖励的完整概率分布。确实,最近的进步
机器学习已经表明,从平均奖励转变为完全奖励
分配可以显着改善复杂任务域中的性能。尽管显而易见
可以通过非常简单且非常简单和
生物学上合理的学习规则。最近的一项研究发现,多巴胺神经元活性的结构可能是
与分布强化学习一致,但是尚不清楚其他神经元电路是否为
涉及 - 最著名的是腹侧纹状体(VS)和眶额皮质(OFC),两者都接受多巴胺
输入并被认为代表预期的奖励,也称为“价值”。在这里,我们建议调查
这些下游区域中的价值编码是否与分布强化学习一致。在
特别是,我们将从这些大脑区域记录,而小鼠则用气味和水进行经典调节
奖励。在第一个任务中,我们将保持平均奖励常数,同时更改奖励差异或更高
订购时刻,并询问VS和OFC中的神经元是否代表信息以外的信息,
与分布强化学习一致。原则上,这应该使我们能够解码完整
纯粹来自神经活动的奖励分布。在第二个任务中,我们将向老鼠展示一个气味面板
通过不同的可能性预测相同的奖励金额。这些伯努利分布的简单性将
允许我们比较大脑中人口编码的长期理论,即概率分布方式
可以在神经活动中实例化以指导行为。除了高密度的硅探针记录外,我们
将在这些任务中执行两光子钙成像,以评估遗传和分子上是否不同
纹状体中神经元的亚群对分布强化学习的贡献不同。最后,
我们将将这些录音与纹状体中多巴胺动态的简单成像结合起来,以询问如何
多巴胺会影响体内纹状体活性。这些研究将共同阐明多巴胺在学习中的作用
奖励分布及其成瘾,焦虑,抑郁和躁郁症的失调。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Adam Stanley Lowet其他文献
Adam Stanley Lowet的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Adam Stanley Lowet', 18)}}的其他基金
Distributional Value Coding and Reinforcement Learning in the Brain
大脑中的分布值编码和强化学习
- 批准号:
10668487 - 财政年份:2021
- 资助金额:
$ 3.45万 - 项目类别:
Distributional value coding and reinforcement learning in the brain
大脑中的分布值编码和强化学习
- 批准号:
10311130 - 财政年份:2021
- 资助金额:
$ 3.45万 - 项目类别:
相似国自然基金
分布式非凸非光滑优化问题的凸松弛及高低阶加速算法研究
- 批准号:12371308
- 批准年份:2023
- 资助金额:43.5 万元
- 项目类别:面上项目
资源受限下集成学习算法设计与硬件实现研究
- 批准号:62372198
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于物理信息神经网络的电磁场快速算法研究
- 批准号:52377005
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
考虑桩-土-水耦合效应的饱和砂土变形与流动问题的SPH模型与高效算法研究
- 批准号:12302257
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
面向高维不平衡数据的分类集成算法研究
- 批准号:62306119
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Fluency from Flesh to Filament: Collation, Representation, and Analysis of Multi-Scale Neuroimaging data to Characterize and Diagnose Alzheimer's Disease
从肉体到细丝的流畅性:多尺度神经影像数据的整理、表示和分析,以表征和诊断阿尔茨海默病
- 批准号:
10462257 - 财政年份:2023
- 资助金额:
$ 3.45万 - 项目类别:
Charge-Based Brain Modeling Engine with Boundary Element Fast Multipole Method
采用边界元快速多极子法的基于电荷的脑建模引擎
- 批准号:
10735946 - 财政年份:2023
- 资助金额:
$ 3.45万 - 项目类别:
Exploratory Analysis Tools for Developmental Studies of Brain Microstructure with Diffusion MRI
利用扩散 MRI 进行脑微结构发育研究的探索性分析工具
- 批准号:
10645844 - 财政年份:2023
- 资助金额:
$ 3.45万 - 项目类别:
Dynamic neural coding of spectro-temporal sound features during free movement
自由运动时谱时声音特征的动态神经编码
- 批准号:
10656110 - 财政年份:2023
- 资助金额:
$ 3.45万 - 项目类别:
In vivo feasibility of a smart needle ablation treatment for liver cancer
智能针消融治疗肝癌的体内可行性
- 批准号:
10699190 - 财政年份:2023
- 资助金额:
$ 3.45万 - 项目类别: