西西河

主题:AlphaGo和F-35 -- 晨枫

共:💬78 🌺303 🌵1 新:
全看分页树展 · 主题 跟帖
家园 不同看法

你说:"由于新的策略方式(policy net + value net)跟传统的完全基于蒙特卡洛的方式有本质的区别..."

其实现在高水平的围棋软件(比如zen)都不是“完全基于蒙特卡洛的方式”。正如那篇关于AlphaGo的论文摘要里说的,

The strongest current Go programs are based on MCTS, enhanced by policies that are trained to predict human expert moves. These policies are used to narrow the search to a beam of high-probability actions, and to sample actions during rollouts. This approach has achieved strong amateur play. However, prior work has been limited to shallow policies or value functions based on a linear combination of input features.

...

We use these neural networks to reduce the effective depth and breadth of the search tree: evaluating positions using a value network, and sampling actions using a policy network.

因此AlphaGo可以看成是对于这种基于MCTS方法的筛选修剪优化。从MCTS的角度看, AlphaGo用的“policy and value networks”跟目前围棋软件用的“shallow policies or value functions based on a linear combination of input features”没有本质的区别。

你说:"所以在对上狗狗这种不会犯错的对手的时候,“打劫”可能对人类而言就是根本无利的伪招式,反而加大了自己犯错的概率"

问题是,谁敢断定AlphaGo不会犯错?事实上,樊麾在跟它下的五盘“非正式”对局里还赢了两盘,只不过谷歌没有公布这些棋谱,我们不知道AlphaGo犯了什么错误而已。

全看分页树展 · 主题 跟帖


有趣有益,互惠互利;开阔视野,博采众长。
虚拟的网络,真实的人。天南地北客,相逢皆朋友

Copyright © cchere 西西河