在自定义健身环境(gym-torcs)上使用Baselines库时出错

时间:2018-08-11 14:28:29

标签: python-3.5 reinforcement-learning openai-gym

我对openAi环境还很陌生,基本上我正在使用https://github.com/ugo-nama-kun/gym_torcs/tree/master/vtorcs-RL-color尝试不同的强化学习代理。

因此,我写下了自己的Reinforce和GPOMDP代理,首先在其中创建环境

 env = TorcsEnv(vision=vision, throttle=False)

,然后将方法调用为env.reset(),env.step()... 一切正常,培训过程开始顺利。

现在,我想在此Gym-torcs env中尝试基准库(https://github.com/openai/baselines),因此我以https://github.com/openai/baselines/blob/master/baselines/trpo_mpi/run_mujoco.py为例,替代了

env = make_mujoco_env(env_id, workerseed)

 env = TorcsEnv(vision=vision, throttle=False)

Torcs已正确启动,但是当汽车应该开始行驶时,我遇到了以下错误:

Traceback (most recent call last):
File "myAgent.py", line 39, in <module>
main()
File "myAgent.py", line 35, in main
 train(args.env, num_timesteps=args.num_timesteps, seed=args.seed)
File "myAgent.py", line 30, in train
 max_timesteps=1000, gamma=0.99, lam=0.98, vf_iters=5, vf_stepsize=1e- 
 3)
File "/usr/src/baselines/baselines/trpo_mpi/trpo_mpi.py", line 199, in 
 learn
 seg = seg_gen.__next__()
File "/usr/src/baselines/baselines/trpo_mpi/trpo_mpi.py", line 36, in 
 traj_segment_generator
 ac, vpred = pi.act(stochastic, ob)
File "/usr/src/baselines/baselines/ppo1/mlp_policy.py", line 54, in 
 act
 ac1, vpred1 =  self._act(stochastic, ob[None])
File "/usr/src/baselines/baselines/common/tf_util.py", line 194, in 
  __call__
 results = tf.get_default_session().run(self.outputs_update, 
 feed_dict=feed_dict)[:-1]
File "/usr/local/lib/python3.5/dist- 
 packages/tensorflow/python/client/session.py", line 900, in run
 run_metadata_ptr)
File "/usr/local/lib/python3.5/dist- 
 packages/tensorflow/python/client/session.py", line 1104, in _run
 np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "/home/nicolobrunello/.local/lib/python3.5/site- 
  packages/numpy/core/numeric.py", line 492, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

有人知道我应该如何将Baseline与Gym-torcs集成在一起吗?

P.S .:我正在使用python 3.5.2和Ubuntu 64位16.04.4

0 个答案:

没有答案