I trained the model using only 1 reward (horizontal_position, vertical_position) each. Circle Shape is the agent. And encoder-decoders are trained on the rewards based on target's position. As you see ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果一些您可能无法访问的结果已被隐去。
显示无法访问的结果