MATLAB中的强化学习工具箱-多个离散动作

时间:2019-05-23 15:26:33

标签: matlab simulink reinforcement-learning

我想使用DQN代理,其中我有多个连续状态(或观察值)和两个动作信号,每个信号都有三个可能的值,总共9种组合。例如,请参阅以下几行以了解我的意思:

a = [-2,0,2];
b = [-3,0,3];
[A,B]   = meshgrid(a,b);
actions = reshape(cat(2,A',B'),[],2);

如果要创建离散操作,则需要将矩阵转换为单元格并运行命令:

actionInfo = rlFiniteSetSpec(num2cell(actions,2));
actionInfo.Name = 'actions';

此外,在DQN中,您还有一个批评者,其中包括一个深度神经网络。我创建了批评者,如下所示:

% Create a DNN for the critic:
hiddenLayerSize = 48; 
observationPath = [
    imageInputLayer([numObs 1 1],'Normalization','none',...
    'Name','observation')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticStateFC1')
    reluLayer('Name','CriticReLu1')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticStateFC2')
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonReLu1')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticCommonFC1')
    reluLayer('Name','CriticCommonReLu2')
    fullyConnectedLayer(1,'Name','CriticOutput')];
actionPath = [
    imageInputLayer([value 1 1],'Normalization','none','Name','action')
    fullyConnectedLayer(hiddenLayerSize,'Name','CriticActionFC1')];
% Create the layerGraph:
criticNetwork = layerGraph(observationPath);
criticNetwork = addLayers(criticNetwork,actionPath);
% Connect actionPath to obervationPath:
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
% Specify options for the critic representation:
criticOpts = rlRepresentationOptions('LearnRate',1e-03,...
    'GradientThreshold',1,'UseDevice','gpu');
% Create the critic representation using the specified DNN and options:
critic = rlRepresentation(criticNetwork,observationInfo,actionInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOpts);
% Set the desired options for the agent:
agentOptions = rlDQNAgentOptions(...
    'SampleTime',dt,...
    'UseDoubleDQN',true,...
    'TargetSmoothFactor',1e-3,...
    'DiscountFactor',0.99,...
    'ExperienceBufferLength',1e7,...
    'MiniBatchSize',128);

我的问题是动作路径imageInputLayer([value 1 1],'Normalization','none','Name','action')的第一个图像输入层。我为value尝试了1、2、9和18的值,但是当我运行时,所有的结果都会导致错误

agent = rlDQNAgent(critic,agentOptions);

这是因为actionInfo包含9个元素的单元格,每个元素的尺寸为[1,2]的双矢量,而imageInputLayer的尺寸为[value,1,1]

那么,如何在MATLAB中用两个主要的离散动作信号(每个信号具有三个可能的值)来设置DQN代理?该代理可与Simulink环境一起使用。因此,我不确定Simulink加固模块对两个输出有何反应。

我是否需要返回单个索引向量,并使用单独的函数将它们映射到正确的矩阵?

在此先感谢您的帮助!

0 个答案:

没有答案