在matlab中解析文本文件

时间:2012-09-10 13:27:04

标签: regex parsing matlab text-parsing

我有这个txt文件:

BLOCK_START_DATASET

dlcdata                 L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Parameterfiles\Bladed4.2\DLC-Files\DLCDataFile.txt
simulationdata          L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Parameterfiles\Bladed4.2\DLC-Files\BladedFile.txt

outputfolder    Pfadangabe\runs_test
windfolder      L:\loads2\WEC\1002_50-2\_calc\50-2_D135_HH95_RB-AB66-0O_GL2005_towerdesign_Bladed_v4-2_revA01\_wind

referenzfile_servesea       L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Dataset_to_start\Referencefiles\Bladed4.2\DLC\dlc1-1_04a1.$PJ
referenzfile_generalsea     L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Dataset_to_start\Referencefiles\Bladed4.2\DLC\dlc6-1_000_a_50a_022.$PJ

externalcontrollerdll           L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Dataset_to_start\external_Controller\DisCon_V3_2_22.dll
externalcontrollerparameter     L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Dataset_to_start\external_Controller\ext_Ctrl_Data_V3_2_22.txt

BLOCK_END_DATASET

% ------------------------------------

BLOCK_START_WAVE
% a6*x^6 + a5*x^5 + a4*x^4 + a3*x^3 + a2*x^2 + a1*x + a0
factor_hs           0.008105;0.029055;0.153752
factor_tz               -0.029956;1.050777;2.731063
factor_tp               -0.118161;1.809956;3.452903
spectrum_gamma  3.3

BLOCK_END_WAVE

% ------------------------------------

BLOCK_START_EXTREMEWAVE

height_hs1  7.9
period_hs1  11.8

height_hs50 10.8
period_hs50 13.8

height_hred1    10.43
period_hred1    9.9

height_hred50   14.26
period_hred50   11.60

height_hmax1    14.8
period_hmax1    9.9

height_hmax50   20.1
period_hmax50   11.60

BLOCK_END_EXTREMEWAVE

% ------------------------------------

BLOCK_START_TIDE

normal  0.85
yr1 1.7
yr50    2.4

BLOCK_END_TIDE

% ------------------------------------

BLOCK_START_CURRENT

velocity_normal 1.09
velocity_yr1    1.09
velocity_yr50   1.38

BLOCK_END_CURRENT

% ------------------------------------

BLOCK_START_EXTREMEWIND

velocity_v1 29.7
velocity_v50    44.8

velocity_vred1  32.67
velocity_vred50 49.28

velocity_ve1    37.9
velocity_ve50   57

velocity_Vref   50

BLOCK_END_EXTREMEWIND

% ------------------------------------

目前我正在以这种方式解析它:

clc, clear all, close all

%Find all row headers
fid = fopen('test_struct.txt','r');
row_headers = textscan(fid,'%s %*[^\n]','CommentStyle','%','CollectOutput',1);
row_headers = row_headers{1};
fclose(fid);

%Find all attributes
fid1 = fopen('test_struct.txt','r');
attributes = textscan(fid1,'%*s %s','CommentStyle','%','CollectOutput',1);
attributes = attributes{1};
fclose(fid1);

%Collect row headers and attributes in a single cell
parameters = [row_headers,attributes];


%Find all the blocks
startIdx = find(~cellfun(@isempty, regexp(parameters, 'BLOCK_START_', 'match')));
endIdx = find(~cellfun(@isempty, regexp(parameters, 'BLOCK_END_', 'match')));
assert(all(size(startIdx) == size(endIdx)))


%Extract fields between BLOCK_START_ and BLOCK_END_
extract_fields = @(n)(parameters(startIdx(n)+1:endIdx(n)-1,1));
struct_fields = arrayfun(extract_fields, 1:numel(startIdx), 'UniformOutput', false);

%Extract attributes between BLOCK_START_ and BLOCK_END_
extract_attributes = @(n)(parameters(startIdx(n)+1:endIdx(n)-1,2));
struct_attributes = arrayfun(extract_attributes, 1:numel(startIdx), 'UniformOutput', false);

%Get structure names stored after each BLOCK_START_
structures_name = @(n) strrep(parameters{startIdx(n)},'BLOCK_START_','');
structure_names = genvarname(arrayfun(structures_name,1:numel(startIdx),'UniformOutput',false));

%Generate structures
for i=1:numel(structure_names)
    eval([structure_names{i} '=cell2struct(struct_attributes{i},struct_fields{i},1);'])
end

它有效,但不是我想要的。总体思路是将文件读入一个结构(每个块一个字段BLOCK_START / BLOCK_END)。此外,我希望数字被读作double,而不是char,而分隔符如“whitespace”,“或”;“必须被读作阵列分隔符(例如3; 4; 5 = [3; 4; 5]和类似)。

为了更好地澄清,我将采取阻止

BLOCK_START_WAVE
% a6*x^6 + a5*x^5 + a4*x^4 + a3*x^3 + a2*x^2 + a1*x + a0
factor_hs           0.008105;0.029055;0.153752
factor_tz               -0.029956;1.050777;2.731063
factor_tp               -0.118161;1.809956;3.452903
spectrum_gamma  3.3

BLOCK_END_WAVE

该结构将被称为WAVE

WAVE.factor_hs = [0.008105;0.029055;0.153752]
WAVE.factor_tz = [-0.029956;1.050777;2.731063]
WAVE.factor_tp = [-0.118161;1.809956;3.452903]
WAVE.spectrum.gamma = 3.3

任何建议都将受到高度赞赏。

最好的问候。

2 个答案:

答案 0 :(得分:1)

你有this question(这也是你的)的答案作为一个很好的起点!要将所有内容提取到单元格数组中,请执行以下操作:

%# Read data from input file
fd = fopen('test_struct.txt', 'rt');
C = textscan(fd, '%s', 'Delimiter', '\r\n', 'CommentStyle', '%');
fclose(fd);

%# Extract indices of start and end lines of each block
start_idx = find(~cellfun(@isempty, regexp(C{1}, 'BLOCK_START', 'match')));
end_idx = find(~cellfun(@isempty, regexp(C{1}, 'BLOCK_END', 'match')));
assert(all(size(start_idx) == size(end_idx)))

%# Extract blocks into a cell array
extract_block = @(n)({C{1}{start_idx(n):end_idx(n) - 1}});
cell_blocks = arrayfun(extract_block, 1:numel(start_idx), 'Uniform', false);

现在,要将其转换为相应的结构,请执行以下操作:

%# Iterate over each block and convert it into a struct
for i = 1:length(cell_blocks)

    %# Extract the block
    C = strtrim(cell_blocks{i});
    C(cellfun(@(x)isempty(x), C)) = [];         %# Ignore empty lines

    %# Parse the names and values
    params = cellfun(@(s)textscan(s, '%s%s'), {C{2:end}}, 'Uniform', false);
    name = strrep(C{1}, 'BLOCK_START_', '');    %# Struct name
    fields = cellfun(@(x)x{1}{:}, params, 'Uniform', false);
    values = cellfun(@(x)x{2}{:}, params, 'Uniform', false);

    %# Create a struct
    eval([name, ' = cell2struct({values{idx}}, {fields}, 2)'])
end

答案 1 :(得分:0)

好吧,我从未使用过matlab,但您可以使用以下正则表达式找到一个块:

/BLOCK_START_(\w+).*?BLOCK_END_\1/s

然后对于每个块,找到所有属性:

/^(?!BLOCK_END_)(\w+)\s+((?:-?\d+\.?\d*)(?:;(?:-?\d+\.?\d*))*)/m

然后,基于第二个子匹配中半冒号的存在,您可以将其指定为单个或多个值变量。不知道如何将其翻译成matLab,但我希望这有帮助!