使用Matlab从文本文件中提取数据(特定单词)

时间:2017-12-10 01:41:45

标签: matlab extraction

我正在尝试从文本文件中获取一些特定信息,但我的代码并未显示我需要的结果。我拥有的文件示例是:

2017-10-02T15:29:47.18Z 'I|PSnd:  61|snd[3D]:FFFF m:0x6564 e:0'
2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[3D]m:0x6564 e:0'
2017-10-02T15:29:47.18Z 'D|Beat:1234|WDTimeout: 300'
2017-10-02T15:29:47.18Z 'D|Beat:1256|sd:0x6564: e:0'
2017-10-02T15:29:47.18Z 'D|Beat:1276|sprts'
2017-10-02T15:29:47.18Z 'D|Beat:5460|GetPckt:0x3901'
2017-10-02T15:29:47.18Z 'D|Beat:7085|Prtns->'
2017-10-02T15:29:47.18Z 'D|Beat:1975|sevt:72'
2017-10-02T15:29:47.18Z 'D|Beat:1780|snd:0x3901'
2017-10-02T15:29:47.18Z 'I|PSnd:  61|snd[B0]:FFFF m:0x3901 e:0'
2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[B0]m:0x3901 e:0'
2017-10-02T15:29:47.18Z 'D|Beat:1833|sd:0x3901:0'
2017-10-02T15:29:47.18Z 'D|Beat:1200|Rcv<-RP, s:1402'
2017-10-02T15:29:47.18Z 'D|Beat:1220|FrMsg:0x467b QMsg:0x5840'
2017-10-02T15:29:47.18Z 'I|Beat:13031|n:1402 rssi:-91, lqi:255, q:61'
2017-10-02T15:29:47.18Z 'D|Beat:8868|sameRP'
2017-10-02T15:29:47.18Z 'D|Beat:5460|GetPckt:0x41a1'
2017-10-02T15:29:47.18Z 'D|Beat:1975|sevt:40'
2017-10-02T15:29:47.22Z 'D|Beat:13282|PR->:1402 LRPID:C1402'
2017-10-02T15:29:47.22Z 'D|Beat:1780|snd:0x41a1'
2017-10-02T15:29:47.22Z 'D|Beat:1791|evtT:3498847'
2017-10-02T15:29:47.22Z 'I|PSnd:  61|snd[3D]:1402 m:0x41a1 e:0'
2017-10-02T15:29:47.22Z 'I|PSnd: 233|sD[3D]m:0x41a1 e:0'
2017-10-02T15:29:47.22Z 'D|Beat:1234|WDTimeout: 300'
2017-10-02T15:29:47.22Z 'D|Beat:1256|sd:0x41a1: e:0'
2017-10-02T15:29:47.22Z 'D|Beat:1200|Rcv<-RP, s:1202'
2017-10-02T15:29:47.22Z 'D|Beat:1220|FrMsg:0x502a QMsg:0x3eef'
2017-10-02T15:29:47.22Z 'I|Beat:13031|n:1202 rssi:-94, lqi:255, q:60'
2017-10-02T15:29:47.22Z 'D|Beat:8868|sameRP'
2017-10-02T15:29:47.22Z 'D|Beat:5460|GetPckt:0x51c8'
2017-10-02T15:29:47.22Z 'D|Beat:1975|sevt:40'
2017-10-02T15:29:47.22Z 'D|Beat:13282|PR->:1202 LRPID:61202'
2017-10-02T15:29:47.22Z 'D|Beat:1780|snd:0x51c8'
2017-10-02T15:29:47.22Z 'D|Beat:1791|evtT:3498847'
2017-10-02T15:29:47.22Z 'I|PSnd:  61|snd[3D]:1202 m:0x51c8 e:0'
2017-10-02T15:29:47.24Z 'I|PSnd: 233|sD[3D]m:0x51c8 e:0'

在上面的文件中,我试图提取包含'sD'的每一行,但前一行必须包含'snd'。我试图在一些输出列中获取日期和值[3D],也可能在不同的数组中获取所有提取的行。

我做了什么: 我尝试使用Psnd作为查询行,这可以在下面的脚本中看到

queryline = 'PSnd';
fID = fopen('log1.txt');
C = textscan(fID,'%s','delimiter','\n');
fclose(fID);
C = C{1};
[temp,matchedLines] = regexp(C,['(?<date>^[0-9,-:T]*)Z.*' queryline ':(?<Num>[0-9A-Z|A-Z[0-9A-Z:]]*)'] ,'tokens','match');
matchedLines = [matchedLines{:}]';
temp = [temp{:}];
temp = reshape([temp{:}],2,[])';
outTime  = datetime(temp(:,1),'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SSS');
[h,m,s]= hms(outTime);
time = {h; m; s};
time_in_hrs = [time{:}];
t = [time{1:3}];

nodes_in_clus = temp(:,2);

我得到一些非常奇怪的结果,我不太理解。我最初的错误是

Error using datetime (line 556)
Numeric input data must be a matrix with three or six columns, or else three or six separate numeric arrays. You can also create datetimes from a single numeric array using the
'ConvertFrom' parameter.

Error in get_cluster (line 10)
outTime2= datetime(temp2(:,1), 'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SSS');

但在做了一些更改后,我得到了一个这样的结果

'2017-10-02T23:58:26.62Z 'I|PSnd:'
'2017-10-02T23:58:26.77Z 'I|PSnd:'
'2017-10-02T23:58:26.77Z 'I|PSnd:'
'2017-10-02T23:58:26.91Z 'I|PSnd:'
'2017-10-02T23:58:26.91Z 'I|PSnd:'
'2017-10-02T23:58:27.06Z 'I|PSnd:'
'2017-10-02T23:58:27.06Z 'I|PSnd:'
'2017-10-02T23:58:27.20Z 'I|PSnd:'
'2017-10-02T23:58:27.20Z 'I|PSnd:'
'2017-10-02T23:58:27.35Z 'I|PSnd:'
'2017-10-02T23:58:27.35Z 'I|PSnd:'
'2017-10-02T23:58:27.49Z 'I|PSnd:'
'2017-10-02T23:58:27.49Z 'I|PSnd:'
'2017-10-02T23:58:27.64Z 'I|PSnd:'
'2017-10-02T23:58:27.64Z 'I|PSnd:'
'2017-10-02T23:58:27.79Z 'I|PSnd:'
'2017-10-02T23:58:27.79Z 'I|PSnd:'
'2017-10-02T23:58:27.93Z 'I|PSnd:'
'2017-10-02T23:58:27.93Z 'I|PSnd:'
'2017-10-02T23:58:28.06Z 'I|PSnd:'
'2017-10-02T23:58:28.06Z 'I|PSnd:'
'2017-10-02T23:58:28.21Z 'I|PSnd:'
'2017-10-02T23:58:28.21Z 'I|PSnd:'
'2017-10-02T23:58:28.36Z 'I|PSnd:'
'2017-10-02T23:58:28.36Z 'I|PSnd:'
'2017-10-02T23:58:28.51Z 'I|PSnd:'
'2017-10-02T23:58:28.51Z 'I|PSnd:'
'2017-10-02T23:58:28.65Z 'I|PSnd:'
'2017-10-02T23:58:28.65Z 'I|PSnd:'
'2017-10-02T23:58:28.79Z 'I|PSnd:'
'2017-10-02T23:58:28.79Z 'I|PSnd:'
'2017-10-02T23:58:28.94Z 'I|PSnd:'
'2017-10-02T23:58:28.94Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:51.76Z 'I|PSnd:'
'2017-10-02T23:58:51.76Z 'I|PSnd:'
'2017-10-02T23:58:51.76Z 'I|PSnd:'
'2017-10-02T23:58:51.87Z 'I|PSnd:'
'2017-10-02T23:58:51.87Z 'I|PSnd:'
'2017-10-02T23:58:51.92Z 'I|PSnd:'
'2017-10-02T23:58:51.92Z 'I|PSnd:'
'2017-10-02T23:58:52.02Z 'I|PSnd:'
'2017-10-02T23:58:52.02Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:59:14.29Z 'I|PSnd:'
'2017-10-02T23:59:14.33Z 'I|PSnd:'
'2017-10-02T23:59:14.33Z 'I|PSnd:'
'2017-10-02T23:59:14.33Z 'I|PSnd:'
'2017-10-02T23:59:31.26Z 'I|PSnd:'
'2017-10-02T23:59:31.30Z 'I|PSnd:'
'2017-10-02T23:59:31.30Z 'I|PSnd:'
'2017-10-02T23:59:31.30Z 'I|PSnd:'
'2017-10-02T23:59:42.64Z 'I|PSnd:'
'2017-10-02T23:59:42.66Z 'I|PSnd:'
'2017-10-02T23:59:42.79Z 'I|PSnd:'
'2017-10-02T23:59:42.79Z 'I|PSnd:'
'2017-10-02T23:59:42.94Z 'I|PSnd:'
'2017-10-02T23:59:42.94Z 'I|PSnd:'
'2017-10-02T23:59:48.24Z 'I|PSnd:'
'2017-10-02T23:59:48.28Z 'I|PSnd:'
'2017-10-02T23:59:48.28Z 'I|PSnd:'
'2017-10-02T23:59:48.28Z 'I|PSnd:'

我在PSnd之后没有得到任何东西,第二列是空的,

2 个答案:

答案 0 :(得分:0)

您可以尝试以下方法:

  • 按原样阅读文件
  • 使用cellfunstrfind的组合来查找snd
  • sD
  • 执行相同的操作

上面两个将给出包含两个标记的行的逻辑索引。

  • 通过以这种方式配对两组索引来创建逻辑值矩阵:所有idx但是第一组中的最后一个,所有idx但第二个中的第一个和
  • 添加1
  • 使用两个1
  • 查找矩阵的行

现在您有了要查找的行。

循环:

  • 相对于空白分割i-th行:第一个标记是日期
  • 您的转换格式似乎不正确,您应该删除最后的S并添加Z(见下文)
  • cellarray
  • 中提取日期
  • 使用strfind查找行中[的起点
  • ]
  • 执行相同的操作
  • 您正在寻找的值(例如3D)介于
  • 之间
  • 将值存储在cellarray

现在您有三个cellarray

中的日期,值和整行

请注意,对于具有不同行集的inout文件,可能需要进行额外检查。

可能的实施可能是:

fID = fopen('log1.txt');
C = textscan(fID,'%s','delimiter','\n');
fclose(fID)

x=C{1};
% Find the row with "snd"
idx_1=~cellfun('isempty',(strfind(x,'snd')))
% Find the row with "sD"
idx_2=~cellfun('isempty',(strfind(x,'sD[')))
% Join the two indeces, shifting the second one of 1
% find the row of the matrix with 2 "1"
k=find(all([idx_1(1:end-1) idx_2(2:end)],2))+1
x{k}
% Loop over the identified rows
for i=1:length(k)
   % Split the row wrt ' ', the first elemetn is the date
   a=strsplit(x{i},' ')
   % Convert the date
   the_date{i}=datetime(a{1},'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SS''Z''')
   % look for the position of the "["
   start_idx=strfind(x{k(i)},'[')
   % look for the position of the "]"
   end_idx=strfind(x{k(i)},']')
   % Extract the value between the "[]"
   val{i}=x{k(i)}(start_idx+1:end_idx-1)
end

关于你的inout文件:

所选行

2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[3D]m:0x6564 e:0'
2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[B0]m:0x3901 e:0'
2017-10-02T15:29:47.22Z 'I|PSnd: 233|sD[3D]m:0x41a1 e:0'
2017-10-02T15:29:47.24Z 'I|PSnd: 233|sD[3D]m:0x51c8 e:0'

所选行的IDx:

 2
11
23
36

相应日期

the_date =

  Columns 1 through 2

    [02-Oct-2017 15:29:47]    [02-Oct-2017 15:29:47]

  Columns 3 through 4

    [02-Oct-2017 15:29:47]    [02-Oct-2017 15:29:47]

相应的值:

val = 

    '3D'    'B0'    '3D'    '3D

答案 1 :(得分:0)

这是一个没有任何for循环的解决方案。

基本上第一次用“snd”搜索行。然后检查下一行的“sD”。返回匹配的行&amp;来自匹配线的正则表达式的标记。

fID = fopen('log1.txt');
C = textscan(fID,'%s','delimiter','\n');
fclose(fID);
C = C{1};
%Find all lines with snd
initMatchIdx = ~cellfun(@isempty,regexp(C,'^[0-9,-:T]*Z.*PSnd.*snd'));
%Check the lines 1 row down ... 
checkIdx = [false; initMatchIdx(1:end-1)];
%If it matches return the entire line and the tokens..
[temp, matchedLines] = regexp(C(checkIdx),'(?<date>^[0-9,-:T]*)Z.*PSnd.*sD\[(?<otherVal>\w*)\].*' ,'tokens','match');
%Do some reshaping and un-celling.
matchedLines = [matchedLines{:}]';
temp = [temp{:}];
temp = reshape([temp{:}],2,[])';
%Convert to Date
outTime  = datetime(temp(:,1),'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SS');
otherVal = temp(:,2);

输出如下:

>> outTime
outTime = 
   02-Oct-2017 15:29:47
   02-Oct-2017 15:29:47
   02-Oct-2017 15:29:47
   02-Oct-2017 15:29:47

>> otherVal    
otherVal = 
    '3D'
    'B0'
    '3D'
    '3D'

>> matchedLines
matchedLines = 
'2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[3D]m:0x656...'
'2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[B0]m:0x390...'
'2017-10-02T15:29:47.22Z 'I|PSnd: 233|sD[3D]m:0x41a...'
'2017-10-02T15:29:47.24Z 'I|PSnd: 233|sD[3D]m:0x51c...'