使用正则表达式检测特定数字

时间:2014-04-28 10:12:18

标签: regex string matlab

如何检测3 in(> 3<)而不是3 in(rank_value_3_months)?

"<span data-bind-domain="rank_value_3_months">3</span>" 



rank(i) = str2double(regexp(CharData7,'>(\d)<','match','once'))

enter image description here

以下是此部分的完整代码,我想检测pre-prosses文件后面的数字(&gt; number&lt;),

%function [feature7] = f7(data)

for i = 1:1

%start read html file
data2=fopen(strcat('DATA\WHOIS\TR\',int2str(i),'.htm'),'r')
CharData = fread(data2, '*char')';  %read text file and store data in CharData
fclose(data2);
%end read html file

register_date = regexp(CharData, '<span data-bind-   domain="rank_value_3_months">.*?/span>', 'match'); %checking

%start write only http in image file
fid = fopen(strcat('DATA\PRE-PROCESS_DATA\F23_TR\f23_TR_pdata_',int2str(i)),'w');
for col = 1:numel(register_date)
 fprintf(fid,'%s\n',register_date{:,col});
end
fclose(fid);
%end write only http in image file

s = dir(strcat('DATA\PRE-PROCESS_DATA\F23_TR\','f23_TR_pdata_', int2str(i)));
disp(s.bytes); 

if s.bytes ~= 0

   data7=fopen(strcat('DATA\PRE-PROCESS_DATA\F23_TR\f23_TR_pdata_',int2str(i),''),'r')

   CharData7 = fread(data7, '*char')';  %read text file and store data in CharData
   fclose(data7);

  rank(i) = str2double(regexp(CharData7,'>(\d)<','tokens','once') )

  else

  end

  if rank(i)~=0
   feature23(i)=-1;
  else
   feature23(i)=1;
  end
  end

1 个答案:

答案 0 :(得分:2)

假设CharData7是一个单元格数组,您可以试试这个:

%// The find 
%// - use 'tokens' to return just the part in brackets
%// - use \s* to make spacing flexible (which is also valid XML/HTML)
rank = regexp(CharData7, '>\s*(\d)\s*<', 'tokens', 'once');

%// Re-format into flat cells 
%// ('tokens' returns ALL tokens, which is therefore a cell, regardless
%// of the 'once' setting)
rank = [rank{:}];

%// and convert everything to double
rank(i) = str2double(rank)

所以,在一个很好的难以理解的单行中:

rank(i) = str2double([builtin('_brace', regexp(C,'>\s*(\d)\s*<','tokens','once'), :)]);

如果CharData7只是一个字符串,您可以跳过单元格展平步骤:

 rank(i) = str2double( regexp(C,'>\s*(\d)\s*<','tokens','once') )
相关问题