Question

我有一个文件，该文件的字符串长度已知，但没有分隔符。

% What should be the result
vals = arrayfun(@(x) ['Foobar ', num2str(x)], 1:100000, 'UniformOutput', false);

% what the file looks like when read in
strs = cell2mat(vals);
strlens = cellfun(@length, vals);

最直接的方法很慢：

out = cell(1, length(strlens));
for i=1:length(strlens)
    out{i} = fread(f, strlens(i), '*char');
end % 5.7s

读取所有内容并随后将其拆分更快：

strs = fread(f, sum(strlens), '*char');
out = cell(1, length(strlens));
slices = [0, cumsum(strlens)];
for i=1:length(strlens)
    out{i} = strs(slices(i)+1:slices(i+1));
end % 1.6s

使用mex功能，我可以降低到0.6s，因此仍有很大的改进空间。我可以使用纯Matlab（R2016a）获得可比的性能吗？

编辑：看似完美的mat2cell函数没有帮助：

out = mat2cell(strs, 1, strlens); % 2.49s

Answer 1

您的最后一种方法-一次阅读所有内容，然后将其拆分-对我来说似乎是最佳选择，也是我做这种事情的方式。

对于我来说，当文件同时位于Mac上的R2016b和R2019a的本地SSD上时，它的运行时间约为80毫秒。

function out = scratch_split_strings(strlens)
%
% Example:
% in_strs = arrayfun(@(x) ['Foobar ', num2str(x)], 1:100000, 'UniformOutput', false);
% strlens = cellfun(@length, in_strs);
% big_str = cat(2, in_strs{:});
% fid = fopen('text.txt'); fprintf(fid, '%s', big_str); fclose(fid);
% scratch_split_strings(strlens);

t0 = tic;
fid = fopen('text.txt');
txt = fread(fid, sum(strlens), '*char');
fclose(fid);
fprintf('Read time: %0.3f s\n', toc(t0));

str = txt;
t0 = tic;
out = cell(1, length(strlens));
slices = [0, cumsum(strlens)];
for i = 1:length(strlens)
    out{i} = str(slices(i)+1:slices(i+1))';
end
fprintf('Munge time: %0.3f s\n', toc(t0));

end

>> scratch_split_strings(strlens);
Read time: 0.002 s
Munge time: 0.075 s

您是否将其粘贴在探查器中以查看在这里占用了您的时间？

据我所知，没有更快的方法可以使用本机M代码将单个基本数组拆分为可变长度子数组。您做对了。

按位置将字符串拆分为单元格数组

1 个答案: