Matlab importdata具有容差

时间:2013-06-02 00:47:47

标签: matlab

我想从列B和C的两个输入近似值得到列A的名称

Data.csv

A;       B;        C
ALGOL;3.13614789;40.95564610
ALIOTH;12.90050072;55.95981118
ALKAID;13.79233003;49.31324779

以下代码适用于准确值:

fid = fopen('test.csv');
C = textscan(fid, '%s %s %s', 'Delimiter', ';');
fclose(fid);

val1 = input('Enter the first input: ', 's');
val2 = input('Enter the second input: ', 's');

if(find(ismember(C{2},val1)) == find(ismember(C{3},val2)))
    output = C{1}{find(ismember(C{2},val1))}
else
    disp('No match found!');
end

结果:

Enter the first input: 12.90050072
Enter the second input: 55.95981118

output =

ALIOTH

但是如何用val1和val2的近似值得到相同的结果?示例:val1 = 13.001,val2 = 57.210将给出=> “玉衡”

也许我必须使用importdata,然后检查容忍度,但我不知道如何。 有没有办法做到这一点?

2 个答案:

答案 0 :(得分:4)

使用浮点数!

我建议您将数据读取为浮点数,而不是将数据作为字符串读取,

C = textscan(fid, '%s %f %f', 'Delimiter', ';', 'HeaderLines', 1);

这将使您能够进行数值比较。然后你可以计算搜索值和数据矩阵中每一行之间的距离(比如说Euclidean distance):

v = [val1, val2];
dist = sqrt(sum(bsxfun(@minus, [C{2:3}], v) .^ 2, 2));

然后您可以从dist中选择最小值(这将始终保证匹配):

tf = (dist - min(dist) < eps);

或选择低于特定阈值的值:

tol = 2; %// Tolerance of your choice
tf = (dist < tol);

生成的逻辑(布尔)向量tf在匹配行的位置应该有“1”。

您可以通过编写以下内容将其转换为第一列的实际值:

result = C{1}(tf)

概括

此解决方案可针对数据中的任意数量的列P进行推广。另外,假设您要在数据中搜索v的几个不同实例(我们假设v是M×P矩阵,其中v中的每一行都是不同的实例匹配):

vv = permute(v, [3 2 1]);
dist = permute(sqrt(sum(bsxfun(@minus, [C{2:end}], vv) .^ 2, 2)), [1 3 2]);

同样,您可以选择最小值,确保匹配:

tf = (abs(bsxfun(@minus, dist, min(dist))) < eps);

或设置阈值:

tf = (dist < tol);

此处tf是逻辑M×N矩阵(N是数据中的总行数),其中每列指示匹配数据行到v中的相应行。

要将其转换为第一列的值,您必须将输出存储在单元格数组中:

result = arrayfun(@(x)C{1}(tf(:, x)), 1:size(tf, 2), 'UniformOutput', false);

实施例

v = [13, 57.2; 13, 47]; %// Entries to search

vv = permute(v, [3 2 1]);
dist = permute(sqrt(sum(bsxfun(@minus, [C{2:end}], vv) .^ 2, 2)), [1 3 2])
tf = bsxfun(@minus, dist, min(dist)) < eps;

这导致:

tf =
     0     0
     1     0
     0     1

表示v的第一行与第二个数据行匹配,v中的第二行与第三个数据行匹配。要查找第一个数据列中的匹配值,我们执行以下操作:

result = arrayfun(@(x)C{1}(tf(:, x)), 1:size(tf, 2), 'UniformOutput', false);

生成以下单元格数组:

result =
    { 'ALIOTH' }
    { 'ALKAID' }

答案 1 :(得分:1)

假设您对目标中任何一个数字的距离有一定的容差,这是一种方法:

function testApproximate
    % define tolerance
    tolerance = 1;
    % open file
    fid = fopen('Data.csv');
    % read headers and discard
    textscan(fid, '%s %s %s', 1, 'delimiter', ';');
    % read rest of the data, combine columns 2 and 3 into a single matrix
    C = textscan(fid, '%s %f %f', 'delimiter', ';', 'CollectOutput', 1);
    % close file
    fclose(fid);

    % ask user for values
    val1 = input('Enter the first input: ');
    val2 = input('Enter the second input: ');

    % use Euclidean distance to find the closest point within tolerance 
    x = isApproximatelyEqual(C{2}, [val1, val2], tolerance);
    if x > 0
        output = C{1}{x}
    else
        disp('No match found!');
    end
end

function x = isApproximatelyEqual(vectors, member, tol)
    % set default tolerance if it is not provided
    if nargin < 3, tol = Inf; end
    % v is the difference between all points in vectors and our single
    % point in member
    v = vectors - repmat(member, size(vectors,1), 1);
    % find the minimum value and index of square root of sum of square of
    % all difference vectors
    [mn, x] = min(sqrt(diag(v * v')));
    % if minimum value does not meet tolerance, reset x
    if mn > tol
        x = 0;
    end
    % return x
    return
end

此方法使用欧氏距离来找到最近的点。如果您需要单独检查每个值以查看它们是否在容差范围内,请将上面的isApproximatelyEqual函数替换为:

function x = isApproximatelyEqual(vectors, member, tol)
    % set default tolerance if it is not provided
    if nargin < 3, tol = Inf; end
    % v is the difference between all points in vectors and our single
    % point in member
    v = vectors - repmat(member, size(vectors,1), 1);
    % return the first pair of points that matches the tolerance
    x = find(all(abs(v') < tol), 1);
    return
end