Question

我在从其他张量中查找值时遇到问题

类似于以下问题：（URL：How to find a value in tensor from other tensor in Tensorflow）

先前的问题是询问输入张量 label_x ，<中是否包含输入张量 x [i] ， y [i] strong> label_y

以下是先前问题的一个示例：

Input Tensor
s_idx = (1, 3, 5, 7)
e_idx = (3, 4, 5, 8)

label_s_idx = (2, 2, 3, 6)
label_e_idx = (2, 3, 4, 8)

问题是给output [i]值1 如果某些j满足 s_idx [i] == label_s_idx [j]和e_idx [i] == label_s_idx [j] 。

因此，在上面的示例中，输出张量为

output = (0, 1, 0, 0)

因为（ s_idx [1] = 3， e_idx [1] = 4）与（ label_s_idx [2] = 3 ， label_e_idx [2] = 4）

（s_idx，e_idx）没有重复的值，而（label_s_idx，label_e_idx）具有重复的值。

因此，假定以下输入示例是不可能的：

s_idx = (2, 2, 3, 3)
e_idx = (2, 3, 3, 3)

因为（ s_idx [2] = 3， e_idx [2] = 3）与（ s_idx [3] = 3， e_idx [3] = 3）。

在这个问题上我想改变的一点是在输入张量中添加另一个值：

Input Tensor
s_idx = (1, 3, 5, 7)
e_idx = (3, 4, 5, 8)

label_s_idx = (2, 2, 3, 6)
label_e_idx = (2, 3, 4, 8)
label_score = (1, 3, 2, 3)

* label_score张量中没有0值

已更改问题中的任务定义如下：

问题是，如果 s_idx [i] == label_s_idx [j] 和 e_idx [i] == label_s_idx [j]，则给output_2 [i]一个label_score [j]的值] 表示满意，

因此，output_2应该像这样：

output = (0, 1, 0, 0)  // It is same as previous problem
output_2 = (0, 2, 0, 0)

如何在Python的Tensorflow上这样编码？

Answer 1

这也许有效。由于这是一项复杂的任务，请尝试更多示例，看看是否获得了预期的结果。

import tensorflow as tf

s_idx = [1, 3, 5, 7]
e_idx = [3, 4, 5, 8]
label_s_idx = [2, 2, 3, 6]
label_e_idx = [2, 3, 4, 8]
label_score = [1, 3, 2, 3]

# convert to one-hot vector.
# make sure all have the same shape
max_idx = tf.reduce_max([s_idx, label_s_idx, e_idx, label_e_idx])
s_oh = tf.one_hot(s_idx, max_idx)
label_s_oh = tf.one_hot(label_s_idx, max_idx)
e_oh = tf.one_hot(e_idx, max_idx)
label_e_oh = tf.one_hot(label_e_idx, max_idx)

# make a matrix such that (i,j) element equals one if
# idx(i) = label(j)
s_mult = tf.matmul(s_oh, label_s_oh, transpose_b=True)
e_mult = tf.matmul(e_oh, label_e_oh, transpose_b=True)

# find i such that idx(i) = label(j) for s and e, with some j
# there is at most one such j by the uniqueness condition.
output = tf.reduce_max(s_mult * e_mult, axis=1)

with tf.Session() as sess:
    print(sess.run(output))
    # [0. 1. 0. 0.]

# extract the label score at the corresponding j index
# and store in the index i
# then remove redundant dimension
output_2 = tf.matmul(
    s_mult * e_mult, 
    tf.cast(tf.expand_dims(label_score, -1), tf.float32))
output_2 = tf.squeeze(output_2)    

with tf.Session() as sess:
    print(sess.run(output_2))
    # [0. 2. 0. 0.]

Answer 2

这是一个可能的解决方案：

import tensorflow as tf

s_idx = tf.placeholder(tf.int32, [None])
e_idx = tf.placeholder(tf.int32, [None])
label_s_idx = tf.placeholder(tf.int32, [None])
label_e_idx = tf.placeholder(tf.int32, [None])
label_score = tf.placeholder(tf.int32, [None])

# Stack inputs for comparison
se_idx = tf.stack([s_idx, e_idx], axis=1)
label_se_idx = tf.stack([label_s_idx, label_e_idx], axis=1)
# Compare every pair to each other and find matches
cmp = tf.equal(se_idx[:, tf.newaxis, :], label_se_idx[tf.newaxis, :, :])
matches = tf.reduce_all(cmp, axis=2)
# Find the position of the matches
match_pos = tf.argmax(tf.cast(matches, tf.int8), axis=1)
# For those positions where a match was found take the corresponding score
output = tf.where(tf.reduce_any(matches, axis=1),
                  tf.gather(label_score, match_pos),
                  tf.zeros_like(label_score))

# Test
with tf.Session() as sess:
    print(sess.run(output, feed_dict={s_idx: [1, 3, 5, 7],
                                      e_idx: [3, 4, 5, 8],
                                      label_s_idx: [2, 2, 3, 6],
                                      label_e_idx: [2, 3, 4, 8],
                                      label_score: [1, 3, 2, 3]}))
# >>> [0 2 0 0]

它将每对值相互比较，因此成本在输入大小上是平方的。另外，tf.argmax用于查找匹配位置的索引，并且如果存在多个可能的索引，则可能不确定地返回其中的任何一个。

如何检查Tensor的值是否包含在其他张量中？

2 个答案: