Question

我给了一个长句和一些单词（在句子中搜索），我必须找到句子的最小部分，其中包含该句子中要搜索的所有单词并打印该部分。

我试过了， 1.首先获取给定句子中所有单词的所有位置（索引）。 2.然后尝试从这些单词索引中找到最小的部分。

但是我在实施第二部分时遇到了问题。所以我想要一些建议，或者你建议任何其他算法可以使它快速。

import java.util.*;
import java.io.*;
public class ShotestSubSegment2 
{
static SearchStr[] search;
static String copystr;
public static void main(String s[])
{
try
{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
        String str = in.readLine();
        copystr = str.substring(0).toLowerCase();
        int k = Integer.parseInt(in.readLine());
        search = new SearchStr[k];
        for(int i=0;i<k;i++)
        {
            search[i] = new SearchStr(in.readLine().toLowerCase());
            getIndicesOf(search[i]);
            if(search[i].noOfElements()==0)
            {
                System.out.println("No Segments Found");
                return;
            }
        }
        searchSmallestPart();//Dont getting Idea Of this

    }
    catch(Exception x){}
}

public static void getIndicesOf(SearchStr searchS) 
{
    String searchStr = searchS.getName();
    int startIndex = 0, searchStrLen=0;
    int index;
    searchStr = searchStr.toLowerCase();
    searchStrLen = searchStr.length();
    while ((index = copystr.indexOf(searchStr, startIndex)) > -1) 
    {
        searchS.add(index);
        startIndex = index + searchStrLen;
    }
}

}

Answer 1

使用此课程：

class FoundToken  {
   int start;
   end start;
   String word;
   int endOfCompleteSequence;
}

1）在列表中存储所有找到的包含起始索引和结束索引的标记

2）对于每个列表项，采用从以下标记构建的第一个完整序列（在列表中）并包含所有所需的标记

3）取最短的序列（基于endOfCompleteSequence-start)。

Answer 2

将每个单词存储到列表中单词出现的位置。

word1 - 找到word1的位置列表1 word2 - 找到word2的位置列表2 ...

你必须最小化（Pend-Pstart），其中Pstart是所有单词的有效位置组合的位置列表中的最小位置，而Pend是最大的位置。要为文本中找到的所有单词生成组合，请使用回溯。

我希望我能说清楚。

Answer 3

这是我的算法。也许有点外行，但这是我想出的最基本的方法。

输入后，循环抛出单词并检查列出的单词匹配哪些匹配。使用一个数组进行处理，该数组将存储列出的单词。
一旦找到匹配项，请标记该位置并从该位置开始另一次扫描并检查匹配项。并行从列表中删除匹配的单词并检查，直到找到列表中的所有单词。直到找到下一个单词，在字符串中添加所有单词（中间的单词）。这个特定的循环继续，直到列出的单词数组的所有元素都为空。
最里面的扫描完成，存储字符串，因此在另一个数组中找到（比如String sol_array）。并继续前一个循环。（上一个循环运行（original_string.length（） - listed_word_array.length）次）
在最外层循环完成后，运行sol_array的扫描并检查字符串的长度是否最小，该字符串就是答案。

Answer 4

临时变量：

bestseq包含当前最佳序列的开始/结束的对象/集合（最初为null）
currently_closest HashMap＆lt;单词，索引＆gt; （用适当的类型替换单词和索引，最初都是特殊值，例如-1）
current_start，current_end（索引，最初为-1）

“算法”：

贯穿字符串
如果当前单词是单词，则将当前索引存储在current_closest [word]中，调整current_start和current_end以反映current_closest中新的最大和最小键
if（current_end-current_start<bestseq.end-bestseq.start或bestseq例如null）且所有字词都具有非特殊索引（即非-1）set =＆gt;将bestseq设为current_start，current_end - 序列

我想这应该在O（length_of_sentence * number_of_words）时间内运行。

找到最小的子段

4 个答案: