Question

我有大量的ID可以存储在 HashSet或String 中即。

String strIds=",1,2,3,4,5,6,7,8,.,.,.,.,.,.,.,1000,";
    Or
HashSet<String> setOfids = new HashSet<String>();
setOfids.put("1");
setOfids.put("2");
.
.
.
setOfids.put("1000");

我想进一步搜索ID

我应该使用哪种方式来获得更好的性能（更快和更高效的内存）

1) strIds.indexOf("someId");
    or
2) setOfids.contains("someId");

告诉我任何其他方式，我也可以这样做。感谢您在这里寻找：）

Answer 1

哈希表查找是＆＃34;常数时间＆＃34;，即它不随着id的数量而增长。

但字符串中所有id的紧凑字符串需要的内存最少。

所以，请下定决心：最快的检索或最少的存储空间！

Answer 2

除了表演之外，你不应该使用这样的字符串。虽然它很有创意，但并不是像那样编制索引。如果要更改ID的格式会发生什么？

为了提高性能并节省hashSet的内存，你当然可以使用

HashSet<Integer> instead of HashSet<String>

Answer 3

Set将是更好的选择。原因：

如果O(1)，搜索结果为Set。如果是String，则为O(N)。
随着数据的增长，性能不会下降。
如果要进行任何类型的数据操作（添加或删除ID），String将使用更多内存。
indexOf也可能会给你带来负面结果

假设1000存在但100不存在，因此indexOf将返回1000的位置，因为100是1000的子串。

性能的简单POC代码：

import java.util.HashSet;
import java.util.Set;

public class TimeComputationTest {

  public static void main(String[] args) {
    String strIds = null;
    Set<String> setOfids = new HashSet<String>();
    StringBuffer sb = new StringBuffer();

    for (int i = 1;i <= 1000;i++) {
      setOfids.add(String.valueOf(i));
      if (sb.length() != 0) {
        sb.append(",");
      }
      sb.append(i);
    }
    strIds = sb.toString();

    testTime(strIds, setOfids, "1");
    testTime(strIds, setOfids, "100");
    testTime(strIds, setOfids, "500");
    testTime(strIds, setOfids, "1000");
  }

  private static void testTime(String strIds, Set<String> setOfids, String string) {
    long startTime = System.nanoTime();
    strIds.indexOf(string);
    long endTime = System.nanoTime();

    System.out.println("String search time for (" + string + ") is " + (endTime - startTime));

    startTime = System.nanoTime();
    setOfids.contains(string);
    endTime = System.nanoTime();

    System.out.println("HashSet search time for (" + string + ") is " + (endTime - startTime));
  }
}

输出将是（约）：

String search time for (1) is 3000
HashSet search time for (1) is 7000
String search time for (100) is 6000
HashSet search time for (100) is 2000
String search time for (500) is 33000
HashSet search time for (500) is 2000
String search time for (1000) is 71000
HashSet search time for (1000) is 1000

Answer 4

我认为HashSet是更好的选择。有两个好处：

不允许重复
HashSet内部假设HashMap，因此检索速度更快。

Answer 5

它会更快地运作:::

String strIds=",1,2,3,4,5,6,7,8,.,.,.,.,.,.,.,1000,";
String searchStr = "9";
boolean searchFound = strIds.contains(","+searchStr +",");

Java：从hashset或String中搜索ID

5 个答案: