Question

我正在寻找传递以下测试用例的方法：

 assertEquals(0, indexOfIgnoreCase("ss", "ß"));
 assertEquals(0, indexOfIgnoreCase("ß", "ss"));
 assertEquals(1, indexOfIgnoreCase("ßa", "a"));

有趣的角色（称为德语“sharp S”）并非真正具有异国情调（U + 00DF，存在于Latin-1 Supplement Unicode块中），除非您将其大写："ß".toUpperCase()返回"SS"（语言环境无关）。

我搜索至少为前256个Unicode字符工作的解决方案只返回ICU4j，我不想使用它。

This question（间接）要求String.contains的不区分大小写的版本，但请注意，大多数答案仅适用于ASCII。接受的答案可以像

一样进行调整

final int flags = Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE;
Pattern pattern = Pattern.compile(Pattern.quote(needle), flags);
final Matcher matcher = pattern.matcher(hay);
return matcher.find() ? matcher.start() : -1;

因此它也适用于非ASCII并返回位置而不是布尔值。但是，它没有通过上述测试。

Apache org.apache.commons.lang3.StringUtils也没有通过。此nice answer利用String.regionMatches提供了快速解决方案，但未通过。

转换为小写是不够的，转换为大写排序，但最后一个测试用例将返回2而不是1。

我对

的结果有点不确定

indexOfIgnoreCase("ßa", "sa")

应该是？ 0.5因为“针头”从S的大写后的第二个ß开始？

Answer 1

将原始文本和针转换为字符数组
将每个字符转换为大写
在原始文本数组中查找针子阵列位置。

例如：

char[] text = convertToUpperCase("...".toCharArray());
char[] needle = convertToUpperCase("...".toCharArray());

for (int i = 0; i < text.length - needle.length; i++)
    if (arraysEqual(needle, 0, text, i, needle.length)) // The same signature as System.arraycopy
        return i;

return -1;

不区分大小写的索引用于Latin-1字符

1 个答案: