Java中将大写转换为小写,小写转换为大写的最快方法

时间:2018-03-08 21:03:10

标签: java performance ascii uppercase lowercase

这是关于表现的问题。我可以使用以下代码将大写转换为小写,反之亦然:

从小写到大写:

// Uppercase letters. 
class UpperCase {  
  public static void main(String args[]) { 
    char ch;
    for(int i=0; i < 10; i++) { 
      ch = (char) ('a' + i);
      System.out.print(ch); 

      // This statement turns off the 6th bit.   
      ch = (char) ((int) ch & 65503); // ch is now uppercase
      System.out.print(ch + " ");  
    } 
  } 
}

从大写到小写:

// Lowercase letters. 
class LowerCase {  
  public static void main(String args[]) { 
    char ch;
    for(int i=0; i < 10; i++) { 
      ch = (char) ('A' + i);
      System.out.print(ch);
      ch = (char) ((int) ch | 32); // ch is now lowercase
      System.out.print(ch + " ");  
    } 
  } 
}

我知道Java提供了以下方法:.toUpperCase( ).toLowerCase( )。考虑性能,执行此转换的最快方法是什么,使用按位操作,就像我在上面的代码中显示的那样,或者使用.toUpperCase( ).toLowerCase( )方法?谢谢。

编辑1:注意我如何使用十进制65503,即二进制1111111111011111。我使用的是16位,而不是8位。根据目前在How many bits in a character?获得更多选票的答案:

  

UTF-16编码中的Unicode字符介于16(2字节)和32位(4字节)之间,但大多数常用字符占16位。这是Windows内部使用的编码。

我的问题中的代码假设为UTF-16。

4 个答案:

答案 0 :(得分:5)

Yes a method written by you will be slightly faster if you choose to perform the case conversion with a simple bitwise operation, whereas Java's methods have more complex logic to support unicode characters and not just the ASCII charset.

If you look at String.toLowerCase() you'll notice that there's a lot of logic in there, so if you were working with software that needed to process huge amounts of ASCII only, and nothing else, you might actually see some benefit from using a more direct approach.

But unless you are writing a program that spends most of its time converting ASCII, you won't be able to notice any difference even with a profiler (and if you are writing that kind of a program...you should look for another job).

答案 1 :(得分:3)

坚持使用提供的方法docker-compose exec solr solr create_core -c development .toLowerCase()。添加两个单独的类来执行.toUpperCase()已经提供的两种方法是一种矫枉过正的做法,会使您的程序变慢(略有差距)。

答案 2 :(得分:3)

您的代码仅适用于ANSII字符。那些没有明确转换小写和大写的语言,例如德语ß(请纠正我,如果我错了,我的德语很糟糕)或使用多字节UTF-8代码点编写字母/符号时。如果你必须处理UTF-8,那么正确性在性能之前出现并且问题并不那么简单,如String.toLowerCase(Locale)方法所示。

答案 3 :(得分:3)

正如所承诺的,这里有两个JMH基准;一个将Character#toUpperCase与您的按位方法进行比较,另一个将Character#toLowerCase与您的其他按位方法进行比较。请注意,只测试了英文字母中的字符。

第一个基准(大写):

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Fork(3)
public class Test {

    @Param({"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
            "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"})
    public char c;

    @Benchmark
    public char toUpperCaseNormal() {
        return Character.toUpperCase(c);
    }

    @Benchmark
    public char toUpperCaseBitwise() {
        return (char) (c & 65503);
    }
}

输出:

Benchmark                (c)  Mode  Cnt  Score   Error  Units
Test.toUpperCaseNormal     a  avgt   30  2.447 ± 0.028  ns/op
Test.toUpperCaseNormal     b  avgt   30  2.438 ± 0.035  ns/op
Test.toUpperCaseNormal     c  avgt   30  2.506 ± 0.083  ns/op
Test.toUpperCaseNormal     d  avgt   30  2.411 ± 0.010  ns/op
Test.toUpperCaseNormal     e  avgt   30  2.417 ± 0.010  ns/op
Test.toUpperCaseNormal     f  avgt   30  2.412 ± 0.005  ns/op
Test.toUpperCaseNormal     g  avgt   30  2.410 ± 0.004  ns/op

Test.toUpperCaseBitwise    a  avgt   30  1.758 ± 0.007  ns/op
Test.toUpperCaseBitwise    b  avgt   30  1.789 ± 0.032  ns/op
Test.toUpperCaseBitwise    c  avgt   30  1.763 ± 0.005  ns/op
Test.toUpperCaseBitwise    d  avgt   30  1.763 ± 0.012  ns/op
Test.toUpperCaseBitwise    e  avgt   30  1.757 ± 0.003  ns/op
Test.toUpperCaseBitwise    f  avgt   30  1.755 ± 0.003  ns/op
Test.toUpperCaseBitwise    g  avgt   30  1.759 ± 0.003  ns/op

第二个基准(小写):

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Fork(3)
public class Test {

    @Param({"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
            "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"})
    public char c;

    @Benchmark
    public char toLowerCaseNormal() {
        return Character.toUpperCase(c);
    }

    @Benchmark
    public char toLowerCaseBitwise() {
        return (char) (c | 32);
    }
}

输出:

Benchmark                (c)  Mode  Cnt  Score   Error  Units
Test.toLowerCaseNormal     A  avgt   30  2.084 ± 0.007  ns/op
Test.toLowerCaseNormal     B  avgt   30  2.079 ± 0.006  ns/op
Test.toLowerCaseNormal     C  avgt   30  2.081 ± 0.005  ns/op
Test.toLowerCaseNormal     D  avgt   30  2.083 ± 0.010  ns/op
Test.toLowerCaseNormal     E  avgt   30  2.080 ± 0.005  ns/op
Test.toLowerCaseNormal     F  avgt   30  2.091 ± 0.020  ns/op
Test.toLowerCaseNormal     G  avgt   30  2.116 ± 0.061  ns/op

Test.toLowerCaseBitwise    A  avgt   30  1.708 ± 0.006  ns/op
Test.toLowerCaseBitwise    B  avgt   30  1.705 ± 0.018  ns/op
Test.toLowerCaseBitwise    C  avgt   30  1.721 ± 0.022  ns/op
Test.toLowerCaseBitwise    D  avgt   30  1.718 ± 0.010  ns/op
Test.toLowerCaseBitwise    E  avgt   30  1.706 ± 0.009  ns/op
Test.toLowerCaseBitwise    F  avgt   30  1.704 ± 0.004  ns/op
Test.toLowerCaseBitwise    G  avgt   30  1.711 ± 0.007  ns/op

我只包含了几个不同的字母(即使所有字母都经过测试),因为它们都有相似的输出。

很明显,你的按位方法更快,主要是由于Character#toUpperCaseCharacter#toLowerCase执行逻辑检查(正如我今天在评论中提到的那样)。

相关问题