Efficient character processing of NSString

时间:2016-10-15 17:09:49

标签: objective-c swift performance cocoa nsstring

I need to do some character processing of huge strings in Cocoa (from Objective-C or Swift), where:

  • The input string passed in an NSString has n characters
  • The result should be returned in an NSString
  • It's OK to make simplifying assumptions on the characters in the string. I mean we can assume they are all ASCII, or all single code unit UTF8, or even all unichar (so as to make indexing and length computation O(1))

For the sake of the example, let's say the processing is a rot13 obfuscation.

I want to do it space and time efficiently:

  • I want to get a mutable buffer of characters from the source string (probably a copy)
  • I want to alter that buffer in place
  • I want to return the altered buffer in a constructed NSString without doing another copy.

I want space complexity ≤ 2*n+ O(1).

I want time complexity O(n) - with as small a constant as possible.

The NSString API allows for that easily, but is too inefficient, with plenty of back and forth conversion from character to string. I am shooting for C-level efficient processing of characters here.

The NSString API also allows to get a buffer of character with methods such as dataUsingEncoding: or UTF8String. But I can't find a way to use the API where I copy the characters for processing no more than once.

1 个答案:

答案 0 :(得分:1)

分配unichar的缓冲区。复制到缓冲区w / getCharacters(range:)。操纵。使用init(charactersNoCopy:length:freeWhenDone:)转换回来。

unichar是UTF-16。如果您愿意假设没有任何东西需要代理字符(例如,如果您认为它是ASCII),那么您可以根据length分配缓冲区(它将是2 *长度)。如果您想要更灵活,但仍需要以2-3倍内存需求为代价的O(1),那么请使用maximumLengthOfBytes。如果你想更灵活,但愿意接受O(n)步骤(我假设你不是),那么使用lengthOfBytes

NSString内部存储为UTF-16有点常见,所以这往往是一种非常快速的转换。也就是说,如果您对字符串有足够的了解,并且愿意编写额外的代码来直接操作编码,那么请查看fastestEncoding