Question

我在变量中有一个字符串，该字符串来自项目的核心部分。现在我想将其转换为unicode字符串。我怎样才能做到这一点添加L或_T（）或TEXT（）不是一个选项。为了进一步说清楚，请参阅下面的

Void foo(char* string) {
    //Here the contents of the variable STRING should be converted to Unicode
    //The soln should be possible to use in C code.
}

TIA 纳温

Answer 1

L用于创建wchar_t文字。

根据您对SafeArrayPutElement的评论以及您对'Unicode'一词的说法，很明显您使用的是Windows。假设char* string在Windows使用的遗留编码中，而不是UTF-8或其他东西（Windows上的安全假设），您可以通过以下方式获取wchar_t字符串：

// typical Win32 conversion in C
int output_size = MultiByteToWideChar(CP_ACP,0,string,-1,NULL,0);
wchar *wstring = malloc(output_size * sizeof(wchar_t));
int size = MultiByteToWideChar(CP_ACP,0,string,-1,wstring,output_size);
assert(output_size==size);

// make use of wstring here

free(wstring);

如果您正在使用C ++，您可能希望通过使用std :: wstring来保护该异常（这使用了一小部分C ++ 11，因此可能需要VS2010或更高版本）：

std::wstring ws(output_size,L'\0');
int size = MultiByteToWideChar(CP_ACP,0,string,-1,ws.data(),ws.size());
// MultiByteToWideChar tacks on a null character to mark the end of the string, but this isn't needed when using std::wstring.
ws.resize(ws.size() -1);

// make use of ws here. You can pass a wchar_t pointer to a function by using ws.c_str()

//std::wstring handles freeing the memory so no need to clean up

这是使用更多C ++标准库的另一种方法（并且利用VS2010不完全符合标准）：

#include <locale> // for wstring_convert and codecvt

std::wstring ws = std::wstring_convert<std::codecvt<wchar_t,char,std::mbstate_t>,wchar_t>().from_bytes(string);

// use ws.c_str() as before

您还在评论中暗示您尝试转换为wchar_t并获得了相同的错误。如果您尝试使用这些方法转换为wchar_t时就是这种情况，则错误位于其他位置。可能在你的字符串的实际内容中。也许它没有被正确地终止？

Answer 2

你不能说“转换为Unicode”。您需要指定编码，Unicode不是编码，而是（大致）一个字符集和一组编码，以将这些字符表示为字节序列。

此外，您必须指定输入编码，例如， string中编码的“å”等字符？

在C中将字符串转换为Unicode

2 个答案: