String from_char = "This is a sample string."; // from char
Similarly, the charset and encoding of wchar_t is also environment dependent. In Windows, wchar_t string uses UTF-16 and you can convert wchar_t string into String instance like the following code:
String from_wchart = L"This is a sample string."; // from wchar_t
Since wchar_t is not identical to UTF-16 in general, you should not use wchar_t on non-Windows environment. You should use UChar2 for UTF-16 (of UCS-4) strings and UChar4 for UTF-32 (of UCS-4) strings. You can assume that UChar2 is always 16bit and UChar4 is 32bit (regardless of the endianness). The code below illustrates how to use them:
const UChar2 sample_1[] = { 0xfeff, 0x30bb, 0x30e9, 0x30fc, 0x30c6, 0x30e0, 0x306f, 0x65e5, 0x672c, 0x306e, 0x4f1a, 0x793e, 0x3067, 0x3059, 0x3002, 0x000d, 0x000a}; String str(sample_1); // from UTF-16
There is also a String constructor that receives UTF-8 string. Since UTF-8 string is also expressed in char string, the UTF-8 version constructor should be explicitly called using utf8s proxy object:
String from_UTF8 = utf8s("This is a sample UTF-8 string.");
We define NULL_STRING enumeration type and NullString value to call a String constructor that initializes the instance with empty string. The following code is a sample use of them:
String str(NullString); assert(str.isEmpty()); // str is empty. String str2 = NullString; // This is also valid. str2 = "Test test test"; str2 = NullString; // the result is identical to String::clear().
You can also use NullString as default value for function parameters:
void someFunction(const String& inString = NullString) { ... } someFunction(); // call the function without any explicit parameter.
You can also initialize a String from another string that is not terminated by '\0' or a portion of a terminated string. The following code illustrates this:
String str1("This is test!", 4); // The string is initialized with "This".
To initialize a String from a portion of another String, use String::substring or String::substringByChar function:
String str1 = "This is test string"; String str2 = str1.substring(8, 4); // str2 is "test".
The difference between String::substring and String::substringByChar is discussed on Character Count vs. String Length.
+ operator: String str1 = "This"; String str2 = "sample code"; String str2 = str1 + " is a " + str2 + "."; // "This is a sample code"
You can also use traditional printf syntax on String by format or format_utf8 function. format function regards the parameters as normal char string and format_utf8 regards them as UTF-8 string.
String str = format("A is %u\n", a);
"\\0" character. String str = "This is test string!"; size_t length = str.getLength();
The difference between the length and the character count is discussed on Character Count vs. String Length.
String str1 = "test", str2 = "TEST"; if(str1 < str2) ...
There are also String::compare (case sensitive) and String::compareI (case insensitive) functions.
String str = "help me!"; UChar1 chr = str[1]; // returns 'e'. (value) str[0] = 'H'; // str is to be "Help me!". (reference)
There is also a way to know the character code of the specified character position. () gets the UCS-4 character code of the specified UCS-4 character index:
String str = "help me!"; UChar4 chr = str(5); // get character code of 'm'.
String str = "This is sample."; const UChar1 *p = str.c_str();
The pointer is valid until you call any function that accesses to the String instance.
String str; // allocate the buffer; 255 means the actual size allocated is 256 bytes. // The function automatically make room for null-terminator and initialized // it with '\\0'. UChar1 *pstr = str.allocate(255); // copy a string to the buffer. std::strcpy(pstr, "This code works!");
const char*.const wchar_t*.const UChar2*.const UChar4*.tchar.h to use TCHAR, you can use TO_TCS to deal with String::toMbs and String::toWcs methods indirectly. It behaves as String::toWcs if UNICODE macro is defined, otherwise as String::toMbs.void some_function(const char *string); String str = "This is sample!"; // This is good use of toMbs some_function(str.toMbs()); // With Win32 API, it works correctly with generic char mapping LPCTSTR pstr = TO_TCS(str); // temp will point to some invalid block const char *temp = str.toMbs(); // This makes temp invalid... str = "Modify the original string"; // Something wrong happens... some_function(temp);
And if you want to convert strings not as a function parameter, you can use mbs, wcs, utf16 and utf32. These are just typedefs of UtfConverter class. They are very useful when using with structures the requires the pointer to the strings. The following code illustrates how to use them:
struct STRUCT_FOR_SOME_FUNC { wchar_t *wszString; // We want to deal with this member! ... }; String str = "This is sample!"; wcs wcsStr(str); STRUCT_FOR_SOME_FUNC sfsf; sfsf.wszString = wcsStr.c_str(); ...
'\0').