国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 學院 > 開發設計 > 正文

Unicode轉化為漢字

2019-11-14 19:18:10
字體:
來源:轉載
供稿:網友
+ (NSString *)replaceUnicode:(NSString *)unicodeStr { NSString *tempStr1 = [unicodeStrstringByReplacingOccurrencesOfString:@"//u"withString:@"//U"]; NSString *tempStr2 = [tempStr1stringByReplacingOccurrencesOfString:@"/""withString:@"///""]; NSString *tempStr3 = [[@"/""stringByAppendingString:tempStr2]stringByAppendingString:@"/""]; NSData *tempData = [tempStr3dataUsingEncoding:NSUTF8StringEncoding]; NSString* returnStr = [NSPRopertyListSerializationpropertyListFromData:tempData mutabilityOption:NSPropertyListImmutable format:NULL errorDescription:NULL]; return [returnStrstringByReplacingOccurrencesOfString:@"//r//n"withString:@"/n"]; }

 


漢字與utf8相互轉化

NSString* strA = [@"%E4%B8%AD%E5%9B%BD"stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];NSString *strB = [@"中國"stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

NSString 轉化為utf8

NSString *strings = [NSStringstringWithFormat:@"abc"];NSLog(@"strings : %@",strings);CF_EXPORTCFStringRef CFURLCreateStringByAddingPercentEscapes(CFAllocatorRef allocator,CFStringReforiginalString,CFStringRef charactersToLeaved, CFStringReflegalURLCharactersToBeEscaped,CFStringEncoding encoding);NSString *encodedValue = (__bridge NSString*)CFURLCreateStringByAddingPercentEscapes(nil, (__bridgeCFStringRef)strings,nil, (CFStringRef)@"!*'();:@&=+$,/?%#[]",kCFStringEncodingUTF8);

iso8859-1 到 unicode編碼轉換

+ (NSString *)changeISO88591StringToUnicodeString:(NSString *)iso88591String{NSMutableString *srcString = [[[NSMutableString alloc]initWithString:iso88591String] autorelease];[srcString replaceOccurrencesOfString:@"&" withString:@"&" options:NSLiteralSearch range:NSMakeRange(0, [srcString length])];[srcString replaceOccurrencesOfString:@"&#x" withString:@"" options:NSLiteralSearch range:NSMakeRange(0, [srcString length])];NSMutableString *desString = [[[NSMutableString alloc]init] autorelease];NSArray *arr = [srcString componentsSeparatedByString:@";"];for(int i=0;i<[arr count]-1;i++){NSString *v = [arr objectAtIndex:i];char *c = malloc(3);int value = [StringUtil changeHexStringToDecimal:v];c[1] = value &0x00FF;c[0] = value >>8 &0x00FF;c[2] = '/0';[desString appendString:[NSString stringWithCString:c encoding:NSUnicodeStringEncoding]];free(c);}return desString;}


Q: Is there a standard method to package a Unicode character so it fits an 8-Bit ASCII stream?

A: There are three or four options for making Unicode fit into an 8-bit format.

a) Use UTF-8. This preserves ASCII, but not Latin-1, because the characters >127 are different from Latin-1. UTF-8 uses the bytes in the ASCII only for ASCII characters. Therefore, it works well in any environment where ASCII characters have a significance as syntax characters, e.g. file name syntaxes, markup languages, etc., but where the all other characters may use arbitrary bytes. 
Example: “Latin Small Letter s with Acute” (015B) would be encoded as two bytes: C5 9B.

b) Use java or C style escapes, of the form /uXXXXX or /xXXXXX. This format is not standard for text files, but well defined in the framework of the languages in question, primarily for source files.
Example: The Polish Word “wyj?cie” with character “Latin Small Letter s with Acute” (015B) in the middle (? is one character) would look like: “wyj/u015Bcie".

c) Use the &#xXXXX; or &#DDDDD; numeric character escapes as in HTML or xml. Again, these are not standard for plain text files, but well defined within the framework of these markup languages.
Example: “wyj?cie” would look like “wyj?cie"

d) Use SCSU. This format compresses Unicode into 8-bit format, preserving most of ASCII, but using some of the control codes as commands for the decoder. However, while ASCII text will look like ASCII text after being encoded in SCSU, other characters may occasionally be encoded with the same byte values, making SCSU unsuitable for 8-bit channels that blindly interpret any of the bytes as ASCII characters.
Example: “ wyjÛcie” where indicates the byte 0x12 and “Û” corresponds to byte 0xDB. [AF] & [KW]


如c所描述,這是一種“未標準"但廣泛采用的做法,說是山寨編碼也行 :-)

所以編碼過程是

字符串 -> Unicode編碼 -> &#xXXXX; or &#DDDDD; 

解碼過程反過來即可 

http://unicode.org/faq/utf_bom.html#General


上一篇:iOS開發之Block

下一篇:IOS開發之Block

發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
主站蜘蛛池模板: 都昌县| 廉江市| 玉树县| 大埔区| 宣武区| 鄂尔多斯市| 尖扎县| 三都| 四会市| 黑龙江省| 屯昌县| 阳原县| 江源县| 静安区| 法库县| 通山县| 宜丰县| 姜堰市| 准格尔旗| 临颍县| 无锡市| 隆林| 油尖旺区| 炎陵县| 仙居县| 甘洛县| 曲靖市| 中山市| 安宁市| 罗平县| 南宁市| 乳山市| 凭祥市| 宕昌县| 青铜峡市| 郧西县| 吉木萨尔县| 广水市| 麻栗坡县| 青冈县| 霍邱县|