takwu - 2006-01-28 1:50 PM
Snappy! - 2006-01-27 4:41 PM
. . . a simple class to convert, read, save text files from Big5 to and fro Unicode.
I'm not exactly sure what you mean. So if you have a Unicode text file, it can convert the Chinese characters to Big5 code and save as a normal
(single byte
) text file? And vice versa? Hmm, interesting...
Yeah ... Big5 code is multi-byte or also called Double-Byte character
(MBCS, DBCS
). So standard ANSI characters like A-Z etc are represented as one byte while chinese characters are represented as 2 bytes, 1 lead byte and another ... 2nd byte?
... Forgot what the 2nd byte is called or if it has a special name.
UNICODE represents *everything* in 2bytes, so an A
(ASCII 65
) is [65][0] with the null trailing byte. A "NULL" is represented as [0][0] in UNICODE.
In any case, the class I wrote allows you to convert from Big5 to and from UNICODE.
This is needed because the display portion in wince is in UNICODE. In fact, the whole wince thingie is in UNICODE. So if you throw a Big5 string to the system, you get strange characters. The MBCS to UNICODE conversion api provided by eVC would have worked fine ... if only the code pages are available! Big5 code page 950 is invalid by default. So calls to all those MBCS, UNICODE apis ith code page 950 will basically give you rubbish or an error.
I tried to find some means to import the code page support, ie have the conversion table that is supported by wince, but gave up and decided its faster to just write my own.
Anyhow, its a very simple class doing a table look up. There are ways to optimize the search, such as
1. Having a primary Big5 lead byte lookup table before the actual lookup table. The conversion table is already sorted by Big5 code, so this would increase the Big5->UNICODE conversion.
2. Having a separate UNICODE->Big5 table, sorted against UNICODE lead byte and implement #1 would optimize UNICODE->Big5 conversion.
I'm right now fiddling with the hpcmemo app so am not quite bothered with optimization as yet.
That will be for the beta phase where I clean up and optimize the code.
I've already "optimize" it a little by scanning the table in 4 byte jumps instead of the earlier 1byte sequential lookup.
In case anyone has an easier built-in solution, post it here too. This class would still be an interesting exercise for those who want to implement their own codepage conversion or to enable those that does not have native support.