The project libUnihan provides a normalized SQLite Unihan database and corresponding C library. All tables in this database are in fifth normal form.
The database and its corresponding database benefits in many areas, such as Chinese character (Hanzi) standard query, variant character research, and input method development. The Hanzi can be searched by its unicode (decimal and hexadecimal), pronunciation, radical-strokes index, major standard, and so on.
Indeed, there are many similar projects that convert Unihan.txt to SQLite format; some even claimed themselves as "normalized". To test whether the database is normalized, query on the tag kSemanticVariant of character U+5275 (創), if it returns:
U+5205<kMatthews U+5231<kMeyerWempe,kHanYu U+6227<kMatthews
|
then it is not normalized as it violates the "no-multiple values in one cell" requirement of 1NF.
Unihan.h
deals not only built-in DBs, but custom DB as well.Unihan.txt
are identical as in Unihan.txt. The project has two parts, one is the Unihan character database and another is the C library that produces and operates the database.
U+3400..U+4DB5
: CJK Unified Ideographs Extension A U+4E00..U+9FA5
: CJK Unified Ideographs U+9FA6..U+9FBB
: CJK Unified Ideographs (4.1)U+9FBC..U+9FC3
: CJK Unified Ideographs (5.1)U+F900..U+FA2D
: CJK Compatibility Ideographs (a)U+FA30..U+FA6A
: CJK Compatibility Ideographs (b)U+FA70..U+FAD9
: CJK Compatibility Ideographs (4.1)U+20000..U+2A6D6
: CJK Unified Ideographs Extension BU+2F800..U+2FA1D
: CJK Compatibility SupplementThis release fixes the no API documents, also correct some functions in collection.[ch], file_functions.[ch] for prepareation of libUnihan 0.6
This release provides further support of ZhuYin and PinYin, such as ZhuYin pseudo field,
and new unihan_query options: -Z, -z, -P, -p
.
Now unihan_query is not only capable of showing the result fields, but
also showing given fields with -O
flags. Thus it will be
more convenient for result checking, especially for SQL like
queries.
Test suite is now introduced into libUnihan. Many bugs have been found with it. :-)
Add kMandarin frequency rank support.
Initial public release