August 3rd, 2006
Microsoft has released a new combined version of its terminology translations. From the website:
To provide users with more up-to-date terminology, Microsoft has replaced the glossary content that was previously posted to the Microsoft ftp site with a more concise document that is easier to use. We have consolidated and moved the data from the ftp site to the Microsoft download center in an effort to significantly increase reliability and accessibility for users.
This new CSV file contains over 9,000 English terms plus the translations of the terms for up to 45 different languages. Microsoft provides the Microsoft terminology data to allow our customers, ISVs, and partners to have a more consistent user experience across the products they are using and developing.
These languages include Chinese (Hong Kong S.A.R.), Chinese (People’s Republic of China), and Chinese (Taiwan). Programmers looking to use terminology already familiar to their users will find this useful.
Posted in Software | Comments Off
March 23rd, 2006
From the announcement:
This is the official announcement for the Third International Chinese Language Processing Bakeoff, sponsored by the Special Interest Group for Chinese Language Processing (SIGHAN) of the Association for Computational Linguistics. The bakeoff will occur over the late spring of 2006 and the results will be presented at the 5th SIGHAN Workshop, to be held at ACL-COLING 2006 in Sydney, Australia, July 22-23, 2006.
The first bakeoff, held in 2003 and presented at the 2nd SIGHAN Workshop at ACL 2003 in Sapporo, has become the pre-eminent measure for Chinese word segmentation evaluation and has been cited in numerous papers. The second bakeoff held in 2005 and presented at the 4th SIGHAN Workshop at IJCNLP-05 on Jeju Island, Korea demostrated further progress in this task. In a change from the first two evaluations, the third bakeoff will augment the classic Word Segmentation task with a new Named Entity Recognition task.
For more information visit the Bake-off website.
Posted in NLP | No Comments »
February 21st, 2006
Slate has a readable article covering the basics of typing Chinese using a QWERTY style keyboard.
Posted in Software | No Comments »
January 16th, 2006
I’ve recently learned of a great tool for learning Chinese: SentBase. It has a Google-like interface where you can type in a few words of Chinese or English and find sentences that contain those words. In addition, each sentence is paired with its corresponding translation. This can be particularly useful for learning Chinese, since you can see how idiomatic language is translated. The search interface allows users to restrict searches to British or American English, or simplified character Chinese. Parallel sentence examples are also useful in developing machine translation systems.
Posted in NLP | No Comments »
January 6th, 2006
Gong, a real-time Internet-based voice communication tool has recently released version 4. The notable aspect of Gong is that it includes special support for Chinese and Japanese, including for pinyin. It can also be used to create Chinese audio lessons.
More on Gong from the website:
Gong is a tool that supports Internet-based text and audio communication. It allows groups of people such as students and teachers to participate in discussion groups using their computers. Participants can leave text and voice messages on voice boards. They can listen to and reply to other text and voice messages left by other people. A group of people can join a real-time text/voice chat which can be recorded on voice boards. In addition, there are some powerful features such as support for multiple languages, styled text editing, voice editing, voice speed up/slow down, selective word/phrase playback and support for multilingual interface.
Posted in Software, Internet | No Comments »
November 18th, 2005
Following the 2005 Chinese Word Segmentation Bake-off, the training, testing, and gold-standard data sets have been released. These data sets, available for research purposes, provide a rich resource for developing and testing new segmentation methods. The various corpora were supplied by CKIP, Academia Sinica, Taiwan; City University of Hong Kong, Hong Kong SAR; Beijing Universty, China; Microsoft Research, China.
Posted in NLP | No Comments »
November 17th, 2005
A recent survey by Guo Liang of the Chinese Academy of Social Sciences sheds light on internet usage in China. Among the interesting findings are that the typical internet user is “young, male, richer and more highly educated”, relatively few people will buy products on the internet, and more users prefer to use instant messaging than e-mail.
Posted in Regional, Internet | No Comments »
October 31st, 2005
An article in Techworld details some of the plans China has to help visitors during the upcoming 2008 Olympics. Among the helps will be a phone with a built in phrase translator that can also read the Chinese phrases out loud.
Posted in Regional, NLP | No Comments »
October 7th, 2005
Following the approval of the Ideographic Variation Database for Unicode, a new draft is now available describing the operation of the database.
Posted in Character Sets | Comments Off
July 13th, 2005
Recently I’ve been looking around for resource that describe the components that make up Chinese characters. Here are some links to the most useful:
http://mousai.as.wakwak.ne.jp/projects/chise/ids/index.html.ja.iso-2022-jp
http://rt.openfoundry.org/Foundry/Project/Wiki/60/index.html
http://www.sinica.edu.tw/~cdp/zip/hanzi/hzmanual.zip
http://glyph.iso10646hk.net/doc/normal_char.txt
Posted in NLP | No Comments »