
The Java2 JDK now comes with a Chinese localized supplement.
Java's site also has a
Chinese glossary of Java terms.
Java works internally with Unicode, so when compiling source code
files that used a Chinese encoding such as Big5 or GB2312, you need to
specify the encoding to the compiler in order to properly convert it to Unicode.
Java comes with a classes called InputStreamReader and OutputStreamWriter that
translate into and out of Unicode from local encodings. Fortunately, two of the
supported encodings are GB2312 (used in mainland China and Singapore) and Big5
(used in Hong Kong and Taiwan). Below is a sample program that converts a GB2312
file to UTF-8. It is derived from a sample program in Java in a Nutshell.
Java 2 allows the programmer to directly access the fonts on the
machine. The code sample below gets a list of all the fonts on the
system, and then checks each font to see if it can display a sample
Chinese string. Matching fonts are printed. Variations of the below
code can be used to automatically find Chinese fonts and set the font
of the Swing components accordingly. The method canDisplayUpTo seems
to have changed since 1.2. This sample works for Java 1.4.
Previous to the introduction of Swing set of peerless Java AWT
components, Java could not display Chinese except on Chinese operating
systems. With Swing, you can display Chinese in any component,
providing you have fonts that support Chinese on your system.
Previously, to find a font that could display Chinese you needed to modify a file called
font.properties to list the available Chinese fonts. This is
not a simple process for the average user and I think the above
code is easier to use. But in case you do need to modify font.properties, here is an
excerpt from my font.properties file, where Bitstream Cyberbit
is the Unicode font. A list of Unicode
fonts supporting Chinese can be found
here.
Java 1.2 includes Unicode fonts. Unfortunately, these fonts do not
support Chinese, Japanese, or Korean yet. For more general information on
fonts and Java, visit this
programmer's page.
Swing components can display any Unicode character that you have the
font for. Here is a sample program that reads a GB file and displays
it in a JTextArea.
If you want to use local specific resource files for Chinese
speaking areas (e.g. zh, zh_CN, zh_TW, etc.) then you can't just use
GB, Big5 or some other normal Chinese encoding. You can create the
resource file(s) using GB, Big5, etc. but you must then convert the
file to use the \uXXXX Unicode escape notation. This is easily done
with the native2ascii tool included with the JDK.
Java 1.2 comes with a set of classes for interacting with the
operating system's built-in input methods. Also, as of version 1.3
Java supports input methods that are independent of the OS. For
more information on this, visit Sun's
manual on using input methods.
Using the tutorial on the JavaSoft website as a guide, I've created
six types of Chinese input methods that any program that runs with
JDK1.3 can use. After downloading the jar file with the input method,
copy it into the lib/ext directory of the your JDK1.3 or JRE1.3. Then
you will be able to use them in any Java application. You will need
to set your font.properties file to include a Chinese font for the
characters to appear properly in the selection box.
Another possible way to input Chinese on Microsoft Windows is to
use Microsoft's own Chinese input methods. However, depending on the
computer, the characters may appear as question marks in the text
field. This is a bug in current implementations in Java, but the next
release of Java, 1.3.1, should fix it. The pure Java input methods
below do not have this problem. One other solution for users of
Windows 2000 is to switch you default locale to traditional or
simplified Chinese.
To activate these input methods in Windows, click on the control
box in the upper left-hand corner. One of the options will be to
select input methods. You can then choose which of the installed
input methods you want to use. Start typing pinyin. A box will
appear beneath the current position with ten matching characters. To
select one either type the number for it, hit the space bar for the
top character, or start typing the next pinyin sequence and the top
character will automatically be selected. If the desired character is
not in the list, use the period "." to move forward and the comma ","
to move back.
These input methods are a work in progress and I plan to improve
them and add new ones. Along those lines I have included the source
code for the input methods in each jar file and am putting them out as
free, open source programs. Please improve and use them in your own
programs and send back the improvements to incorporate in future versions.
Compiling Java Source Files Containing Chinese
javac -encoding big5 sourcefile.java
or
javac -encoding gb2312 sourcefile.java
Loading GB or Big5 files
import java.io.*;
public class inputtest {
public static void main(String[] args) {
String outfile = null;
try { convert(args[0], args[1], "GB2312", "UTF8"); } // or "BIG5"
catch (Exception e) {
System.out.print(e.getMessage());
System.exit(1);
}
}
public static void convert(String infile, String outfile, String from, String to)
throws IOException, UnsupportedEncodingException
{
// set up byte streams
InputStream in;
if (infile != null) in = new FileInputStream(infile);
else in = System.in;
OutputStream out;
if (outfile != null) out = new FileOutputStream(outfile);
else out = System.out;
// Use default encoding if no encoding is specified.
if (from == null) from = System.getProperty("file.encoding");
if (to == null) to = System.getProperty("file.encoding");
// Set up character stream
Reader r = new BufferedReader(new InputStreamReader(in, from));
Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
// Copy characters from input to output. The InputStreamReader
// converts from the input encoding to Unicode,, and the OutputStreamWriter
// converts from Unicode to the output encoding. Characters that cannot be
// represented in the output encoding are output as '?'
char[] buffer = new char[4096];
int len;
while((len = r.read(buffer)) != -1)
w.write(buffer, 0, len);
r.close();
w.flush();
w.close();
}
}
Displaying Chinese
Finding Chinese Fonts
// Determine which fonts support Chinese here ...
Vector chinesefonts = new Vector();
Font[] allfonts = GraphicsEnvironment.getLocalGraphicsEnvironment().getAllFonts();
int fontcount = 0;
String chinesesample = "\u4e00";
for (int j = 0; j < allfonts.length; j++) {
if (allfonts[j].canDisplayUpTo(chinesesample) == -1) {
chinesefonts.add(allfonts[j].getFontName());
}
fontcount++;
}
font.properties
dialog.0=Arial,ANSI_CHARSET
dialog.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
dialog.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
dialog.3=Bitstream Cyberbit
dialoginput.0=Courier New,ANSI_CHARSET
dialoginput.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
dialoginput.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
dialoginput.3=Bitstream Cyberbit
serif.0=Times New Roman,ANSI_CHARSET
serif.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
serif.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
serif.3=Bitstream Cyberbit
sansserif.0=Arial,ANSI_CHARSET
sansserif.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
sansserif.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
sansserif.3=Bitstream Cyberbit
monospaced.0=Courier New,ANSI_CHARSET
monospaced.1=WingDings,SYMBOL_CHARSET,NEED_CONVERTED
monospaced.2=Symbol,SYMBOL_CHARSET,NEED_CONVERTED
monospaced.3=Bitstream Cyberbit
Chinese and Swing
import java.lang.*;
import java.io.*;
import java.awt.*;
import java.awt.event.*;
import java.util.*;
import javax.swing.*;
public class swingsample extends JFrame {
private static JTextArea mTextArea;
public swingsample(String filename) {
super("GB File Viewer");
createUI();
try {
loadfile(filename, "GB2312"); // or "BIG5"
}
catch (Exception loadexc) {
}
setVisible(true);
}
public static void loadfile(String filename, String enc)
throws IOException, UnsupportedEncodingException
{
String newline;
String buffer;
InputStream in;
newline = System.getProperty("line.separator");
in = new FileInputStream(filename);
// Set up character stream
BufferedReader r = new BufferedReader(new InputStreamReader(in, enc));
while ((buffer = r.readLine()) != null) {
mTextArea.append(buffer + newline);
}
r.close();
}
protected void createUI() {
setSize(500, 500);
Container content = getContentPane();
content.setLayout(new BorderLayout());
mTextArea = new JTextArea();
//mTextArea.setFont(new Font("Bitstream Cyberbit", Font.PLAIN, 12));
JScrollPane scrollPane = new JScrollPane(mTextArea,
JScrollPane.VERTICAL_SCROLLBAR_ALWAYS,
JScrollPane.HORIZONTAL_SCROLLBAR_ALWAYS);
content.add(scrollPane, BorderLayout.CENTER);
// Exit the application when the window is closed.
addWindowListener(new WindowAdapter() {
public void windowClosing(WindowEvent e) {
System.exit(0);
}
});
}
public static void main(String[] args) {
new swingsample(args[0]);
}
} // swingsample
Chinese Resource Files
Inputting Chinese
Pinyin w/o Tones - Simplified and Traditional Characters
Pinyin with Tones - Simplified and Traditional Characters
Pinyin w/o Tones - Simplified Characters
Pinyin with Tones - Simplified Characters
Pinyin w/o Tones - Traditional Characters
Pinyin with Tones - Traditional Characters
Chinese Java Links