Editing the Dictionaries

From Wenlin Guide
Jump to navigation Jump to search

Wenlin 216x93.png Chapter 9 of the Wenlin User’s Guide

This chapter describes how to edit the dictionary entries of Chinese characters and words, and English words. To understand this chapter, you should already have read (at least) the following chapters: Chapter 1 (Basic Operations), Chapter 5 (Looking Up Vocabulary), Chapter 6 (Lists), and Chapter 8 (Editing Documents).

You can easily edit Wenlin’s dictionaries: you can add new vocabulary, and add your own notes, examples, and explanations. You can delete information – even the entries themselves – and customize the dictionaries to suit your needs.

Editing Dictionary Entries in General

Please Note: It is only possible to edit the dictionaries when they have been installed on a writable hard disk – there is no way to change the files on a CD-ROM, since it is a Read-Only Memory. Therefore, if Wenlin hasn’t already been installed on a hard disk, and you are running Wenlin from the CD-ROM, you need to perform an installation, as described in the section on Installation earlier in this Guide.

Wenlin’s cross-referencing capabilities are based on notations in the dictionary entries. For example, a Chinese character’s entry indicates the correspondence between simple and full forms, the character’s pronunciation(s), and the components that form the character. Wenlin maintains indexes that keep track of all this information. In turn, these indexes enable you to see lists such as Characters by Pinyin and Characters Containing Components.

To expand the scope of Wenlin’s lists (and preserve their integrity), you’ll need to abide by the order of arrangement or structure for special notations in the dictionary entry.

On the other hand, if you simply want to add your own definitions and notes, you needn’t concern yourself with such issues. With few restrictions, you can place anything in, or delete anything from, a dictionary entry. Just leave the structure and special notations “as-is” and do your editing on the non-critical areas.

Specific instructions for editing each kind of dictionary entry are given later in this chapter. The following apply when editing any dictionary entry:

  • Changes in Appearance When you choose Enable Editing, the entry is displayed differently: triangle buttons vanish, and only “raw” information appears, including, possibly, some notations and abbreviations that are normally invisible. When you save the entry, the buttons reappear, and the abbreviations and notations are expanded. Button commands are not stored in the entries, but are automatically inserted when Wenlin assembles and displays each entry.
  • Spaces When a notation calls for enclosing elements within parentheses or brackets, do not include spaces. For example, do this: [hǎo] and not this: [  huài   ]. Regarding the presence or absence of spaces in general, try to imitate notations in existing dictionary entries.
  •  Parts of Speech These should be abbreviated according to the documentation available by choosing Abbreviations from the Help menu. Notations for parts of speech do not affect indexes.

Saving a Dictionary Entry

Changes you make to a dictionary entry are not permanent until you save them. To save an entry, choose Save from the File menu. If you are modifying an existing entry, Wenlin asks you to confirm that you want to replace it. After you choose Save, the dictionary entry’s appearance changes: the triangle buttons reappear and the abbreviations and notations are expanded. Editing is no longer enabled; if you want to make further changes, simply choose Enable Editing again.

If you try to close the window of a dictionary entry that you have edited but not saved, Wenlin asks whether you want to save your changes, discard them, or cancel the process of closing (leave the window open). Normally you’ll want to save; but if you make a mistake while editing a dictionary entry, you can close the window and discard the mistaken changes.

Wenlin will not save an entry that has serious errors in the arrangement of its special notations; instead it will issue one or more brief messages stating the problem. Fix the problem and choose Save again.

Deleting a Dictionary Entry

All dictionary entries, except Zìdiǎn (character dictionary) entries, can be deleted. To delete an entry:

• Open the entry, and make sure it is the active window
• Choose Enable Editing from the File menu
• Choose Select All from the Edit menu
• Choose Clear from the Edit menu
The window becomes empty
• Choose Save from the File menu
A dialog box asks if you want to delete the entry
• Choose OK to delete the entry

Although you can’t completely delete a Zìdiǎn entry, you can delete everything in it except for the character itself.

Editing the Zìdiǎn (Chinese Character Dictionary)

For a Zìdiǎn entry, the only absolute requirement is that the character be the first thing on the entry’s top line. Since this is the only requirement, you are free to add or delete virtually anything.

Most Zìdiǎn entries, however, contain a lot of useful information, some of it organized in a particular structure. For example, by precisely specifying pronunciations, simple/full forms, and components, you can allow Wenlin to maintain indexes that keep track of a vast network of associations. These indexes, in turn, pave the way for all sorts of features. Therefore, the information that you add or delete can affect the contents of various lists, and the options you have for adding new Cídiǎn entries.

If you just want to do simple editing – if you only want to add notes and explanations – then avoid the top line, at least any notations in parentheses or in brackets, such as (S马) and (F馬) and [mǎ]; and avoid any line that starts with #, such as the components line. Advanced editing, on the other hand, involves the areas just mentioned. Certain notations, in order to be effective, must be on the top line of a Zìdiǎn entry. We define the top line as either all text up to the first newline character, or the first 1023 bytes of text, whichever comes first. If you save an entry with a longer top line, the excess is ignored when it comes to indexing. How much is 1023 bytes? It’s enough to fill between 2 and 4 displayed lines, depending on the text type and the current text size as given in the Size menu. In practice, we never approach this bound. We edit the Zìdiǎn so that, at a medium text size, most entries fit on a single line. By following this practice, you’ll keep the lists that are based on these top lines looking neat and clean.

Creating a New Zidian Entry

There are thousands of rare characters that Wenlin can display, and many don’t have dictionary entries yet. To add a new character to the dictionary, you must first locate its shape displayed somewhere in Wenlin. The list of characters by Unicode (described in Chapter 6) shows all the characters that Wenlin is capable of displaying, and shows a blank square in place of any (extremely rare) character that Wenlin can’t display. The lists of characters by radical (described in Chapter 6) is a more convenient method of finding rare characters, and includes all the Hanzi in Unicode 6.X, provided that the Hanzi filter option (described in Chapter 2) is set to All.

Wenlin can be used to design new character shapes, thought it's naturally somewhat technical; see Appendix G about CDL for more information. Nearly always, however, it will turn out that the character’s shape is already in Wenlin, it was just hard to find in the absence of a complete dictionary entry. Only when you know the character’s Unicode number, and you see that Wenlin displays a blank square for that Unicode number, can you be certain that the shape doesn’t already exist in Wenlin.

Once you locate the character, e.g., in a document or in the list of characters by Unicode,

• Try to look up the character by clicking on it
A “dummy” dictionary entry opens, including the character, a “Not found” message, and triangle buttons
• Choose Enable Editing from the File menu
The word Editing appears in the title, and everything but the character disappears

You can now type in any information you have about the character, preferably (but not necessarily) conforming to the structure described below (Section 9.4.3).

Modifying an Existing Zidian Entry

To modify a Chinese character entry:

• Open up the entry
• Choose Enable Editing from the File menu
The word Editing appears in the window’s title, and the entry’s appearance changes: the buttons vanish as well as the single-syllable words and the frequency rank (if any). Various references and notations are replaced by abbreviations.

You can now edit the entry as you would edit a document, maintaining the structure and notations described in the following section.

Structure of Zidian Entries

The character, its pronunciation(s), and its corresponding simple or full form(s), must be specified on the top line of the Zìdiǎn. Since space on the top line is limited, keep in mind that you can put detailed descriptions of compound words and single-syllable words in their own separate Cídiǎn entries.

Pronunciation

On the top line of the Zìdiǎn, indicate the character’s pronunciation(s) using pinyin enclosed in square brackets. Each syllable has its own pair of brackets; e.g., [hǎo] [hào]. There is no way for Wenlin to verify that the pinyin matches the character – you are editing the dictionary, after all! Wenlin does, however, verify that the pinyin is a valid Mandarin syllable. (The file “Bopomofo.wenlin” in the “Text” folder is a list of pinyin syllables and their bopomofo equivalents.) The pronunciations in the top line definition are automatically indexed when you save the dictionary entry, and consequently affect both phonetic conversion and lists by pinyin.

Frequency ranks

The character frequency ranks in Wenlin are already pre-assigned and not editable. For a detailed explanation of frequency ranks, see Appendix A.

Specifying Simple/Full Form Correspondences

When a character has different simple and full forms, each form has its own Zìdiǎn entry, and the two forms are cross-referenced to each other by a notation on the top line of each entry.

Follow the examples in Chapter 5 for specifying the alternate forms. Place the appropriate notations, like (S马) and (F馬), in their proper places. Make sure that you edit both the simple and full form entries so they are cross-referenced to each other.

Full and simple form entries can each have their own explanation sections. However, to avoid repeating the same explanation in both entries, it is also possible for one explanation to be no more than a pointer to the other. For example, for 妈(F媽) ‘mother’, the entry for 妈 may have this explanation:

From 女 nǚ ‘woman’ and 马(馬) mǎ phonetic.

while the entry for 媽 has this explanation:

(Explanation from the entry for the simple form 妈:) From 女 (nǚ) 'woman' and 马(馬) mǎ phonetic.

When editing is enabled, that last explanation is abbreviated as

#s

(a tic-tac-toe character and a lowercase “s”), on a line by itself.

Notations like (S马) and (F馬), and their exact placement in the top line definition, affect the automatic mappings between simple and full forms in operations such as Make Transformed Copy in the Edit menu. They also affect phonetic conversion and lists by pinyin. For example, when you list characters pronounced , either 马 or 馬 is listed (but never both), depending on whether Simple Form Characters is turned on or off in the Options menu. That is because of the (S马) and (F馬) notations.

For a more subtle example: when you list characters pronounced , both 里 and 裡 are listed if Simple Form Characters is off; but only 里 is listed if Simple Form Characters is on. This is correct behavior, since both characters can occur among full form characters, but only one can occur among simple form characters. Again, the behavior is a consequence of the notations (S马) and (F馬).

For a very complex example, you can study the dictionary entries for 干 gān and its associated forms.

References

Following the explanation of a character’s shape(s), references can be indicated, on a line that begins with the notation #r.

For example, the entry for 好 has this #r reference line:

#rG.303.08,419.01 K.AD1089,GSR1044a D.2.1028.1 L.45 M.2062

In this list, Κ. and L. (for example) are abbreviations for the names of Bernhard Karlgren and Cecilia Lindqvist; the references AD1089 and 45 refer to section or page numbers. (We’ll explain in a moment how to see a list of these abbreviations and the books they refer to.)

The abbreviations all belong on the same line as #r, separated by spaces. After you save the entry, the #r notation is replaced by a triangle button and the word references, and many abbreviations are spelled out in verbose human-readable form. For example, the above #r line displays as follows, when the entry is saved and/or editing is not enabled:

▷references: Guǎngyùn:303.08,419.01; Karlgren:AD1089,GSR1044a; Hànyǔ Dà Zìdiǎn:2.1028.1; Lindqvist:45; Mathews:2062

By clicking on the triangle button, you can open a document whose file name is “references.wenlin”, in Wenlin’s “Text” folder, which lists the books and abbreviations. If you add new references to other books, you can edit that file as well.

The Components Line

To cause a character to be indexed by certain components, list the components all on one line, preceded by the symbols #c (a tic-tac-toe sign and a small c).

For example, for the character 好, indicate the components 女 and 子, as follows:

#c女子

You can include up to nineteen components. (Practically, that’s more than enough.) After you save the entry, the #x symbols vanish, and Wenlin displays the line as follows:

▷components: 女子

Indexes are automatically updated so that lists of characters containing components will reflect your changes.

What to include on the components line, is sometimes a matter of choice. The main idea is to help anyone who might try to look up the character by choosing Characters Containing Components from the List menu. You can follow this procedure:

• Look at the whole character
• Try to divide it in two

Most characters can be divided in two from left-to-right, and many others from top-to-bottom. If the there is more than one way of dividing it in two – if it isn’t obvious which way is historically relevant – then try to do it in all ways that one might imagine.

For example, there are two obvious way to divide 章 (zhāng ‘badge’) in two:

1: 音 (yīn ‘sound’) over 十 (shí ‘ten’)
2: 立 ( ‘stand’) over 早 (zǎo ‘early’)

音 over 十 is the historically accurate analysis. On the other hand, if one didn’t know better, one might imagine it to be composed of 立 and 早. Including 音 and 十 and 立 and 早 on the components line provides a way for anyone to locate 章 if they recognize any of its apparent major components.

Although 章 could be divided in three, with 日 ( ‘sun’) in the middle, we try to divide it in two; so we don’t regard 日 as a major or essential component. Nevertheless, you could optionally include 日 on the components line as well.

3: 立 over 日 over 十

This approach is more flexible than the traditional classification by radicals, where you are forced to guess what the radical is (and waste a lot of time if you guess wrong). In Wenlin, any reasonable guess based on the above procedure can work. The goal is to help people find a character, not to hide it.

The components line allows for an efficient dictionary look-up method, and some useful lists; but it’s not a theoretical statement. Remember, the historical analysis of a character belongs in the explanation section, not on the components line.

One might reasonably ask, “If you want to help people look up characters, why stop with a division by two; why not keep diving down until you get all the components?” Consider the horizontal line, which is the character 一 ‘one’; and consider 口 kǒu ‘mouth’. Horizontal lines, and boxes, appear almost everywhere in Chinese characters. If 一 and 口 were included on the components line every time they appeared in a character, it would make the list of characters containing 一 and 口 very long indeed. This might be self-defeating, and nothing helpful would be likely to come of it (for many common purposes).

Nevertheless, Wenlin's CDL font technology does provide users with access to extremely detailed stroke- and component-level indexing. For example, radical/stroke indexes are arranged by means of CDL component (dictionary radical) and stroke-type information (the five basic stroke types). And there are even more advanced indexing options available that make use of Wenlin's CDL font technology.

Stroke-zhang1.jpg

Editing the Cidian (Dictionary of Chinese Words)

As explained in Chapter 5, there is a single Cídiǎn dictionary entry for both the simple and full form versions of a word. Both forms always appear together, but which appears first, and which appears in brackets, depends on the setting of the Simple Form Characters option at the time of look-up.

When you save a Cídiǎn entry, Wenlin takes an active role in assuring the accuracy and completeness of the entry. For example, since the characters in the word already have unique Zìdiǎn entries, Wenlin checks that the pinyin for the word matches the pinyin in the Zìdiǎn entries of the individual characters. Furthermore, if you don’t supply an alternate full or simple form version of the word, Wenlin automatically includes it in the Cídiǎn entry, basing the correspondences on those already specified in the Zìdiǎn entries.

When editing a Cídiǎn entry, you may want to have the Zìdiǎn entries of the characters in the word open on the screen for easy reference.

Creating a New Cidian Entry

Before you can create a Cídiǎn entry, each of the individual characters in the word must already have its own Zìdiǎn entry. Assuming that this is so, to create a new Cídiǎn entry:

• Try to look up the word (select it on the screen or use the Look up word command)
A new window opens whose title is the word and whose contents include the message Zero Entries, a search button, and a button that says create new entry written...
• Click on the ▷create new entry written... button
A new window appears with the title Editing new ci entry, containing a partial entry with question marks in place of the part of speech and definition.

Cidian-xukai.jpg

You can now edit the window as you would edit any document, according to the structure described below (Structure of Cidian Entries).

Modifying an Existing Cidian Entry

To modify a Cídiǎn entry:

• Look up the word (select it on the screen or use the Look Up Word command)
A new window opens with the word’s dictionary entry
• Choose Enable Editing from the File menu
The word Editing appears in the title, and the search button vanishes

Structure of Cidian Entries

The structure of Cídiǎn entries has become considerably more complex since Wenlin version 2.0, due to the merger of Wenlin with the ABC Dictionary. The additional complexity is unavoidable given the much higher quality and sophistication of the ABC Dictionary, compared with Wenlin’s original smaller, simpler, and inferior Cídiǎn.

If you wish simply to append some notes to an existing entry, the simplest method is to add a line at the very bottom with nothing but the letter h on it. Then, on subsequent lines, you can add whatever text you would like. After saving the entry, the h will be replaced by a horizontal rule, i.e., a line, and your additions will appear below the line.

If, instead, you want to add a new entry or modify the contents of an existing entry, then you will need to pay some attention to what is known as band notation.

Here is an example of a simple entry as it appears in band notation, when editing is enabled:

Cidian-lizi.jpg

Several bands are shown, among which the following five are the most important: pinyin, characters, serial-number, part-of-speech, and definition.

pinyin    ¹lìzi*
characters    例子
grade    B
serial-number    1008006443
reference    28962
part-of-speech    n.
definition    example; case; instance
example@    gěi yị̄ ge ∼
hanzi    给一个∼
translation    give an example
frequency    22.7 [XHPC:41]

When editing is not enabled, the band names disappear, and the entry looks like this:

Cidian-lizi-1.jpg

The pinyin band must precede the characters band, but when editing is not enabled, the characters normally precede the pinyin (except when you choose Words by Pinyin from the List menu).

Imagine that you knew this word also had another meaning, “cherry tree” (this is imaginary). You could add another definition band like this:

2definition    cherry tree (not really!)

(You could put any number of spaces between “definition” and “cherry”, it would make no difference.) You would also want to insert the digit “1” in front of the first definition, like this:

1definition    example; case; instance

After you saved the entry, it would be displayed like this:

例子 ¹lìzi* {B} n. ① example; case; instance ② cherry tree (not really!)

(notice the circled numbers).

Imagine, further, that 例子 were also a verb meaning “swim like a frog”. (Warning: it doesn’t really mean that!) Then you could add another part-of-speech band and another definition band. You would also need to insert another level of band numbers, as follows:

1part-of-speech   n.
11definition   example; case; instance
12definition   cherry tree (not really!)
2part-of-speech   v.
2definition   swim like a frog (not really!)

All the initial digits (called mode numbers) refer to parts of speech. The 11 in front of the first definition means “first part of speech, first definition.” (It does not mean “eleven.”) After you saved the entry, it would be displayed like this:

例子 ¹lìzi* {B} n. ① example; case; instance ② cherry tree (not really!) ◆ v. swim like a frog (not really!)

Numbers for parts of speech are not displayed except while editing, but a diamond symbol (◆) is automatically displayed preceding all but the first part of speech.

The description of band notation above is incomplete, but may be enough for making simple changes and additions. For all of the gnarly details of band notation, including a list of all the band names, see Appendix E.

Editing the Ying-Han (English-Chinese) Dictionary

Creating a New Ying-Han Entry

To create a new entry in the Yīng-Hàn dictionary:

• Try to look up the word (select it on the screen or use the Look Up Word... command)
A new window opens whose title is the word and whose contents include the message Zero Entries, a search button, and a button that says create new entry written...
• Click on the ▷create new entry written... button
A new window appears with the title Editing new English entry, containing a partial entry with question marks in place of the part of speech and definition.

Modifying an Existing Yīng-Hàn Entry

To modify an existing Yīng-Hàn dictionary entry:

• Open the entry (click on the word or use the Look up word command)
A new window opens with the word’s dictionary entry
• Choose Enable Editing from the Edit menu
The word Editing appears in the title, and the search button vanishes

Structure of Yīng-Hàn Entries

The Yīng-Hàn dictionary, like the Cídiǎn, uses band notation, which was introduced in above and is documented in detail in Appendix E. Here we will focus on those aspects of band notation that are specific to Yīng-Hàn entries.

Wenlin places these restrictions on the top line of a Yīng-Hàn entry:

  •  The band name is headword or hw (or you can omit the band name). The rest of the top line is the English word itself.
  •  The word itself can’t be more than sixty (60) letters long (antidisestablishmentarianistically is okay)
  •  After the top line, the entry must be in band notation. As with Cídiǎn entries, if you wish simply to append some notes to an existing Yīng-Hàn entry, the simplest method is to add a line at the very bottom with nothing but the letter h on it. Then, on subsequent lines, you can add whatever text you would like without needing to use band notation. After saving the entry, the h will be replaced by a horizontal rule, i.e., a line, and your additions will appear below the line.

English-Chinese dictionary entries were quite scarce in Wenlin 1.0, and hardly followed any format. The merger of Wenlin with the ABC Dictionary resulted in Wenlin’s English-Chinese dictionary looking even worse by comparison. To compensate for this deficiency, we turned the Chinese-English dictionary inside-out to produce an improved English-Chinese dictionary.

For example, the Chinese-English (汉英 Hàn-Yīng) entry

气化[氣-] ²qìhuà n. gasification

was turned inside-out automatically to produce the English-Chinese (英汉 Yīng-Hàn) entry

gasification ∾n. qìhuà 气化

The results of this procedure are much better than nothing, but they aren’t always perfect either. For example, the part of speech may describe the Chinese word better than the English word. To indicate this fact, the symbol appears before the part of speech.

When editing is enabled, the entry looks something like this:

headword    gasification
serial-number    2004497652
automatic
grade    *
part-of-speech    n.
definition    qìhuà [气化]

Notice that the parts of speech and definitions derived from the ABC Chinese-English Dictionary are preceded by automatic' indicating that the entry was generated by the inside-out process. Since Wenlin 4.0 incorporates the new ABC English-Chinese Dictionary, with much higher-quality Yīng-Hàn entries than previous editions of Wenlin, only less common vocabulary (such as gasification) still uses the automatic band.

Notice also the structure of the definition band value:

definition    qìhuà [气化]

Each definition band value starts with the pīnyīn, followed by a space and the Simplified Hànzì in square brackets.

definition    Hànzì [汉字]

(Note that this square-bracketing convention is different from that used for marking Simple[Full] relations in character bands in Cídiǎn entries.)

Dictionaries are Really Database and Index Files

You may not need to know how the dictionaries are stored as files on disk. However, if you modify the dictionaries, you might want to make backup copies of the files. Also, if you’re interested in technicalities, you might like to have a general idea of what and where Wenlin’s dictionaries really are.

The following three files (located in the “W4DB” folder) contain all the dictionary entries:

“zidian.wenlindb” (dictionary of Chinese characters)
“cidian.wenlindb” (dictionary of Chinese words and phrases)
“yinghan.wenlindb” (dictionary of English words)

(In versions of Wenlin before 4.1, the file extension was “.db” instead of “.wenlindb”.)

These are database files, not text files; you never open them directly. When you make changes to a single dictionary entry, such as the Cídiǎn entry for 你好 nǐ hǎo ‘hello’, Wenlin modifies the corresponding database record in the large file “cidian.wenlindb”. It’s more efficient to have one large file per dictionary than thousands of smaller files.

When you add or change a dictionary entry, Wenlin updates one or more of these index files:

“zidian.wenlintree”
“cidian.wenlintree”
“yinghan.wenlintree”

(In versions of Wenlin before 4.1, the file extension was “.tre” instead of “.wenlintree”.)

Every time Wenlin runs, it checks to see that the database and index files properly match. If you combine different versions of the database and index files, or edit the files outside of the Wenlin environment, you risk introducing a mismatch. In this event, Wenlin won’t run, an error message will appear on the screen, and you’ll need to re-install, or contact Wenlin Institute for help.

Note: some information about Chinese characters is stored, not in the dictionaries, but in other files. In particular, for some advanced features, see Appendix G about Character Description Language.

Qingting.jpg

蜻蜓 qīngtíng ‘dragonfly’


Mouse pointer finger right.jpg | Previous: 8. Editing Documents | Next: 10. Searching | Contents |