Difference between revisions of "Character Description Language"

From Wenlin Guide
Jump to navigation Jump to search
Line 95: Line 95:
 
<center>[[Image:Wenlin-strokingbox_adv_24049_c_s.png]]</center>
 
<center>[[Image:Wenlin-strokingbox_adv_24049_c_s.png]]</center>
  
The image above shows stroke-level CDL, with one stroke element for each of the 17 strokes of '''[U+24049](V=0)'''. This form of CDL is considerably more compact than the version with XML comments interspersed. It is also completely self-contained, and portable: all that is needed to render it is the CDL Engine, you do not need the CDL Database. (Various attributes of '''stroke''', ''cdl''' and '''comp''' elements appearing in the above illustrations but as yet undiscussed. These are introduced in the [[Character_Description_Language#Core_CDL_Resources|CDL Specification]].)
+
The image above shows stroke-level CDL, with one stroke element for each of the 17 strokes of '''[U+24049](V=0)'''. This form of CDL is considerably more compact than the version with XML comments interspersed. It is also completely self-contained, and portable: all that is needed to render it is the CDL Engine, you do not need the CDL Database. (Various attributes of '''stroke''', '''cdl''' and '''comp''' elements appearing in the above illustrations are as yet undiscussed. These are introduced in the [[Character_Description_Language#Core_CDL_Resources|CDL Specification]].)
  
 
Shift-clicking on the ▷'''cdl''' button in the above window converts the multi-line CDL description into the in-line version: all newlines are stripped, and a new window opens showing the CDL description rendered as a single character. Such in-line CDL is suitable for use in your documents when you do not want to (or cannot) store the CDL in the CDL database. Such in-line CDL may be associated with zero or more Unicode code points: if there is no suitable Unicode code point, then the description cannot be stored in the CDL database except in Private-Use Area. Such anonymous CDL descriptions can feed into the Unicode encoding process.
 
Shift-clicking on the ▷'''cdl''' button in the above window converts the multi-line CDL description into the in-line version: all newlines are stripped, and a new window opens showing the CDL description rendered as a single character. Such in-line CDL is suitable for use in your documents when you do not want to (or cannot) store the CDL in the CDL database. Such in-line CDL may be associated with zero or more Unicode code points: if there is no suitable Unicode code point, then the description cannot be stored in the CDL database except in Private-Use Area. Such anonymous CDL descriptions can feed into the Unicode encoding process.

Revision as of 19:00, 18 January 2015

Wenlin 216x93.png Appendix G of the Wenlin User’s Guide

This appendix documents features relating to Wenlin Institute’s Character Description Language (CDL), a powerful font and character description technology.


文林研究所 ‧ 字形描述语言 (字描语)

Wenlin CDL Feature Overview

◦ Wenlin’s CDL font technology is the powerhouse behind numerous commonly used features of Wenlin Software for Learning Chinese, including:

After selecting Song Hanzi (CDL) or Plain Hanzi (CDL) in the Wenlin Font Menu, all Chinese text you see in Wenlin is rendered using CDL font technology. Likewise, after you choose Monospace Pinyin in the Wenlin Font Menu, all Pinyin, English and much other text you see in Wenlin is also rendered using CDL.

◦ Wenlin’s CDL font technology also provides numerous advanced features of Wenlin Software for Learning Chinese, some of which are available when the Advanced CDL Options are enabled, including:

  • CDL glyph editing and export functions
  • CDL character variant and component analyses
  • Advanced CDL indexing features
  • Advanced Shuowen character variant analyses

The following image shows some CDL variants of the character 𤁉[U+24049] / 漢[U+6F22] / 汉[U+6C49] Hàn rendered in the CDL Stroking Box.

Han4s.png

CDL has always been a part of Wenlin, but the underlying language was invisible until Wenlin version 4.0. Now it is possible for end-users to view and manipulate the CDL description for any character-variant that can be viewed in the Stroking Box.


Wenlin Stroking Box: Advanced CDL Features

Basic features of Wenlin’s Stroking Box are described in Chapter 7. To explore advanced CDL features, users need only choose Advanced Options from the Options menu, and turn on the option labeled Enable advanced CDL (Character Description Language) features. Then, when you are viewing any character in the Stroking Box, there will be a checkbox labeled advanced, and when it is checked, additional buttons (at the lower right of the image below) will be available.

Wenlin-strokingbox adv 24049.png

Advanced Stroking Box buttons include:

  • CDL: to display the character's description in XML format.
  • Points: to show the control points for manipulating the arrangement of strokes and components.
  • EPS: to convert the character glyph into Encapsulated PostScript, an outline usable in graphics programs.
  • Strokes: to convert the description into one that uses only <stroke> elements, not <comp> elements.
  • SVG: to convert the character glyph into Scalable Vector Graphics, an outline usable in web browsers and other programs.
  • Scale: to ensure that the coordinates fit the entire grid, when editing.

In addition to the buttons listed above, if you are a Wenlin Developer, the Advanced Stroking Box may include any number of unpublished or experimental features documented only in Wenlin Source Code.

The CDL Button

  • After pushing the CDL button, the underlying CDL description appears in XML form in a new Editing CDL window (illustration below).
Wenlin-strokingbox adv 24049 cdl.png

In the above illustration, there is one top-level cdl element, with char and uni attributes associating this CDL description with a Unicode code point [U+24049]. This CDL description serves as the default representation of [U+24049](V=0): it is variant='0' and so has no explicit CDL variant attribute. Note that an explicit Unicode code point assignment in the top-level cdl element is optional: a CDL description can be associated with zero or more Unicode code point values.

Below the top-level cdl element in the above illustration, there are two indented CDL comp (component) elements. There are no CDL stroke elements at thhis level of the CDL description.

Each comp element has char, uni and points attributes. The comp and uni attributes identify the specific variant form of the component for use in this context. The points attribute determines the scale of the component, here in the default 128x128 CDL grid-space (em-square).

The Points Button

  • After pushing the Points button, the underlying CDL description also appears in a separate window (bottom of the illustration below) if it was not displayed already (but the Stroking Box remains in the foreground).

In the Stroking Box itself the control points for positioning the components appear at the upper-left and lower-right corner of each component.

Dragging any control point of a component will change the proportions of that component, and update the points attribute value accordingly, in the Editing CDL window.

Wenlin-strokingbox adv 24049 p.png

The Strokes Button

  • After pushing the Strokes button, the underlying CDL description appearing in the separate window has been converted to Stroke-Level CDL: it is now comprised of stroke elements only, with components inserted as XML comments recursively at each depth (indentation level). This is an extremely powerful feature for advanced CDL editing: new CDL descriptions with custom components can be easily created by mingling and tweaking various elements of pre-existing CDL descriptions.

If the resulting CDL description is re-loaded into the Stroking Box (by pushing the ▷cdl button), then in the Stroking Box itself after pushing the Points button again, the control points for positioning the individual strokes appear.

Dragging any control point of a stroke will change the features of that stroke instance.

Wenlin-strokingbox adv 24049 p s.png

After pushing the Strokes button, if the resulting stroke-level CDL description (with comp elements interspersed as XML comments) is again re-loaded into the Stroking Box (by pushing the ▷cdl button), pushing the Strokes button again will strip the XML comments, leaving only stroke elements.

Wenlin-strokingbox adv 24049 c s.png

The image above shows stroke-level CDL, with one stroke element for each of the 17 strokes of [U+24049](V=0). This form of CDL is considerably more compact than the version with XML comments interspersed. It is also completely self-contained, and portable: all that is needed to render it is the CDL Engine, you do not need the CDL Database. (Various attributes of stroke, cdl and comp elements appearing in the above illustrations are as yet undiscussed. These are introduced in the CDL Specification.)

Shift-clicking on the ▷cdl button in the above window converts the multi-line CDL description into the in-line version: all newlines are stripped, and a new window opens showing the CDL description rendered as a single character. Such in-line CDL is suitable for use in your documents when you do not want to (or cannot) store the CDL in the CDL database. Such in-line CDL may be associated with zero or more Unicode code points: if there is no suitable Unicode code point, then the description cannot be stored in the CDL database except in Private-Use Area. Such anonymous CDL descriptions can feed into the Unicode encoding process.

The Scale Button

  • After editing a CDL Hanzi description, you should always push the Scale button in the Stroking Box, to ensure that the CDL description completely fills the em-square.

Because CDL descriptions are built-up recursively, it is important for proper scaling and positioning of components that each sub-component completely fill its em-square. The Scale button takes care of this for you.

The EPS Button

  • Pushing the EPS button in the Advanced Stroking Box generates an Encapsulated Postscript version of the glyph, suitable for use in any application that renders EPS. Save the resulting text to a file, and open it in your EPS application.

The SVG Button

  • Pushing the SVG button in the Advanced Stroking Box generates a Scalable Vector Graphics version of the glyph, suitable for use in any application that renders SVG. Save the resulting text to a file, and open it in your SVG application.


Wenlin CDL Screenshots and Videos

CDL Screenshots:

CDL Videos:


For Wenlin CDL Developers

Wenlin CDL Technology Overview

For advanced CDL users and Wenlin Developers, the following list summarizes some of the unique features of Wenlin's CDL font technology, features which make possible all of the CDL-related features of the Wenlin Software for Learning Chinese application, and much much more.

  • CDL is the engine (C source code) behind CJK Unicode megafonts, breaking the 64K glyph barrier! (A CDL font can contain an unlimited number of glyphs.)
  • CDL is an XML application, a standards-based font and encoding technology designed for precise and compact description, rendering, and indexing of all 漢/汉 Han (Chinese, Japanese, Korean, and Vietnamese = CJKV) characters, encoded and unencoded.
  • CDL is a font database containing (to date) XML/Unicode descriptions of nearly 100,000 characters, complete Unicode 7.1 CJK character support, and more.
  • CDL adds a third dimension to the code space, with a variant mechanism for associating an unlimited number of CDL descriptions with any Unicode codepoint.
  • Each CDL description can be associated with zero or more Unicode code points, making CDL the ideal tool for extending The Unicode Standard.
  • CDL means consistent stroke/component analyses, built-in indexing and variant mappings, and high-quality graphic images as outlines convertible to SVG, PostScript, MetaFont, and more.
  • CDL is a compressed binary with an incredibly small memory footprint (~1.5 MB!), suitable for use in limited-memory mobile devices that want full Unicode CJK support.
  • CDL technology has applications for machine learning, for handwriting recognition and input methods, for optical character recognition (OCR), and most importantly for human language-learning.
  • The basic elements of CDL are a two-dimensional coordinate space, and a set of basic stroke types. Using these simple elements, CDL provides a framework for describing characters and components, and for (recursive) reuse of character and component descriptions in the descriptions of other characters and components.
  • CDL has applications beyond CJK, for organizing information underlying the rendering of any complex script.


Core CDL Resources

  • A draft CDL DTD (Document Type Definition) defining the CDL tags (elements and attributes).
  • The Unicode Standard Version 6.1 – Core Specification: Appendix F: “CJK Strokes Documentation” (all CJK glyphs in this appendix were created by the CDL team using CDL, and all text derives from the CDL Spec. and from CJK Strokes work in WG2/IRG:N3063) [2012-01-31].


Contact the CDL Development Team

If you are interested in building CDL font technology into your application, if you want to build cutting-edge CJK fonts, if you need to digitize difficult CJK texts, don't try to reinvent the wheel, don't hesitate to contact the CDL Development Team. We can help build a solution to meet your programming needs.

All CDL descriptions provided with Wenlin Software for Learning Chinese are copyright © 2012 Wenlin Institute, Inc., All Rights Reserved. To use CDL in your applications and publications, please contact Wenlin Institute. Conventional fonts can be exported from CDL by various methods, and the CDL Engine and Database are available for licensing.



Mouse pointer finger right.jpg | Previous: App. F. Remembered Options | Next: App. H. Wenlin Menu Overview | Contents |