There are currently two techniques to access an alternate graphic character set in MARC 21 records using the MARC-8 encoding.  One way is a MARC-specific technique for accessing a small number of characters; the other involves using standard escape sequences to access various registered character sets.  Below is an explanation of these two techniques.

Technique 1:  Using custom sets for Greek Symbols, Subscript, and Superscript characters

Three Greek Symbols (alpha, beta, and gamma), fourteen Subscript characters, and fourteen Superscript characters have been placed in three custom graphic sets, created exclusively for MARC, that are accessed by a locking escape sequence.  The technique for accessing these characters is outside the framework specified in ANSI X3.41 or ISO 2022.  These three custom sets are designated as sets in code point ranges 21(hex) through 7E(hex) by means of a two-character sequence consisting of the escape character and an ASCII graphic character.  The specific escape sequences for the three custom sets are:

ESCg (ASCII 1B(hex) 67(hex)) for the Greek symbol set

ESCb (ASCII 1B(hex) 62(hex)) for the Subscript set

ESCp (ASCII 1B(hex) 70(hex)) for the Superscript set

When one of these character sets is invoked using the escape sequence, the escape is locking which means that all characters following the escape sequence are interpreted as being part of the newly designated character set until another escape sequence is encountered.  This follow-on escape sequence may redesignate ASCII or designate another custom character set.  To redesignate ASCII, the following two-character escape sequence is used:

ESCs (ASCII 1B(hex) 73(hex)) for ASCII default character set

The use of the three characters in this Greek Symbol set is discouraged as they present mapping difficulties.  (See Part 4, Special Mapping Issues.)

Technique 2:  Using standard alternate graphic character sets

All other alternate graphic character sets should be designated and invoked in accordance with ANSI X3.41, Code Extension Techniques for Use with 7-bit and 8-bit Character Sets or its international counterpart ISO 2022.  Consult these standards for a complete specification of the techniques.  The following discussion is simplified, but covers all cases pertaining to MARC-8 practice.

At the present time, additional sets are accessed through designation as either G0 (codes 21(hex) through 7E(hex)) or G1 (codes A1(hex) through FE(hex)).  Alternate graphic character sets are designated and invoked by means of a multiple character escape sequence consisting of the ESCAPE character, an Intermediate character sequence, and a Final character in the form ESC I F, where:

ESC is the ESCAPE character (ASCII 1B(hex))

I is the Intermediate character sequence, which may be one or more characters in length and indicates whether the set is designated as the G0 set or the G1 set and whether the set has one byte or multiple bytes per character, and, if the Intermediate sequence contains more than one character, additional information.  The following values may be used for the Intermediate character sequence:

To designate as the G0 set:

For a set with one byte per character

I = 28(hex) [ASCII graphic: ( ]

or I = 2C(hex) [ASCII graphic: , ].

For a set with multiple bytes per character

I = 24(hex) [ASCII graphic: $ ]

or I = 24(hex) 2C(hex) [ASCII graphics: $ , ].

To designate as the G1 set:

For a set with one byte per character

I = 29(hex) [ASCII graphic: ) ]

or I = 2D(hex) [ASCII graphic: - ].

For a set with multiple bytes per character

I = 24(hex) 29(hex) [ASCII graphics: $ ) ]

or I = 24(hex) 2D(hex) [ASCII graphics: $ - ].

F is the Final character in the escape sequence, which identifies the graphic character set being designated.  The codes for Final characters are assigned by the registration authority of the International Organization for Standardization (ISO) for many sets.  These sets are assigned codes in the range 40(hex) through 7E(hex); other character sets intended for local use may be assigned a code outside this range.  The Final characters for alternate graphic character sets approved for use in MARC-8-encoded MARC 21 are the following:

33(hex) [ASCII graphic: 3] = Basic Arabic

34(hex) [ASCII graphic: 4] = Extended Arabic

42(hex) [ASCII graphic: B] = Basic Latin (ASCII)

21(hex)45(hex) [ASCII graphics: !E] = Extended Latin (ANSEL) (the 21(hex) technically is a second character of the Intermediate segment of this escape sequence.)

31(hex) [ASCII graphic: 1] = Chinese, Japanese, Korean (EACC)

4E(hex) [ASCII graphic: N] = Basic Cyrillic

51(hex) [ASCII graphic: Q] = Extended Cyrillic

53(hex) [ASCII graphic: S] = Basic Greek

32(hex) [ASCII graphic: 2] = Basic Hebrew

Use and placement of escape sequences

Escape sequences to designate alternate graphic character sets may occur wherever the alternate characters are needed, e.g., within a word, at the beginning of a subfield, or in the middle of a subfield.  However, the escape sequence never replaces a space.

Escape sequences are locking.  The alternate graphic character set remains designated as the Gn set until another graphic character set is designated.  If the ASCII graphics have been displaced as the G0 set within a subfield, ASCII graphics must be designated as the G0 set before a subfield delimiter or field terminator.  Some alternate character sets include separately defined marks of punctuation that duplicate those defined in ASCII.  They may be used when the alternate graphics are used.  (See Part 4, Special Mapping Issues.)

Example:

E = ESCAPE control function code (1B(hex))

( = set is designated as the G0 set and has one byte per character

N = Basic Cyrillic character set

B = ASCII default character set

When the text of a field which has an indicator for nonfiling characters begins with an escape sequence, the bytes in the escape sequence are not included in the count of nonfiling characters.

Example of a field containing bidirectional script data:

Order of data as it might be displayed

Order of data in MARC record

E =  ESCAPE control function code (1B(hex))

( = set is designated as the G0 set and has one byte per character

2 = Hebrew character set

B = ASCII default character set

To return, select:

Part 2:  MARC-8 Encoding Environment (Character Sets)

Character Sets and Encoding Options