The MARC-8 character repertoire is composed of the combined repertoires of several character sets, either standard or custom.  These are identified in the section Accessing Alternate Graphic Character Sets, and their encoding is specified in Part 5.

The character sets used to encode the MARC-8 repertoire were developed at a time character set design and specification was conceived in terms of fitting sets into the 128 code points available in a 7-bit matrix.  Such a set is frequently referred to as a code page.  Employing the eighth bit available in most computer architectures allows two code pages to be used at the same time.  An 8-bit environment is assumed throughout the following discussion.

Graphic character sets encoding a larger character repertoire need values of greater length than 8 bits.  The standard sets specified for use in MARC-8-encoded MARC 21 records are all of the code page type except for the East Asian Character Code, which uses a fixed length of 24 bits per code point.

An 8-bit working environment accommodates two sets of 32 control functions (C0 and C1), two code pages of 94 graphic characters apiece (G0 and G1), a space character, a delete character, and two reserved character positions (see Figure below).  The C0 and C1 control functions, and the space character can be accessed at any time as they are not affected by the designation and invocation of different graphic sets.

8-bit Code Matrix

According to Code Extension Techniques for Use with 7-bit and 8-bit Character Sets (ANSI X3.41) and its international counterpart Character Code Structure and Extension Techniques (ISO/IEC 2022), the general technique for the use of code sets requires first the designation of the sets, then the invocation of a designated set as the working set.  For 8-bit codes, two sets of control functions and four graphic character sets may be designated at any given time.  The designated sets of control functions are called the C0 and C1 sets.  The designated sets of graphic characters are called the G0, G1, G2, and G3 sets.  Two Cn and two Gn sets may be in an invoked, working set status at any given time.  If, for example, a specific character set is designated as the G0 set and invoked as the working set, in order to change a working set either another character set must be designated as the G0 set, or another character set must be designated as set G1, G2 or G3 and that set invoked as a working set.  The following sections specify the designation and invocation of code sets in MARC-8-encoded MARC 21 records.

Graphic character sets in MARC-8 encoding

ASCII graphics are the default G0 set and ANSEL graphics are the default G1 set for MARC 21 records.  ASCII graphics are invoked as the working set for codes 21(hex) through 7E(hex).  ANSEL, a graphic character set of letters, symbols and combining marks complementing ASCII, is designated as the graphic G1 set, invoked as the working set for codes A1(hex) through FE(hex).  These are the default working sets for data transcribed in the fields and subfields unless other default sets are specified in the record field 066 (Character Sets Present).  Additional graphic character sets may also be accessed using techniques described below.  Upon exit from a subfield, ASCII must be designated the G0 set.

There are two special character positions in every "G" code block; one at the beginning (20(hex) in a G0 set or A0(hex) in a G1 set) and one at the end (7F(hex) in a G0 set or FF(hex) in G1 set).  The space character occupies code point 20(hex).  The space is used as a graphic character in all parts of a MARC 21 record and is universally recognized by the standard MARC-8 character sets.  This character is also referred to as "blank" in MARC 21 documentation.  The delete character occupies the second of the two special character positions (7F(hex)) in a G0 set.  It is a control character that is not used in MARC 21 records.  A0(hex) and FF(hex) are reserved values and must not be used in MARC-8-encoded records.

Control function code sets

The C0 and C1 control function code sets are fixed for MARC 21 records using the MARC-8 encoding.  They are thus designated and invoked by default and need not be designated and invoked in the record.  More information about control function code use in MARC 21 records is given in Part 1.

To return, select:

Part 2:  MARC-8 Encoding Environment

Character Sets and Encoding Options