MARC 21 encoding marker

In a MARC-8-encoded MARC 21 record, Leader character position 9 (Character coding scheme) must contain a space character (20(hex)).

MARC field 066

Field 066 (Character Sets Present) is used to indicate the MARC-8 character sets other than the default sets that are invoked in the record.  (See MARC 21 Format for Bibliographic Data.)  Whenever standard alternate graphic character sets accessed using Technique 2 (see Accessing Alternate Graphic Character Sets) are used in a MARC 21 bibliographic record, field 066 must appear in the record.  The alternate graphic character sets are identified in subfield $c of field 066 to assist machine processing.  However a record should not have a 066 field if only Technique 1 escape sequences are used in it.

Combining characters (diacritics)

Graphic combining characters are always used in conjunction with other graphic characters, functionally referred to as base characters.  More than one combining character may be associated with one base character.  As noted in Part 1, MARC 21 uses ANSEL combining characters rather than similar appearing ASCII characters, which are not combining characters, to encode diacritics associated with alphabetic characters.  The combining marks that are used in conjunction with base characters appear in the ANSEL character set in code point range E0-FE(hex) (G1 set).  The Greek, the Hebrew, and the Basic and Extended Arabic character sets also include some MARC-8 combining marks.  In a MARC-8 encoded character string, these combining characters precede the base character that they modify. When a graphic character in MARC-8 encoding requires multiple combining characters, they are entered in the order in which they appear, reading left to right (or right to left with right-to-left scripts) and top to bottom.

Directionality of text

The contents of a field in a MARC 21 record using the MARC-8 encoding are always recorded in their logical order, from the first character to the last, regardless of the directionality of the text being recorded.  When data in a subfield are written in a bidirectional script (such as Arabic or Hebrew), the subfield delimiter/code pair (always a left-to-right sequence) is followed by the escape sequence which invokes the character set for the script and then immediately by the logically first character of the text (i.e., exactly the same as for text in a left-to-right script).  The first character of text in a bidirectional script does not occur at the end of the field just before the field terminator.  An example of a field with bidirectional script data is given at the bottom of the following section.

To return, select:

Part 2:  MARC-8 Encoding Environment

Character Sets and Encoding Options