Introduction (Character Sets)

MARC 21 records intended for broad, standard interchange must employ either of two character encoding schemes. Only one of them may be used within a single record. The encoding now known as MARC-8 was introduced in 1968 with the beginning of the use of the MARC format. Over the years it has grown to include code points for a large repertoire of characters including Latin, Cyrillic, Arabic, Hebrew, and Greek scripts and over 15,000 characters used in writing Chinese, Japanese and Korean. The MARC-8 encoding is derived primarily from a collection of international standard character sets. These are identified in Part 2. The total collection of characters that can be represented in MARC-8 encoding is called the MARC-8 character repertoire. This extensive repertoire is adequate for many library environments. No further additions will be made to it.

Alternatively, the Universal Character Set (UCS or ISO/IEC 10646) encoding may be used. Its first version was published in 1993. As the name implies, the UCS aims to provide, in a single system, code points for the characters of all written languages. At present it includes over 100,000 characters used in dozens of scripts. ISO/IEC 10646 was developed in conjunction with the Unicode Consortium, an international group of industries, educational institutions, government agencies, etc. The consortium provides the primary energy for maintenance and expansion of the UCS. For that reason the UCS is frequently called Unicode. In this specification the terms UCS/Unicode, UCS, and Unicode may be considered synonymous when referring to the standard, either as encoding or as repertoire.

With the constantly growing adoption of the UCS/Unicode standard it will become a preferred option also for libraries. Conversions to Unicode have already taken place in many large library systems. When UCS/Unicode encoding is used in MARC 21, characters are expressed in the UCS transformation format, UTF-8. More information is given in Part 3.

Part 1 provides guidelines for character set handling in MARC 21 records that is common to both the MARC-8 and UCS/Unicode encoding environments.

Part 2 specifies the handling of character sets within the MARC-8 environment.

Part 3 describes encoding in the UCS/Unicode environment.

Part 4 specifies the issues involved in converting back and forth between the MARC-8 environment and repertoire and the UCS/Unicode environment and repertoire.

Part 5 specifies, in the form of code tables, the MARC-8 repertoire and its encodings.

To return, select:

Character Sets and Encoding Options