Northwestern University Library

Humanities Computing Testbed

The future can be best predicted by inventing it.
October 2004 Archives

October 28, 2004

Introductory LaTeX Literature

As I am preparing a LaTeX seminar, I am evaluating several free introductory texts. I will have to pick one of them, but here is a list of several good candidates:



  • The Not So Short Introduction To LaTeX2e is a classic that has grown from a document that was indeed short. This excellent text is freely available in no less than 17 languages.

  • A Beginner's Introduction To Typesetting With LaTeX is another excellent text, and it is more structured for use in a course. It also includes some more detail on installation, complete with useful features like screen shots of various editors.

  • Essential LaTeX is a very concise overview of LaTeX's most important features.

  • Making TeX Work is a slightly outdated book originally published by O'Reilly that deals with the details of getting a (La)TeX installation to work.

  • sampler.pdf shows a number of fonts commonly used with LaTeX.

October 26, 2004

LaTeX Seminar Announcement

Download the Announcement

October 05, 2004

The Unicode in the Humanities FAQ

Here are some common questions about Unicode. I will expand this list as I get new questions.

This FAQ doesn’t answer my Question! What do I do?

Visit the Humanities Computing Testbed or email Oliver your question and the answer will be included here.

What is Unicode?

Unicode is a standard that allows us to encode many, and in theory all, scripts of the world using the same encoding method. Historically, there have been many different character encodings, even for the same language. English and other languages written in the Latin script are often encoded in ASCII, but some large IBM computers use the EBCDIC encoding. Ten different mutually incompatible encodings are necessary for the languages of Europe written in the Latin script alone; the last addition was a new encoding that contains the Euro sign. In addition to the multitude of standard encodings, many computer programs use their own encodings, incompatible with anything else. Whenever you wanted to move a file from one computer to another that expects a different encoding you had to reencode it, possibly leading to the loss of data if the source encoding does not map unambiguously to the target encoding. Unicode promises a lasting solution to this problem:

Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.”

Where can I find more information on Unicode?

Here are some documents you might find interesting:

  • What is Unicode? A very short non-technical introduction to Unicode. On this page you'll also find links to translations of this document in many different languages, such as Swedish or Thai, that you can use to check whether your web browser or another application you're using displays a certain character correctly.

  • The Unicode® Standard: A Technical Introduction is a very short technical introduction to Unicode and the UTF formats used to represent it.

  • Where is my Character? contains useful information if you want to represent a certain character on your computer with Unicode (and Unicode almost certainly will have a way to represent your character) but don't know how that character is encoded.

  • And finally there's the Unicode standard itself. This is not a work you want to be consulting in your everyday work, but it is the authoritative source on all things Unicode.

What are some common difficulties of working with Unicode?

  • The 8-bit encoding (or the Y2K) problem all over again: Unicode in its full glory uses a number between 0 and 1114111 for each character. Now 1114111 is a fairly large number, and most makers of supposedly Unicode-compliant software don’t think that you really need that many different characters. Therefore they only support the first 65536 (corresponding to 16 bits) characters of the Unicode encoding. These first 65536 characters contain what is called the Basic Multilingual Plane, and it contains all the characters necessary for all of the world’s common languages. However, the characters for some rare languages of mainly scholarly interest, such as Gothic or Shavian, are assigned numbers outside of the Basic Multilingual Plane, and programs that call themselves ‘Unicode-compliant,’ but really support only the BMP, won’t be able to handle these languages.

  • There may be more than one representation for a given character. This makes it difficult to search for a given text. For example, the character ‘Ä’ may be represented by number 196 (Unicode inherited the first 255 characters from Latin-1) or it may be represented by the character ‘A’ with the number 65 followed by the combining diaeresis character with the number 776. If you want to search and replace in Unicode-encoded texts, you might want to consider ‘normalizing’ your Unicode, i.e. making sure that each appearance of a character is represented in the same way. The unicode.org FAQ has a section on Unicode normalization.

  • One glyph, as in the example with the letter ‘Ä,’ might be represented by a sequence of characters. Therefore you cannot simply cut and paste sequences of Unicode; if you cut and pasted the combining diaeresis without the letter ‘A’ it would combine with whatever other letter it can find. In addition, if your Unicode-text is encoded in UTF-8 or UTF-16, you cannot rely on byte or word boundaries being character boundaries. To avoid these problems, only manipulate Unicode-encoded text with programs or libraries that can take care of all of the Unicode-handling for you.

How can I enter strange characters on my Computer?

  • If you want to enter a rare character every once in a while and you’re using one of the usual office suites, the menu point ‘Insert/Special Character’ will give you a list of all the odd characters your font can display and lets you select the character you want.

  • If you want to enter text in one language not covered by the U.S. Keyboard very often, you might want to switch keyboard layouts. All modern graphical user environments allow you to switch from one keyboard layout to another, usually by pressing a certain key combination or by clicking on a little flag icon somewhere on the screen. If you don’t like the idea of having to type Greek on an American keyboard, you might consider plugging a Greek USB keyboard into your computer in addition to the American one.

  • There’s also a clever way of entering unusual characters that originated in the Unix world, but is also available for other graphical user environments: You have a key called ‘Compose’ on your keyboard; or, since most likely you don’t really have that key, you define another one, such as the right ‘Ctr’ key to be your ‘Compose’ key. In order to enter a character that looks like a combination of two other characters, you press ‘Compose’ and then the keys of the characters of which your desired character is composed. Thus, ‘Compose’ + ‘A’ + ‘'’ gives the letter ‘Á,’ ‘Compose’ + ‘e’ + ‘`’ gives the letter ‘è,’ ‘Compose’ + ‘s’ + ‘s’ gives the letter ‘ß,’ and so on.