Monday, 17 January 2011

LaTeX documents with utf-8 foreign accents, Word quotation marks etc.

And then, all of a sudden, this post got a lot easier...

I was having some trouble with latex documents. Many of the documents I make have different writers, who hand in their articles in Word documents. They use foreign characters, accents and quotation marks that LaTeX does not understand. When you paste such a content in a .tex file, (encoded as utf-8), the source looks alright. However, they do not appear in the generated pdf, div or ps. Latex does not give an error, it simply skips over the unrecognized characters. To overcome this problem I set out to make a preprocessor (which would simply scan the latex document and replace all the accents with \'{} and graves with \`{} etc.) and started looking for a list of all foreign characters in latex.

Then I found that latex has an option for utf-8 encoding. It was here that this post got easy. Simply put
\usepackage[utf8]{inputenc}
in your document and you're done.