[macemacsjp-english 626] Opening UTF-8 text files

Back to archive index

Mahn-Soo Choi mahn-****@uniba*****
Wed Aug 16 21:06:48 JST 2006


Hi, All.

I just started to learn Spanish and naturally to write Spanish setences.
Then I found a strange (to me) behavior of Emacs when it opens UTF-8
text files.  I want to understand what it actually does.

I have three files, named "tmp.txt", "tmp.utf", and "tmp-more.txt",
respectively.  They are all encoded in UTF-8.  "tmp.txt" and "tmp.utf"
are exactly the same. (The unix "file" utility find correctly that they
are encoded in UTF-8. The unix "diff" utility finds no difference
between the two files.)  "tmp-more.txt" is almost the same, but has
additional non-Latin characters.

1. When Emacs opens (finds) the file "tmp.txt", it looks like:

     Español - ¡Todo para la Educación Hispana Primaria y ..

   (all broken) because it opens the file in the Latin-1 coding system.


2. On the other hand, when Emacs opens the file "tmp.utf", it looks
   quite correct:

     Español - ¡Todo para la Educación Hispana Primaria y ...

Okay, apparently, Emacs does something depending on the
`file-coding-system-alist'.  The question is: Why does Emacs ignore the
coding system (UTF-8) of the file contents?

3. Even more surprsing, Emacs opens the file "tmp-more.txt" also
   correctly:

     Español - ¡Todo para la Educación Hispana Primaria y ... Plus 한글

Now, there are several things I don't understand here.  I thought
(naively) Latin-1 is just a subset of UTF-8 and Latin-1 encoded file
should look the same either in the UTF-8 mode or Latin-1 mode.
Apparently, I'm wrong?  If I'm wrong, then why does Emacs ignore the
coding system information of the file and open it in Latin-1 coding?

FYI, I alwasy set

(set-language-environment 'English)
(set-default-coding-systems 'utf-8-unix)
(mac-setup-inline-input-method)

My Carbon Emacs Package version is:
GNU Emacs 22.0.50.1 (i386-apple-darwin8.6.1) of 2006-06-16 on petit.local

I will gratefully appreciate your help.

Best regards,

mahn-soo


More information about the macemacsjp-english mailing list
Back to archive index