Mahn-Soo Choi
mahn-****@uniba*****
Wed Aug 16 21:06:48 JST 2006
Hi, All. I just started to learn Spanish and naturally to write Spanish setences. Then I found a strange (to me) behavior of Emacs when it opens UTF-8 text files. I want to understand what it actually does. I have three files, named "tmp.txt", "tmp.utf", and "tmp-more.txt", respectively. They are all encoded in UTF-8. "tmp.txt" and "tmp.utf" are exactly the same. (The unix "file" utility find correctly that they are encoded in UTF-8. The unix "diff" utility finds no difference between the two files.) "tmp-more.txt" is almost the same, but has additional non-Latin characters. 1. When Emacs opens (finds) the file "tmp.txt", it looks like: Español - ¡Todo para la Educación Hispana Primaria y .. (all broken) because it opens the file in the Latin-1 coding system. 2. On the other hand, when Emacs opens the file "tmp.utf", it looks quite correct: Español - ¡Todo para la Educación Hispana Primaria y ... Okay, apparently, Emacs does something depending on the `file-coding-system-alist'. The question is: Why does Emacs ignore the coding system (UTF-8) of the file contents? 3. Even more surprsing, Emacs opens the file "tmp-more.txt" also correctly: Español - ¡Todo para la Educación Hispana Primaria y ... Plus 한글 Now, there are several things I don't understand here. I thought (naively) Latin-1 is just a subset of UTF-8 and Latin-1 encoded file should look the same either in the UTF-8 mode or Latin-1 mode. Apparently, I'm wrong? If I'm wrong, then why does Emacs ignore the coding system information of the file and open it in Latin-1 coding? FYI, I alwasy set (set-language-environment 'English) (set-default-coding-systems 'utf-8-unix) (mac-setup-inline-input-method) My Carbon Emacs Package version is: GNU Emacs 22.0.50.1 (i386-apple-darwin8.6.1) of 2006-06-16 on petit.local I will gratefully appreciate your help. Best regards, mahn-soo