Beena Emerson
memis****@gmail*****
2013年 10月 28日 (月) 16:06:51 JST
On Mon, Oct 28, 2013 at 8:54 AM, Amit Langote <amitl****@gmail*****>wrote: > > Just as an example, > > > testdb=# create table ja (a text); > > testdb=# insert into ja values ('Z'); > INSERT 0 1 > > testdb=# insert into ja values ('ぁ'); > INSERT 0 1 > > testdb=# select * from ja order by 1 asc; > a > ---- > ぁ > Z > (2 rows) > > Whereas in locale "/usr/share/i18n/ja_JP": > > LC_COLLATE > ... > ... > <UFF58> > <UFF59> > <UFF5A> -> 'Z' > <U3041> -> 'ぁ' > <U3042> > ... > ... > > So, as per ja_JP locale, the order given by C locale in the last > select query is incorrect (localewise). > > Locale defines the sort order of characters right? So can we call this C locale behavior as "incorrect"? > There is still another question - do we require pg_bigm to provide > strictly "correct" sorting order in any of its functionality? > I have the same question. I feel that the way the index is ordered will not be locale dependent. Since all bigm functions show same comparison behavior, I guess there should not be any problems as the final output will not be much different except that sort order will vary. Encoding: UTF8, locale C: bi1=# SELECT show_bigm('上検'); show_bigm -------------------- {上検,"検 "," 上"} (1 row) Encoding: EUC_JP, locale C: You are now connected to database "jp_c" as user "Beena". jp_c=# SELECT show_bigm('上検'); show_bigm -------------------- {"検 ",上検," 上"} (1 row) Encoding: EUC_JP, locale ja_JP: You are now connected to database "jp_jp" as user "Beena". jp_jp=# SELECT show_bigm('上検'); show_bigm -------------------- {"検 ",上検," 上"} (1 row) Here the C locale gives different output in different encoding environment. I guess this is because the memory representation of the characters (encoding) are different. -- Beena Emerson -------------- next part -------------- HTMLの添付ファイルを保管しました...Télécharger