[Pgbigm-hackers] Regarding C locale restriction

Back to archive index

Beena Emerson memis****@gmail*****
2013年 10月 28日 (月) 16:06:51 JST


On Mon, Oct 28, 2013 at 8:54 AM, Amit Langote <amitl****@gmail*****>wrote:

>
> Just as an example,
>
>
> testdb=# create table ja (a text);
>
> testdb=# insert into ja values ('Z');
> INSERT 0 1
>
> testdb=# insert into ja values ('ぁ');
> INSERT 0 1
>
> testdb=# select * from ja order by 1 asc;
>  a
> ----
>>> (2 rows)
>
> Whereas in locale "/usr/share/i18n/ja_JP":
>
> LC_COLLATE
> ...
> ...
> <UFF58>
> <UFF59>
> <UFF5A>   -> 'Z'
> <U3041>    -> 'ぁ'
> <U3042>
> ...
> ...
>
> So, as per ja_JP locale, the order given by C locale in the last
> select query is incorrect (localewise).
>
> Locale defines the sort order of characters right?
So can we call this C locale behavior as "incorrect"?


> There is still another question - do we require pg_bigm to provide
> strictly "correct" sorting order in any of its functionality?
>

I have the same question. I feel that the way the index is ordered will not
be locale dependent. Since all bigm functions show same comparison
 behavior,  I guess there should not be any problems as the final output
will not be much different except that sort order will vary.

Encoding: UTF8, locale C:
bi1=# SELECT show_bigm('上検');
     show_bigm
--------------------
 {上検,"検 "," 上"}
(1 row)

Encoding: EUC_JP, locale C:
You are now connected to database "jp_c" as user "Beena".
jp_c=# SELECT show_bigm('上検');
     show_bigm
--------------------
 {"検 ",上検," 上"}
(1 row)

Encoding: EUC_JP, locale ja_JP:
You are now connected to database "jp_jp" as user "Beena".
jp_jp=# SELECT show_bigm('上検');
     show_bigm
--------------------
 {"検 ",上検," 上"}
(1 row)

Here the C locale gives different output in different encoding environment.
I guess this is because the memory representation of the characters
(encoding) are different.
--
Beena Emerson
-------------- next part --------------
HTMLの添付ファイルを保管しました...
Télécharger 



Pgbigm-hackers メーリングリストの案内
Back to archive index