Voir skf_1.93_man.txt

OSDN > Trouver un logiciel > skf - simple kanji filter > Docs
Catégorie (Tag) arbre

root
déposer de l'information

Catégorie (Tag): root
nom de fichier: skf_1.93_man.txt
dernière mise à jour: 2005-04-13 22:59
type: Plain Text
Editeur: Seiji Kaneko
description: Man page for skf (formatted plain text version)
l'historique des versions 1 montrer les différences de versions sélectionnées
langue: Anglais; Traduire
SKF(1)                                                     SKF(1)



NAME
       skf - simple Kanji Filter (v1.93)

SYNOPSIS
       skf  [-AEIJKNQRSXZabdehjknqrsuvxz] [ long_format_options ]
       [infiles..]

DESCRIPTION
       skf is a yet another i18n capable  kanji-filter,  designed
       for  reading  various CJK-coded files on the Net.  It con-
       verts input kanji texts or streams into a character stream
       using  designated codeset and output them to standard out-
       put. Specifically, skf is designed to be a versatile  fil-
       ter  to  read documents in various code sets, and does not
       have fancy features which are not directly related to code
       conversion.

       Like  nkf,  skf  automatically  recognizes input file code
       when it is a kind of ISO-2022  compliant  code,  and  also
       detects  EUC-variant  codes if input file is Japanese text
       without X0201 kanas.  skf 1.9x can read  various  iso-2022
       compliant charsets, including JIS Kanji code (X0208, X0212
       and X0213), EUC encoding (euc-jp  (with  x-0213  support),
       euc-cn,   euc-kr   and   euc-tw),   ISO   Europian  latins
       (ISO-8859-1 to 11, 13/14/15/16), BS 4730, NF Z 62-010  and
       X0201  kana  with  ESC-(-I,  SS0, Locking shift.  skf also
       supports  some  non-iso2022  compliant   sets,   including
       Microsoft  Shift-JIS  code,  KOI-8-R/U, GB2312 (HZ), big5,
       VISCII(rfc1456,   include    VIQR),    Unicode    standard
       (UCS2/UTF-16,  UTF7 and UTF8), some of MS codesets (cp1250
       etc.) and some other vendor specific  codes  (KEIS83,  JEF
       etc).

       Supported  output  codesets  include  X-0208/X-0212/X-0213
       JIS,  X-0201  JIS,  ASCII,   Microsoft   Shift-JIS,   EUC-
       jp/-kr/-cn,  HZ, iso-2022-jp/kr, big5, VISCII and Unicode.

       skf also provide some basic  decoding  features  for  some
       common encodings (MIME, Punycode and URI codepoint).

       Unlike  nkf,  skf  is  designed to convert input code into
       some kind of human-readable form under a local environment
       (i.e.  codeset), and has several extra conversion features
       like GNU recode.  Such conversions include  Windows/Macin-
       tosh  specific  code  swap  and  old-new jis glyph change,
       html-format/TeX format  conversion  and  variant  unifica-
       tions.

       If  file  name(s) are given, skf read the files and output
       converted stream to stdout. If no file  names  are  given,
       input  is  taken from stdin and output to stdout.  OPTIONS
       are taken from Environment Variables  SKFENV,  skfenv  and
       command  line,  respectively  in  this  order. Environment
       variables are not used when skf is running as  priviledged
       user.   skf  does not use LOCALE-related environment vari-
       ables for conversion, but output error messages  are  con-
       trolled by given LOCALES.

OPTIONS
       skf-1.9 is written from scratch, and inherits no code from
       nkf. However, skf is intended to be a drop-in  replacement
       for  nkf(v1.4)  and has a similar commonly-used nkf option
       set.
       skf 1.9x recognizes following options.  Defaults  are  all
       off if not explicitly specified.

   buffering control
       -b     use buffered output. This is default.

       -u     use  unbuffered  output.  This  option  spoils code
              detection feature.

   Input/Output codeset options
       --ic=  input_code_set
              specify input codeset is input_code_set.   Possible
              candidates are shown below.

       --oc=  output_code_set
              specify  output codeset is output_code_set.  Possi-
              ble candidates are shown below. Default codeset  in
              distribution package is euc-jp, but depends on com-
              pile option. Default codeset is shown by

     Supported codeset
       skf recognize following codesets as an input/output  code-
       set.  These  codeset names are case insensitive. Note that
       iso-2022 escape-based input codeset (registered  to  IANA)
       is  recoginized  automatically,  and for this reason, some
       codeset is treated as same when specified as input.  o  in
       in-column  means  named  codeset can be specified as input
       and x means named codeset is not for input.  output-column
       is same except it is for output.

       in out  name            description
       o  o    iso8859-1       ascii + iso-8859-1 (latin-1)
       o  o    iso8859-2       ascii + iso-8859-2 (latin-2)
       o  o    iso8859-3       ascii + iso-8859-3 (latin-3)
       o  o    iso8859-4       ascii + iso-8859-4 (latin-4)
       o  o    iso8859-5       ascii + iso-8859-5 (Cyrillic)
       o  o    iso8859-6       ascii + iso-8859-6 (Arabic)
       o  o    iso8859-7       ascii + iso-8859-7 (Greek)
       o  o    iso8859-8       ascii + iso-8859-8 (Hebrew)
       o  o    iso8859-9       ascii + iso-8859-9 (latin-5)
       o  o    iso8859-10      ascii + iso-8859-10 (latin-6)
       o  o    iso8859-11      ascii + iso-8859-11 (Thai)
       o  o    iso8859-13      ascii + iso-8859-13 (Baltic Rim)
       o  o    iso8859-14      ascii + iso-8859-14 (Celtic)
       o  o    iso8859-15      ascii + iso-8859-15 (Latin-9)
       o  o    iso8859-16      ascii + iso-8859-16
       o  o    koi-8r          koi-8r (Russian)
       o  o    cp1251          Cyrillic latin MS cp1251
       o  o    jis             iso-2022-jp (rfc1496 7bit JIS)
       o  o    jis-x0213       iso-2022-jp-3 (JIS X-0213(2000))
       o  o    jis-x0213-strict iso-2022-jp-3-strict
       o  o    jis-x0213-2004  iso-2022-jp-2004(JIS X-0213(2004))
       o  o    oldjis          iso-2022-jp-1978(JIS X-0208(1978))
       o  o    euc-jp          EUC-encoded JIS X-0208(1997)
       o  o    euc-x0213       EUC-encoded JIS X-0213(2000)
       o  o    euc-jis-2004    EUC-encoded JIS X-0213(2004)
       o  o    euc-kr          EUC-encoded KS X-1001 Korian
       o  o    euc7-kr         7bit EUC-encoded KS X-1001 Korian
       o  o    johab           KS X-1001-johab Korian
       o  o    euc-cn          EUC-encoded GB2312 chinese
       o  o    euc7-cn         7bit EUC-encoded GB2312 chinese
       o  o    hz              HZ-encoded GB2312 chinese
       o  o    euc-tw          EUC-encoded CNS 11643 chinese
       o  o    gb12345         EUC-encoded GB12345 chinese
       o  o    gbk             GB2312 Extension (cp936)
       o  o    big5            BIG5 (with Eten extension + EURO)
       o  o    big5-cp950      BIG5 (Microsoft cp950 + EURO)
       o  o    sjis            Shift-jis (Microsoft cp943)
       o  o    sjis-x0213      Shift-jis-encoded JIS X-0213(2000)
       o  o    sjis-x0213-2004 Shift-jis-encoded JIS X-0213(2004)
       o  x    sjis-cellular   Shift-jis-encoded JIS X-0208
                        with NTT Docomo, Vodafone phone glyph
       o  o    cp932           Shift-jis-encoded MS cp932
       o  o    viscii          VISCII (rfc1456) Vietnamise
       o  o    viqr            VISCII (rfc1456-VIQR) Vietnamise
       o  o    keis            Hitachi KEIS83/90
       o  x    jef             Fujitsu JEF (basic support only)
       o  o    ucs2            Unicode(TM) UCS-2/UTF-32LE
       o  o    utf7            Unicode(TM) UTF-7
       o  o    utf8            Unicode(TM) UTF-8
       o  x    transparent     Transparent mode (see below)


     Codeset explanations
       iso-8859-*
              a.k.a. latin*. When specified as output, G0 = GL is
              ascii and G1 = GR is iso-8859-*. 8bit  encoding  is
              used.

       iso-2022-jp, jis
              Encoding is iso-2022-jp-2 (RFC1496). G0 = GL is JIS
              x0201 roman, G1 = GR  is  JIS  x0201  kana,  G2  is
              iso-8859-1 and G3 is JIS x0212 Supplementary Kanji.

       jis-x0213
              Encoding is iso-2022-jp-3. G0 =  GL  is  JIS  x0201
              roman, For output, G1 = GR is JIS x0201 kana, G2 is
              iso-8859-1 and G3 is JIS x0213 plane2 Kanji.

       jis-x0213-strict
              Encoding is subset  of  iso-2022-jp-3-strict  (uses
              Plane  1  only).  For  output, G0 = GL is JIS x0201
              roman, G1 = GR is JIS x0201 kana, G2 is  iso-8859-1
              and  G3  is not set. Output code as JIS x0208 when-
              ever possible. JIS X-0213  input  is  automatically
              recognized.

       jis-x0213-2004
              Encoding  is iso-2022-jp-2003(2004). For output, G0
              = GL is JIS x0201 roman, G1 = GR is JIS x0201 kana,
              G2  is iso-8859-1 and G3 is JIS x0213 plane2 Kanji.

       oldjis Encoding is iso-2022-jp (JIS X-0208(1978)). G0 = GL
              is  JIS  x0201 roman, G1 = GR is JIS x0201 kana, G2
              is iso-8859-1 and G3  is  JIS  x0212  Supplementary
              Kanji.

       euc-jp, euc
              Encoding is 8-bit EUC using JIS X0208(1997) charac-
              ter set.  G0 = GL is ascii, G1 = GR is  JIS  x0208,
              G2 is JIS x0201 kana and G3 is JIS x0212 Supplemen-
              tary Kanji.

       euc-x0213
              Encoding is 8-bit EUC-based JIS X0213(2000).  G0  =
              GL  is  ascii,  G1  =  GR  is  X0213 plane 1, G2 is
              iso-8859-1 and G3 is JIS x0213 plane2 Kanji.

       euc-jis-2004
              Encoding is 8-bit EUC-based JIS X0213(2004).  G0  =
              GL  is ascii, G1 = GR is X0213(2004) plane 1, G2 is
              iso-8859-1 and G3 is JIS x0213 plane2 Kanji.

       euc-kr Encoding is 8-bit EUC using KS X-1001 Wansung char-
              acter  set.   G0  =  GR  is KS X1003, G1 = GR is KS
              X1001, G2 and G3 is not set.

       euc7-kr iso-2022-kr
              Encoding is iso-2022-kr (rfc1557). 7-bit EUC  using
              KS  X-1001  Wansung  character  set.  G0 = GR is KS
              X1003, G1 is KS X1001, G2 and G3 is not set.

       euc-cn Encoding is 8-bit EUC using GB 2312 character  set.
              G0  = GR is GB1988, G1 = GR is GB2312, G2 and G3 is
              not set.

       euc7-cn
              Encoding is 7-bit EUC using GB 2312 character  set.
              G0  =  GR is GB1988, G1 is GB2312, G2 and G3 is not
              set.

       hz     Encoding is HZ encoded (rfc1842) GB 2312  character
              set.   G0 = GR is GB1988, G1 = GR is GB2312, G2 and
              G3 is not set.

       euc-tw Encoding is EUC encoded CNS11643  Plane1/2.  Subset
              of  iso-2022-cn.   G0  =  GR  is  ascii, G1 = GR is
              CNS11643 plane 1, G2 is CNS11643 plane 2 and G3  is
              not set.

       gb12345
              Encoding  is 8-bit EUC using GB 12345 (GBF) charac-
              ter set.  G0 = GR is GB1988, G1 = GR is GB12345, G2
              and G3 is not set.

       gbk    Encoding  is GBK (a.k.a. cp936).  G0 = GR is GB1988
              and G1 = GR is GBK. G2 and G3 is not set.

       big5   Encoding is Big5 with ETen extension. Include  Euro
              mapping.  Uses ascii as latin part.

       big5-cp950
              Encoding is Big5 (cp950) character set.  Uses ascii
              as latin part.

       VISCII (experimental)
              Vietnamise VISCII (rfc1456). Not TCVN-5712.

       VIQR (experimental)
              Vietnamise VISCII with VIQR encoding(rfc1456).

       sjis   Encoding is Shift-encoded JIS X0208(1997) character
              set.  Note this is not cp932. Uses JIS x-0201 latin
              as latin(GL) part.

       sjis-x0213
              Encoding is Microsoft  JIS  using  JIS  X0213(2000)
              character set.

       sjis-x0213-2004
              Encoding  is  Microsoft  JIS  using JIS X0213(2004)
              character set.  10 newly defined  character  added,
              but  Unicode  mapping  is  same as JIS X0213(2000).
              Uses JIS x-0201 latin as latin(GL) part.

       sjis-cellular (experimental)
              Encoding is Shift-encoded JIS X0208(1997) character
              set  with  NTT Docomo/Vodafone cellular phone glyph
              mapping.

       cp932  Encoding is Microsoft SJIS cp932 with NEC/IBM gaiji
              area.  Uses JIS x-0201 latin as latin(GL) part.

       johab  Encoding is KS X1001(Johab). Uses KS X1003 latin as
              latin(GL) part.

       ucs2   Encoding is  Unicode  UTF-16  (v4.0).  Input/Output
              default byte-endian is little, and input byte order
              mark is recognized.  Output includes endian mark by
              default  unless --disable-endian-mark is specified.
              Output range is within UTF-32 with  surrogate  pair
              unless --limit-to-ucs2 is specified.

       utf8   Encoding  is  UTF-8  encoded Unicode (v4.0). Output
              doesn't   include   byte    order    mark    unless
              --enable-endian-mark is specified.  Output range is
              within UTF-32 unless --limit-to-ucs2 is  specified.

       utf7   Encoding  is  UTF-7  encoded Unicode (v4.0). Output
              range is limited to UTF-16, and value above U+10000
              is regarded as undefined.

       keis (experimental)
              Encoding is Hitachi KEIS83/90. Output range is lim-
              ited to EBCDIK and JIS X-0208 area.

       jef (experimental)
              Encoding is Fujitsu JEF. Only basic  part  is  sup-
              ported.

       koi8r  Russian KOI-8R code.

       cp1250 Central Europian latin MS cp1250 code.

       cp1251 Eastern Europian cyrillic MS cp1251 code.

       transparent
              Transparent  mode.  Various  code control features,
              include folding and line end  code  conversion,  is
              ignored.


     Shortcuts
       -n -j  same as --oc=jis

       -s -x  same as --oc=sjis

       -a -e  same as --oc=euc-jp

       -q     same as --oc=ucs2

       -z     same as --oc=sjis

       -y     same as --oc=utf7

       -k     same as --oc=keis


       -A, -E same  as --ic=euc-jp. Assume input code set is EUC-
              JP.

       -N     same  as  --ic=jis.  Assume  input  code   set   is
              iso-2022-jp.

       -S, -X same   as  --ic=sjis.  Assume  input  code  set  is
              Microsoft JIS.

       -Q     same as --ic=ucs2.

       -Y     same as --ic=utf7.

       -Z     same as --ic=utf8.

       -K     same as --ic=keis.


     ISO-2022 Specific controls
       Replace G0-3 after setting up according to specified input
       codeset by assigned character set with this option.

       --set-g0=`charset name'
              Predefine  specified code set to plane 0 (G0). Also
              set to GL at initial state.

       --set-g1=`charset name'
              Predefine specified code set to right  plane  (G1).
              Also set to GR at initial state.

       --set-g2=`charset name'
              Predefine specified code set to right plane (G2).

       --set-g3=`charset name'
              Predefine specified code set to right plane (G3).


       Supported  `char_set' is as follows. 'o' means the codeset
       can be spacified to set to the plane. 'x' means you can't.


       g0 g1 g2 g3    codeset name   description
       o  o  o  o     ascii          ANSI X3.4 ASCII
       o  o  o  o     x0201          JIS X 0201 (latin part)
       x  o  o  o     iso8859-1      ISO 8859-1 latin
       x  o  o  o     iso8859-2      ISO 8859-2 latin
       x  o  o  o     iso8859-3      ISO 8859-3 latin
       x  o  o  o     iso8859-4      ISO 8859-4 latin
       x  o  o  o     iso8859-5      ISO 8859-5 Cyrillic
       x  o  o  o     iso8859-6      ISO 8859-6 Arabic
       x  o  o  o     iso8859-7      ISO 8859-7 Greek-latin
       x  o  o  o     iso8859-8      ISO 8859-8 Hebrew
       x  o  o  o     iso8859-9      ISO 8859-9 latin
       x  o  o  o     iso8859-10     ISO 8859-10 latin
       x  o  o  o     iso8859-11     ISO 8859-11 Thai
       x  o  o  o     iso8859-13     ISO 8859-13 latin
       x  o  o  o     iso8859-14     ISO 8859-14 latin
       x  o  o  o     iso8859-15     ISO 8859-15 latin
       x  o  o  o     iso8859-16     ISO 8859-16 latin
       x  o  o  o     tcvn5712       TCVN 5712 (Vietnamese)
       x  o  o  o     ecma113        ECMA 113 Cyrillic
       o  o  o  o     x0212          JIS X-0212(1990)
       o  o  o  o     x0208          JIS X-0208(1990)
       o  o  o  o     x0213          JIS X-0213 Plane 1(2000)
       o  o  o  o     x0213-2        JIS X-0213 Plane 2(2000)
       o  o  o  o     x0213n         JIS X-0213 Plane 1(2004)
       o  o  o  o     gb2312         Simplified Chinese GB2312
       o  o  o  o     gb1988         Chinese GB1988(latin)
       o  o  o  o     gb12345        Traditional Chinese GB12345
       o  o  o  o     ksx1003        Korian KS X 1003(latin)
       o  o  o  o     ksx1001        Korian KS X 1001
       x  o  o  o     koi8-r         Cyriilic KOI-8R
       x  o  o  o     koi8-u         Ukrainean Cyriilic KOI-8U
       o  o  o  o     cns11643       Traditional Chinese CNS11643
       x  o  o  o     viscii-r       RFC1496 VISCII (right plane)
       o  o  o  o     viscii-l       RFC1496 VISCII (left plane)
       o  o  o  o     vni            Vietnamese VNI
       x  o  o  o     cp437          Microsoft cp437 (US latin)
       x  o  o  o     cp737          Microsoft cp737
       x  o  o  o     cp775          Microsoft cp775
       x  o  o  o     cp850          Microsoft cp850
       x  o  o  o     cp852          Microsoft cp852
       x  o  o  o     cp855          Microsoft cp855
       x  o  o  o     cp857          Microsoft cp857
       x  o  o  o     cp860          Microsoft cp860
       x  o  o  o     cp861          Microsoft cp861
       x  o  o  o     cp862          Microsoft cp862
       x  o  o  o     cp863          Microsoft cp863
       x  o  o  o     cp864          Microsoft cp864
       x  o  o  o     cp865          Microsoft cp865
       x  o  o  o     cp866          Microsoft cp866
       x  o  o  o     cp869          Microsoft cp869
       x  o  o  o     cp874          Microsoft cp874
       x  o  o  o     cp932          Microsoft cp932 (Japanese)
       x  o  o  o     cp1250     Microsoft cp1250(Central Europe)
       x  o  o  o     cp1251         Microsoft cp1251 (Cyrillic)
       x  o  o  o     cp1252         Microsoft cp1252 (Latin-1)
       x  o  o  o     cp1253         Microsoft cp1253 (Greek)
       x  o  o  o     cp1254         Microsoft cp1254 (Turkish)
       x  o  o  o     cp1255         Microsoft cp1255
       x  o  o  o     cp1258         Microsoft cp1258

       --euc-protect-g1
              In EUC input mode,  suppress  sequences  to  set  a
              charset to G1. Such sequences are discarded.

       --add-annon
              Add announcer for JIS X-0208(1990) to X-0208 desig-
              nate  sequence.  This  option   works   only   with
              iso-2022-based output.

       --disable-jis90
              Disable  2 added characters of JIS X-0208(1990). If
              this option is specified, these two characters  are
              replaced  by Kanji variants.  This option is off by
              default.

       --input-detect-jis78
              Distinguish  JIS  X-0208(1978)  codeset   and   JIS
              X-0208(1983/90)  codeset.   By  default,  these two
              charset is regarded as X-0208(1983/90). This option
              is   valid   only   when   input  encoding  is  JIS
              (ISO-2022).


     JIS X-0212(Supplement Kanji code) Support
       --x0212-enable
              skf by default does not  output  JIS  X-0212  code.
              This  option enables use of JIS X-0212 part. Output
              code set may be neither Microsoft  code  nor  KEIS.
              For Unicode variant encodings, this option is on by
              default.  This option  is  supported  for  backward
              compatibility.  May not be supported in future ver-
              sions.


     Unicode coding specific control options
       --use-compat
              When output is one of translation format of Unicode
              standard,  enable characters in compatibility plane
              (0xfxxx).  If disabled, these  characters  is  con-
              verted to variants or undefined.

       --use-ms-compat
              When  output is Unicode, make translation Microsoft
              windows compatible (i.e.  cp932). This only  affect
              some  symbols in JIS-Kanji, and adding --use-compat
              option is recommended.

       --use-cde-compat
              When output is Unicode, make translation CDE  stan-
              dard codeset compatible.

       --little-endian
              When  output  is  Unicode,  use little endian byte-
              order. This is default.

       --big-endian
              When output is Unicode, use big endian  byte-order.

       --disable-endian-mark
              When  output is UTF-16, do not use byte order mark-
              ing. To make UTF-16N, use this option  with  --lit-
              tle-endian. This is off by default.

       --enable-endian-mark
              When  output  is  UTF-8, output byte order marking.
              This is off by default.

       --input-little-endian
              When input  is  Unicode,  assume  input  is  little
              endian  byte-ordered.   This  is  default,  but skf
              respects byte-order mark.

       --input-big-endian
              When input is Unicode, assume input is  big  endian
              byte-ordered.   Note  that  skf respects byte-order
              mark.

       --endian-protect
              Do not use endian mark in the input stream.  Endian
              mark is just discarded.  This is off by default.

       --use-replace-char
              skf  by  default  converts undefined (except 0x2xxx
              part)  characters  into  "geta  (U+3013)"  code  in
              Japanese codeset.  This option specifies skf to use
              replacement char (U-fffc) instead.

       --limit-to-ucs2
              Do not use > 0x10000 area  code  in  Unicode  (i.e.
              limit  code to ucs2 area).  This is off by default.

       --disable-cjk-extension
              Treat CJK extension A/B area as undefined. This  is
              off (i.e. these areas are enabled) by default.

       --old-hangul-location
              Treat input U-3400 area as hangul (Unicode 1.0 com-
              patibility).  This is off by default.


     Codeset/Vendor Specific codeset handling flags
       skf by default assumes machine  specific  parts  of  kanji
       code  are  Microsoft  Windows  compatible.  Here  are some
       options that control this behavior.  Option in this  cate-
       gory  is  valid  when  output codeset is Japanese codeset,
       except disable-charts.

       --use-apple-gaiji
              Assume machine specific part in input file is  Mac-
              intosh (System 7,8,9 or OS X) compatible.

       --disable-ibm-gaiji
              Disable machine specific part in input file.

       --disable-chart
              Do  not use Moji-keisen characters. This is for old
              Macintosh system (System 6.x or older)  compatibil-
              ity.


     Miscellanious codeset related options
       --old-nec-compat
              Enable old NEC kanji sequence (ESC-K,H). Needs com-
              pile option --enable-oldnec at configuration.

       --no-utf7
              Assume input code set is *NOT* UTF-7  encoded  Uni-
              code. This option disables input utf7 testing.

       --no-kana
              Assume  input code set does *NOT* include JIS x0201
              kana. Also suppresses Unicode half width  variants.


   OUTPUT Conversions options
       skf has various features to fit output file to local envi-
       ronment, and many of these are controlled by extended con-
       trol switch described in this section.

       --use-g0-ascii
              set  G0(=GL) for output encoding to ASCII, ignoring
              codeset designation.

     X-0201 Kana/latin conversions
       skf by default converts X-0201 kanas to X-0208  kanas.  To
       output X-0201 kana as it is, use one of following options.
       When output is designated to  EUC  or  SJIS,  these  three
       options enable X-0201 kana output by ways provided by each
       code set. When Unicode output is specified, (equiv.)  kana
       part  output  is controlled by --use-compat, not following
       switches.  Valid only when output codeset  is  non-Unicode
       Japanese codeset.

       --kana-jis7
              use  SI/SO  locking  shift  sequence  to  designate
              X-0201 kana.

       --kana-jis8
              output X-0201 kana using 8-bit code right plane.

       --kana-esci --kana-call
              use ESC-(-I to designate X-0201 kana.

       --kana-enable
              use X-0201 kana when EUC (with G2) or  SJIS  output
              code  is  used.  When  JIS  output,  it  is same as
              --kana-call.


     URI/TeX conversion feature options
       With Unicode(tm) family output codings,  skf  output  non-
       ascii latin character part as it is, but with other output
       codings, skf converts  these  characters  using  following
       rules:

       (1)  If  code is defined in a specified output codeset, it
       is outputted with this codeset.
       (2) If one of following html  convert  modes  enabled  and
       code  is  defined in html/sgml codeset, it is converted to
       entity-reference or codepoint reference.
       (3) If tex convert mode enabled and code is defined in tex
       codeset, it is converted to tex format.
       (4)  If  code is a kind of combined ligatures, it is shown
       by a set of characters.
       (5) A kind of replacement character is shown,  with  warn-
       ing.

       --convert-html --convert-sgml
              Enable  html  convert mode. This mode is cleared by
              --reset. These two options are  synonyms,  and  are
              treated as same option.

       --convert-html-decimal
              Enable  html  code-point decimal convert mode. This
              mode is cleared by --reset.

       --convert-html-hexadecimal
              Enable html code-point  hexadecimal  convert  mode.
              This mode is cleared by --reset.

       --convert-tex
              Enable  TeX  convert  mode. This mode is cleared by
              --reset.

       --use-iso8859-1
              Enable iso-8859-1 output. Iso-8859-1 is invoked  to
              G1 and set to GR plane.

       --use-iso8859-1-right
              Enable   7-bit  iso-8859-1  output.  Iso-8859-1  is
              invoked to G1 plane.

   Encoding control options
       --decode=`encoding scheme'
              Specify encoding scheme for input stream. Supported
              encoding   scheme   is   `hex',  'mime',  'mime_q',
              'mime_b', 'uri_encode', 'puny',  'hex_perc_encode',
              CAP  hex-code, mime, mime Q-encoding, mime B-encod-
              ing, uri character  reference,  ACE  punycode,  uri
              percent  notation,  base64, Q-encoding, rfc2231 and
              rot13/47 respectively. Only one  decode  option  is
              valid,  and  if  more than one option is specified,
              last one is used.  When mime decoding is specified,
              base  text  is  assumed  to  be EUC encoding unless
              specified  otherwise.  Except  rot,  which  assumes
              input  stream  is  Shift_JIS,  EUC  or iso-2022-jp,
              these encodings assumes input stream is  ascii  (as
              defined  in  RFC2045).  Some encodings may co-exist
              with encoding, but this is  not  guaranteed.  Espe-
              cially,  if input is UTF-16/UCS2 code, these encod-
              ing is ignored in skf.

   End of line control options
       --lineend-thru
              Output end of line code as it is.  Also  output  ^Z
              code as it is.  This is default.

       --lineend-cr --lineend-mac
              Use  CR  as  end  of line code. Also delete ^Z code
              from input stream.

       --lineend-lf --lineend-unix
              Use LF as end of line code.  Also  delete  ^Z  code
              from input stream.

       --lineend-crlf --lineend-windows
              Use  CRLF  as end of line code. Also delete ^Z code
              from input stream.

       -F[line_length[-kinsoku]]

       -f[line_length[-kinsoku]]
              Wrap input lines by line_length columns.  f  option
              deletes  CR/LF's  in  input,  and  F option doesn't
              delete them. For Japanese convension, both gyoutou-
              kinsoku(by    burasage-gumi)   and   gyoumatsu-kin-
              soku(oidasi-gumi) is supoorted. burasage-length  is
              controlled  by  kinsoku  option.  Default value for
              line_length is 60, and  must  be  <  1000.  Default
              value for kinsoku is 5, and must be < 10.

   File control options
       --filewise-detect --force-reset
              Reset  and re-detect input code set at the start of
              each file.

       --linewise-detect
              Reset and re-detect input code set at the start  of
              each  line. This option needs -DKUNIMOTO at compile
              time.


   Compatibility options
       --nkf-compat
              interpret following options as nkf compatible  man-
              ners.

       --skf-compat
              interpret  following options as skf-native manners.


   Misc. Control options
       --disable-space-convert
              skf by default, converts an ideographic space  into
              two ascii spaces.  This option disables this behav-
              ior.

       --html-sanitize
              Convert several  characters  in  HTML  document  to
              entity    reference    expression.    Specifically,
              "!#$&%()/<>:;?' is escaped by entity expression.

       --filewise-detect --force-reset
              If multiple input files  are  given,  detect  input
              code for each file.

       --linewise-detect
              Detect input code line-wise. Note this option weak-
              ens code detect feature.  Need compile  option  (at
              configure) --enable-kunimoto.

       --reset
              Reset  all flags specified by extended controls and
              given input code.

       --inquiry --guess
              skf detects code and output detect result  to  std-
              out.  No filtering output is performed. If multiple
              input file is given, --show-filename  is  automati-
              cally enabled.

       --suppress-filename
              When inquiry(--inquiry) is on, this option disables
              file   name   output.    This   option    overrides
              --show-filename.

       --show-filename
              When  inquiry(--inquiry)  is  on,  this option adds
              each file name to output.

       --invis-strip
              Delete  all  escape  sequences  not  belonging   to
              ISO-2022   code  extension.  This  is  intended  to
              replace invisstrip command bundled in  inews  pack-
              age.

       -I     Warn if input has unassigned code points.

       -v     print version and exit.

       -h --help
              print brief help.

       --show-supported-codeset
              Display supported codeset (input) and exit.

       --show-supported-charset
              Display  supported character set (output) and exit.

       -%[debug_level]
              Enable skf debugging. Debug level is one  digit.  0
              is the least verbose, and with -%9 you'll get whole
              traces  within  skf.   This  option  needs  compile
              option --enable-debug.


FILES
       /usr/(local/)share/skf/lib/   (Unices)

       /Program Files/skf/share/lib (MS Windows)
              These  directories  are where external codeset con-
              version tables go.  The location that  current  skf
              assumes are shown by -h option.


AUTHOR
       skf  is  written  by  Seiji  Kaneko (skaneko@a2.mbn.or.jp)
       based  on  idea  from  nkf  written  by   Itaru   Ichikawa
       (ichikawa@flab.fujitsu.co.jp) X-0213 code table is derived
       from work of earthian@tama.or.jp.


ACKNOWLEDGEMENT
       skf is inspired by works or requests by shinoda@cs.titech,
       kato@cs.titech, uematsu@cs.titech, void@global ohta@ricoh,
       Hinata(HKE)  Ashizawa(CRL)  Kunimoto(SDL)  Oohara(Univ  of
       Kyoto). Thanks.


BUGS AND LIMITATIONS
       1. skf can handle mixed coding with some limitations. How-
       ever, code detection tends to fail  for  mixed  code,  and
       giving  explicit input code set is strongly encouraged, if
       codeset is known beforehand.
       In case of need, --linewise-detect option  may  help,  but
       more likely to fail to detect codes.

       2.  When using UCS2, UTF-16, UTF-8 and UTF-7, skf tries to
       detect input code, but giving explicit code set is encour-
       aged.    skf   doesn't  support  UCS4,  but  does  support
       UTF-16/UTF-32 (i.e. surrogate pairs).  skf just pass  Com-
       posite characters to output. No further normalization pro-
       cess is performed.

       3. skf implements ISO-2022 with following exceptions
        i) GL 0x20 is always space. Even when 96-character  code-
       set is invoked to GL.
        ii)  Sequences  for  setting codes to C1 and C2 is always
       ignored.
        iii) if unknown sequence is given to G0,  G0  is  set  to
       ascii,   and  locking/single  shift  is  cleared.  Unknown
       sequece call to G1-G3 is just ignored.
        iv)  Sequences  for  96  character  multibyte  coding  is
       ignored (Currently, no codeset is registered).
        v)  Calling  UTF-8, UTF-16 coding system from iso-2022 is
       supported, and returns to previous coding system by  stan-
       dard  return. Calling and return in other case is ignored.
        vi) Because of cellular phone glyph support, several pri-
       vate  (not  registered) codeset is defined in skf, and can
       be called by appropriate sequence.

       4. Since skf by default tests input stream to detect  utf7
       coding,  skf sometimes misdetects pure ascii text as utf7.
       If this occurs, use --no-utf7 option.

       5. error output coding is controlled by LOCALE environment
       variables  in UN*X system. Since skf don't care about std-
       out and stderr is redirecting into same stream, this  case
       should be handled by user.

       6. skf-1.9x converts KEIS/JIS X-0213 code using CJK-exten-
       sion B and CJK compatibility area. For this reason, X-0213
       and  KEIS  convert result varies depending on --use-compat
       and --limit-to-ucs2 switches.

       7. JIS X-0207(1979) is not supported. JIS X-0211(1987)  is
       designed  to  be  supported  (i.e. common terminal control
       sequence will be transparently passed to output).

       8. Even if unbuffer option(-u) is  specified,  some  code-
       translation  related  bufferings  are  still performed (in
       MIME, kana, VIQR etc.).

       9.  skf-1.9x   recognizes   and   handles   languages   in
       iso639-1(alpha  2).   iso639-2 is not supported as a valid
       language set.


Notes
       1. Extended options are changed extensively since skf-1.9.
       Some archaic options (eg. -B, -@ and -r) have been deleted
       from this version.

       2. skf is derived project from nkf,  but  doesn't  contain
       nkf codes. Copyright notice is retained by honor.

       3.  From  version  1.9,  default  Japanese  character  set
       assumed by  skf  has  changed  to  JIS  X-0208(1990)  with
       Microsoft Japanese Windows gaiji (i.e. CP932).

       4.  Code autodetection is not perfect by design. If it has
       failed to detect input code properly,  please  give  input
       code information explicitly.

       5.  Some  ligatures in Unicode, cp932 gaiji and KEIS83 are
       converted using JIS X-0124 and  other  convention.  During
       this conversion, its byte length is not preserved.

       6.  skf  is intended to pass ANSI compatible terminal con-
       trol code transparently, but this is not guaranteed.

       7. nkf's -i and -o options still  works,  but  valid  only
       when  iso-2022-jp and is independent with codeset specifi-
       cations. Using these options are strongly discouraged.

       8. For unconverted character, skf uses geta and  undefined
       character  as --use-replace-char option. If output codeset
       doesn't contain geta code, skf prefers 'black square char-
       acter', then uses '.' respectively.

       9.  There  are  some  undocumented  options. These options
       should be considered as highly experimental.


Notice
       Unicode(TM) is a trademark of Unicode, Inc. Microsoft  and
       Windows  are  registered  trademarks of Microsoft corpora-
       tion. Macintosh is a registered trademark  of  Apple  Com-
       puter Inc. Vodafone is a trademark of Vodafone K.K.  Other
       names and terms may be trademarks or registered  trademark
       of their respective owner.  Trademark symbol (TM) is omit-
       ted in this manual page.



                           09/MAY/2004                     SKF(1)
Développer et télécharger des logiciels Open Source

skf - simple kanji filter

Voir skf_1.93_man.txt

Catégorie (Tag) arbre

déposer de l'information