Anthyを正かなづかひの入力と變換に對應させるパッチの作成作業リポジトリ。<git://git.debian.org/git/collab-maint/anthy.git>のmasterブランチからフォーク。
Révision | 33f494486f19f71b87af834157b2c5b2112c09b1 (tree) |
---|---|
l'heure | 2012-06-27 20:46:48 |
Auteur | ![]() |
Commiter | MORIYAMA Hiroshi |
Add a new script depgraph/anthy-depgraph-gendai-to-seikana.rb
Anthyの付属語辞書を正かなづかひ対応に改変するRubyスクリプト。
コミット a800bf8c71a688c747e72e4fc848b1436ad3306b で使用した。
@@ -0,0 +1,108 @@ | ||
1 | +#! ruby -Eeuc-jp | |
2 | +# Encoding: EUC-JP | |
3 | +# | |
4 | +# Anthyの附屬語辭書(depgraph)を正かなづかひ對應に改變するスクリプト。 | |
5 | +# | |
6 | +# $Id$ | |
7 | +# | |
8 | +# Copyright (C) 2012 MORIYAMA Hiroshi | |
9 | +# | |
10 | +# This library is free software; you can redistribute it and/or | |
11 | +# modify it under the terms of the GNU Lesser General Public | |
12 | +# License as published by the Free Software Foundation; either | |
13 | +# version 2 of the License, or (at your option) any later version. | |
14 | +# | |
15 | +# This library is distributed in the hope that it will be useful, | |
16 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of | |
17 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |
18 | +# Lesser General Public License for more details. | |
19 | +# | |
20 | +# You should have received a copy of the GNU Lesser General Public | |
21 | +# License along with this library; if not, write to the Free Software | |
22 | +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA | |
23 | + | |
24 | +## Usage: | |
25 | + | |
26 | +# % for f in *.depword *.txt *.table; do \ | |
27 | +# ./anthy-depgraph-gendai-to-seikana.rb "$f" >"$f".tmp && /bin/mv "$f".tmp "$f"; \ | |
28 | +# done | |
29 | + | |
30 | +## Code: | |
31 | + | |
32 | +def parse_anthy_depgraph (file_or_string) | |
33 | + entries = [] | |
34 | + | |
35 | + file_or_string.each_line do |line| | |
36 | + items = line.strip.split(/\s/) | |
37 | + | |
38 | + if line[0] == ?# # comment line | |
39 | + entries << [line] | |
40 | + next | |
41 | + else | |
42 | + entry = [items.shift, trans_conditions = [], trans_nodes = []] | |
43 | + end | |
44 | + | |
45 | + items.each do |item| | |
46 | + if item.match(/"/) | |
47 | + trans_conditions << item.gsub(/\A"|"\z/, '') | |
48 | + else | |
49 | + trans_nodes << item | |
50 | + end | |
51 | + end | |
52 | + | |
53 | + entries << entry | |
54 | + end | |
55 | + | |
56 | + entries | |
57 | +end | |
58 | + | |
59 | +if __FILE__ == $PROGRAM_NAME | |
60 | + depgraph_entries = parse_anthy_depgraph(ARGF) | |
61 | + | |
62 | + depgraph_entries.each do |ent| | |
63 | + if ent.length == 1 # comment line | |
64 | + puts ent | |
65 | + next | |
66 | + end | |
67 | + | |
68 | + head_node = ent[0] | |
69 | + trans_conds = ent[1] | |
70 | + new_trans_conds = trans_conds.map{|s| s.dup } | |
71 | + | |
72 | + trans_conds.each do |s| | |
73 | + s = s. | |
74 | + gsub(/ぁ/, 'あ').gsub(/ぃ/, 'い').gsub(/ぅ/, 'う').gsub(/ぇ/, 'え'). | |
75 | + gsub(/ぉ/, 'お').gsub(/っ/, 'つ').gsub(/ゃ/, 'や').gsub(/ゅ/, 'ゆ'). | |
76 | + gsub(/ょ/, 'よ').gsub(/ゎ/, 'わ') | |
77 | + | |
78 | + # | |
79 | + # Fix conversion mistakes. | |
80 | + # | |
81 | + s = s.gsub(/じや/, 'ぢや') # じゃ -> ぢや | |
82 | + s = s.gsub(/([ちぢ])やう/, '\1やふ') # ちゃう -> ちやふ | |
83 | + s = s.gsub(/([ちぢ])やわ/, '\1やは') # きちゃわない -> きちやはない | |
84 | + s = s.gsub(/\Aそうだつ\z/, 'さうだつ') # そうだっ-た -> さうだつ-た | |
85 | + s = s.gsub(/でしよう/, 'でせう') # でしょ-う -> でせ-う | |
86 | + s = s.gsub(/でしよ"/, 'でしょ"') # でしょ。-> でしょ。 | |
87 | + | |
88 | + s = 'う' if head_node == '@形容詞語幹' && s == 'ゆう' # 美しう | |
89 | + s = 'にょ' if head_node == '@よ' && s == 'によ' # 良いにょ(良いよ) | |
90 | + | |
91 | + if head_node == '@ます' | |
92 | + s = 'ましょ' if s == 'ましよ' | |
93 | + s = 'ましぇん' if s == 'ましえん' | |
94 | + end | |
95 | + | |
96 | + if head_node == '@ます(かもの後)' | |
97 | + s = 'ましぇん' if s == 'ましえん' | |
98 | + end | |
99 | + | |
100 | + new_trans_conds << s | |
101 | + end | |
102 | + | |
103 | + ent[1] = new_trans_conds.uniq.map{|s| '"' + s + '"' } | |
104 | + puts ent.flatten.join(' ') | |
105 | + end | |
106 | +end | |
107 | + | |
108 | +## anthy-depgraph-gendai-to-seikana.rb ends here. |