Custom canonical transliteration schemes

Google Input Tools supports adding custom transliteration schemes. Defining a scheme is done in a text file with a .scm extension. Once the scheme file is created and placed in the Schemes directory, it will automatically appear in the menu of available schemes next time the input tools are enabled.

Defining Schemes

The Scheme files (.scm files) are text files; you can use your favorite text editor to edit them. The scheme file should be saved with UTF-8 or UTF-16 text encoding. Any scheme file consists of two parts, the header and the mapping rules section.

Header

The scheme file header specifies multiple attributes. A sample header with all the attributes will look like,

version: 1.0
name: ITRANS
using classes
class-delimiters: [ ]
wildcard: #
stop-char: _

Class specification

You can define mutiple classes in your scheme file if you have specified the using classes line in the header. A class definition should look like,

class <class-name> <class-begin-delimiter>
... rules
<class-end-delimiter>

The class-begin-delimiter, and class-end-delimiter are the ones specified in the header as mentioned above,

class-delimiters: <class-begin-delimiter> <class-end-delimiter>

If you have not specified class-delimiters in the header (but using classes is defined) then '{' and '}' are used as the default class delimiters. The rule specification inside a class is the same as rule specification outside the class.

Rule specification

Rules are specified as,

<rule-prefix> <rule-target>

Rule prefix can consist of a sequence of ASCII characters and optionally a class specifier. A class specifier inside a rule should be of the form,

<class-begin-delimiter><class-name><class-end-delimiter>

If the Rule prefix has a class specifier, then the Rule target should use the wildcard. Rule target consists of a sequence of characters in the target language of user's choice and a wildcard (when the Rule Prefix has a class specifier). The wildcard character is the one specified in the header as,

wildcard: <wildcard-character>

If no wildcard is specified in the header (but using classes is defined) '*' is used as the wildcard by default. A sample scheme file with a class (headers omitted),

class sample {
  a 1
  b 2
  c 3
}
a{sample} 1*
{sample}x *2
ad 14
vd 24

When the rules are processed each occurence of the class in the Rule Prefix is replaced by the prefixes defined in the class and the '*' (wildcard) in the target is replaced by the corresponding target in the class. So in the above example the rules get expanded into,

aa 11
ab 12
ac 13
ax 12
bx 22
cx 32
ad 14
bd 24

The first three rules correspond to the first rule in the rule file, the next three rules are for the second rule, and the last two rules are same as the last two rules in the file because it does not use any class specifiers (Note that the rules inside the class are not added to the set of rules for the scheme). Now if you use this scheme file and type "abaxcxad", you will get the suggestion as "12123214" in the input tool edit window. If you are not using classes the scheme file is just a set of rules with the version and name specified in the header.

Example of a scheme file with no classes,

version: 1.0
name: Sample
r1 target1
r2 target2
r3 target3

Now when you use this Scheme file and type the sequence "r1r2r3" you will see the word "target1target2target3" as the suggested option.

Integrating Scheme files with Google Input Tools

To integrate a scheme file with Google Input Tools, place the scheme file (with the extension .scm) under the Schemes directory. The Schemes directory is present under the Google Input Tools installation directory (typically under C:/Program Files/Google/Google [Language] Input/). Once the scheme file is placed, Google Input Tools should be restarted - it will pick up the scheme file while loading. Any errors in the scheme file will be displayed in a dialog so that they can be fixed. If there are no errors, the Scheme will appear under the Schemes menu in input tools. Schemes can be activated in several ways: