mozdev.org

russkey

resources:

Design

Typing with Russ Key.

At this point I have discovered various things about JavaScript that I wasn't familiar with before. I found a way to cancel the default action executed upon a keypress, so now the cyrillic letters appear right away instead of the way it was done before - english letters appearing first and then being substituted by the russian letters.

This fixed the synchronization problem that I had before I could cancel the default action.
There is also a change in handling the scrolling within a text area, at this point the scrolling is much smoother and the text does not jump once certain boundaries are reached.

The normal forward translation method that is used in the first release of the extension works as follows:
  1. To switch to the Russian keyboard the user must either right click on the TextArea or Text Input field and use the context 'Russ Key' menu item, or focus on the text input area and use either the status bar 'Russ Key' button or use the ctrl-shift-k key-sequence.
  2. Once the user selects to use the cyrillic keyboard an event listener is attached to the selected TextArea or Text Input field. Now the 'forwardTranslate()' function will be executed on every 'keypressed' event and 'rkCapsLockPressed()' function will be executed on every 'keyup' event fired by the TextArea or the Text Input field in question.
  3. 'rkCapsLockPressed()' function switches the internal script value of the caps lock variable since there is no way to find out the actual current status of the CapsLock button, so initially CapsLock=off is assumed.
  4. Once a 'keypress' event is fired, the 'forwardTranslate()' function executes.
    1. If the translator is off, the function will exit.
    2. If the key pressed is 'return', translate function is exited. This is done because the a keyup event will propagate from an active context menu opened with the right-click.
    3. If the key pressed is one of the special function keys (including the spacebar,) the translate function is exited.
    4. Use the capsState global variable and the event.shiftKey modifier to figure out whether the typed key is capital or lower case.
    5. The newly typed character is translated. If the translation cannot be done (-1 is returned from the keyboard map instead of a valid character,) the function is returned.
    6. Default action upon the 'keypress' event is cancelled (prevented).
    7. Temporarily storing the current X and Y positions of the cursor within the text input area.
    8. Temporarily storing the text found before the newly typed character.
    9. The text input area value is set to the new value (text before new character + new character + the rest of the string.)
    10. Temporarily store the text before the new character + the new character
    11. Use the length of the text prior and including the new character to set the cursor to the new position.
    12. Temporarily store the height of the text prior to and including the new character by figuring out the number of line breaks before the new character and multiplying the number of lines by a hardcoded (for now) character height (in pixels.)
    13. Temporarily store the width of the line where the new character was typed (in pixels.)
    14. Temporarily store total height of the text input area (in pixels.)
    15. Temporarily store total width of the text input area (in pixels.)
    16. Now there is some logic used to scroll the text in the text input area to the correct position. This uses all of the temporarily stored, size related values (more comments in the code.)

The forward translation method used for ThunderBird host application within compose window works in a different way partially because I do not have enough knowledge how to manipulate this editor element properly. If someone has a better idea, feel free to make your suggestions. All the other TB text elements behave like normal TextAreas and are translated using the FF method describe above. The below description only applies to TB compose window text editor area.
  1. To switch to the Russian keyboard the user must either focus on the text editor area and use either the status bar 'Russ Key' button or use the ctrl-shift-k key-sequence.
  2. Once the user selects to use the cyrillic keyboard an event listener is attached to the text editor in the focused window. Now the 'forwardTranslateEditor()' function will be executed on every 'keypressed' event and 'rkCapsLockPressed()' function will be executed on every 'keyup' event fired by the text editor.
  3. 'rkCapsLockPressed()' function switches the internal script value of the caps lock variable since there is no way to find out the actual current status of the CapsLock button, so initially CapsLock=off is assumed.
  4. Once a 'keypress' event is fired, the 'forwardTranslateEditor()' function executes.
    1. If the translator is off, the function will exit.
    2. If the key pressed is 'return', translate function is exited. This is done because the a keyup event will propagate from an active context menu opened with the right-click.
    3. If the key pressed is one of the special function keys (including the spacebar,) the translate function is exited.
    4. Use the capsState global variable and the event.shiftKey modifier to figure out whether the typed key is capital or lower case.
    5. The newly typed character is translated. If the translation cannot be done (-1 is returned from the keyboard map instead of a valid character,) the function is returned.
    6. Default action upon the 'keypress' event is cancelled (prevented).
    7. 'selection' collection of ranges is retrieved from the focused window.
    8. New node is created and populated with the translated character string.
    9. New node is inserted into the 0th range in 'selection' object. Here I reuse the same method used to insert a new node with text transformed from translit into cyrillic.

Transforming Translit with Russ Key.

Current version of Russ Key lets the user to transform translit - Russian text typed with Latin (English) letters.

To transform translit the user only needs to select any text on the page (either text in a text input field or static HTML text) and select "Russ Key" menu option in the context menu or click on the 'Russ Key' button on the statusbar.

Technically transformation of this kind requires two proofs of concept.
  1. It must be possible to transform Latin into Cyrillic and achieving some coherency in the resulting text.
  2. It must be possible to substitute the text from the HTML page with some other text.

The first problem appeared to be not too difficult to achieve, however it became clear that transforming translit will not always produce syntactically correct result. There are too many inconsistencies in the writings found on the web, some people write characters in one way and others use different combinations of characters to achieve the same result. This is obviously one of the biggest problems.
The other problem is that it is impossible to transform translit into syntactically correct Russian because of the ambiguities in the Russian language. For example in some cases 'sch' means one letter 'щ' and in other cases it means two letters 'сч'. To transform translit correctly the code must have the entire set of rules of Russian language programmed into it and I don't think I am up to this gigantic task just yet. So the translit transformation will result in an approximation of the Russian text but it appears to be good enough and definitely better than reading translit. To actually transform the text, it became necessary to use a number of simple regular expressions to match combinations of Latin letters that map to one or more Russian letter.
So the text is run by all of these combinations and after that a simple translation loop is run transforming the rest of the letters that could be mapped one-to-one. In the loop, each letter is read from the text one by one, its value is compared to a transformer limiter (the value of the letter must be lower than 127 in order to be translated from ASCII to UTF.) Then the letter is used as an offset into a translation array, where the position within the array is the value of the ASCII letter and the result is a UTF encoded string - a cyrillic letter.

The second problem - substituting text from the HTML page with some other text - is split into two subproblems. If the selected text - text to be transformed - is a value within a text input field or a textarea, the solution is simple: just read the value from the text field, find the selection range, use substring on the value to select the text for transformation, transform the text and append it back together with the rest of the original string and paste the value back into the text field.
If the selected text is just some static HTML text, the problem is more difficult. Current solution is to use the selected range from the document and basically delete it from the document and insert a new range with the translated text. This presents a problem: the newly inserted text node will not have any original HTML formatting in it.

The next step for the extension is to traverse the DOM structure of the selected range and to update only the text values from every node in the range with transformed text values.

KeyboardMap Format

In the browser type about:config instead of a URL and press enter, you will see various FF and extension preferences. Find this preference russkey.KeyboardMap - you will see that the value for this key is a long string that looks like this:

Well, this is the string that contains information used to map keys from the keyboard to some unicode characters. It is now possible to switch the default keyboard mapping for a new mapping that is user specific. However to do this a new mapping string must be created. In order to create a correct mapping string follow this format:
  1. The string must not contain any spaces, tabs, new lines or line feeds.
  2. The string consists of 6 substrings that start with the following delimiters (case sensitive, the column must be included into the name.)
    1. UPPERALPHA: - all upper case characters that map to alpha keys on the keyboard. These are the keys that are pressed with CAPS or SHIFT.
    2. UPPERSYMBOLS: - all upper case characters that map to non-alpha keys on the keyboard such as 1,2...0,-,=,[,],\,/,.,,,',;,` These are the characters that are pressed with CAPS but without SHIFT.
    3. UPPERSYMBOLSSHIFTED: - all upper case characters that map to shifted non-alpha keys on the keyboard such as !,@...),_,+,{,},|,?,>,<,",:,~ These are the characters that are pressed with SHIFT on but CAPS off.
    4. LOWERALPHA: - all lower case characters that map to alpha keys on the keyboard. These are the keys that are pressed with CAPS or SHIFT.
    5. LOWERSYMBOLS: - all lower case characters that map to non-alpha keys on the keyboard such as 1,2...0,-,=,[,],\,/,.,,,',;,` These are the characters that are pressed with CAPS but without SHIFT.
    6. LOWERSYMBOLSSHIFTED: - all lower case characters that map to shifted non-alpha keys on the keyboard such as !,@...),_,+,{,},|,?,>,<,",:,~ These are the characters that are pressed with SHIFT on but CAPS off.
  3. Mapping Pairs: within each substring there are ASCII to UNICODE pairs that map a key to a unicode representation. Both numbers are base 10. The ASCII portion of the pair (the key) must be between 0 and 126 inclusive. The UNICODE portion (the value) must be over 126 and for UTF8 encoding must not exceed 65534. Here you will find list of the cyrillic unicode entities.
  4. Mapping pairs are delimitted with ','. There is no trailing ',' after any of the 6 substrings.

Transliteration Character Code Mapping String Format

In the browser type about:config instead of a URL and press enter, you will see various FF and extension preferences. Find this preference russkey.TranslitMapCharCodes.1 - you will see that the value for this key is a long string that looks like this:

Well, this is the string that contains information used to map character codes from tranlist (latin) to some unicode characters (cyrillic). It is now possible to switch the default character code mapping for a new mapping that is user specific. However to do this a new mapping string must be created. In order to create a correct mapping string follow this format:
  1. The string must not contain any spaces, tabs, new lines or line feeds.
  2. The string consists of mapping pairs consisting of a key and a value. Key is found on the left side and value is found on the right from a '=' sign. Mapping pairs are seperated by '-' from each other.
  3. Each mapping pair maps one ASCII character code to a unicode representation. Both numbers are base 10. The ASCII portion of the pair (the key) must be between 0 and 126 inclusive. The UNICODE portion (the value) must be over 126 and for UTF8 encoding must not exceed 65534. Here you will find list of the cyrillic unicode entities.

Transliteration Regular Expression Mapping String Format

In the browser type about:config instead of a URL and press enter, you will see various FF and extension preferences. Find this preference russkey.TranslitMapRegExpCodes.1 - you will see that the value for this key is a long string that looks like this:

Well, this is the string that contains information used to map regular expressions for translit transformation from latin (or anything else) to some unicode characters (cyrillic). It is now possible to switch the default regular expression mapping for a new mapping that is user specific. However to do this a new mapping string must be created. In order to create a correct mapping string follow this format:
  1. The string must not contain any spaces, tabs, new lines or line feeds.
  2. The string consists of mapping triads consisting of a key, regexp option and a value. Mapping triads are seperated by '-' from each other.
    1. Key is an ASCII based string to be used as a regular expression string, so it can be formatted as a regular expression string to be processed by a javascript function.
    2. Next after the key string comes a ',' and then a regular expression option that can be either 'g' or 'i' or 'gi', which means global search, noncase sensitive and global-noncase sensitive respectively.
    3. Next after the option string comes a '=' and then a substitution string, which consists of integers over 126 and smaller than 65534 (for UTF8,) representing unicode entities. This string is delimited with ',' characters.

The russkey project can be contacted through the mailing list or the member list.
Copyright © 2000-2017. All rights reserved. Terms of Use & Privacy Policy.