Monday, August 25, 2008

Unix and Linux Online Language Translation Script

Hey There,

This week's start-off Linux/Unix bash shell script is a follow up to our previous online dictionary shell script and Thesaurus shell script. I was thinking of doing the online Encyclopedia today, but, for some reason, I got sidetracked on The BabelFish Language Translation Site.

This script offers the full functionality of the original website's text translation, except I left out the HTML page translation part. I figure if you're going to translate an entire webpage, it would look a lot nicer using a regular web browser ;)

Like our previous scripts, this script uses wget and sed. This script was also written using "brute force" scripting (that's my term for writing while I think it through). I don't know (at this point) where it could be made better, but I'm sure that it can :) I'll give it some thought later, and any recommendations are always welcome!

BTW, a huge "shout out" to the folks at AutoPOST for help getting all the POST elements I needed to pass in the POST request to the online translator. (I fuddled around for hours before I went looking for help... ). Good stuff. Check it out :)

Below is a screen shot of the output spewed when you throw the script the "-h" flag:

Click the picture below to see it in full size:

language preference abbreviations

Below is a screen shot of the script's output translating from one language to another, then back again (to see how literal the translation was - this can be very funny sometimes ;) and a long convoluted statement showing one limitation of the script with regards to the handling of "special" international characters:

Click the picture below to see it in full size:

language translations

To get started, either do:

host # ./ <-- for a straight up help screen


host # ./ -h <-- to list out the required short names needed to do the translations BabelFish is able to do, as well as the regular help.

Hope you enjoy this bash script, and can find some use for it. Any suggestions for improvement would be greatly appreciated, of course :)


Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License


# - Ich verstehe nicht ;)
# Translate between languages using Yahoo's BabelFish
# 2008 - Mike Golvach -
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

if [ $1 == "-h" ]
echo "Usage: $0 TranslationPref YourQuotedPhrase"
echo "TranslationPref Values:"
echo "zh_en => Chinese-simple to English"
echo "zh_zt => Chinese-simple to Chinese-traditional"
echo "zt_en => Chinese-traditional to English"
echo "zt_zh => Chinese-traditional to Chinese-simple"
echo "en_zh => English to Chinese-simple"
echo "en_zt => English to Chinese-trad"
echo "en_nl => English to Dutch"
echo "en_fr => English to French"
echo "en_de => English to German"
echo "en_el => English to Greek"
echo "en_it => English to Italian"
echo "en_ja => English to Japanese"
echo "en_ko => English to Korean"
echo "en_pt => English to Portuguese"
echo "en_ru => English to Russian"
echo "en_es => English to Spanish"
echo "nl_en => Dutch to English"
echo "nl_fr => Dutch to French"
echo "fr_nl => French to Dutch"
echo "fr_en => French to English"
echo "fr_de => French to German"
echo "fr_el => French to Greek"
echo "fr_it => French to Italian"
echo "fr_pt => French to Portuguese"
echo "fr_es => French to Spanish"
echo "de_en => German to English"
echo "de_fr => German to French"
echo "el_en => Greek to English"
echo "el_fr => Greek to French"
echo "it_en => Italian to English"
echo "it_fr => Italian to French"
echo "ja_en => Japanese to English"
echo "ko_en => Korean to English"
echo "pt_en => Portuguese to English"
echo "pt_fr => Portuguese to French"
echo "ru_en => Russian to English"
echo "es_en => Spanish to English"
echo "es_fr => Spanish to French"
exit 0
elif [ $# -ne 2 ]
echo "Usage: $0 TranslationPref YourQuotedPhrase"
echo "For Tranlation Options: $0 -h"
echo "Ex: $0 en_de \"hi there. How are you?\""
echo "Ex: $0 en_it \"hi there. How are you?\""
exit 1

args=`echo $oargs|sed -e 's/?/\\?/' -e 's/&/and/' -e 's/ /+/g'`
fromlang=`echo $langtran|awk -F"_" '{print $1}'`
tolang=`echo $langtran|awk -F"_" '{print $2}'`

case $fromlang in
zh) longlang=Chinese-simple;fromlang=$longlang;;
zt) longlang=Chinese-traditional;fromlang=$longlang;;
en) longlang=English;fromlang=$longlang;;
nl) longlang=Dutch;fromlang=$longlang;;
fr) longlang=French;fromlang=$longlang;;
de) longlang=German;fromlang=$longlang;;
el) longlang=Greek;fromlang=$longlang;;
it) longlang=Italian;fromlang=$longlang;;
ja) longlang=Japanese;fromlang=$longlang;;
ko) longlang=Korean;fromlang=$longlang;;
pt) longlang=Portuguese;fromlang=$longlang;;
ru) longlang=Russian;fromlang=$longlang;;
es) longlang=Spanish;fromlang=$longlang;;
*) longlang="Unrecognized Language!";fromlang=$longlang;;

case $tolang in
zh) longlang=Chinese-simple;tolang=$longlang;;
zt) longlang=Chinese-traditional;tolang=$longlang;;
en) longlang=English;tolang=$longlang;;
nl) longlang=Dutch;tolang=$longlang;;
fr) longlang=French;tolang=$longlang;;
de) longlang=German;tolang=$longlang;;
el) longlang=Greek;tolang=$longlang;;
it) longlang=Italian;tolang=$longlang;;
ja) longlang=Japanese;tolang=$longlang;;
ko) longlang=Korean;tolang=$longlang;;
pt) longlang=Portuguese;tolang=$longlang;;
ru) longlang=Russian;tolang=$longlang;;
es) longlang=Spanish;tolang=$longlang;;
*) longlang="Unrecognized Language!";tolang=$longlang;;


echo "Original $fromlang: $2"
echo -n "Translated $tolang: "
$wget -nv -O -\&doit=done\&fr=bf-res\&intl=1\&tt=urltext\&trtext=${args}\&lp=${langtran}\&btnTrTxt=Translate 2>&1|grep -i 'result'|egrep -iv 'txt-form|tips 1'|sed -e 's/$/\n/' -e :a -e 's/<[^>]*>/ /g;/</N;//ba' -e 's/\&[A-Za-z]*\;/ /g' -e 's/ / /g' -e 's/\(.\)$/\1\n/' -e '/^ *\t*$/d'
exit 0

, Mike

Pete had this interesting comment to share. It brings up a very interesting way of attacking the script and you might find it useful in other situations as well! Our Thanks for the great observation!

Hi! When I copied in the script and ran, I got "unary operator expected at line 11." (or something simlar). I think it was just a slip up ( I do it often ), using the "C" syntax, instead of the shell syntax. In any case, changing to a single "=" corrected that. I usually, also, use the following test construct to avoid problems with shell scripts, when the tested variable may happen to have null contents.

If [ "x$1" = "x-h" ] ... that way, test never has to deal with a null argument.


Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.