Thursday, July 24, 2008

Shell Script To Emulate A Thesaurus For Linux or Unix

Hey There,

Today's Linux/Unix bash shell script is for those of us who sometimes get lost for words. This happens to me at least a few times a day as I seem to talk, and type, way too much. Every now and again, I'll find myself facing a sentence that is not only redundant, but also seems to repeat its central message more than once ;) Sometimes redundancy is a good thing, though. If you've ever listened to an instructional or motivational speaker, you've probably noticed that a lot of them like to hit on the "rule of 3's" (sometimes 4's and 5's, with the annoyance factor increasing commensurate to the occurrence of repetition :) I, personally, try not to repeat myself ever (although, in writing about a thesaurus, I'm almost certainly doomed to some sort of meta-paradox).

One of the times a good thesaurus can come in handy is when you're faced with having to use similar words within a restricted amount of space and the resulting text seems stilted because of it. For instance, the sentence:

As good an idea as it may seem, it's generally not good to repeat the same word within a sentence.

With a little thought (or a handy reference) can be made much more palatable, and the redundancy can be made to appear to have disappeared:

As good an idea as it may seem, it's generally not desirable to repeat the same word within a sentence.

Our script today is based on an original script (call "thes") that can be found at Gentoo.org and, like that script, makes use of the Online Thesaurus at reference.com.

The major differences between our script today, and the equally helpful one posted on Gentoo.org are mainly rooted in the method. For instance, their script makes use of lynx and a program you may not have installed by default, called html2text. Ours, while still relying on the online component, uses wget and sed. We went with wget over lynx since it's partial-source dump option is a little more predictable than lynx's. That's not to say that there's anything wrong with lynx, just that it didn't suit our needs for this particular endeavour.

Another major difference between the two is that we decided to go ahead and throw in the "%20" space declaration so that you could submit multi-word queries to the script and get a response that you'd expect. Check out the picture below for a quick example of submitting a bad multi-word query, a good multi-word query, a bad single-word query and a good single-word query. If you can't see the picture, for whatever reason, the output is fairly simple. When you submit a bad query of any type (single or multi-word), you'll get back a "No results found" message and some suggestions. When you submit a query that matches something, you'll receive a varying number of definitions followed by a varying number of synonyms (and, yes, I'm not using the script while I write this ;)

Click the picture below to see it in full size:

Sample Thesaurus script output

As of the writing of this post, I have yet to figure out the "&whatever" suffix to the URL that will make the online Thesaurus return more than 10 results per page, so there's still some work to be done there. If you're so inclined, you can write in a quick check and recheck into the script. The addition to the URL that will start you at definition number 11 (instead of 1; the default) would be "&start=11" - So far, except for with very general words like "good," I've found that this hasn't been necessary, but it would be a cool improvement.

Here's to finding new ways to express yourself (politely ;)

Cheers,


Creative Commons License


This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/bash

#
# Thesaurus - Find something original to say :)
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

if [ $# -lt 1 ]
then
echo "Usage: $0 YourThesaurusTerm"
echo "May be two or more words separated"
echo "by spaces, but only one definition"
echo "per execution."
echo "Ex: $0 goodness"
echo "Ex: $0 goodness gracious"
exit 1
fi

args="$@"
wget=/usr/bin/wget

if [ $# -gt 1 ]
then
args=`echo $args|sed 's/ /%20/g'`
fi

echo
$wget -nv -O - http://thesaurus.reference.com/search?q="$args" 2>&1|egrep -i 'Synonyms:|Definition:|No results found|Thesaurus suggestions:'|sed -e 's/<br \/>/\n/g' -e 's/<[^>]*>//g' -e 's/ / /g' -e 's/\(.\)$/\1\n/' -e 's/Would you.*$//'

exit 0


, Mike