Wednesday, July 30, 2008

The Thesaurus Shell Script - New And Improved!

Hey There,

Today's post is a follow up on a post we did just last week on creating an online Thesaurus using shell scripting. In it, we took an already existing Thesaurus script called from and spiffed it up a little so that you could use multiple word queries and extract the suggestions the online Thesaurus would give you if it couldn't find a match for your word or phrase. Of course, sometimes it couldn't find a match for you and, also, had no suggestions... You can't win 'em all ;)

Thanks for this post's updated, and more platform-independent, script goes out to Fred Stephens (who's starting his own Linux revolution over at Linux Latitude) and (like so many other folks who, sometimes, prefer to remain nameless) his help in modifying this script so that it would also run correctly on a few more Linux and Unix distro's I didn't have access to, is greatly appreciated!

In an effort to make the script more portable as well, we used wget instead of lynx and took away the dependency on the html2text program by replacing that versatile (but not always available) software with a series of simple sed executions. This, of course, didn't work exactly the same (or "the same enough" ) on as many distro's as possible. Of course, I knew, when I wrote it, that I couldn't possibly test it on every available platform, so the possibility of it behaving differently on someone else's OS was always there (I think it always is and always will be). But, especially as time lumbers on, if we let hurdles like that slow us down, we'll all eventually accomplish less and less in direct opposite proportion to the amount of variety the computing industry offers us (which is growing more and more abundant as I type). Eventually, no one would ever write any sort of program, spark an original thought or attempt to improve and/or modify solutions unless they had a blank check from the government (or a giant conglomerate) to fund their every notion ;)

But that's one of the great things about the internet. Although I'll agree that (to a certain extent) it promotes seclusion and separation of physical entities from the "uncomfortable" prospect of having to physically interact with one another, the flipside of that coin is that the internet provides the world's largest forum for the free interchange and exchange of information and ideas and makes it possible for like-minded individuals to pool their efforts and ingenuity to produce more, and better, solutions to problems at a much snappier pace.

This rewrite addresses an issue with the one part of the script I just "knew" in my gut wouldn't port to some distro somewhere: the sed execution line. As I mentioned above, the free (licensed under the GPL) software html2text is a project that's had a lot of development hours put into it (actual releases and versions, etc ;), so it's obviously much much better at parsing out html and converting it to plain text than any series of sed commands I could ever string together on a given afternoon. Again, we chose to use sed in order to try and make this script accessible to users who couldn't get their hands on html2text (which also requires python), since sed comes standard on every Linux and Unix distro I know of (and has been around for a long time).

In any event, the script's been spruced up a bit and should run cleaner (The change isn't drastic, but it's definitely significant. If you noticed any garbage characters getting returned to you when you ran the original, this rewrite should, hopefully, fix that for you :)

Thanks, again, for your contribution, Fred :)

BTW, if you want to see sample output, there's a picture on the parent post about this Thesaurus shell script. I'm not jamming it in here again to try and keep this blog light (although, I see I've typed another novel already ;)


Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

# Thesaurus - Find something original to say :)
# 2008 - Mike Golvach - (modified slightly by Fred J. Stephens -
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
if [ $# -lt 1 ]
echo "Usage: $0 YourThesaurusTerm"
echo "May be two or more words separated by spaces"
echo "but only one definition per execution."
echo "Ex: $0 leader"
echo "Ex: $0 big name"
exit 1
if [ $# -gt 1 ]
args=`echo $args|sed 's/ /%20/g'`
$wget -nv -O -"$args" 2>&1|\
egrep -i 'Synonyms:|Definition:|No results found|Thesaurus suggestions:'|\
sed -e 's/<br \/>/\n/g' -e 's/<[^>]*>//g' -e 's/ / /g' -e 's/\(.\)$/\1\n/' -e 's/Would you.*$//' -e 's/\ / /g'

exit 0

, Mike