Tuesday, August 26, 2008

Perl Script To Reverse HTML With BDO on Linux and Unix

Ahoy there,

Today, we're putting out a little Perl script to deal with the obscure. Now, I realize that my view is biased heavily by living on American soil, but I can't remember, for the life of me, the last time I came across a Hebrew web page. And by Hebrew, I mean the whole reading from right to left thing. I probably wouldn't even understand it if I recognized it when I saw it ;) It did remind me however, that HTML 4.0 introduced a tag to deal with just that sort of translation. Come to think of it, when applied more liberally it would be a great translation tool for Japanese comic books that read from right to left and from back to front. Anyway, all this talk of forward and backward, left and right... just add up and down and I'll be running to the drug store for some Dramamine ;)

To get directly to the point (finally ;), the tag we're using in this script is called, simply, "bdo." It actually has more options than the "one" we're going to use, but the "dir" attribute is essential in allowing this script do what it does very simply (and actually the only required attribute). The dir is short for direction and can have a value of either "ltr" (left to right) or "rtl" (right to left). So, if you're reading this blog in Hebrew or Japanese, all you need to do is flip flop the value of that variable in the script and it will reverse your HTML pages as well :)

For completeness' sake, the bdo tag can have the following additional attributes (none of which are "required"): class, id, lang, style and title.

The bdo tag is also only valid when used within its own container, or as an attribute of one of the following: a, abbr, acronym, address, applet, b, big, blockquote, body, button, caption, center, cite, code, dd, del, dfn, div, dt, em, fieldset, font, form, h1, h2, h3, h4, h5, h6, i, iframe, ins, kbd, label, legend, li, noframes, noscript, object, p, pre, q, s, samp, small, span, strike, strong, sub, sup, td, th, tt, u or var.

To give you an idea of what today's Perl script accomplishes, I pulled down our website front page, from yesterday and put it through the grinder.

Here's an original snapshot of yesterday's HTML page:

Click below to see this bad boy in full size ;)

Original Web Page

and the "somewhat" reversed version after feeding it to bdo.pl:

Click below to see this gigantic headache in full size ;)

Reversed Web Page

Fun? Yes. Amusing? Possibly. Subject of a lifelong obsession? Doubtful, I hope ;)

You can run the script (which is only written for the command line, but could be easily tweaked to use in a CGI framework) very simply, like this:

host # ./bdo.pl tlaum.htm
Reversed File Saved As: tlaum.htm.5964.html

It can also be improved upon so that it catches every little last thing! You may notice problems with text styles and certain portions of text not getting reversed. This is a limitation of the script, as it stands, and can, most definitely, be improved upon. If it bothers you a lot, you can always use perltidy to clean up the code some ;)

Easy peasy. Enjoy :)

Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License


# bdo.pl - Use the bidirectional tag for once in your life ;)
# 2008 - Mike Golvach - eggi@comcast.net
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

if ( $#ARGV != 0 ) {
print "Usage: $0 yourwebpage.html\n";


if ( ! -f $FILE ) {
print "File $FILE does not exist! Exiting...\n";


open(FILE, "<$FILE");

open(OUTFILE, ">>$OUTFILE");
foreach $line (@file) {
if ( $line =~ /(h[12345]|span|a |br|p|div|textarea)/) {
$trans = $line;
$trans =~ s/(<(h[12345]|span|a |br|p|div|textarea)[^>]*>)/$1<bdo dir="rtl">/ig;
$trans =~ s/(<(\/h[12345]|span|a |br|p|div|textarea)>)/<\/bdo>$1/ig;
print OUTFILE "$trans\n";
} else {
print OUTFILE "$line\n";
print "\nReversed File Saved As: $OUTFILE\n";

, Mike

Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.