Monday, January 7, 2008

Sed Manpage Markup Language Translation Script

As I note in the header of this script it's somewhat imperfect. In fact, for the most part it may be almost completely obviated by the last few years of advancements in OS translation of the manpage itself. It is, however, another example of just how crazy you can get with sed.

Back in the day (and I still see this from time to time on some current Unix and Linux systems), if you redirected the output of "man whatever" to a file, like so, in order to get the manpage without using sed, or some other information-massager:

man whatever > OUTPUTFILE

you'd end up with a script full of garbage. Not that it was completely unreadable, but you'd have to filter out all the markup language graffiti on your own (in your head, if possible ;)

That's what prompted me to begin work on stripping down a garbage-output manpage to a simple, and easily readable, file using sed. You may notice, if you need to do this kind of translation and use this method, that a few markup language remnants remain. Feel free to embellish this work to remove them as well. The final section of the script is the easiest place in which to do this. Which also answers the question: Why so many individual replacements and not just one compact regular expression? As noted above, this is a work in progress that I'm trying to make work for a lot of different flavors (and versions) of Linux and Unix. Once I feel that I've run down every possible marker, I'll make it a lot tidier and wrap it up with a bow ;)


Creative Commons License

This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

# Semi-Imperfect Man-Source To Regular-File Translator
# 2008 - Mike Golvach -
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

echo >> $1.txt

sed -e '1,/NAME/d' -e '/^\.\\"/d' -e 's/^\.if//' -e '/^\.TH/d' -e '/^\.de/d' -e '/^\.ds/d' -e '/^\.ll/d' -e '/^\.in/d' -e '/^\.ti/d' -e '/^\.ie/d' -e '/^\.br/d' -e '/^\.el/d' -e 's/^\.\.//' -e 's/^\.SH\(.*\)/\1\
/' -e '/^\.nr/d' -e '/^\\f/d' -e '/^\.}f/d' -e '/SYNOPSIS/i\
' -e '/COPYRIGHT/i\
' -e '/SEE ALSO/i\
' -e 's/\.PP//' -e '/^\.PD/d' -e 's/^\.BR//' -e '/^\.BI \\/{
s/\.BI \\/ /g
s/\n/ /g
}' -e '/^\.B \\/{
s/\.B \\/ /g
s/\n/ /g
}' -e '/^\.B/{
s/\.B/ /g
s/\n/ /g
}' -e '/^\.IR/{
s/\n/ /g
}' -e 's/\.SB//' -e '/\.TS/d' -e '/\.TE/d' -e 's/^\.IX//' -e '/\.LP/d' -e 's/\.TP *[0123456789]*//g' -e 's/\\fB//g' -e 's/\\fR//g' -e 's/\\fP//g' -e 's/\\fI//g' -e 's/\\| *//g' -e 's/ n //g' -e 's/ t //g' -e 's/\\^ *//g' -e 's/\\//g' -e 's/\.FN//g' -e 's/\.I//g' -e '/^\.SM/d' -e 's/^\.RS//' -e 's/^\.RE//' -e 's/\.SS//g' $1 >> $1.txt

, Mike