Tuesday, October 28, 2008

Linux/Unix Shell Script To Find Your Google Page Rank

Hey There,

Today's entry may remind you of our older post on finding your Google index rank, but (aside from the word "rank" ;) it's a whole separate topic. Fortunately, or unfortunately, we still haven't put up enough posts to forget any of them yet :)

While our index rank script took a URL and a search term (or terms) and returned your relative index rank at that particular moment in time (e.g. It would let you know that your website was the 435th listing in a search for "fig trees," or something of that nature), this script focuses, simply, on the Google Page Rank (PR) of any specific URL. "Simply," as you'll see, is a matter of perspective. For instance, our script for today is fairly easy to follow. The actual deconstruction of Google's checksum algorithm (a "very" important piece of information if you want to get your Google PR) is somewhat complicated.

The URL checksum algorithm, itself, can be found on the web in PHP, ASM, C, C++, Perl, Python and many other variations, but no "shell" version of the algorithm exists (that we know of). And, that, (even though it will be a major PITA ;) is going to give us something to do for a while. For today's script, we are using a version we compiled from the (unmodified) source code of pagerank.c. This code is freely available via the preceding link and its author is only listed as the base of the URL in that link. We would like to thank http://zhiwei.li/ for writing this code and making it publicly available. Without it, we might have gone the easy route and used Perl's WWW:Google:PageRank module, saving a few hairs in the process ;)

NOTE: We've attached the pagerank.c code at the end of this post, after our bash script. We've also compiled it, and put it up for download on a separate server, for Cygwin kernel 1.5.25 and Ubuntu kernel 2.6.24-21 (Right click on either link and choose "Save As," since neither file has an extension, but both are binaries :) If you would like the binary compiled for your particular distro, send us an email (top right link) and we'll help you if it's possible :) If you prefer to build the binary from the attached source for yourself, you can do that with gcc (or your compiler of choice) very simply, as it requires no "special" arguments (just the c code file and an output file name), like so:

host # gcc -o pagerank pagerank.c

and you'll have the "pagerank" binary that we call from our script.

The script itself is even easier to use. If you want to modify it, note that the "pr_checksum_prog" variable is set to "./pagerank". If you've compiled the source to a different name or have placed it somewhere other than the same directory from which you run our script, that line will need to be modified accordingly (or, if you've downloaded from our alternate site, just copy your distro's binary from "cygwin1525_pagerank" to "pagerank" (or "ubuntu262421_pagerank" to "pagerank"). Assuming you've built (or downloaded) the binary, have it named "pagerank" and have placed it in the same directory as our script, you just need to do this to get any URL's Google PR:

host # ./prank.sh http://www.google.com
Google PR For http://www.google.com = 10
<-- Is it just us, or does this result seem biased? ;)

And that's all there is to it :) We promise we'll get back to converting the C checksum code into shell script as soon as possible. Google "does" change their URL checksumming algorithm from time to time, so, if you're reading this in the distant future, the code on this page may no longer be of any value. If (or, more realistically, "when" ;) it changes, we'll try to keep on top of it and publish the updated source.

Hope you enjoy this, or find some good use for it :)

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/bash

#
# prank.sh - Find any URL's Google Page Rank (PR)
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

if [ $# -ne 1 ]
then
echo "Usage: $0 HttpOrHttpsURL\n"
exit 1
fi

pr_checksum_prog="./pagerank"

if [ ! -x $pr_checksum_prog ]
then
echo "Cannot Find Checksum Program: $pr_checksum_prog !"
exit 2
fi

wget=/usr/bin/wget
prank_url=$1
mod_prank_url=`echo $prank_url|sed -e 's/:/%3A/g' -e 's/\//%2F/g'`
prank_checksum=`$pr_checksum_prog $prank_url|sed 's/Checksum=//'`

prank_qurl="http://toolbarqueries.google.com/search?client=navclient-auto&ch=${prank_checksum}&ie=UTF-8&oe=UTF-8&features=Rank&q=info:${mod_prank_url}"

echo -n "Google PR For $prank_url = "

$wget -nv -O - "$prank_qurl" 2>&1|grep "Rank_"|sed 's/Rank_[0-9]:[0-9]://'
exit 0



pagerank.c Unmodified Source Code


/******************************************************************************
Filename : pagerank.c
Description : Google PageRank Checksum Algorithm
Author : http://zhiwei.li/
Log : Ver 0.1 2005-9-13 first release
Ver 1.0 2005-10-19 fixed :final character bug
Ver 1.1 2006-10-05 refine code
Ver 1.2 2008-8-20 use boolean type
******************************************************************************/

#include <stdio.h>
#include <stdbool.h>

int ConvertStrToInt(char *pStr, int Init, int Factor)
{
while (*pStr) {
Init *= Factor;
Init += *pStr++;
}
return Init;
}

int HashURL(char *pStr)
{
unsigned int C1, C2, T1, T2;

C1 = ConvertStrToInt(pStr, 0x1505, 0x21);
C2 = ConvertStrToInt(pStr, 0, 0x1003F);
C1 >>= 2;
C1 = ((C1 >> 4) & 0x3FFFFC0) | (C1 & 0x3F);
C1 = ((C1 >> 4) & 0x3FFC00) | (C1 & 0x3FF);
C1 = ((C1 >> 4) & 0x3C000) | (C1 & 0x3FFF);

T1 = (C1 & 0x3C0) << 4;
T1 |= C1 & 0x3C;
T1 = (T1 << 2) | (C2 & 0xF0F);

T2 = (C1 & 0xFFFFC000) << 4;
T2 |= C1 & 0x3C00;
T2 = (T2 << 0xA) | (C2 & 0xF0F0000);

return (T1 | T2);
}

char CheckHash(unsigned int HashInt)
{
int Check = 0;
bool Flag = false;
int Remainder;

do {
Remainder = HashInt % 10;
HashInt /= 10;
if (Flag){
Remainder += Remainder;
Remainder = (Remainder / 10) + (Remainder % 10);
}
Check += Remainder;
Flag = !Flag;
} while( 0 != HashInt);

Check %= 10;
if (0 != Check) {
Check = 10 - Check;
if (Flag) {
if (1 == (Check % 2)) {
Check += 9;
}
Check >>= 1;
}
}
Check += 0x30;
return Check;
}

int main(int argc, char* argv[])
{
unsigned int HashInt;

if (argc != 2) {
printf("Usage: %s [URL]\n",argv[0]);
return 1;
}

HashInt = HashURL(argv[1]);
printf("Checksum=7%c%u\n", CheckHash(HashInt), HashInt);
return 0;
}




, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.