Monday, December 22, 2008

Finding Your MSN/Live Index Rank From The Unix Or Linux CLI

Hey There,

Some of you may recall, a while back, when we did a post on how to find your Google search index rank from the CLI. In that post we took a look at another aspect of site SEO monitoring, where (rather than looking at Google's proprietary "PR" ranking system) we just went out and tried to fake being a human being entering keywords into the search page and then figuring out what result our desired URL actually showed up at (first - we stopped after the first positive result returned, since Google - like most places - will cut you off for, at least, a little while if they suspect you are a non-human ;).

Today's script is a variant on that, although we're going to be trudging through the landscape of MSN/Live's search engine.

IMPORTANT NOTE: Although this warning is on the original Google search rank index page, it bears repeating here and now. If you use wget (as we are in this script), or any CLI web-browsing/webpage-grabbing software, and want to fake the User-Agent, please be careful. Please check this online article regarding the likelihood that you may be sued if you masquerade as Mozilla.

This MSN script, of course, is only slightly different than the original Google script, although the differences are significant enough that I rewrote entire portions so I wouldn't have to back-translate code and then convert it and hope it worked ;) The script, itself, operates the same way our Google search index page rank script works, insofar as executing it from the command line goes. There are, at least, three different ways you can call it. The most basic being:

host # ./mrank www.yourdomain.com all these key words

It doesn't matter if they're enclosed in double quotes or not. If you "really" want to get the double quote experience, you just need to backslash your double quotes.

host # ./mrank www.yourdomain.com \"all these key words\"

Other ways include creating files with the URL and keyword information (same format as the command line) and feeding them to the script's STDIN:

host # cat FILE|./mrank
host # ./mrank <FILE



There is a little bugger that I need to remove from this script that incorrectly returns position #1 as #0 sometimes, but that should be easy to fix. I'd do it if I had the time. In fact, when I do, I'll repost the script and just include a notice in whatever that day's post is.

Now for the pictures :) Following are a few shots of MSN index rank checking as compared to Google index rank checking. Although most people will tell you it should be the opposite, it seems that we have a much better presence on Google than on MSN, after being in business (read: shamelessly self-promoting by answering question in forums, writing articles and submitting a billion sitemaps) for a year or so.

Following those, an EVEN BETTER set of pictures that made me scratch my head and mutter "WTF?" You'll see why. Let's just say that this site ranks lower on MSN for a certain keyword phrase than other sites that point to the post AND the first time our site comes up in that "double quoted" exact-string search, it's for an entirely unrelated post (???) Who has the time to wonder? ;) After that, way down at the bottom, we've somehow managed to remember to tack on the script.

By clicking any of the pictures below you are consenting to the use of our custom shrink ray, which will make each picture appear larger than normal, until you come back to the site ;)

mapquest cli search

adobe cli search

zombie search

And here's the MSN "managing swatch output" double-quote search that I still don't get.

swatch cli search

swatch search not our urls

And here we are, down a bit farther than any references to our post, and listing a page that doesn't contain the post, but probably has it listed in the blog archive on the sidebar ;)

swatch search our url

And, at long last, here's the script. Enjoy and have fun re-tooling it. Just be sure to double-check MSN to make sure you're not trying to fix a problem with your index ranking that exists by design ;)

Cheers,


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

#!/bin/bash

#
# mrank - Get your MSN Live Search Ranking Index
#
# 2008 - Mike Golvach - eggi@comcast.net
#
# Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
#

if [ $# -lt 2 -a $# -ne 0 ]
then
echo "Usage: $0 URL Search_Term(s)"
echo "URL with or with http(s)://, ftp://, etc"
echo "Double Quote Search If More Than 1 Term"
exit 1
fi

if [ $# -eq 0 ]
then
while read x y
do
url=$x
search_terms=$y
$0 $x "$y"
done
exit 0
else
url=$1
shift
search_terms=$@
fi

base=0
num=0
start=1
multiple_search=0
not_found=0

for x in $search_terms
do
if [ $multiple_search -eq 0 ]
then
search_string=$x
multiple_search=1
else
search_string="${search_string}+$x"
fi
done

echo "Searching For MSN Index For $url With Search Terms: $search_terms..."
echo

num_results=`wget -q --user-agent=Firefox -O - http://search.msn.com/results.aspx?q=${search_string}\&first=${start}|awk '{ if ( $0 ~ /of [0-9,]* results/ ) print $0 }'|sed 's/^.*of \([0-9,]*\) results.*$/\1/'`

while :;
do
if [ $not_found -eq 1 ]
then
break
fi
wget -q --user-agent=Firefox -O - http://search.msn.com/results.aspx?q=${search_string}\&first=${start} 2>&1|sed 's/<a href=\"\([^\"]*\)\"[^>]*>/\n\1\n/g'|sed -e :a -e 's/<[^>]*>/ /g;/</N;//ba'|sed '1,/See all.../d'|grep http|egrep -v 'cc.msnscache.com' 2>&1|sed '/search.live.com/d' |sed '/search.msn.com/d'|sed '/ocid=/,$d'|sed '/Developers | Help | Feedback/,$d'|awk -v num=$num -v base=$base '{ if ( $1 ~ /^http/ ) print base,num++,$NF }'|awk '{ if ( $2 < 10 ) print "MSN Index Number " $1 $2 " For Page: " $3; else if ( $2 == 10 ) print "MSN Index Number " $1+1 "0 For Page: " $3;else print "MSN Index Number " $1 $2 " For Page: " $3 }'|grep -i $url
if [ $? -ne 0 ]
then
let start=$start+10
let nexthopper=$start-1
if [ $nexthopper -ge 100 ]
then
not_found=1
if [ $not_found -eq 1 ]
then
break
fi
fi
let base=$base+1
num=0
else
break
fi

let sleep_time=${RANDOM}/600
echo "Not In Top $nexthopper Results: Sleeping $sleep_time seconds..."
sleep $sleep_time
done

if [ $not_found -eq 1 ]
then
echo "Not Found In First 100 Index Results!"
echo
fi

echo "Out Of Approximately $num_results Results"
echo
exit 0


, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.