Monday, April 13, 2009

A Simple Approach To Grabbing Streaming Audio Playlists With Unix Or Linux

Hey there,

Today's post is borne of the fruits of my spare-time labour ;) I've spent most of my life in the Chicagoland area (after doing a whirlwind tour of of the world as an Army brat) and have been into rock and roll since about the time I grew my fetal ears :) As everyone who lives anywhere near Chicago knows, it's got a great variety of free radio on both the AM and FM dials. Formats change, call numbers change, signal frequencies fluctuate, etc, but there's almost always something to love on at any given moment.

If you're a fan and follower of both local and world-wide heavy metal (By which I don't mean Cinderella, although you're welcome to enjoy as much glam-rock as the constitution will allow ;) you probably get as excited as I do when another decent station pops up on the dial. Although radio stations like WXRT have long since eschewed the corporate playlist, and made a good buck doing so, it's not quite so easy for heavy metal stations to do this for too long. I mean, if you think about some of the metal that Chicago stations have broken over the years (Pantera, Yngwie Malmsteen, even Loudness from Japan), you'll never hear those sorts of tunes on commercial radio. Even though I can find this direct quote on Wikipedia:

Despite the generally cold reception of the band's first four albums from the 1980s, critics have lauded Pantera's style thereafter; Jason Birchmeier of Allmusic.com states that "there was no greater metal band during the early to mid-1990s than Pantera."[2] The band has received accolades such as ranking 45th on VH1's list of the "100 Greatest Artists of Hard Rock"[3] and fifth on MTV's "Top 10 Greatest Heavy Metal Bands of All-Time."[4]


You'll still never hear any of Pantera's tunes (even re-worked to be more family-friendly ;) on regular rotation. So, when a new metal station does pop up, I like to grab as much as I can grab (legally, of course) while they're still around. Back in High School, I used to have a regular tape player hooked up to a regular receiver to tape RPM (Real Precious Metal) on WVVX for the 3 hours it was on a night so I could listen to it later. What I'm about to show you here (or begin to show you) is in the same spirit. You want to listen to the tunes (and maybe buy some albums... I mean CD's ;) and you just don't have the time to listen when it's convenient. Light rock goes over well in most office environments. Crank up Death's Crystal Mountain and half of a Lamb Of God tune and you'll probably be ejected from the premises ;)

Our assumptions for today's experiment are this:

1. You have very little idea about how to download streaming playlists. That's okay. This is basic enough that you don't need to.
2. You already have a packet capture program setup.
3. You've isolated it so that it doesn't pick up too much noise from other computers on your network (although we'll filter out everything else anyway)
4. You're ready to ROCK and it's been far too long since you've been able to do so.
5. You can bring up Rebel Radio's MySpace Page in your browser and hear the music when it starts playing. Their playlist changes fairly often, so it's a great page to hit for uncut, sometimes offensive, metal (The best kind :)

The first step is to find a source to grab from. Since pure streaming audio is the next step in this series, we're going to start with streaming "playlists." The major difference between the two (at least as far as this blog post is concerned) is that the streaming "playlist" consists of a number of individual MP3 format songs that are being streamed via HTTP. These are easier to find (since each song has a beginning and ending delimiter) and easier to download (for almost the exact same reason).

Before you set up your packet capture, try to make sure you're not doing too much other networking. Just starting a packet capture and watching it will give you a good idea of what kind of garbage is being picked up on your network interface(s) - preferably you should only be sniffing one of them.

Now, on to the music finding and harvesting:

1. The first step would, naturally, be to filter out all the extraneous network activity that's being picked up by your packet sniffer. I'm not going to do this today. Basically, I'm going to start by eyeballing what happens when I open up Rebel Radio's MySpace Page and stop the packet capture as soon as a song begins to play.

2. From seeing what I can see just by looking at what happens, I'll create this packet filter rule (my local IP is 192.168.0.3):

(tcp.port > 50000 and tcp.port < 60000) and (ip.dst != 192.168.0.3)


This rule says I only want to see packets that hit the remote port range between 50000 and 60000 and which don't have my own IP as the destination.

3. From there, I begin to filter out all the TCP streams that don't have anything to do with what I want to accomplish (I don't care about advertising streams, for instance and I want to leave Rebel Radio's Main Site alone for now - 24/7 - 1500 AM on the dial - soon to be streaming!). This results in my rule getting augmented a few times before I actually get anywhere. For completeness' sake, here are a list of the augmented filters created along the way to my first hit:

(tcp.port > 50000 and tcp.port < 60000) and (ip.dst != 192.168.0.3) and !((ip.addr eq 192.168.0.3 and ip.addr eq 216.178.33.51) and (tcp.port eq 50074 and tcp.port eq 80))
-
(tcp.port > 50000 and tcp.port < 60000) and (ip.dst != 192.168.0.3) and !((ip.addr eq 192.168.0.3 and ip.addr eq 216.178.33.51) and (tcp.port eq 50074 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 74.125.95.166) and (tcp.port eq 50072 and tcp.port eq 80))
-
(tcp.port > 50000 and tcp.port < 60000) and (ip.dst != 192.168.0.3) and !((ip.addr eq 192.168.0.3 and ip.addr eq 216.178.33.51) and (tcp.port eq 50074 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 74.125.95.166) and (tcp.port eq 50072 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 63.135.80.46) and (tcp.port eq 50071 and tcp.port eq 80))
-
(tcp.port > 50000 and tcp.port < 60000) and (ip.dst != 192.168.0.3) and !((ip.addr eq 192.168.0.3 and ip.addr eq 216.178.33.51) and (tcp.port eq 50074 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 74.125.95.166) and (tcp.port eq 50072 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 63.135.80.46) and (tcp.port eq 50071 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 74.125.95.155) and (tcp.port eq 50073 and tcp.port eq 80))


4. We'll dump some more stuff, with more filters, too, since a lot of misdirection happens here and the end is obscured by hand-offs and referrals. I only want the "end product":

(tcp.port > 50000 and tcp.port < 60000) and (ip.dst != 192.168.0.3) and !((ip.addr eq 192.168.0.3 and ip.addr eq 216.178.33.51) and (tcp.port eq 50074 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 74.125.95.166) and (tcp.port eq 50072 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 63.135.80.46) and (tcp.port eq 50071 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 74.125.95.155) and (tcp.port eq 50073 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 91.199.100.2) and (tcp.port eq 50075 and tcp.port eq 80))
-
(tcp.port > 50000 and tcp.port < 60000) and (ip.dst != 192.168.0.3) and !((ip.addr eq 192.168.0.3 and ip.addr eq 216.178.33.51) and (tcp.port eq 50074 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 74.125.95.166) and (tcp.port eq 50072 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 63.135.80.46) and (tcp.port eq 50071 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 74.125.95.155) and (tcp.port eq 50073 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 91.199.100.2) and (tcp.port eq 50075 and tcp.port eq 80)) and !((ip.addr eq 192.168.0.3 and ip.addr eq 64.94.107.16) and (tcp.port eq 50076 and tcp.port eq 80))


5. The next hop (which may not be your next step) reveals an interesting header (Remember that today we're not automating anything. We're actually looking at each stream and determining if there's anything worthwhile to pursue. I promise we'll automate this in a future post so you can do it for any streaming mp3 playlist you like without busting a capillary :)

FULL STREAM CONTENT:

Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; User-agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://bsalsa.com) ; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; Media Center PC 5.0)
Host: www.greatprofilemusic.com
Connection: Keep-Alive

HTTP/1.1 302 Found
Server: nginx/0.6.35
Date: Mon, 13 Apr 2009 01:37:08 GMT
Content-Type: text/html; charset=iso-8859-1
Connection: close
Location: http://www.he.playlist.com/mc/mp3player.swf?tomy=http://www.greatprofilemusic.com/mc/config/config_black_shuffle.xml&mywidth=435&myheight=270&file=http://www.greatprofilemusic.com/loadplaylist.php?playlist=6240117
Content-Length: 499

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://www.he.playlist.com/mc/mp3player.swf?tomy=http://www.greatprofilemusic.com/mc/config/config_black_shuffle.xml&mywidth=435&myheight=270&file=http://www.greatprofilemusic.com/loadplaylist.php?playlist=6240117">here</a>.</p>
<hr>
<address>Apache/2.2.3 (CentOS) Server at www.greatprofilemusic.com Port 80</address>
</body></html>


That may look a little confusing, but that's just because it's two Internet servers talking to each other and, in much the same way people who don't speak your native tongue don't bother translating for you when conversing with one another, it doesn't matter to them if you understand what they're talking about or not.

So why is this header interesting? To break it down simply:

a. It has the 302 status displayed. This means the the page (or source) has moved. It literally says this just a few lines down.

b. The new location is listed at the end of the HTTP headers in the "a href" declaration:

http://www.he.playlist.com/mc/mp3player.swf?tomy=http://www.greatprofilemusic.com/mc/config/config_black_shuffle.xml&mywidth=435&myheight=270&file=http://www.greatprofilemusic.com/loadplaylist.php?playlist=6240117


Of course, just plopping that address into your web browser's field won't get you much (you'll actually need to download the end product anyway, unless your web browser has an MP3 Player plugin), and, as noted above, we're looking for the most elementary way to get the playlist; not necessarily the most efficient or technically proficient ;)

d. If you tried to plop it in there anyway, you either got an error returned or you're using a web browser I would love you to turn me on to ;) Here is the simple derivation of the URL above that I went through to hit the actual XML version of the playlist

http://www.he.playlist.com/players/642a/mp3player.swf?tomy=http://www.greatprofilemusic.com/mc/config/config_black_shuffle.xml&mywidth=435&myheight=270&file=http://www.greatprofilemusic.com/loadplaylist.php?playlist=6240117

http://pl.playlist.com/pl.php?playlist=6240117&time=20090412194024


and there it is in two steps!

6. Now, we'll take the known-good XML playlist and use wget to print out a list of all the songs on the playlist. A simple command like this will print it to your terminal (don't forget to backslash the shell-special ampersand character):

host # wget -O - http://pl.playlist.com/pl.php?playlist=6240117\&time=20090412194024 2>&1|grep http|grep "\.mp3"|sed 's/<[^>]*>//g'|wc -l
http://www.mentalsuplex.com/music/halo.mp3
http://www.fileden.com/files/2008/4/15/1868122/03-five_finger_death_punch-salvation.mp3
http://vans.edgeboss.net/download/vans/warpedtour/throwdown/throwdown_holyroller.mp3
http://www.mp3-host.com/uploads/a144a.0e2bb.mp3
http://bullyg.fatcow.com/mp3/straighthate.mp3
http://www.slowreaction.com/insane/music/sc-funeral.mp3
http://www.roadrunnerrecords.com/shared/downloads/Chimaira/Chimaira-Split.mp3
http://mp3.centurymedia.com/Skinlab_03_slavetheway_revoltingroom.mp3
...


The list is about 66 or 67 tunes long.

7. Now, we want to grab all of these and save them so we can listen to them at our leisure. Of course, if you know your metal, you can just pick the ones you like and download those :) Note also that the XML file we're pulling these MP3 locations from contains much more information about each song, so (if you want) you can parse the file to name the downloaded MP3's with extra artist information, group them in appropriate folders, etc. All the extra stuff that we're not going to play with today)

NOTE: The time is actually recorded in this URL. The only reason I note this is that the receiver may reject your request if the timestamp is "too" far off (probably not, but maybe). It should be noted, also, that the time is not "UNIX" time. It's simply the date and time the referrer requested the page (Approximately 7:40pm - and 24 seconds - on April 12th, 2009)

host # $ while read line;do wget $line;done <<< "`wget -O - http://pl.playlist.com/pl.php?playlist=6240117\&time=20090412194024 2>&1|grep http|grep "\.mp3"|sed 's/<[^>]*>//g'`"


8. As a final test, open up any of the mp3 files you just downloaded and enjoy :) If some of them don't work, check to make sure there aren't any non-standard characters in the URL request, or that our simple expression hasn't rent some requests into chunks or not removed a few bogus lines that it should have. Remember, we're going for fast and not perfect today. Some of the direct URL's may deny you if you don't include Referrer headers, as well. In any event, you'll get most of them!

From this straight-up experiment, I was only not able to grab 7 of them on the first try (one of which was an invalid page)

As I noted, we'll get into this more in-depth (as well as covering real streaming audio - by which I mean audio that isn't split up by song and loses its tail as the head creeps further along)

Enjoy the ROCK and, as always, cheers,

, Mike




Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!



Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Excellent Web Hosting At An Affordable Price!
www.authorswebhosting.com
These Are The Guys I Use To Host My Online Presence!