PodGrab – A Command Line RSS Podcast Downloader Written In Python

I’ve been using gPodder as a podcast downloader for quite a while, which is great for a desktop system like Ubuntu, but less awesome (i.e. can’t be done) for cron-jobbing on my server. What I wanted was something I could use to store my podcast subscriptions and set it up as a cron job on my server and download all my podcasts to my server whenever there was a new episode out. I had a look around on the net for an RSS-based podcast downloader for the command line and couldn’t find one. I mean, not one.
Since I’ve been meaning to learn Python for quite a while, I thought this would be the ideal opportunity to learn a bit of Python and get a useful tool out of it that I could actually use. So, without further ado, I present to you PodGrab, my Python podcast downloader.

It is very simple to use although if you’re using Windows, you may need to download some extra modules – but since this isn’t a Windows site we’ll forget about that :)

Once you’ve downloaded it, you can add a new feed URL with: –

PodGrab.py -s http://some.feed.url.xml

This will store the feed as a subscription and download the latest episode. Sometimes (as I have wanted to do in the past), you simply want to download all episodes of a particular podcast without subscribing to it. You can do this with: –

PodGrab.py -d http://some.feed.url.xml

If you want to list your current subscriptions, you can use: –

PodGrab.py -l

which will list all your current subscriptions. You can delete a subscription with: –

PodGrab.py -un http://some.feed.url.xml

The program also has the ability to mail you if you add an e-mail address, list mail addresses and such. Type

PodGrab.py -h

for a full list of command line switches. To update your podcast subscriptions when using in conjunction with cron, you simply use: –

PodGrab.py -u

and the program will check each feed for a new episode. By default, all podcast downloads are stored in a subdirectory called “Podcasts” off wherever you run PodGrab for the first time, but this is easily changed in the code if you desire. The subscriptions are all stored in a file called “PodGrab.db” which is a small SQLite database. In a future version, I may add MySQL support.

Anyway, you can download PodGrab here. I hope it’s useful to somebody. If you find any major bugs or suggestions for improvements, my e-mail address is in the code. I hope you enjoy it :-)

, , ,

Comments

  • Werner Avenant says:

    Thank you so much for this! I got sick and tired of the current solutions available for Linux (my biggest gripe was also the fact that non of the podcast downloaders had a schedule option).

    BashPodder looked OK, but OMG, the XML parsing is about as fast as 7 elephants… moving in 9 different directions.

    Your script is fast, gets the job done, and just works.

    I had 1 problem with it though. I could not easily listen to all the podcasts that downloaded the previous day. To fix this I added support for a “bydate” directory, and once the podcasts have downloaded, I softlink that day’s podcasts back to the original directory.

    I’ve never coded in python before, so apologies in advance if I did things the wrong way around.

    If anyone is interested in the file with “bydate” support, I uploaded it here

    http://www.mediafire.com/?tn899ts0pghai3h
    Total Comments by Werner Avenant: 6

    • jon says:

      Hey – glad that you liked it :-) I’d never coded in Python before either actually! When you say you had trouble with listening to podcasts you’d downloaded the previous day, can you explain further? Because I use it too (I wrote it because I needed it) and don’t seem to have encountered this issue – I’ve set the cronjob to update my subscriptions on a weekly basis as most of my podcasts issue new releases weekly. However, if it’s a problem based on file names or frequency of updates I’ll look into it and I’ll add your code (with your comments and credit intact of course!) to the “official” download release here in case I add any more features. Although I’m working on an Android version of PodGrab at the moment – again, first time coding Android so I’ll see how it goes and what functionality I can add but it looks totally doable.

      • Werner Avenant says:

        Oh… sorry I wasn’t clear on the “previous day” issue.

        I meant, it is just time consuming to go through each directory and to find the latests podcasts. My idea is that I just want to double click a directory and it should play everything that it downloaded the previous day (or more like, 2am in the morning).

        The changes I made to your original code keeps everything exactly the same, except it now adds a directory structure looking like this:

        bydate
        -> 2011-11-06
        -> {Symlink to the actual file in the podcasts directory}
        – 2011-11-05
        -> {More symlinks}
        podcasts
        -> This-Week-In-Goolge
        -> etc, etc
        Podgrab.py
        Podgrab.db

        If you have a github account you should actually post it there, as this is the most useful “Podget” command line tool out there. Otherwise I can put it there for you with a link back to your site.

        I actually want to make a change to the file I posted on mediafire. Right now the symlinks “locks you in” on a specific path, in my case it is

        @symlink = /mnt/data/Podcasts/podcasts/This-Week-in-Google/thepodcastexample.mp3

        It should actually use relative paths so that you can move everything between harddrives/server and it will still work, ie

        @symplink = ../../podcasts/This-Week-in-Google/thepodcastexample.mp3
        Total Comments by Werner Avenant: 6

        • jon says:

          Hmm….interesting usability issue I hadn’t really thought about. Based on your idea, I think I shall try and add an option to auto-symlink (based on a command-line switch so that it’ll create a directory called “latest” or something and symlink everything new there, but relative to the location of the main script. Although instead of putting each episode in each directory, you’ll have a “summary” or “latest” directory of the last “batch” of podcasts it downloaded. However, I shall attempt to use symlinks for this, else you’ll have two copies of everything that is new and I suppose if it’s a series of podcasts you want to keep, the assumption being that you’d still want every episode of each subscription in one directory. I’ve actually got another bash script that periodically wipes podcasts that are older than a certain date as a lot of podcasts I download are current events things and if I’ve not listened to them after say, three months then the information they impart would be out of date anyway, aside from some drama podcasts that I’m keeping as a set.
          I’ll have a look at your code and see if I can adapt it to what I’m thinking of. Don’t worry, you’ll get credit in the source even if the code I end up using is radically different to yours – it was your idea, after all :-)

          I don’t have a github account, but I kinda want to keep everything in one place on my own server and I’d like to have one place for people to get my stuff and point people at The Node rather than GitHub – but that’s just me being all OCD and I never really thought of PodGrab as something that people might actually be interested in enough to go to a dedicated place for it – I only wrote it because nobody else seemed to have done so! You’re welcome to branch the code off in any way you see fit of course – that being the nature of open source. Let me think about it for a bit, but you’ll get credit for any improvements based on this issue in the code notes regardless and a mention in the infrequent update posts on The Node that I post whenever I throw up a new version. Acceptable to you?

  • jon says:

    I might even add a direct link to PodGrab on the right hand side which current only has link sections to external sites that I find interesting or useful. Perhaps I should add a category for my own stuff as well :-)

  • Werner Avenant says:

    Actually, I’m going about this the wrong way. Symlinks are great and all that, but it will cut out people on Windows who cannot create symlinks.

    I had a look at the M3U fileformat, which can be used be most MP3 players. Good news it supports “a local pathname relative to the M3U file location”, as per http://en.wikipedia.org/wiki/M3U

    I went ahead and rewrote my changes to rather work on the M3U format. To pull this off I had to refactor your code a bit (basically I needed current_directory to be a global variable, and I needed the channel_title to be passed to write_podcast instead of the full channel_dir).

    You can view the code here: http://pastebin.com/48NN8DPh
    To download: http://pastebin.com/raw.php?i=48NN8DPh

    If I run the code now it leaves your directory structures exactly like you had it (I like the structure you chose), it only creates a M3U file in the directory on PodGrap.py.

    For example, today’s run produces:

    2011-10-06.m3u

    With the contents:

    podcasts/Stuff-You-Should-Know/201110062011-10-06-sysk-peace-corps.mp3
    podcasts/The-Startup-Success-Podcast/20111006StartupSuccess122.mp3
    podcasts/The-Force-Field-Podcast/20111006TheForceField53.mp3
    podcasts/60-Second-Idea-to-Improve-the-World/20111006forum60sec20110917-0900a.mp3
    podcasts/World-Business-Report/20111006wbnews20111006-1906a.mp3

    It works perfectly in Audacious and VLC media player. Tomorrow morning when I have my coffee I can simply click “2011-10-07.m3u” and my Audacious will play all the podcasts the cronjob downloaded at 2am.

    Hope I didn’t screw up the python too much! I must say it is far less complicated that what I originally thought it would be!
    Total Comments by Werner Avenant: 6

  • Werner Avenant says:

    Oops sorry I only saw this now “… but you’ll get credit for any improvements based on this issue in the code notes regardless and a mention in the infrequent update posts on The Node that I post whenever I throw up a new version. Acceptable to you?…”

    It’s really not a problem. I simply added the date and name comment one the line before each code change as it gives you an easy way to spot where I fiddled with the code. You more than welcome to remove all those comments once you are done, as they are really only meant for you.

    I’m just glad I finally have a command line Podcast downloader that actually works (and exactly the way I want it to work)! Thank you once again for publishing this!
    Total Comments by Werner Avenant: 6

  • jon says:

    Why, thank you for the praise. I’ve learnt most of the things I know from either books or blogs, so…y’know…share and share alike :-)
    As for not using symlinks because of cutting out Windows users…well. This isn’t really a Windows site, although I have mentioned tricks with Windows in connection with things like Samba in the past. The thing is, I really just didn’t see the need for PodGrab to run on Windows. 99% of users barely touch the command line under Windows because the OS really isn’t geared that way, even at the enterprise level and certainly not on the desktop, at least for most users. Also, I figured there has to be some very capable programs such as gPodder that work just fine for the environment they’re in, which is bound to the the GUI and a desktop one at that. And although Python runs on Windows, you still need to download certain module installers like numpty and install them yourself even after installing Python. Which is a hell of a bigger hassle than using a distro’s software repository via yum or apt-get because Windows simply doesn’t have that kind of functionality. I gather Windows 8 might, but it’s more likely to be something like iTunes than an online repository of necessary software libraries and drivers.
    Basically, I just didn’t see Windows users burning for that kind of command line tool based on their usual environment. If you want to adapt the code to be Windows-friendly, however, be my guest :-) As it is, the next version of PodGrab will run with either SQLite or MySQL as options to store subscriptions. Or maybe I’ll have two versions or a conf file. Either way, I’ve so much other data in MySQL, I may as well add the ability to store my podcast subscriptions there too.
    Plus something along the lines of the issue you’ve raised, natch.

  • Werner Avenant says:

    Hey Jon.

    All valid points. Keep in mind that I’m just as much of a M$ hater as you (if you look at your logs you’ll see I’m coming from an Ubuntu machine). I love Linux.

    I just figured, both symlinks and M3U files solve the problem perfectly. Both symlinks and M3U files stop you from having to go hunt down which Podcasts downloaded during the previous session.

    What I do like about the M3U session is that the whole solution is portable across systems. Like you, I don’t really care if PodGrab works in windows, I’m booted into Linux 98% of the time. But when I play games I have to boot into windows. The M3U solution allows me to listen to that day’s podcasts while gaming without a problem. Multitasking! :)

    I can also for example backup the Podcasts for the last week to a USB Drive (FAT32 ugh!) using “find -mtime” and give it to a buddy of mine (who is on a Windows machine).

    But like you mentioned, this is the exactly the spirit and drive behind open source. All about many people coming together and tinkering with code based on their own ideas. Those collective ideas improve the overall product. I’m pretty sure x % of the people using this will want symlinks only, while y % of them will want M3U files, while z % won’t need this at all.

    If you decide to go the symlink route, would it be OK if I fork this code on github? I’ll link back to this blog, the bonus for you is that a PageRank 7 (github) will link here, boosting your SEO rankings a bit.

    Otherwise thank you once again for solving a problem that has been bugging me for more than 3 years now (no jokes!)
    Total Comments by Werner Avenant: 6

  • jon says:

    Agreed, M3U file would be more elegant than symlinks, although symlinks is an obviously quicker solution :D Okay, you’ve convinced me – I’ll look into adding M3U playlist file support for the latest downloads. This may or may not be possible under Python – I don’t know because you seem to have done more research on the matter than me thus far!

    Basically, though, as it says in the code – throw it to the four winds if you like and put it on GitHub or wherever, just as long as the commented stuff at the top that I’ve added remains untouched and you link back here. If you make any changes yourself you will obviously be at liberty to add your name and any features you’ve added to the comments at the top.
    M3U support then, if it’s possible under Python and a filthy symlink relative path hack for Linux users if it’s not :-)

  • jon says:

    Actually, just saw that you’ve already added M3U support in one of your earlier posts. I’ll download the code and have a look. If it’s fine (which I’m sure it is), I’ll roll your changes into the official download and credit you. GitHub options remains your call.

  • Werner Avenant says:

    Hi Jon

    Please don’t be mad at me, 3 days ago I didn’t know anything about python and after fiddling with the code I had so much fun I just dove in and went crazy. This was very much a “learn Python exercise” and I fully admit that I was waayyy too over-eager. Like I mentioned before, I’ve been looking for something like this for 3 years now and my excitement got the better of me. Please feel free to ignore any and all changes I made.

    I decided to go the github route as it is a win-win situation for everyone and will give your blog some extra exposure. I also like the web based diff feature, it makes it easy to see what changed.

    I noticed something was up with the “last_ep” detection functions. It wasn’t immediately obvious because write_podcast would return if the file existed. I rewrote some of the “last_ep” logic. The biggest change is that last_ep is never “NULL”. If lasp_ep does not exist I save it as “Thu, 01 Jan 1970 00:00:00”. It makes date comparisons much easier

    I also refactored the itenerate_channel code and moved “MODE_DOWLOAD” to be part of the loop. The effect of this is that MODE_DOWNLOAD can “catch up” on podcasts that it missed. Some people might now want this, and the best route will be to add a commandline switch to turn “catch-up” on or off. This will give me the opportunity to play with that command line argument parser thing python offers (which looks very very nice BTW!)

    I also changed write_podcast to return “Successful Write” or “File Existed”. That allows the last_ep in the database to be updated if the file existed. This is useful if you move over from another podcast downloading system and you want Podgrab to continue where the other system left off. It also helps when you for some reason manually downloaded an episode.

    For a diff of the stuff I changed:

    https://github.com/nightowl77/PodGrab/commit/eecbfcbe577dd188b93b0b952eed883281a5e93d#PodGrab.py

    You can check the full commit history here:

    https://github.com/nightowl77/PodGrab/commits/master/PodGrab.py

    And the actual file

    https://github.com/nightowl77/PodGrab/blob/master/PodGrab.py

    Like I said, I was just having fun, you really don’t have to incorporate any changes I made, I’m just telling you about them in case you spot something that you like.
    Total Comments by Werner Avenant: 6

    • jon says:

      lol – I’m not annoyed in the least :) The code is public domain and isn’t especially revolutionary – it just fills a need. Cheers for setting up the GitHub thing. I’m looking into the code and the M3U file stuff anyway.

  • Mary says:

    Hi
    I found you awesome cli podcatcher … I know this is a bit of a newb question how do I get the email function
    to work .I know I need to setup something just not sure what

    Many thanx in advance

    Mary
    Total Comment by Mary: 1

  • Victor says:

    Just wanted to drop a line thanking you for this awesome tool. PodGrab is the embodiment of the UNIX philosophy “Do one thing and do it well”.
    Total Comment by Victor: 1

    • jon says:

      Thank you very much. There is also a fork on GitHub somewhere. I don’t know what they’ve added since I released the code. This version has all the functionality I need so I let somebody else maintain it :D

  • Peter says:

    Hello,

    first at all thanks for this script. But i figured out that it’s not downloading only the newest.
    For example yesterday i downloaded “a_podcast.mp3” so the filename is “20130711a_podcast.mp3” and today it will be downloaded again just as “20130712a_podcast.mp3”

    can there be a check for this?

    Thanks :)
    Total Comment by Peter: 1

  • Stefan says:

    Hi,

    first of all I would like to say, your program is really awesome. I use it together with a Raspberry Pi so download some audio and video podcasts to watch it on TV.

    But I have some difficulties. When the podcast feed does not end on .xml Pod Grab ist no able to download the lastest podcast.

    Can you please implement this feature to your program.

    Thx in advance.
    Total Comment by Stefan: 1

  • constantijnArm says:

    Thank you very much. Great script works perfectly on my raspi.

    cheers
    Total Comment by constantijnArm: 1

  • orangeek says:

    love your script. Just found it and works like a charm :)
    I’m actually using it to download torrent files from showrss.info and then upload to a remote seedbox. :)
    only hiccup I have is that the torrent files cannot be renamed based on the rss entry (so, torrent file is: 21088029182190821.torrent, while the entry in the RSS is ‘big bang theory s08e10’). the python script always uses the name of the torrent file instaed of RSS entry.
    regardless, i love it :)
    Total Comment by orangeek: 1

  • Pingback: Python 3: retrieving podcasts with feedburner | spacemishka

  • Karl says:

    Came across this while looking for a way to get raspberry pi (on raspbian) to auto download authenticated podcasts (ie paid for, using username & password). However looks i’ll have to continue my search to achieve this. This works fine for non-authenticated podcasts though !
    Total Comment by Karl: 1

    • jon says:

      Yeah, sorry. I didn’t know there was such a thing when I wrote it :) It’s no longer under active development by me but it’s on GitHub if you fancy adding to it.

  • John says:

    Thank you! This is almost exactly what I have been looking for.
    How can I have the script download the last 3 episodes?
    You see I listen to a radio show that is three hours long and I’d kind of like to download all three hours of a show.
    Total Comment by John: 1

    • jon says:

      PodGrab is not under active development at the moment. If you want to modify it yourself, the code is open.

Comments are closed.