Friday, April 6, 2007

Help needed on Twitter

This is a post about my continued obsession with Twitter. If you are aren't enamored with the product, bail now. Even if you are, understand that this post is of a technical nature. Hence, I'm hiding it behind the "more" link.



Background

Days ago I mentioned the "@[username]" convention and lamented the fact that this wasn't a direct message. But perhaps it shouldn't be, as many have commented, because they enjoy the "public" exchange that two folks can have using this convention. I'll conceded that.

I've also complained that there isn't an easy way to search through Tweets to based on keywords. Unrelated as those two may seem, I'll join them together with deft skill in a moment.

Side Panels

For the purposes of this discussion, there are three different "feeds" you can get from Twitter:

  • The feed of Tweets from your personal list of Friends and Followers
  • The public timeline feed
  • A feed of Tweets from any given user of Twitter (assuming they make it public)


I'm less interested in the latter, though the discussions here should be applicable to that instance as well.

RSS feeds can be accessed and added to a feed reader for any, but that's problematic (at least for me) for one simple reason: Any feed from Twitter shows a snapshot in time of the 20 most recent items. My feedreader is set to poll feeds every half hour. Tweets don't stay on my F&F feed or the public timeline feed for much more than 10 minutes. Ergo, I miss posts.

Gardening Implements

The whole "@[username]" isn't as universal as you might think. I've seen messages directed at me with the following prepends:

  • @evo_terra (this one is correct)
  • @evo
  • @Evo
  • @Evo Terra
  • @evoterra
  • @ evo_terra
  • @ evo
  • @ Evo
  • @ Evo Terra
  • @ evoterra
  • ... and there are probably some I've missed. Ah, the struggles of consistent data.


Pegboard

I don't want to read every Tweet on Twitter. Can't. Don't care who you are. And I don't even try. But I sure would like to get a comprehensive list off all the times someone said something to me that day (see above). No, I don't feel obligated to respond -- but I want to make that choice.

I played around with Yahoo! Pipes to try and get me where I want to go, but I'm not happy with the results for the following reasons:

  1. The Pipe is still a snapshot in time and doesn't dump out to a file. Hence, I lose Tweets.
  2. It doesn't work. At all. :(

So I'm betting one of you out there can solve my problems above. The Pipe is publicly available and I would really appreciate someone helping me tweak it. Less helping and more making it work. But don't forget about my "slice of time" issue. I want a 24 hour rundown of all the Tweets matching the parameters I chose (they are all those names up top). Do I get it emailed to me? Do I subscribe via RSS and it just builds up over time? Usability engineers are welcome to chime in as well.

6 comments:

  1. Heya - you don't know me from Adam, but I've been listening to a handful of your podcasts for a year or so now and have dropped comments here or there. Also, I've written a book on Hacking RSS and Atom. (That's the title.)

    Since Twitter doesn't really (yet?) have a way to send out notifications based on arbitrary keywords, what's probably the only solution is to script up a feed fetcher that polls the Twitter feed at a rate of once every few seconds to catch updates before they fall off the end of the feed. The rate is entirely dependent upon just how must usage Twitter gets as a whole and how fast updates travel through that 20 update window.

    The problem with that though, is that Y! Pipes for sure won't do that for you, and the folks at Twitter itself might notice the huge number of hits (as many as 20-80k a day for a hit every few seconds) from a single address and decide to block access for suspicion of launching a denial of service attack.

    That all said, though - once you do have a constant stream of tweets flowing by, it's not so hard to match on keyword or whatnot and then deliver the results in pretty much any form you want.

    ReplyDelete
  2. Actually, now that I look at the feed, it's worse than I thought. The Twitter public timeline feed is *not* a live firehose - it seems only to change every 60 seconds or so, from a rough check I just did. So, if you fetch it repeatedly over the course of a minute or so, it never changes.

    So, at worst, that means it's just a snapshot of the last 20 updates from the last 60 seconds - possibly missing a bunch. At best, maybe it means they save up 20 updates before changing the feed, and there just happen to be about 20 updates a minute from the world. (Though, somehow, I doubt that last one.)

    What this means is that, fetching the feed once a minute, you'd have a good chance of catching some tweets with your keywords but would be likely to miss a bunch if the site gets more than 20 updates a minute.

    ReplyDelete
  3. I'm not sure you can do exactly what you want, but using the twitter API you might be able to at least pull your follower list, and then pull the feeds of your followers (I'm assuming that people who don't follow you won't address tweets to you).

    You can still only pull 20 hits per person, but it's probably a bit closer to what you want.

    Jay

    ReplyDelete
  4. @Jason: Oh, hey, now that's a good idea! Don't drink from the firehose, but from the followers' feeds. You won't get the world that way, but you'll be very likely to get most of the part that cares about you.

    ReplyDelete
  5. Yes, the twitter API leaves much to be desired. The 20 updates per minute is very limiting. For instance, the twittermap misses many updates because there is rarely 20 or fewer updates per minute on the public timeline.

    Seems like the only way to do it within the current API is to use html instead. You can get up to 24 hours worth of tweets if you grab http://twitter.com/user/with_friends (Basic HTTP Auth works, to see friends who's tweets aren't public). You'd have to parse it all out (not as easy or clean as xml/rss/etc.) and watch for the "next" links and get/parse the subsequent pages, but at least the info is available...

    I tried page=2 on various calls in the official API but it doesn't work anywhere.

    If this seems like the way to go, I could write a fairly quick parser in perl that would take the html and output something a little more manageable. Perhaps with a date param so we don't have to go back more pages than we have to.

    Vern

    ReplyDelete
  6. I just got pointed to this post by Froosh on Twitter. I am the author of TwitBox a Windows client for Twitter and I am currently working on new features that I can pull from the API - like search for user - to providing enhancements within TwitBox itself to make using Twitter a little more personable.

    One of the things I will be working on which may not meet your needs for "attention" it might help. Basically it will allow you to define color codes for tweet priorities.

    In the meantime I'll keep an eye on this comment thread incase any suggestions come up that I think I can add.

    ReplyDelete