DISQUS

Educer: On pubsubhubbub (Part 2) – Get with it, PuSH, you’re supposed to be realtime.

  • Amos Jones · 2 months ago
    I have the opposite problem. The "official" rssClolud server goes down and misses updates.

    I guess there are rough edges around all new servers.
  • jeremy · 2 months ago
    The two circumstances are different. There are always rough edges, but....

    The rpc.rsscloud.org server you are probably referring to is maintained by Dave with no promise of uptime (correct me if I'm wrong, obviously) as a place to test your implementation. It is possible that the server can be rebooted for changes at any time. No big company is providing a constant connection here.

    The WP plugin for rssCloud creates a server on every blog that installs it. Problems have been few and far between with this.

    The pubsubhubbub server hosted by Google has been pushed into every blogger feed and implemented heavily in FeedBurner feeds by Google. It pushes updates from multiple IPs, indicating a network of hubs that they are using to guarantee uptime. By doing this, they have told me that they are ready for real time.
  • Amos Jones · 2 months ago
    The "official" PubSubHubBub server that you are probably referring to is the server written by Bret Slatkin and deployed to Google App Engine. You can find the source for the server at the PubSubHubBub website.

    A quick browse of the source will show you that it's a simple app. There's no "network of hubs".

    All App Engine applications use multiple IP addresses for connections. The number if IP addresses used by these apps should not be used to infer a guarantee of uptime.

    By the way, an application running on App Engine can not subscribe to rssCloud because these applications do not use the same IP address for inbound and outbound connections.
  • jeremy · 2 months ago
    Yes, referring to the official PubSubHubBub server that Google employee Bret Slatkin wrote and deployed. The one that Google then Pushed to their Blogger feeds.

    Got it, no multiple hubs. But by using multiple IPs from the AppEngine network, a distributed network and uptime is inferred.

    My point is - if you're going to push yourself into millions of feeds as a solution, then you are ready to go.

    BTW, I'm not arguing against PubSubHubbub here. I want it to work. That's all that I'm getting at.
  • Amos Jones · 2 months ago
    All applications running on App Engine are forced to use multiple IP addresses.

    You should not infer that applications running on App Engine have a guarantee of distribution or uptime.

    Recent blog posts from the App Engine team indicate that applications run in a single data center at a time. The apps are single homed, not distributed across multiple data centers.

    Plugging the hub into many feeds seems like a great way to bootstrap and test the realtime cloud, even if there are some rough edges.
  • jeremy · 2 months ago
    Cool, I'm not a big AppEngine guy. Not trying to argue the architecture. My perception is that it's stable and stays up.

    Plugging the hub into many feeds is a great way to test. But, two things.

    1) When you're big like Google and you announce the implementation, the perception given to me, the developer, is that you're ready.

    2) When I, the developer, do start using it, I'll get a perception on how it's working and I'll share it.

    If others have details on how it's working for them, I'd love to share examples. I'm having fun working with both rssCloud and PubSubHubBub and coding > arguing.
  • Amos Jones · 2 months ago
    Google did not make any announcements about Brett Slatkin's hub. It's not a Google product.

    There are rough edges on all the work that's going on. We should cut everybody some slack including Brett Slatkin.
  • jeremy · 2 months ago
    Google did make announcements though.

    http://adsenseforfeeds.blogspot.com/2009/07/wha... - Google announces PubSubHubBub support in FeedBurner feeds for AdSense, notifying a "Google-run Hub".

    http://googlereader.blogspot.com/2009/08/pubsub... - Google announces PubSubHubBub support for shared items in Reader.

    http://buzz.blogger.com/2009/08/blogger-joins-h... - "All blog post feeds now contain a "hub" element, and will ping Google's hub on every post update."

    http://googlecode.blogspot.com/2009/08/towards-... - "we have gone a step further and added PubSubHubbub support to Google Alerts."

    I'm not trying to push any blame on Brett. Google owns this now and should help any issues along. I've written about some of the issues I'm seeing with Google's PubSubHubBub Hub.

    Constructive discussion about the issues I've been seeing is definitely welcome.
  • Brett Slatkin · 2 months ago
    Hey Jeremy, Thanks for reporting your experiences so far. How long was the sample period for your testing? I wonder what your results would be over the course of a week. It would be great to see some more data on end-to-end latency, retry attempts, duplicate deliveries, bandwidth, etc, especially if it were broken down by feed type.

    Otherwise, what is your subscriber's average latency for handling notifications? The reference Hub is defensive about delivering to subscribers that track many feeds and are slow to respond. So, if you're taking over 5 seconds, you may see slowdowns. It's best to process incoming notifications asynchronously if you can.
  • jeremy · 2 months ago
    Hi Brett, thanks for stopping by.

    This comment stream aside, the original post was written more as my perception than science. Rereading, it's a pretty unorganized perception. Ahh, late nights. There's more too the rambling, but if you come away with one thing from the above, it's that I don't see the FeedBurner stuff being real time as I thought it would.

    I'll be grinding through the data more closely as the week goes on. The initial conclusions are based on a snapshot look at the initial 24 hours or so of use.

    I can't answer to the latency yet, but I also can't imagine it being too high. Not a perfect answer, I know, but the server is on Amazon's EC2 and overall latency (network and system) seems low. Almost the only traffic coming in is from rssCloud and PubSubHubBub notifications. From watching Dave's rssCloud log (light pings), the time posted is usually less than .300 seconds.

    The feeds that I've noticed the most issues with are from FeedBurner. My guess is that the delay and re-pushes are due to the ping scheduling between publisher->FeedBurner->PuSH. Once publishers start pinging directly to the hub instead of relying on a middle man, I would think that these issues clear up. See previous post directed at publishers.

    The feeds that I've noticed the best response with are from Google Reader shared items. Again, perception, but things seem to run pretty smoothly here.

    My generated twitter-link feeds seem to be sporadic when done in quick succession. @mmastrac pointed out after I posted last night that this could be a "race" between the feed writing and the hub reading if things are happening quickly enough. I still need to explore that.

    You've given me a bunch of stuff to look at. I'll do what I can to start logging and parsing all of it and then provide the results. Hopefully I find a few problems with my code to fix along the way. :)
  • Julien · 2 months ago
    It's definetely certain that both technologies have rough edges, but the future is the end of polling :) BTW, you should check out http://superfeedr.com for mystatuscloud, as I am pretty sure it will be very useful for you!