Twitter API - pagination and IDs
Looking for some Twitter API help. Bit of a geeky post, this...
Pagination is the act of splitting data into logical pages. Suppose I had a list of item, numbered 0 - 99. If I want 20 items per page, it's trivial to see that pagination looks like:
p1 = 0-19 p2 = 20-40 p3 = 41-61 p4 = 62-82 p5 = 83-99
If I wanted to start at, say, page 55 - pagination would look like:
p1 = 55-75 p2 = 76-96 p3 = 97-99
Easy, right? So why am I telling you this?
Twitter Timeline
Imagine that those items are Twitter Status ID. Each one represents a tweet in your timeline.
Twitter will allow us to "page" back and forth through our timeline. If we say status ID 80 is the most recent post in our timeline, and we want to see 20 tweets at a time, pagination would look like this.
p1 = 80-60 p2 = 60-40 ... etc.
Normally, that would be fine.
The only issue is that friends are posting all the time. Imagine we start with tweets 80-60. We go to page 2, but in the meantime, 5 new tweets have been made.
p1 = 80-60 p2 = 65-45
The user sees 5 tweets she has already read. Not desirable.
If 20 tweets had been made before clicking on the "next" button, this is what happens.
p1 = 80-60 p2 = 80-60
Max_ID To The Rescue (AKA, the easy bit)
Luckily, Twitter allows us to use a max_id parameter in our API calls. This says "Get the tweets older than this number."
http://api.twitter.com/1/statuses/home_timeline.json?max_id=123465789
So, using max_id we can ensure that the user never has to read the same tweet twice. Instead of dumbly using pages, we call the specific tweets we want.
p1 max_id=80 = 80-60
p2 max_id=60 = 60-40
Easy! We just use the oldest tweet on the page as the max_id parameter when we call the next page.
Looking To The Future (AKA, where it all goes horribly wrong)
So far, we've looked at stepping back in time. Seeing older tweets. Suppose we want to see newer tweets?
Twitter provides us with a since_id parameter for API calls. This says "Get the tweets newer than this number."
Unfortunately, it doesn't work. Well, it works but not the way I expected it to!
Suppose our user is deep down in her tweets, this is how I would expect since_id to work
max_id=60 = 60-40
(So, let's show any more recent tweets)
since_id=60 = 80-60
We see the 20 tweets that occured since the since_id. Right? Wrong! This is what happens?
max_id=60 = 60-40
(So, let's show any more recent tweets)
since_id=60 = 100-80
What?
An Explanation
The since_id retrieves tweets starting with the most recent. It stops when it reaches the since_id.
I don't know the max_id that I'm looking for, so I can't call that.
I could call the most recent 200 tweets and look for the 20 I need. That's wasteful in terms of bandwidth and processing - there's also no guarantee that the since_id will be in there.
So, I have a problem. The "Older" link in my Twitter application will work. The "Newer" links won't.
Any suggestions?
Kai Hendry says:
I think I share the same confusion with since_id whilst writing a backup script:
http://twitter.natalian.org/hgweb.cgi/file/tip/fetch-tweets.sh
We should get together and discuss this. I am thinking of launching a twitter backup service with search. See http://twitter.natalian.org/ for a prototype and let me know what you think!
Mark Pack says:
Kludge of a workaround, but if I was using a Twitter app with older links that work but newer ones that didn't, I'd start off with the most recent tweets, happily use the older link to scroll back through tweets and on the rare occasions I want to go back to newer tweets, curse the broken newer button - and use the browser's back button instead to return to the pages I'd viewed earlier which had the newer tweets.
So (a) having only a working older link isn't that bad, and (b) rather than pulling data from Twitter, couldn't you use that cached information that the back button uses?
Terence Eden says:
An excellent suggestion. Dabr's users primarily browse on a mobile. So the back button is often hidden in a menu. Lack of JavaScript support in many phones means I can't use .go(-1).
I suppose I could make the "newer" button call the same URL as the previous page - and the home page if none existed - but that means that the user journey could be 50-60, 60-70, then 95-105. Missing out all those tweets inbetween. Meaning you have to go to the newest and work back rather than sequentially through.
Rafael says:
Hi, I'm a Dabr user (@rluik) I'd like to say I somewhat like to see tweets I already saw from the "newer" page in the "older" page, it tells me someone posted a tweet while I was reading old tweets so I know when to go "home" to see these new tweets...
Terence Eden says:
Try it for a few days and let me know what you think. While I agree that reading 5 old tweets means there are some new ones to view, I find it more annoying to have half my page load time taken up with tweets I've already read.
Paging back 20 at a time feels more relaxed to me - I'm not always being reminded that there are new ones to read.
Thanks
T
Rafael says:
Actually it isn't working. First, there's a "newer" link in the homepage and no new tweets. Second, when I clicked older two times I got a tweet that I already seen before (someone posted a new tweet and pagination didn't worked).
Jon Price says:
I prefer Twitter clients that don't use pagination per se, instead lazy loading older and newer tweets when I scroll to the bottom or top of the list. does that model avoid challenges of since_id? is that possible without JavaScript?
Terence Eden says:
It's not possible without JavaScript - so won't work on many phones.
You still need the since_id to know where to load from.
Riccardo says:
Hi Terence, I just got this post and it is exactly what I'm facing..any solution? thanks
Terence Eden says:
Not that I've found. Sorry.
B Zion says:
An idea I haven't tried yet ---
What if we save previous pages newest item's id_str in a stack. Then we call the api with BOTH the desired since_id AND the stored max_id from the stack.
This should likely bracket the desired range.
B Zion