random.bml
Jul. 14th, 2006 02:55 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
In a (usually futile) effort to break out of the usual LJ-clique and find new and interesting content*, I have a toolbar button that takes me to a "random livejournal" (http://www.livejournal.com/random.bml).
It just took me to
frinkle_twinkle, whose current (from September) entry is all about how it seems she gets a disproportionate number of visitors via the random journal selector: http://frinkle-twinkle.livejournal.com/
Is the random-livejournal selector broken, or is this just observational bias? (Anyone could notice some random visitors, put up a note about them, and then suddenly have lots of people saying that they, too, got there via the random journal button, and isn't that funny.) I suppose the way to test this, aside from inspecting the source code to
* actually, just to waste time.
It just took me to
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
Is the random-livejournal selector broken, or is this just observational bias? (Anyone could notice some random visitors, put up a note about them, and then suddenly have lots of people saying that they, too, got there via the random journal button, and isn't that funny.) I suppose the way to test this, aside from inspecting the source code to
random.bml
, would be to make a new journal with a post that says "Isn't it funny, so many people seem to be getting here via the random button" post, and see how many "me too" comments accrue.* actually, just to waste time.
no subject
Date: 2006-07-14 07:34 pm (UTC)no subject
Date: 2006-07-14 08:35 pm (UTC)no subject
Date: 2006-07-14 08:37 pm (UTC)no subject
Date: 2006-07-14 10:32 pm (UTC)query = "SELECT name from curPages order by random() limit 5"
So this is pairing each row with a number generated by random() when the query is run. To ensure that the columns are actually selected randomly and not merely in some order maintained by the BTree, it seems that PostgreSQL does a full sequential scan -- looking at all of the records -- to do the random() sort:
So with PostgreSQL we do a sequential scan. If we did a full sequential scan of the 10,669,375 livejournals every time an enterprising young reader clicked that button that would be seriously bad news. Even limited to LJ's user clusters you're still looking at around a million users to scan. So, a good solution seems to be to cache some of that randomly selected data someplace -- say around 5,000 random journals -- and keep it around for a little while (randomly select from it and so forth).
Another solution is to pair every row with a random value when you insert it into the database. It's poor design, but it would allow you to use that value as an index to select the page. For the amount the random feature is truly used and the low value in having true randomness this is probably a bad idea.
MySQL, which LJ uses, doesn't give very much information in their "EXPLAIN" to know what's going on in this query, but it's probably inefficient too.
So the other real no-brainer solution that works in LJ's case is to select a random number between 1 and the maximum user id and then select that record. This would work because user id's are sequential. It seems it's what LJ does, kinda:
LJ builds a table called randomuserset daily which consists of 5000 journals which are marked as 'public yo' and also have been updated in the past day interval ( ). The small set (5000) combined with the day long interval as well as the condition that the user has to update is likely to favor certain users. LJ tells me there have been around 200,000 updates in the past day, so let's assume there's 100,000 updates a day by unique users. If you update every day that's a 5% chance of being in that random pool, which actually isn't that bad at all.
no subject
Date: 2006-07-14 11:19 pm (UTC)no subject
Date: 2006-07-14 11:27 pm (UTC)no subject
Date: 2006-07-14 11:41 pm (UTC)no subject
Date: 2006-07-19 08:30 pm (UTC)most recently http://shebbybaby.livejournal.com/60884.html
no subject
Date: 2006-07-15 01:58 am (UTC)no subject
Date: 2006-07-14 11:51 pm (UTC)