Sunday, June 4, 2017

Reddit Users Are A Subset Of Reddit Users

An interesting blog post on view counting at Reddit reminded me of an old post I wrote almost ten years ago. Reddit engineer Krishnan Chandra wrote:

Reddit has many visitors that consume content without voting or commenting. We wanted to build a system that could capture this activity by counting the number of views a post received.

He goes on to describe the scaling and accuracy problems in this apparently simple goal. But he never addresses a few really weird assumptions the Reddit team appears to have made: that every Reddit user is logged in, or even has an account in the first place; that every Reddit user has only one account; and that every account is only ever used by one person.

As I wrote back in the ancient days of 2008:

Assuming a one-to-one mapping between users and people utterly disregards practical experience and common sense. One human could represent themselves to your system with several logins. Several humans could represent themselves as one login. Both these things could happen. Both these things probably will happen.

In other words, "users" aren't real. Accounts and logins are real. The people who use those accounts and logins are also real. But the concept of "users" is a social fiction, and that includes the one-to-one mapping "users" implies between accounts and people. In some corners of the vast Reddit universe, creating more than one "user" per account is as common as it is in World of Warcraft.

As I say, this is all from a blog post back in 2008. More recently, in 2016, I decided that the best way to use Reddit (and Hacker News as well) is without logging in. In other words, the best use case is not to be a "user":

reading Reddit without logging in is much, much more pleasant than being a "user" of the site in the official sense. The same is true of Hacker News; I don't get a lot out of logging in and "participating" in the way that is officially expected and encouraged. Like most people who use Hacker News, I prefer to glance at the list of stories without logging in, briefly view one or two links, and then snark about it on Twitter...

neither of these sites seems to acknowledge that using the sites without being a "user" is a use case which exists. In the terms of both sites, a "user" seems to be somebody who has a username and has logged in. But in my opinion, both of these sites are more useful if you're not a "user." So the use case where you're not a "user" is the optimal use case.


On the Reddit blog, discussing all the technical aspects of view counting, Chandra said the team's goal was to make it easy for people to understand how much activity their posts were driving, and to include more passive activity like viewing without commenting. But there's this obvious logical disconnect here, where Reddit excludes from this analysis any user who isn't logged in. You could even make the case that they'd be better served just pulling the info from Google Analytics, although that'd be naive.

First, given the scale involved, there's likely to be enough accuracy issues that a custom solution would be a good idea in either case. Second, Reddit's view-counting effort made no attempt to identify when two or more accounts belonged to the same human being, or when two or more human beings shared the same login. An off-the-shelf analytics solution could probably provide insight into that, but not definitive answers.

My favorite explanation for all this is that it's just a simple institutional blind spot: people who work at Reddit are probably so used to thinking that people who don't log in "don't count" that it seemed perfectly reasonable to literally not count them. The concept of "users" is such a useful social fiction that the company's employees probably just genuinely forgot it was a social fiction at all (or never realized it in the first place). However, if your goal is to provide view counting with accuracy, skimming over these questions makes your goal impossible to achieve.