November 18, 2009

  • Data Flood

    There is much discussion about timestamping. It’s even not surprisingly expanded into a discussion of pulsing and recommends and the ISH sites and all manner of other features that people think are problematic.

    I would like to argue for a shift in perspective about these matters.  We speak sometimes as if timestamping were the problem. Timestamping is not the problem. It’s a symptom. It might be a particularly annoying frustrating and ugly symptom but it’s still just a symptom.  

    The same goes for seeing excessvie pulses in your inbox, or excessive entries for ish sites. These are just symptoms of a deeper problem.

    How do we know that?  Well it requires only a very simple thought experiment. Imagine a world where all those annoyances did not exist. There is no timestamping or date manipualtion of any kind. Let’s say for the sake of argument that ISH sites, recommends, and pulses are abolished. 

    Now imagine what happens as your friend and subscriber list grows without bound.  Indeed imagine for the sake of argument that you had an infinitely large subscriber list all of whom wrote and posted a blog entry say every minute or so. And imagine that everyone else does too. Now you write a blog entry. What is the probability that anyone will read it? What is the probability that anyone will comment?

    As the number of friends and subscribers each user has approaches infinity the probability that your blog will be noticed by anyone approaches zero.

    That should be pretty obvious.  But you can see then that the problem of relevant interesting blog entries being pushed off the page isn’t exclusively caused by timestamping or pulses or anything else. Those things might exacerbate the problems but even if you got rid of them the problem would still exist.

    Indeed, people pulse and timestamp and recommend as a means of FIGHTING that inherent problem. They are trying to combat the natural tendency for ANYTHING you write on the internet to vanish beneath the waves of the inevitable scourge of DATA FLOOD.

    Data Flood is a problem all online services struggle with.  Users want to feel relevant so they want people to see and read their stuff. On the other hand users are also interested in having access to as much stuff as possible. The more stuff they see the less likely they are to notice anything or to find the one thing they are really looking for.  Services like Facebook and Twitter are constantly evolving to find new ways to deal with data flood in order to keep their services useful to people even as they grow without bounds. People still want to see a lot of data, but they don’t want to see irelevant boring or uninteresting data, an inherent contradiction..

    If you think about it social networking in general is a way to control data flood. It does so by making you connect to users who presumably want to read your works and so you share only with those who have interest. This reduced pool of viewers reduces the risk of data flood drowning out your work.

    Microblogging was also introduced as a way to combat data flood. By limiting writers to 140 characters you could streamline the process of skimming through data to find the things that interest you. You could then be exposed to a lot but only find that small piece.

    Consider email. It’s an ancient technology that was very much plagued by problems of data flood. The result was ultimately people found that the  best way to get their emails read was to send 50 billion of them to as many people as possible. Emails became spam. That’s because there was no way to control the data flood.

    Typically you combat data flood, not by restricting what people can post or how they can post it, but by empowering the users in two ways.

    1. Give them the means to find the data that is RELEVANT
    2. Give them the means to ignore the data that is IRRELEVANT

    Often ironically to do this you actually have to provide users with MORE data. It seems like a contradiction but it’s not. 

    Consider Twitter’s new Lists Beta feature. This allows twitter users to categorize the users  they are following into lists. This allows twitterers to find more RELEVANT information by looking at the lists that combine related users who are likely to post the information they want to see. It adds more information about every single user though, which lists they belong on and who created what lists. Nevertheless it makes data flood easier to control.

    Consider say a Spam Filter on Email. This allows an email receiver to filter out emails they don’t want. In effect it adds metadata to every email you receive about whether it meets or doesn’t meet some arbitrary threshold of what is considered spam. That’s more data, however, without it for many of us email would be so polluted by Data Flood that it’d be nigh on unusuable. With it though, we can filter out data that is IRRELEVANT to us hence saving us from drowing in the data flood.

    Let’s take a look at ideas related to Xanga.

    Take for example CelestialTeapot’s idea to provide a history of timestamping dates to a blog entry that has been timestamped and the initial publication date. That’s more data so it increased the flood, but at the same time it’s USEFUL data. A person can use that data to filter out and ignore posts that have two many timestamps or to view posts only in the order of initial publication. Those who just overtimestamped posts as irrelevant can then ignore them.

    Or for example take The_Brink_of_Omniscience’s idea to give Xangans the ability to mark entries as read or unread. That’s an additional piece of data about a post. However, it allows users to filter out posts that they deem no longer relevant to them because they’ve already read them.

    Or take ModernBunny’s idea to create Xanga categories in order to organize it more like a forum. That’s adding more information to each post. You now know what category they belong to and what users reading them and commenting on them are likely to be interested in.  Yet it makes it easier to find blog entries that are of direct interest to you. If you’re interested in tech blogs you can read just those blogs that have the tech category and consequently those bloggers who write in tech category will find their blogs more likely to be read BY those people perusing the tech category.

    Generally that’s the best way to deal with data flood. You don’t try to head it off at the source by building dams that prevent the flood from happening. Instead you give people the means to swim and navigate the floods effectively. Categorization, Search Tools, Spam Managemetn, Ratings systems, Folders, Tagging, Lists, Meta Moderation, Linking and Forwarding are typical ways in which services have managed data flood in the past.

    There are a lot of good ideas out there. It will be interesting to see how Xanga chooses to answer this challenge in the future.

Comments (4)

  • The Xanga company is showing initiative these days; giving us the option to buy a custom URL was a great move, and the front page has been modified to emphasize both that new feature and the best reason to buy Premium (more photo storage space). This is a good sign. Perhaps Xanga is looking for ways to progress now, so ideas may be seriously considered. Let’s hope so.

    I like CelestialTeapot’s idea because it encourages accountability. It’s beyond frustrating when someone has timestamped for the tenth time a post they wrote last month.

  • @TheModernBunny - I am massively skeptical about this “initiative”. Where were they for like the past five months? I am very cynical. They made intersting changes but they seem focused on just getting people to buy more premium. In other words they still have much more of a revenue focused mind set than a quality enhancement mindset.  But when I look at the most successful online communities they all seem to have focused first and foremost on creating a high quality product that would enable them to build the size of their community. Then htey were able to leverage that large community in order to earn a profit.

    Also I’m just worried that Xanga will seem to contribute for a few months and then disappear again without a whisper or a word. That’s no way to instill faith in your members. And it’s particularly bad after having made a big show of trying to listen more to their users and being more responsive. 

    I don’t wnat to sound too self righteous. I’ve made similar mistakes to this myself many times. But I’m just saying when you make such mistakes you have to know that you need to earn trust in the community again. I just don’t see how begging users to signup for more premium accounts is going to do that.

  • so true…data flooding is a cancer that will eat up our time and resources… i like all the suggestions you pointed out, including the timestamping thing, which i believe would be valuable especially if there are legitimate updates…

    i just hope xanga would act on the suggestions…

Post a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *