DISQUS

ryanpark.org: Top 10 Reasons to Avoid the SimpleDB Hype

  • jackson · 1 year ago
    "Guess what, kids - you aren’t Google, Amazon, or Facebook. The chance that your web toy will ever be a fraction as popular as those sites is so vanishingly small that is creating an underflow condition."

    That's exactly right. If you are creating a toy app, you don't need to scale. If you are creating a toy app, you should use a toy db, i.e., an rdbms.

    Google is not the only one that needs a scalable db. /ANYONE/ who ever hopes to have > 100,000 users is going to start running into scalability problems and eventually face the reality of the SHARDING NIGHTMARE if they use an RDBMs.

    And if you are making something for < 100,000 users, you really probably ought to just stop now, shouldn't you?
  • jackson · 1 year ago
    ha, and one more thing. Once you shard (as all the big-boy rdbms shops eventually have to do, e.g. youtube), most of your arguments go away, too: you no longer get automatic integrity, consistency, aggregate ops, etc. Once you shard, YOU HAVE TO DO A LOT OF CODING to make up for the INHERENT NON-SCALABILITY OF RDBMs.

    Sorry, you can't get around it. MUCH BETTER to go in with the assumption that you'll have these problems (because, hey, you are creating something that's going to be successful, right?) and plan from day 1 to deal with them.

    That's one of the beauties of couch/simple/bigtable -- you can't hide behind some empty promise from some big RDBM vendor. You have to face the truth from the start.

    And you know what? The truth isn't so bad. It's quite elegant, actually.
  • Toby DiPasquale · 1 year ago
    You seem to have missed the point of these pieces of software. By structuring your application to use these new datastores, you don't need to worry about fork-lift upgrades and all the other scaling problems of a traditional RDBMS as you get bigger and take on more load. It scales up just like the rest of you app: by adding more machines.

    Yahoo!, eBay, Facebook, etc scale their RDBMSs by doing the same thing that SimpleDB or BigTable do internally: by sharding the data down to finer and finer levels of keyspace as the number of machines grows. Except, with an RDBMS, this is a manual process. They also use read-only slaves to distribute read load, something also implicit in BigTable/SimpleDB/etc with their use of simple block replication without the concurrent use of erasure coding (e.g. RAID). RAID is also external to the RDBMS, requiring you to manage both disparately.

    Also, Oracle can scale up to 64 nodes at max with a clustered filesystem (this is somewhat old, it might be 128 by now). Google has ~650,000 machines in their clusters. This is 4 orders of magnitude difference. No one has enough money to pay Oracle to scale to this level. Yahoo! and Facebook have gotten MySQL to run on more boxes than this, but not in a cluster, so they (like you and everyone else) are stuck with the manual process of sharding and shard management.

    If you don't expect to grow, by all means, continue to live in the RDBMS past. However, if you're app is subjected to possible rapid growth (e.g. a Facebook app, Salesforce app, GAE app, pretty much any Web-facing app) you should definitely be thinking about how to leverage SimpleDB/HBase/Hypertable/CouchDB/etc in your design. I see a lot of posts of this nature lately and they all seem to be coming from the initial shock of what you *can't* do with these systems. Give them a shot and see what you *CAN* do with some semi-clever design and you might be surprised.
  • jackson · 1 year ago
    I love how you rdbms fanboys are scared out of your pants by the new wave of stuff. I love it.

    Here's why you /should/ be scared. Google (and to a lesser degree Amazon) /already/ run on this new breed of DB. And their apps /work/. And their apps /scale/, massively. No stupid sharding or rdbms babysitting required.

    Sorry, but if you think traditional rdbms scale without problems, you either have 0 experience with large systems or you are being disengenuous.
  • anon · 1 year ago
    How can you support shards while not support simpledb?
    Surely you can't do a group by if you use shards?
  • Jon Gilkison · 1 year ago
    You nailed it.
  • orly andico · 1 year ago
    I would suspect that the scalability solutions for e.g. MySQL are for mostly-read scenarios. I don't think anybody can do linear horizontal scaling of heavy-write scenarios. Except Real Application Clusters.

    That said, Oracle does have an in-memory, key-value pair based system which is highly (5000-node clusters in production) and linearly scalable, can do aggregations over the entire grid, and works on objects.

    Oracle Coherence.

    Costs an arm and a leg, but solves all the issues raised in this article (yes you can do aggregations and SQL-like queries, and they are automatically run "in parallel" across the entire grid).
  • Habib · 1 year ago
    We heard these similiar things during MySQL vs RDBMS saga, and why soon the sky will fall down on everyone using MySQL instead of a read database. Still MySQL is going stronger than ever. There are a lot of FUD on the net as always. What's missing is a few real life true story on how someone lost business because they chose to use a non RDBMS solution where RDBMS prefered that they should be doing otherwise.
  • Null Pointer · 1 year ago
    >This morning I read an article by Todd Hoff which fawned over SimpleDB’s unconventional rules to such an extent that I thought it might be satire.

    You don't miss much, do you, slick?
  • Dan Shaw · 1 year ago
    So funny:
    "We all expect Oracle to scale if we pay them enough money..."
  • Ariejan · 1 year ago
    Excellent points, all 10 of them! Thanks for the write up.
  • Abbas Ali · 1 year ago
    > I’ll argue however that this is the kind of suckiness programmers like. Programmers like problems they can solve with more programming. (By Mr.Todd Hoff)

    I also strongly disagree with this. Yes, programmers do like solving problems but only new problems and challenges not those problems for which the easy solutions already exist.
  • Venkman · 1 year ago
    I thought that the scalability proposed by this model is not in the "number of records in the table" but in the "number of simultaneous reads". As in having a lot of simultaneous users searching for an item, not as in having a couple of users searching through a lot of items.


    That said, I would've just stated one reason not to use this (#10: You almost surely don't need it).
  • Anonymous Coward · 1 year ago
    MapReduce and parallel execution solves a fair number of the arguments above.
  • T. Roll · 1 year ago
    Using a Real Database (c)(tm) will solve ALL of the above problems.

    The polished turds from Amazon and Google are still turds, though shiny.
  • Dennis Forbes · 1 year ago
    Good entry, Ryan, and quite on mark. I had just come across a quotation of the article you quote in #3, and at the time I thought he was being sarcastic. I'm greatly saddened to think that he was actually being serious.

    I find it most interesting seeing all of the cheerleading for SimpleDB and similes by people who are quite evidently clueless about databases, so they embrace and flaunt their ignorance, using Google and Amazon as a "Big Daddy" of sorts, always ready to reference.

    Guess what, kids - you aren't Google, Amazon, or Facebook. The chance that your web toy will ever be a fraction as popular as those sites is so vanishingly small that is creating an underflow condition.

    Google has a very specialized database, and their needs are absolutely nothing like almost anyone else. Amazon likewise. Until the day that you build your own specialized database, an RDBMS is often a suitable choice.

    And the scalability ruse....extraordinary. The numbers I've seen for these "scalable" database technology are need to be scalable because they're such incredibly poor performers.

    Alas, everything old is new again. Here we have cheerleaders heralding the arrival of basically exactly what people did before real databases were invented. Hurrah for the past!
  • Dennis Forbes · 1 year ago
    Okay, it seems to have mangled my last post, so let me format slightly...

    +And if you are making something for < 100,000 users, you really probably ought to just stop now, shouldn’t you?

    Ho ho ho. Awesome stuff.

    Yeah, I guess making systems managing billions in funds just doesn't cut into realm of the awesome systems that you make.

    You are simply delusional.

    +And if you are making something for 100,000 user sites do you have, jackson? Care to point a couple out?

    Now I presume you must mean 100,000 simultaneous users, because there are quite a few >100K user sites easily running on some shitty RDBMS (e.g. MySQL) on a low-end desktop PC. Slashdot, for instance, which was pretty much a worst case because they were caching nothing, and generating every request live from the database.

    Clearly you have needs far beyond /. in their heyday.
  • Dennis Forbes · 1 year ago
    Better still, you have needs beyond Slashdot in their heyday, and an apparently miniscule budget. My dev database server is a 16-core, 6-disk monster, serving up an unbelievable transaction load.

    Not good enough for jackson's imaginary success story, though.
  • troll_wrangler · 1 year ago
    @jackson - "And if you are making something for < 100,000 users, you really probably ought to just stop now, shouldn’t you?"

    the answer you troll for is "Nope". in fact, i'd say just the opposite. if you know before you start that your app will need upwards of a hundred thousands users to be useful to its audience, "you really probably [sic] ought to just stop now."
  • Rob W · 1 year ago
    You definitely have some good points -- but you're also being unfairly harsh and not scoring any higher marks on presenting a balanced viewpoint.

    One obvious one that caught my eye: #7: "SimpleDB isn’t that fast" -- Todd specifically pointed out (right in there with the performance numbers he was quoting...) that tools like SimpleDb are NOT fast. That's not the point; they exist to address scaling issues.

    Some other lines you apparently considered fawning or possibly satiric:

    "If you have a complex OLAP style database SimpleDB is not for you. But, if you have a simple structure, you want ease of use, and you want it to scale without your ever lifting a finger ever again, then SimpleDB makes sense. The cost is everything you currently know about using databases is useless and all the cool things we take for granted that a database does, SimpleDB does not do."

    That sounds an awful lot like what you're saying in #10. But you start that off with "Everyone’s assuming that SimpleDB was designed to be a general-purpose replacement for OLTP database servers."

    Sorry for the rant; I guess I'm just saying you clearly have some useful input to add to the discussion -- just leave the straw man nonsense at home, please.
  • At a loss · 1 year ago
    Can someone remind me why application programmers should care so much about the integrity constraint checking afforded by relational databases? There are only a small set of constraints that can be checked without programming. And, from what I've seen, the only way to get decent error reporting in my application is to check all of the constraints myself, anyway.
  • Anonymous · 1 year ago
    Last time I checked, Amazon *do not* use SimpleDB to power their online store. Rumours of Postgres abound ...

    PS - do you know that your comment filter rejects valid email addresses such as root@localhost.localdomain? (at least, that's where all my cron jobs send it ... :-)
  • Tammer Saleh · 1 year ago
    "When websites like Friendster have scalability issues, it’s not usually because of the RDBMS."

    @toby has it exactly right. RDBMS _can_ scale, but at *significant* costs in both money and developer/sysadmin/DBA time.
  • Diarmuid Wrenne · 1 year ago
    The hype has been pretty intense and peoples' perceptions of what SimpleDB et al are useful for have grown pretty inflated. Most applications database component are made up of 3 distinct areas; 1 user related data, 2 data that structures the user experience within the app, 3 rapidly growing and dynamic data. I feel that 1 and 2 are best served by an RDBMS while 3 is a good fit for SimpleDB. Take YouTube. User data and user created scalar data is tiny compared with Video data and its associated data. If a video search is not fully optimised, no-one dies. However, a user does want to be sure that their favourites, channelss, etc are stable.

    Regards

    D
  • Yurii Rashkovskii · 1 year ago
    First of all I’d like to note that the below comments are not about SimpleDB but rather to prevent FUD about document-based databases.

    1. Data integrity is not guaranteed.
    This could be the case with SimpleDB, but overall nothing prevents document databases from managing data integrity very well.

    Regarding the constraints, there is nothing that prevents defining validations in a document or its related “meta” document (this is pretty much how StrokeDB works — you can define your validations within meta document and they will let your document stay validated)

    More interesting are the concerns about the conflicts. I’d say that this problem is hardly addressed in a common RDBMS approach. All you usually get is either user’s A or user’s B most recent update — there seems to be no easy way graceful conflict resulution. On the contrary, since document databases approach is rather novel there is certainly enough room to adopt ways to deal with conflicts. For example, with different and configurable algorithms — like merging them slot-by-slot 3-ways, or even some special programmer-defined algorithms. I can hardly imagine how to do this sort of stuff with traditional RDBMS in a relatively easy manner.

    2. Inconsistency will provide a terrible user experience.
    First of all, it should noted that described inconsistencies are also quite possible with distributed RDBMS setups — they too are constrained by a certain lag before the data is going to be propagated through replicas.

    The actual problem is not with lag — it is more about leaving documents in a consistent state.

    This problem could be easily addressed in any kind of database, either relational or document-based.

    3. Aggregate operations will require more coding.
    Again, while this seems to be true for SimpleDB, other document-based databases address this problem pretty well with Views approach (CouchDB, StrokeDB [Views is WIP]) — so you can define any kind of aggregation, even such that are simply not supported by RDBMS.

    More at http://rashkovskii.com/articles/2008/4/26/top-1...
  • AkitaOnRails · 1 year ago
    You are missing the point completely.

    Databases != RDBMS. RDBMS is but "one" kind of database. Then you have hierarchical, object-based, document-based, etc.

    SimpleDB is but one kind of non-RDBMS database. There are use cases that fit RDBMS, that are use cases that make RDBMS cry. That's where SimpleDB or other alternatives get into the game.

    Just as simple as that. When all you know is a hammer, all your problems are nails.

    This article is just wanting more traffic by generating FUD to newbies.
  • TimM · 1 year ago
    I couldn't agree more with AkitaOnRails, you are comparing totally wrong stuff, this shows in a lot of situations a lack of experience.
  • Jim · 1 year ago
    The points in this article are true, but not reasonable or relevant, and more importantly highly imbalanced. Hence I am sorry to say -> FUD.
  • Jay · 1 year ago
    Google scales because it can afford the SKUs for storing their data and hence can throw machines at resolving scaling problems. However, if you need to run Oracle or any other RDBMS, you will have to empty your wallets to scale up. Most online applications do not require the zillion RDBMS features that are not optimized for the characteristics of typical online apps which are more read heavy. I should also point out that I have unfortunately seen people ridiculously normalize their schemas even for read heavy apps when they could have easily spent more on writing multiple times.
  • Mitch Stephens · 1 year ago
    This article is pretty much on target.. I tried to use SimpleDB as a persistence layer for C# classes... the idea was to use an attribute to store the xml for the class... Couldn't do it because of the 1k limit per attribute.

    This is actually a big database limitation. My application has a lot of places where people can leave comments.. and 1k is too small. Think about a long email message.. it could easily go over 1k.
    That means you have to split a field into multiple chunks.... :-(
  • A.J. Brown · 9 months ago
    > My application has a lot of places where people can leave comments.. and 1k is too small.

    You didn't spend too much time researching SimpleDB then. You can store pointers to larger data objects stored in S3 if you need more than 1k.
  • sartre · 6 months ago
    "You didn't spend too much time researching SimpleDB then"
    Ummm. yeah lets also add the s3 goodness for something thats easily handled by a rdbms. Do you work for amazon?
  • manvscode · 4 months ago
    Very good stuff.
  • randv · 1 month ago
    Amazon agrees, they now offer hosted mysql (one could do it before with your own image) with better support that your hosted version.
  • sam · 3 weeks ago
    10000000% correct, you are the man, i came to those conclusions the hard way, i wish i could have found this post, all my work went in the dustbin when the website was launched, simpledb was the cause of its failure :(