ryanpark.org - Latest Comments in Top 10 Reasons to Avoid the SimpleDB Hype

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Beachlife — Sun, 18 Apr 2010 12:33:50 -0000

Valid points you make. These new data stores all remind me of Tandem's Enscribe filesystem from the 1980's. This system is still in use for very limited uses that need low latency (i.e., trading systems) but Nonstop SQL took over for more complicated uses. Keeping the DB functionality on the server reduces code complexity and duplication of code.

Your most valid point though is that fewer bytes of data must be transferred if you push the functionality down to the server. This saves enormous time and is a basic functionality of a true Database server and not just a datastore.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

isterin — Mon, 08 Feb 2010 15:23:49 -0000

#1. RDBMS provides some rather rudimentary capabilities for enforcing integrity. In any applications, invariants are enforced through the application logic, not the database. There are definitely ways around it, like triggers that can enforce this, nothing that a NOSQL db can't have. CouchDb allows for validation functions that can enforce such integrity. Either way, in most apps invariant enforcement happens on both ends and is superfluous. Invariants should be enforce in a single place and rdbms doesn't have as much power as a Turing complete language to do so.

#2. That's completely application dependent. Consistency usually comes with performance tradeoffs. Some apps would much rather accept the write and then reconcile it later (asynchronously), than force the user to wait and block until such a transaction can be completed due to all the lock contentions that can happen in a high traffic environment. Again, it's very application dependent, but what you're describing is called "Eventual Consistency", you can read more here http://queue.acm.org/detail.... RDBMS themselves are not necessarily the bottlenecks, rather it's ACID transactions which are. Every high load/concurrency app eventually trades transactional semantics for performance. It's up to your application to figure out how to reconcile temporal inconsistencies if at all. If you still think it's that important, just ask eBay. They don't use transactions and utilize the eventually consistent model? What does that mean, it means that there might be a very minuscule chance of some inconsistency and they might have 1 out of 100K spooked customers, but who cares, they've just achieved a 99.9999% customer satisfaction, as opposed to many unsatisfied customers if their system is somehow unavailable or slow.

#3. Probably, but the same is true when you shard a sql database. Also, through map reduce operations, aggregates are actually more natural and some key/value DBs now provide such operations. The great thing, map/reduce is completely abstracted from distributed database semantics, so you can easily distributed this operation over numerous remote nodes.

#4. Not sure what you mean. Maybe they'll require non-SQL coding, but not necessarily "more" coding. And coding might actually be shorter as in some languages you utilize a Turing complete language to apply predicates vs. the limitations of SQL and it's underlying relational model. Again, it's true that RDBMS are more suited for reporting at this time, mostly because they've accumulates lots of experience over the years. On the other hand, if reporting is not a huge part of your system and you want it in a relation database, just replicate. Data warehouses do this all the time to optimize/denormalize data for reporting, as reporting on highly normalized data is also very inefficient.

#5. I think that's a side-effect of the network. Yes, it's true that RDBMS algorithms might have been more optimized over the last 20 years, but that doesn't stop from NOSQL DBs from getting faster. Basically, there is no theoretical reason for a "local" NOSQL query to be any slower than RDBMS. It's all in implementation details of that db, storage structure, search algorithms, etc... Because SimpleDb is distributed, the side effect of aggregate functions is "more latency". The same side effect would be true in a sharded RDBMS.

#6. RDBMS might have more tools now, but again, that's not a reason to necessarily disqualify the benefits of a different storage model. Also, have you looked at CouchDBs replication? Talking about fast and easy.

#7. See #5 response. Also, SimpleDb wasn't designed to be fast per say, although that wouldn't be a bad feature. It was designed to be linearly scalable, so you might incur some latency, but that latency should be constant with increasing load.

#8. Nonsense. There is a sweet spot for RDBMS systems and I use them in any applications which has a requirement for such and/or where the data fits into the relational model. I hate having to square pegs into round wholes. Like storing highly dynamic and hierarchical data in a relational database, either using a convoluted normalized model or using skinny tables, which defeat the purpose of the relational model completely. Also, you mention Facebook, MySpace, etc... Yes, they use a RDBMS engine behind the scenes, but if you look at their storage model, they are utilizing it just like a key/value store. Basically, they're not benefiting from any features that RDBMS systems are good at.

#9. Agree, you don't want to make that the top priority unless you have to. But I think many people read "don't optimize prematurely" as a ticket to forgo such activity. That's the worst thing you can do. I faced that personally, when you only worry about features and not the long term requirement changes/scalability and then your system fails in production due to an unexpected load which you never accounted for and/or thought a your architecture can handle. What then? Well, besides making up excuses and trying to savior any of the relationships that still exists with users, you're up for 2 weeks straight not sleeping doing what you should have done upfront. That's like building bridges and only being able to handle 5 cars on a bridge at a time, because in this rural community we'll never have traffic. Disastrous results await.

#10. No shit, everything is useful only in certain contexts.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Dan Stocker — Fri, 22 Jan 2010 11:58:45 -0000

I think I can agree with your conclusion. Nosql is not a must. If a certain solution is easier to implement in relational and that satisfies your performance needs, use relational.

Problems arise when you're trying to square the circle. When your application needs aggregate functions or joins, nosql is not the way to go. When your app doesn't need a lot of personalization in terms of aggregation (say, statistics based on user preferences) you can just reverse the problem, and generate and update aggregated data as they come in.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

sam — Wed, 02 Dec 2009 10:08:27 -0000

10000000% correct, you are the man, i came to those conclusions the hard way, i wish i could have found this post, all my work went in the dustbin when the website was launched, simpledb was the cause of its failure :(

Re: Top 10 Reasons to Avoid the SimpleDB Hype

randv — Sun, 22 Nov 2009 13:50:16 -0000

Amazon agrees, they now offer hosted mysql (one could do it before with your own image) with better support that your hosted version.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

manvscode — Thu, 30 Jul 2009 02:15:57 -0000

Very good stuff.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

sartre — Thu, 28 May 2009 11:08:53 -0000

"You didn't spend too much time researching SimpleDB then"
Ummm. yeah lets also add the s3 goodness for something thats easily handled by a rdbms. Do you work for amazon?

Re: Top 10 Reasons to Avoid the SimpleDB Hype

A.J. Brown — Fri, 27 Feb 2009 02:05:03 -0000

> My application has a lot of places where people can leave comments.. and 1k is too small.

You didn't spend too much time researching SimpleDB then. You can store pointers to larger data objects stored in S3 if you need more than 1k.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Mitch Stephens — Sat, 06 Dec 2008 07:53:32 -0000

This article is pretty much on target.. I tried to use SimpleDB as a persistence layer for C# classes... the idea was to use an attribute to store the xml for the class... Couldn't do it because of the 1k limit per attribute.

This is actually a big database limitation. My application has a lot of places where people can leave comments.. and 1k is too small. Think about a long email message.. it could easily go over 1k.
That means you have to split a field into multiple chunks.... :-(

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Jay — Sun, 04 May 2008 04:42:52 -0000

Google scales because it can afford the SKUs for storing their data and hence can throw machines at resolving scaling problems. However, if you need to run Oracle or any other RDBMS, you will have to empty your wallets to scale up. Most online applications do not require the zillion RDBMS features that are not optimized for the characteristics of typical online apps which are more read heavy. I should also point out that I have unfortunately seen people ridiculously normalize their schemas even for read heavy apps when they could have easily spent more on writing multiple times.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Jim — Mon, 28 Apr 2008 08:18:03 -0000

The points in this article are true, but not reasonable or relevant, and more importantly highly imbalanced. Hence I am sorry to say -> FUD.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

TimM — Sun, 27 Apr 2008 17:33:45 -0000

I couldn't agree more with AkitaOnRails, you are comparing totally wrong stuff, this shows in a lot of situations a lack of experience.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

AkitaOnRails — Sun, 27 Apr 2008 13:46:29 -0000

You are missing the point completely.

Databases != RDBMS. RDBMS is but "one" kind of database. Then you have hierarchical, object-based, document-based, etc.

SimpleDB is but one kind of non-RDBMS database. There are use cases that fit RDBMS, that are use cases that make RDBMS cry. That's where SimpleDB or other alternatives get into the game.

Just as simple as that. When all you know is a hammer, all your problems are nails.

This article is just wanting more traffic by generating FUD to newbies.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Yurii Rashkovskii — Sat, 26 Apr 2008 01:28:09 -0000

First of all I’d like to note that the below comments are not about SimpleDB but rather to prevent FUD about document-based databases.

1. Data integrity is not guaranteed.
This could be the case with SimpleDB, but overall nothing prevents document databases from managing data integrity very well.

Regarding the constraints, there is nothing that prevents defining validations in a document or its related “meta” document (this is pretty much how StrokeDB works — you can define your validations within meta document and they will let your document stay validated)

More interesting are the concerns about the conflicts. I’d say that this problem is hardly addressed in a common RDBMS approach. All you usually get is either user’s A or user’s B most recent update — there seems to be no easy way graceful conflict resulution. On the contrary, since document databases approach is rather novel there is certainly enough room to adopt ways to deal with conflicts. For example, with different and configurable algorithms — like merging them slot-by-slot 3-ways, or even some special programmer-defined algorithms. I can hardly imagine how to do this sort of stuff with traditional RDBMS in a relatively easy manner.

2. Inconsistency will provide a terrible user experience.
First of all, it should noted that described inconsistencies are also quite possible with distributed RDBMS setups — they too are constrained by a certain lag before the data is going to be propagated through replicas.

The actual problem is not with lag — it is more about leaving documents in a consistent state.

This problem could be easily addressed in any kind of database, either relational or document-based.

3. Aggregate operations will require more coding.
Again, while this seems to be true for SimpleDB, other document-based databases address this problem pretty well with Views approach (CouchDB, StrokeDB [Views is WIP]) — so you can define any kind of aggregation, even such that are simply not supported by RDBMS.

More at http://rashkovskii.com/arti...

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Diarmuid Wrenne — Fri, 25 Apr 2008 17:45:23 -0000

The hype has been pretty intense and peoples' perceptions of what SimpleDB et al are useful for have grown pretty inflated. Most applications database component are made up of 3 distinct areas; 1 user related data, 2 data that structures the user experience within the app, 3 rapidly growing and dynamic data. I feel that 1 and 2 are best served by an RDBMS while 3 is a good fit for SimpleDB. Take YouTube. User data and user created scalar data is tiny compared with Video data and its associated data. If a video search is not fully optimised, no-one dies. However, a user does want to be sure that their favourites, channelss, etc are stable.

Regards

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Tammer Saleh — Wed, 23 Apr 2008 13:26:33 -0000

"When websites like Friendster have scalability issues, it’s not usually because of the RDBMS."

@toby has it exactly right. RDBMS _can_ scale, but at *significant* costs in both money and developer/sysadmin/DBA time.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Anonymous — Wed, 23 Apr 2008 08:00:28 -0000

Last time I checked, Amazon *do not* use SimpleDB to power their online store. Rumours of Postgres abound ...

PS - do you know that your comment filter rejects valid email addresses such as root@localhost.localdomain? (at least, that's where all my cron jobs send it ... :-)

Re: Top 10 Reasons to Avoid the SimpleDB Hype

At a loss — Tue, 22 Apr 2008 22:16:59 -0000

Can someone remind me why application programmers should care so much about the integrity constraint checking afforded by relational databases? There are only a small set of constraints that can be checked without programming. And, from what I've seen, the only way to get decent error reporting in my application is to check all of the constraints myself, anyway.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Rob W — Tue, 22 Apr 2008 18:57:38 -0000

You definitely have some good points -- but you're also being unfairly harsh and not scoring any higher marks on presenting a balanced viewpoint.

One obvious one that caught my eye: #7: "SimpleDB isn’t that fast" -- Todd specifically pointed out (right in there with the performance numbers he was quoting...) that tools like SimpleDb are NOT fast. That's not the point; they exist to address scaling issues.

Some other lines you apparently considered fawning or possibly satiric:

"If you have a complex OLAP style database SimpleDB is not for you. But, if you have a simple structure, you want ease of use, and you want it to scale without your ever lifting a finger ever again, then SimpleDB makes sense. The cost is everything you currently know about using databases is useless and all the cool things we take for granted that a database does, SimpleDB does not do."

That sounds an awful lot like what you're saying in #10. But you start that off with "Everyone’s assuming that SimpleDB was designed to be a general-purpose replacement for OLTP database servers."

Sorry for the rant; I guess I'm just saying you clearly have some useful input to add to the discussion -- just leave the straw man nonsense at home, please.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

troll_wrangler — Tue, 22 Apr 2008 17:35:07 -0000

@jackson - "And if you are making something for < 100,000 users, you really probably ought to just stop now, shouldn’t you?"

the answer you troll for is "Nope". in fact, i'd say just the opposite. if you know before you start that your app will need upwards of a hundred thousands users to be useful to its audience, "you really probably [sic] ought to just stop now."

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Dennis Forbes — Tue, 22 Apr 2008 14:31:04 -0000

Better still, you have needs beyond Slashdot in their heyday, and an apparently miniscule budget. My dev database server is a 16-core, 6-disk monster, serving up an unbelievable transaction load.

Not good enough for jackson's imaginary success story, though.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Dennis Forbes — Tue, 22 Apr 2008 14:29:23 -0000

Okay, it seems to have mangled my last post, so let me format slightly...

+And if you are making something for < 100,000 users, you really probably ought to just stop now, shouldn’t you?

Ho ho ho. Awesome stuff.

Yeah, I guess making systems managing billions in funds just doesn't cut into realm of the awesome systems that you make.

You are simply delusional.

+And if you are making something for 100,000 user sites do you have, jackson? Care to point a couple out?

Now I presume you must mean 100,000 simultaneous users, because there are quite a few >100K user sites easily running on some shitty RDBMS (e.g. MySQL) on a low-end desktop PC. Slashdot, for instance, which was pretty much a worst case because they were caching nothing, and generating every request live from the database.

Clearly you have needs far beyond /. in their heyday.

Re: Top 10 Reasons to Avoid the SimpleDB Hype

jackson — Tue, 22 Apr 2008 14:17:05 -0000

"Guess what, kids - you aren’t Google, Amazon, or Facebook. The chance that your web toy will ever be a fraction as popular as those sites is so vanishingly small that is creating an underflow condition."

That's exactly right. If you are creating a toy app, you don't need to scale. If you are creating a toy app, you should use a toy db, i.e., an rdbms.

Google is not the only one that needs a scalable db. /ANYONE/ who ever hopes to have > 100,000 users is going to start running into scalability problems and eventually face the reality of the SHARDING NIGHTMARE if they use an RDBMs.

And if you are making something for < 100,000 users, you really probably ought to just stop now, shouldn't you?

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Dennis Forbes — Tue, 22 Apr 2008 13:14:46 -0000

Good entry, Ryan, and quite on mark. I had just come across a quotation of the article you quote in #3, and at the time I thought he was being sarcastic. I'm greatly saddened to think that he was actually being serious.

I find it most interesting seeing all of the cheerleading for SimpleDB and similes by people who are quite evidently clueless about databases, so they embrace and flaunt their ignorance, using Google and Amazon as a "Big Daddy" of sorts, always ready to reference.

Guess what, kids - you aren't Google, Amazon, or Facebook. The chance that your web toy will ever be a fraction as popular as those sites is so vanishingly small that is creating an underflow condition.

Google has a very specialized database, and their needs are absolutely nothing like almost anyone else. Amazon likewise. Until the day that you build your own specialized database, an RDBMS is often a suitable choice.

And the scalability ruse....extraordinary. The numbers I've seen for these "scalable" database technology are need to be scalable because they're such incredibly poor performers.

Alas, everything old is new again. Here we have cheerleaders heralding the arrival of basically exactly what people did before real databases were invented. Hurrah for the past!

Re: Top 10 Reasons to Avoid the SimpleDB Hype

Guest — Tue, 22 Apr 2008 12:26:47 -0000

You seem to have missed the point of these pieces of software. By structuring your application to use these new datastores, you don't need to worry about fork-lift upgrades and all the other scaling problems of a traditional RDBMS as you get bigger and take on more load. It scales up just like the rest of you app: by adding more machines.

Yahoo!, eBay, Facebook, etc scale their RDBMSs by doing the same thing that SimpleDB or BigTable do internally: by sharding the data down to finer and finer levels of keyspace as the number of machines grows. Except, with an RDBMS, this is a manual process. They also use read-only slaves to distribute read load, something also implicit in BigTable/SimpleDB/etc with their use of simple block replication without the concurrent use of erasure coding (e.g. RAID). RAID is also external to the RDBMS, requiring you to manage both disparately.

Also, Oracle can scale up to 64 nodes at max with a clustered filesystem (this is somewhat old, it might be 128 by now). Google has ~650,000 machines in their clusters. This is 4 orders of magnitude difference. No one has enough money to pay Oracle to scale to this level. Yahoo! and Facebook have gotten MySQL to run on more boxes than this, but not in a cluster, so they (like you and everyone else) are stuck with the manual process of sharding and shard management.

If you don't expect to grow, by all means, continue to live in the RDBMS past. However, if you're app is subjected to possible rapid growth (e.g. a Facebook app, Salesforce app, GAE app, pretty much any Web-facing app) you should definitely be thinking about how to leverage SimpleDB/HBase/Hypertable/CouchDB/etc in your design. I see a lot of posts of this nature lately and they all seem to be coming from the initial shock of what you *can't* do with these systems. Give them a shot and see what you *CAN* do with some semi-clever design and you might be surprised.