Recent

Support Wikipedia

Calendar

Navigation


On a New Road

RAM is my friendSaturday January 1, 2011
I got a kick out of reading the Slashdot posting Replacing Traditional Storage, Databases With In-Memory Analytics. One of my personal quirks is that the relational/sql model has never made much sense to me. It's both cumbersome and slow. Give me a big bucket of RAM and a log file any day. It's always hugely faster and more flexible. If the database is too big for RAM, shard. There's a odd sort of political correctness about SQL. I've frequently run into people with high performance transaction systems. When asked how they achieved that performance, "big HashMap" comes up often, and often with a hint of embarrassment. Some people seem to think that it's just a hack that they're forced into to achieve performance. But there's a murky distinction in my mind between "hack" and "elegant technique". I tend to think of the log as the Truth, and RAM as a cache that just happens to be big enough to contain everything. There's a huge bag of tricks to trade off reliability, scale, distribution and startup time. Pick a point in that multidimensional space, and there's almost always a set of tricks to get you there.
Comments:

Hi, Sorry but I would like to contact you but I don't know how... I hope that you won't be too much angry... For 2 years I'm working on my personal project called "Ynot". It's a new scripting language. The interpreter is made in Java with an easy syntax like PHP. You could directly use external libraries (.jar files) in the ynot scripts or add new words in the language. If you can have a quick look at the site http://www.ynotscript.com I will be very proud. Thank you very very... much, Eric.

Posted by Eric Quesada on January 01, 2011 at 04:44 PM PST #

OK, Let me see :)

Posted by ChunXiong on January 01, 2011 at 10:37 PM PST #

excellent write up, but almost 4 years later after you'd first implemented your java documentation project. nice stuff :-)

Posted by Mayuresh Kathe on January 02, 2011 at 01:04 AM PST #

Hi Thanks for the useful pointer.. You may also find some of the references on that regard on one of my previous posts: RAM is the new Disk.. (http://natishalom.typepad.com/nati_shaloms_blog/2010/03/memory-is-the-new-disk-for-the-enterprise.html), specifically Tim Bray notes as well Stamford research on that regard. In my 2011 predications (http://natishalom.typepad.com/nati_shaloms_blog/2010/12/2011-cloud-paas-nosql-predictions.html) i also thought that the new Tera-Scale effect were 1TB can be available on a single box will drive a big push toward applications that will run entirely in-memory during 2011.

Posted by Nati Shalom on January 02, 2011 at 01:48 AM PST #

Hi James, I have to disagree with you regarding the pros/cons of SQL versus HashMap type databases. Both have their use cases. HashMaps are very good when data is discrete and there aren't many joins. For many applications this is a good data model. SQL really has no alternative when an analytical query has to be performed that requires complex joins, subqueries, etc. The issue of RAM versus storage is also a red herring as SQL databases also use RAM - the more the better. The performance problem with transactions is often due to referential integrity requirements that SQL databases are expected to follow. Regards

Posted by Dibyendu Majumdar on January 02, 2011 at 03:13 PM PST #

In all fairness to sql, etc., it was invented when RAM was much more expensive, and RAM still is wicked expensive. And when it comes down to it, all sql is, is a log file with a bunch of hashtable (or btrees) indexes in ram. Also I find it a little objectionable that we often have CPU power to spare, but we need a whole new machine just so we can throw more memory at the problem. Java, for example, does not do well with huge huge memory heaps. At least not in my experience. You know, 48G or 128G, etc.

Posted by JP on January 03, 2011 at 04:55 AM PST #

Sure, my own pet project is HashMap in memory too, with occasional need to drop or compress some of the less used data from the values when memory is tight. It's nothing to be embarrassed about: I'm with you on that grey area... Rgds Damon

Posted by Damon Hart-Davis on January 03, 2011 at 07:51 AM PST #

Maybe the real answer is a scalable (via built in sharding), concurrently accessible, distributed hashmap with some optimistic concurrency primitives (see membase.org) with a set of high performance clients (see spymemcached on Google code and Enyim) that implement the platform's standard hashmap. At least, that's what a few of us been working on and with for the last year and a half... The reality to many is that app development has finally become network and memory oriented, rather than disk oriented.

Posted by Matt Ingenthron on January 03, 2011 at 02:08 PM PST #

ISAM (Indexed Sequential Access Method) is an old standard that is pretty much a "HashMap on disk". It outperformed SQL wayyy back when SQL was a still young and immature. I think modern caching techniques could really make this a viable modern technology.

Posted by Collin on January 03, 2011 at 02:58 PM PST #

HashMap idea sounds cool, however what if Power goes down? you need to replicate that data somehow.

Posted by raveman on January 04, 2011 at 05:53 AM PST #

No surprise on it not making sense for you, SQL isn't optimized for real-time analytics. It is designed for transactional, reliable storage and retrieval, where getting those things wrong cost money or lives. It is also designed to shield implementors and coders from the vagueries of which type of index is being used today. If RAM is your thing, and it is time for analytics, dump your SQL based relational database and started grepping, or whatever you wish to do, knowing that your real data is still secure in the RDBMS (even if it is owned by Oracle). Or is who makes the RDBMS the real problem here?

Posted by Ken Marsh on January 04, 2011 at 12:49 PM PST #

I don't follow how '... the relational/sql model is ... both cumbersome and slow'? - The relational model just is a conceptual model on top of particular *implementations*. That's the part that can be slow. Maybe it's how ACID has traditionally been implemented that slows things down? Sounds like hype but see this by Michael Stonebraker for the 4 things that make SQL databases slow: http://cacm.acm.org/blogs/blog-cacm/50678-the-nosql-discussion-has-nothing-to-do-with-sql/fulltext I have not tried the products he's involved with (VoltDB / H-Store and others) but they might be fulfilling his prediction that he expects 'very high speed, open-source SQL engines in the near future that provide automatic sharding... they will continue to provide ACID transactions...'

Posted by George Rypysc on January 04, 2011 at 06:28 PM PST #

@Dibyendu "QL really has no alternative when an analytical query has to be performed that requires complex joins, subqueries" The two models are actually not mutually exclusive - you can have a HashMap that expose extended SQL query semantics ontop of the key/value API. This has been supported for a while with GigaSpaces and was added recently with Coherence. The other option is to add standard JPA/JDBC facade ontop of the In-memory data store (Similar to the way Google added JPA onto of thiere Big Table implementations) - As far as i know GigaSpaces is currently the only In-Memory data store that provides built-in support for those two interfaces, i assume that others will follow shortly.

Posted by Nati Shalom on January 04, 2011 at 06:50 PM PST #

@raveman You should assume that there is more than one copy of the data on another machine - in that case if the machine goes down the data doesn't go away. I would suggest that you would look on how In-Memory-Data-Grids handles that scenario.

Posted by Nati Shalom on January 04, 2011 at 06:55 PM PST #

Post a Comment:
Comments are closed for this entry.