Earlier today, EnterpriseDB posted a press release stating that it had released its shared-nothing-based GridSQL parallel query engine under the GPL v2, with an additional provision to include and distribute it with projects under several other OSI-approved licenses. While this isn’t directly related to Oracle internals, I thought I’d post my thoughts as well as pose a little question out there to those of you in the Oracle community.
As someone who has done significant research and performance work in the areas of clustering and parallelization, I agree with several of Kevin Closson’s points regarding Shared Disk vs. Shared Nothing architectures, especially in the realm of OLTP. In fact, they’re not really his own viewpoints as much as they are well-defined behaviors and limitations inherent to each cluster architecture. And, while I tend to personally prefer shared-disk clustering, I have to acknowledge that there are a variety of customer use-cases where shared-nothing makes more sense.
At this point, however, I should state that this blog entry certainly isn’t intended to discuss the pros/cons of each architecture. Instead, it is to discuss the open-sourcing of GridSQL, and the possible opportunities for Oracle users.
First, as I’ve seen several posts from people attempting to directly compare GridSQL to RAC, I thought I’d throw my two cents in. My basic opinion is that GridSQL is not RAC, nor does it intend to be.
On one hand, RAC is derived from the substantial Oracle V6 kernel rewrite and Oracle’s foray into MPP with Oracle Parallel Server. RAC’s shared-disk architecture is designed to meet several performance and parallel processing challenges commonly found in both OLTP and DSS environments. On the other hand, due to a few of the inherent limitations of shared-nothing database architectures, they have generally been reserved solely to DSS, where significant processing can be performed by various independent nodes.
It should be stated that both shared-disk and shared-nothing architectures share communication and coordination overhead. And, to some of Kevin’s points, in a shared-nothing OLTP environment, the only reliable way to mitigate this overhead is by using well-defined data-routing mechanisms within the application, or at the driver-level.
Based solely on the overall architecture, I find that GridSQL is designed to efficiently parallelize DSS queries using a shared-nothing architecture; which, in my opinion, makes GridSQL more analogous to a distributed version of Parallel Query Option.
As an aside, if you’re looking for an open-source shared-disk clustered database, your only two choices (that I know of) are Ingres and PGCluster-II (whose development seems to have stalled).
As GridSQL is the first open-source, Java-based shared-nothing database system, do you see any Oracle community developers porting it to support Oracle? My initial response to this question was, “yes, I can see it”. However, now that I’ve thought about it, I’m not quite sure I do.
Because GridSQL is Java-based, its interface to the underlying database is fairly abstract. While not the easiest task, this well-defined abstraction makes it simpler for a Java developer to port GridSQL to another database, such as Oracle, SQL Server, DB2, and even MySQL. But, when I start to look at the reasoning behind doing it, I’m not sure it makes too much sense.
I remember back in the OPS days (and I believe people still do it with RAC), you could combine both shared-nothing and shared-disk clustering to overcome some of the inherent limitations of each. But, what are your thoughts?
NOTE: For those who don’t already know, while I work for EnterpriseDB, this blog and research is my personal hobby, and has been for about 12 years now; the views expressed here are solely my own and do not reflect those of EnterpriseDB. I try hard to see all-sides of a discussion and make sure my views are as unbiased as possible.