Meeting Minds: A Commercial Grid Geek talks to an Academic Grid Geek

I had a nice long AIM chat with my friend Chris last night; Chris has spent a few years working in academic computing grid and cluster research, and with my new grid-focused commercial role we’ve been swapping notes about our respetive jobs, and the respective cultures thereof.

Last night’s chat turned-out more different than I’d ever expected – with Chris’ permission I include an edited log, which I’ve cleaned-up for typos and flow, and italicised some parts.

It’ll be of interest to anyone who is bandying around the word grid as part of their job; the g-word is not always the correct one to be using…

  • chris: How’s things?

  • alecm: Exciting. I am learning lots of new stuff about grid computing.
  • alecm: Especially the differences and similarities between your sorts of grid, and the sorts my customers want. It’s interesting.

  • chris: What sorts of grid do your customers talk about ?

  • alecm: The thinking we are putting into (say) network fabric layout in order to match the problem set, is also amusing
  • alecm: Striking a balance between general-purpose and business-focused is a challenge

  • chris: So what, in your thinking, is a grid ?

  • alecm: Hmm. There seem to be several common meanings.
  • alecm: Can i first say what I believe the rest of the business world seem to be thinking ?

  • chris: Yup

  • alecm: Ok
  • alecm: In industry, there are a lot of hot-air white papers circulating by analysts whipping-up “grid” as the “next big thing”
  • alecm: they are (IMHO inadvisably) teaching people that grids divide into a handful of apparently, but not truly, distinct types, viz:
  • alecm: “Data Grids”, “Compute Grids”, “Storage Grids”, er… another one I forget, and (of course, best of all) “Autonomous Grids”
  • alecm: this leads to PHBs reading these briefs, believing that they then know everything, and coming to people like us and saying “We want a Data Grid”
  • alecm: (“and it should be green, because that goes fastest.”)

  • chris: hehehe

  • alecm: so… an artificial top-down taxonomy is coming into usage; it then broadens to describe/consume extant architectures.
  • alecm: eg: I am starting to see people talking-up “Hub And Spoke Compute Grids for Web-services” – which (to me) are very nearly the same layer-7 load balancer architectures that we’ve been selling very successfully and usefully for 5+ years.
  • alecm: other people say they “want grid”, where they actually want dynamic provisioning and unified management over a heterogeneous, “mixed-bag” compute farm
  • alecm: yet others say they “want grid”, and desire 1000 identical single or dual-CPU machines… but *then* they either want to do a single system image, *or* do something like PVM, *or* they want to do something like Javaspaces on top of the compute fabric.

  • chris: That last one is more of what we’d call a cluster.

  • alecm: so if i remember your question, it was what *i* reckon “grid” is?

  • chris: yup!

  • alecm: i am forced to take the ultra-liberal definition of “more than one cpu, housed in more than one metal box, collaborating on one or more computational tasks”

  • chris: OK – so what’s your feeling about what the difference between a “cluster” and a “grid” is ?

  • alecm: Not a lot. I need a swig of tea before I can put words around an intuitive feeling.
  • alecm: aside: someone last year suggested to me that Crack was one of the first [what I would now call] “grid-enabled” programs to be released onto the Internet. I found that an amusing thought.

  • chris: I strongly doubt that Crack was the first, parallel programs have been around for years, and Crack (IIRC) relies on shared storage (usually NFS)
  • chris: “I need a swig of tea before i can put words around an intuitive feeling” – I like that.

  • alecm: I never said *the* first, just “one of”. 😎

  • chris: Nah, it’s too closely coupled for a grid app

  • alecm: Ha, hark at him – “nah, it’s too closely coupled for a grid app” – a lot of the grids we are selling use NFS for shared storage, and a rsh-alike for dispatch, and that is what crack did. 😎

  • chris: Let me explain…
  • chris: My gut feeling on the difference between clusters and grids comes down to how closely coupled they are. In the HPC world a cluster is a closely coupled set of nodes, usually (but not necessarily) homogeneous.
  • chris: Closely coupled means shared filesystems and some interconnect, often low-latency but sometimes just gigE (or even just 100E if you’re not dealing with a lot of data, running only single CPU jobs, don’t care about latency, etc)
  • chris: A grid is a collection of disparate systems where no assumptions about coupling can be made (jobs need to be mapped to local users, no shared storage, data needs to be staged in and results staged out, often from different systems) and there is no shared interconnect asides from commodity Internet links (I don’t think anyone is using Grid over UUCP in the real world, though it would be an ultra neat hack).
  • chris: Often a grid is made up from independent clusters but they can also be cycle-scavanging setups using things like Condor on labs of desktop PCs; you can’t make assumption on the processor type or OS vendor, let alone library versions, etc..
  • chris: All of which make building them a nightmare, but rewarding if you get them going.

  • alecm: So: What’s your term for (say) that thing made of G5 Apple kit that is near the top of the Top500 list ?

  • chris: VT’s system is a cluster, it’s got a shared interconnect (IB, I think) for low latency parallel jobs and its very closely coupled.
  • chris: It ain’t a grid, that’s for sure.
  • chris: A grid often transcends administrative boundaries

  • alecm: Ok, so, my turn now ?

  • chris: Yup – I’m all typed out for the moment.

  • alecm: Ok, I’ll try from my end of the spectrum

  • chris: No worries.

  • alecm: The sort of customers I deal with on a day-to-day basis tend to use the word “cluster” to indicate a set of machines that provide a *service*, and that service is typically something that requires to be imbued with qualities of load-balancing for efficiency, or perhaps extremely high availability;
  • alecm: Before you say it, I will acknowledge that this is an almost utterly different meaning to “cluster” in your sense. The most basic of *these* sorts of clusters might be a pair of machines which watch each other over a heartbeat network (eg: a SCSI bus) and when one deems the other to have died it assumes the other’s business role, eg: by ifconfig’ing up a virtual IP address and launching a HTTP server to replace the one on the dead machine
  • alecm: Another might be “Clustered H/A NFS Servers” which use a “Cluster File System” (again slight terminology/usage tweak) to maintain state between physical machines that team together to ensure the service, whatever it is, is always available.

  • chris: This is the disconnect between the business world (where HA is king) and the HPC world (where a standby machine is seen as wasted compute resources).
  • chris: Yeah, the old STONITH take over system

  • alecm: Exactly
  • alecm: So, with this usage of the term “cluster” it seems that the business world has expanded the meaning of the word “grid” to encompass both terms as you use them.
  • alecm: So, in my little world (at least) we talk about “classic” or “cookie-cut” grids, which in your world would be “clusters” — the Delia Smith ideal of having identical machines, fast interconnects, etc…

  • chris: I guess the main problem for folks like you (and vendors in general) is tailoring your vocabulary to the person your talking to.
  • chris: Because when folks start talking about wanting to sell us a “Grid”, they look like they’re clueless about the HPC world.
  • chris: For me it’s very easy to forget that just because everyone I talk to uses the same vocabulary our niche is pretty small compared to the rest of the market and the rest of the world can use these terms in ways that are very strange to us.

  • alecm: Works both ways; here’s a joke:
  • alecm: main() { while(1) fork(); }
  • alecm: If you giggled, you are in a set of less than perhaps 0.001% of the world’s population who might even raise a smile

  • chris: 😎
  • chris: Of course; and likewise with this whole Grid thing.
  • chris: I guess what I’m driving at is that of all the folks I’ve talked to, people who call our Cluster a “Grid” sound like they don’t really understand our niche area. As with so many things it’s more a case of user education and perception than reality.

  • alecm: /me thinks
  • alecm: I was going to make a point earlier, but diverted myself
  • alecm: Having defined “Grid” and “Cluster” as I/We use the terms, I want to further acknowledge that there is a top and a bottom half to both. The bottom-half is the technology; the top half is the management, and some people focus entirely on the top half. they tend to be business operations people.

  • chris: …and of course then there is the “middleware” (god I hate that term)

  • alecm: There is no getting away from it, though. 😎

  • chris: Yeah, *especially* when you’re building Grids..
  • chris: At least the whole web services/XML area is standardising things, though slowly..
  • chris: WSDL, WSRF, Tomcat, etc..

  • alecm: Are you being beaten up about SOA yet ?

  • chris: SOA? DNS?

  • alecm: SOA & SOI – Service Oriented {Architecture,Infrastructure}
  • alecm: Basically “let’s build our systems out of software components in a manner akin to LEGO”; from my perspective it is common sense, but then i am not someone who gets paid for evangelising the obvious as if it were trendy, I get paid to deliver results.

  • chris: Not come across those acronyms, most of the things we’re looking at are OASIS sort of things like WSDL
  • chris: It sounds like software engineering.

  • alecm: It is. At the webserver/appserver end of the scale.
  • alecm: In crayon.

  • chris: Hehehe

Comments

9 responses to “Meeting Minds: A Commercial Grid Geek talks to an Academic Grid Geek”

  1. Tess

    Clusters to me are always going to be several boxes providing failover services to each other. Those academic types should get with the program. We’re the ones making the money after all, sheesh.

  2. Alejandro M. Ramallo
    re: Meeting Minds: A Commercial Grid Geek talks to an Academic Grid Geek

    Interesting…my two cents: – I like to think of a cluster as an aggregation of nodes in which the main objective is to provide HA. Traditionally in such things adding a node to the cluster means turning it off and reconfiguring a lot of things at the OS level. As such the cluster is explicit. – For me Grids are tacit (implicit), the OS/machine/node does not know it is part of one – Therefore I think Virginia’s G5 farm is intended to be a Virtual Compute Server, something I think is more related to Grids than to Clusters – I am one of your customers and we are doing the JavaSpaces thing over a bunch of entry level servers to create a Virtual Compute Server, that I think is also not related to HA but to computation aggregation (HPC) – SOA is not Web Services, is not WSDL, have a talk with your JINI friends at Sun 🙂 The confusion between Cluster and Grid is not dangerous, the confusion between SOA and Web Services is! – SOA provides many advantages over traditional monolithic architectures such as separation of concerns, physical separation of code, executables and software engineering teams, management, etc.

  3. Stephen Usher
    re: Meeting Minds: A Commercial Grid Geek talks to an Academic Grid Geek

    The term “Grid” was coined around 1997 as an eScience initiative and has only really quite recently been taken up as a marketting term in corporations.

    The original definitition was that described by Chris, ie. a distributed, heterogenious “cluster” of computing resources designed so that processing jobs could be located (and relocated) to where the resources were. The original concept included the idea of moving processing jobs between sites during the time they were running so that if they needed CPU power they would move to a machine somewhere in the world which had a powerful CPU and if they needed data then the process would be transfered to a machine near the data. etc.

    Current scientific “Grids” are still little more than batch queues on steroids where processes are farmed out to machines which sort of best suit them. The original concept needed multiple copies of the binaries, one for each architecture.

    At that point in time, a cluster was merely a collection of tightly coupled computers meant for any purpose whatsoever. It could be HPC or it could be fail-over or shared resource. (cf. VAXcluster and Beowolf Clusters).

    Today, just as the term “blade” has been diluted by marketting types (eg. Sun Blade 150 etc.) the term “Grid” sounds sexy and is being used for anything which involves multiple, reasonably closely coupled machines co-operating with each other to offer some type of service, be it HPC or fail-over… ie. the old “cluster” in new clothes.

  4. alecm
    re: venting spleen

    yes, and we love you too, dear heart. 😎

  5. alecm
    re: Meeting Minds: A Commercial Grid Geek talks to an Academic Grid Geek

    Hi Alejandro,

    It’s not hard for me to get to chat to the JINI folk, in fact it occasionally proves challenging to get them to shut up again, certainly any time before “last-orders” gets called at the pub. 😎

    As for SOA it does seem more like common sense to me than anything particularly magical, but then I am coming to it from the perspective of having been deploying similar architectures for some time past, only this last year or so to encounter someone else “inventing” and “branding” it as SOA. On the other hand “WebServices” (and not “Web Services”?) appears to be one of those terms which are a bit like an Ogre^H^H^H^HOnion, with many many different layers.

    “XML^H^H^HParfait! Everybody loves Parfait!”

  6. Stephen Usher
    This might be better

    http tinyurl.com/5rlv6

  7. Tess
    re: venting spleen

    🙂

  8. Chris Samuel
    Grids != Clusters

    Bzzt, thank you for playing.

    Alec’s use of the word academic is misleading, it should really be High Performance Technical Computing (HPTC) and I think folks like General Motors would object most strongly to be described as academics, and I think they probably beat you in the making money department too. 🙂

    HPTC != HA and the vocabulary differs significantly.

    A cluster (in the HPTC world) is often a system designed for running large jobs, whether those be parametric studies (single CPU job run many times with different parameters) or large parallel MPI jobs such as Computational Fluid Dynamics codes like Fluent to simulate oil flow in a gearbox or Finite Element Analysis engineering codes like Nastran & Abaqus for modelling parts.

    Three of the clusters I’m directly involved with are used by academics run everything from the above mentioned codes through molecular modelling codes like NAMD and Schrodinger to Bioinformatics/Genomics software like NCBI Blast. A lot of our users are rolling their own code too, especially in areas like astrophyics. Our oldest cluster (Compaq Alpha SC running Tru64 from 2000) is about to be reconfigured to allow some of our academics to run 100 CPU jobs.

    But we also have involvement with running clusters for General Motors and others for commercial work, and believe me any node on any of our clusters not working is considered resources wasted.

Leave a Reply

Your email address will not be published. Required fields are marked *