I had a nice long AIM chat with my friend Chris last night; Chris has spent a few years working in academic computing grid and cluster research, and with my new grid-focused commercial role we’ve been swapping notes about our respetive jobs, and the respective cultures thereof.
Last night’s chat turned-out more different than I’d ever expected – with Chris’ permission I include an edited log, which I’ve cleaned-up for typos and flow, and italicised some parts.
It’ll be of interest to anyone who is bandying around the word grid as part of their job; the g-word is not always the correct one to be using…
- chris: How’s things?
- alecm: Exciting. I am learning lots of new stuff about grid computing.
- alecm: Especially the differences and similarities between your sorts of grid, and the sorts my customers want. It’s interesting.
- chris: What sorts of grid do your customers talk about ?
- alecm: The thinking we are putting into (say) network fabric layout in order to match the problem set, is also amusing
- alecm: Striking a balance between general-purpose and business-focused is a challenge
- chris: So what, in your thinking, is a grid ?
- alecm: Hmm. There seem to be several common meanings.
- alecm: Can i first say what I believe the rest of the business world seem to be thinking ?
- chris: Yup
- alecm: Ok
- alecm: In industry, there are a lot of hot-air white papers circulating by analysts whipping-up “grid” as the “next big thing”
- alecm: they are (IMHO inadvisably) teaching people that grids divide into a handful of apparently, but not truly, distinct types, viz:
- alecm: “Data Grids”, “Compute Grids”, “Storage Grids”, er… another one I forget, and (of course, best of all) “Autonomous Grids”
- alecm: this leads to PHBs reading these briefs, believing that they then know everything, and coming to people like us and saying “We want a Data Grid”
- alecm: (“and it should be green, because that goes fastest.”)
- chris: hehehe
- alecm: so… an artificial top-down taxonomy is coming into usage; it then broadens to describe/consume extant architectures.
- alecm: eg: I am starting to see people talking-up “Hub And Spoke Compute Grids for Web-services” – which (to me) are very nearly the same layer-7 load balancer architectures that we’ve been selling very successfully and usefully for 5+ years.
- alecm: other people say they “want grid”, where they actually want dynamic provisioning and unified management over a heterogeneous, “mixed-bag” compute farm
- alecm: yet others say they “want grid”, and desire 1000 identical single or dual-CPU machines… but *then* they either want to do a single system image, *or* do something like PVM, *or* they want to do something like Javaspaces on top of the compute fabric.
- chris: That last one is more of what we’d call a cluster.
- alecm: so if i remember your question, it was what *i* reckon “grid” is?
- chris: yup!
- alecm: i am forced to take the ultra-liberal definition of “more than one cpu, housed in more than one metal box, collaborating on one or more computational tasks”
- chris: OK – so what’s your feeling about what the difference between a “cluster” and a “grid” is ?
- alecm: Not a lot. I need a swig of tea before I can put words around an intuitive feeling.
- alecm: aside: someone last year suggested to me that Crack was one of the first [what I would now call] “grid-enabled” programs to be released onto the Internet. I found that an amusing thought.
- chris: I strongly doubt that Crack was the first, parallel programs have been around for years, and Crack (IIRC) relies on shared storage (usually NFS)
- chris: “I need a swig of tea before i can put words around an intuitive feeling” – I like that.
- alecm: I never said *the* first, just “one of”. 😎
- chris: Nah, it’s too closely coupled for a grid app
- alecm: Ha, hark at him – “nah, it’s too closely coupled for a grid app” – a lot of the grids we are selling use NFS for shared storage, and a rsh-alike for dispatch, and that is what crack did. 😎
- chris: Let me explain…
- chris: My gut feeling on the difference between clusters and grids comes down to how closely coupled they are. In the HPC world a cluster is a closely coupled set of nodes, usually (but not necessarily) homogeneous.
- chris: Closely coupled means shared filesystems and some interconnect, often low-latency but sometimes just gigE (or even just 100E if you’re not dealing with a lot of data, running only single CPU jobs, don’t care about latency, etc)
- chris: A grid is a collection of disparate systems where no assumptions about coupling can be made (jobs need to be mapped to local users, no shared storage, data needs to be staged in and results staged out, often from different systems) and there is no shared interconnect asides from commodity Internet links (I don’t think anyone is using Grid over UUCP in the real world, though it would be an ultra neat hack).
- chris: Often a grid is made up from independent clusters but they can also be cycle-scavanging setups using things like Condor on labs of desktop PCs; you can’t make assumption on the processor type or OS vendor, let alone library versions, etc..
- chris: All of which make building them a nightmare, but rewarding if you get them going.
- alecm: So: What’s your term for (say) that thing made of G5 Apple kit that is near the top of the Top500 list ?
- chris: VT’s system is a cluster, it’s got a shared interconnect (IB, I think) for low latency parallel jobs and its very closely coupled.
- chris: It ain’t a grid, that’s for sure.
- chris: A grid often transcends administrative boundaries
- alecm: Ok, so, my turn now ?
- chris: Yup – I’m all typed out for the moment.
- alecm: Ok, I’ll try from my end of the spectrum
- chris: No worries.
- alecm: The sort of customers I deal with on a day-to-day basis tend to use the word “cluster” to indicate a set of machines that provide a *service*, and that service is typically something that requires to be imbued with qualities of load-balancing for efficiency, or perhaps extremely high availability;
- alecm: Before you say it, I will acknowledge that this is an almost utterly different meaning to “cluster” in your sense. The most basic of *these* sorts of clusters might be a pair of machines which watch each other over a heartbeat network (eg: a SCSI bus) and when one deems the other to have died it assumes the other’s business role, eg: by ifconfig’ing up a virtual IP address and launching a HTTP server to replace the one on the dead machine
- alecm: Another might be “Clustered H/A NFS Servers” which use a “Cluster File System” (again slight terminology/usage tweak) to maintain state between physical machines that team together to ensure the service, whatever it is, is always available.
- chris: This is the disconnect between the business world (where HA is king) and the HPC world (where a standby machine is seen as wasted compute resources).
- chris: Yeah, the old STONITH take over system
- alecm: Exactly
- alecm: So, with this usage of the term “cluster” it seems that the business world has expanded the meaning of the word “grid” to encompass both terms as you use them.
- alecm: So, in my little world (at least) we talk about “classic” or “cookie-cut” grids, which in your world would be “clusters” — the Delia Smith ideal of having identical machines, fast interconnects, etc…
- chris: I guess the main problem for folks like you (and vendors in general) is tailoring your vocabulary to the person your talking to.
- chris: Because when folks start talking about wanting to sell us a “Grid”, they look like they’re clueless about the HPC world.
- chris: For me it’s very easy to forget that just because everyone I talk to uses the same vocabulary our niche is pretty small compared to the rest of the market and the rest of the world can use these terms in ways that are very strange to us.
- alecm: Works both ways; here’s a joke:
- alecm: main() { while(1) fork(); }
- alecm: If you giggled, you are in a set of less than perhaps 0.001% of the world’s population who might even raise a smile
- chris: 😎
- chris: Of course; and likewise with this whole Grid thing.
- chris: I guess what I’m driving at is that of all the folks I’ve talked to, people who call our Cluster a “Grid” sound like they don’t really understand our niche area. As with so many things it’s more a case of user education and perception than reality.
- alecm: /me thinks
- alecm: I was going to make a point earlier, but diverted myself
- alecm: Having defined “Grid” and “Cluster” as I/We use the terms, I want to further acknowledge that there is a top and a bottom half to both. The bottom-half is the technology; the top half is the management, and some people focus entirely on the top half. they tend to be business operations people.
- chris: …and of course then there is the “middleware” (god I hate that term)
- alecm: There is no getting away from it, though. 😎
- chris: Yeah, *especially* when you’re building Grids..
- chris: At least the whole web services/XML area is standardising things, though slowly..
- chris: WSDL, WSRF, Tomcat, etc..
- alecm: Are you being beaten up about SOA yet ?
- chris: SOA? DNS?
- alecm: SOA & SOI – Service Oriented {Architecture,Infrastructure}
- alecm: Basically “let’s build our systems out of software components in a manner akin to LEGO”; from my perspective it is common sense, but then i am not someone who gets paid for evangelising the obvious as if it were trendy, I get paid to deliver results.
- chris: Not come across those acronyms, most of the things we’re looking at are OASIS sort of things like WSDL
- chris: It sounds like software engineering.
- alecm: It is. At the webserver/appserver end of the scale.
- alecm: In crayon.
- chris: Hehehe
Leave a Reply