Experience with #Xen on #OpenSolaris: SUCKAGE ANALYSIS, + The Non-future of Xen & VMWare

After my Linux Suckage posting this can be dealt with relatively easily; so I will include analysis below to make up the shortfall in effort.

I have two non-Mac x86 machines at home – a AMD x64 server running OpenSolaris, and a old Thinkpad. The AMD is a file server and one does not like ripping those up and down all the time, for obvious reasons; so if I am to play with Xen then it has to happen on the Thinkpad.

Which has a 32-bit Pentium-M 1.6GHz processor which lacks Physical Address Extension / PAE.

Now, I *know* that PAE is not a requirement for Xen – however CentOS Linux will not boot Xen on the Thinkpad for lack of it.

So I thought “Let’s give OpenSolaris a go, Sun will surely have done it right! They have *proper* integration engineers!”

Darren pointed me at this wonderful page which (if you ignore the tail-end of the URL that suggests it is something to do with a 2008 release of OSOL) says:

How to Set up OpenSolaris as a xVM dom0

If you are running build 126 or later of OpenSolaris, enabling xVM is as simple as:

$ pfexec pkg install xvm-gui
$ pfexec svcadm enable milestone/xvm
$ pfexec reboot

There are three problems here:

1) the most recent revision of OSOL is 2009.06 which is build 111

OK, so I installed 2009.06 and then used the “dev” repository to upgrade it to build 126.

2) the upgrade process takes hours

Oops. Oh well. That’s DSL for you. Sort of. And unless I make a local repository, I will have to do this for every rebuild. Hmmm.

3) Sun removed 32-bit support for Xen in build 125

WTF!?!?! Yep. There it is. After all that work, the following “bug”:

http://bugs.opensolaris.org/view_bug.do?bug_id=6851808

There is a desire to stop delivering the 32-bit xVM hypervisor in OpenSolaris.

Machines that typically run xVM have many guests, one of the central roles of virtualization (to consolidate many physical machines). The limitations of a 4gb address space make it unfeasible to support many modern OS guests on 32-bit systems.

To add, these small systems become a maintenance and testing burden for the development team: time that would be better spent adding features to 64-bit dom0 or improving performance.

Removing the 32-bit hypervisor and dom0 will not have any effect on guests – we can still run 32-bit guests on a 64-bit dom0.

It is proposed to do this work in two stages:

* stop delivering the hypervisor itself and the kernel backend drivers
* stop delivering the 32-bit binaries for the userland components

This bug addresses the former. The latter is a little more complex as some of the userland components are currently tied to 32-bit by /usr/bin/virt-manager, which itself depends on a library from a separate Consolidation that is only delivered compiled as 32-bit.

“There is a desire…”? That’s not a bug, that’s a whinge. Smacks of cost-cutting, and seeing as I got laid-off recently I’ll just guess why.

But Xen? Fuck it. I might have a try playing with a custom Debian kernel build, but I can’t be arsed for the moment. I’ll stick to VBox.

Industry Analysis

Now here’s the thing: there are more than a hundred reasons why Sun Microsystems is in the shitter, but for me one of the really huge ones was that Sun “focused on its core competencies” in the late 90s and stopped hard-selling Solaris-able workstations to Universities, in favour of pushing Starfires at bankers and hoping they would stick. This led to a generation of college geeks who thought Linux was the best and only way to compute, and who then went on to use it everywhere when they got jobs at the banks.

Virtualisation vendors (VVs) like VMWare and Xen are in a similar and very tricky position – especially Xen which has an open-source technology base; virtualisation has become commoditised, and the deep-magic management software that was supposed to provide a revenue stream is following suit and/or being replaced by “cloud” GUIs.

I’ll bet you that the VVs are focusing on technological differentiation a-la Linux vs Solaris – and look where that strategy put Sun; what the VVs *ought* to be doing is treating the platform vendors (Sun, Ubuntu, Debian) in the symbiotic way that Sun used to treat its ISV community, and they should invest $$$$$$ in making sure that their virtualisation solution runs best / most easily / most effectively on each and every platform, in every reasonable incarnation of it.

That strategy alone leads to pull-thru, which is what the VVs desperately need.

People – including people like me – want “free and easy”; lacking that they will take “free or easy”, but of that “easy” is preferred.

It’s why I am using VBox on Mac. And it’s why, when I go back into the job market like any other student, I will be ignoring Xen and VMWare and instead find whatever is free and good at that point.

ps: “EC2 runs Xen?” Yah, and Sun had Wall Street.

Comments

29 responses to “Experience with #Xen on #OpenSolaris: SUCKAGE ANALYSIS, + The Non-future of Xen & VMWare”

  1. Stephen Usher

    The other problem Sun had when it dropped the ball was that SPARC massively lagged in performance behind x86 and this wasn’t addressed. By 2002 desktop SPARC machines were about two to three generations behind x86 *AND* cost more than twice as much.

    Not only this, but the desktop SPARC machines were based upon a two generation behind, tier three x86 (SIS) chipset which meant that the already crippled SPARC IIi and IIIi processors (with their tiny cache) couldn’t resort to fast memory access either, so they crawled, even relative to 486 machines.

    Now, Sun have jettisoned most of their core software development as well, meaning that although Solaris itself is still OK much of their “Enterprise” software is in a sorry state.

    e.g. Java [sic] Enterprise Identity Server(i.e. LDAP server + AD integration).

    As of version 6.3 there was no native Solaris PKG version of the install!

    Version 7 has corrected this, to a point, but gone is the Webconsole support (you have to roll your own web server roll-out, which still doesn’t work properly).

    Unfortunately, this is merely the end-point of the crumbling of Sun’s software I’ve witnessed over about five years. It’s very sad as the core functionality is good and isn’t serviced elsewhere.

    (JSEE is the only system I know of which can seamlessly handle bi-directional NIS and AD integration with an LDAP server giving a complete single-sign-on and password synchronisation system.)

  2. Mike Smith

    I concur with your analysis that when Sun stopped getting Solaris into universities they lost the hearts and minds of a generation of geeks.

    In my view Digital did the same in the late 80s when they tried to sell big Vaxes into banks. I was at an academic site which was inviting tenders to replace a Decsystem 20 in 1989. The business was Digital’s to lose, but their software licensing terms were crippling compared to Sun’s.

    I think this is when the geek hearts and minds started shifting from Digital to Sun and helped Sun to become the power that it was by the mid 90s.

  3. Part of what was so soulcrushing about working at Sun over the last 8 (nearly 9) years was watching a company that was based in geek get taken over by people who just could not comprehend that hearts and minds *is* a compelling long term business strategy.

    I don’t think Sun focused on core competencies. I think it focused on the stuff the Marketing team could understand. And there’s the fail.

    1. what she said. 🙂

  4. John Levon

    It’s quite simple: Xen is a server virtualization technology, and running 32-bit servers makes little sense these days. What you want in this case is VirtualBox.

    1. Because of course I am trying to familiarise myself with Xen before starting deployment on EC2, therefore I should use VirtualBox.

      That makes perfect sense. Not.

      How about switching off the “we know what’s good for you” attitude and re-reading the post, John?

  5. John Levon

    You don’t mention anything about EC2 in this post (I’ve read your older entries now).

    Anyway, in that case, what you’re trying to do makes even less sense, since EC2 does not use Solaris dom0, as I’m sure you know.

    1. Sigh. A little more consideration and you might work out for yourself that what I want is a dom0 of any flavour on this laptop.

      I really don’t care what OS it is.

      I need a Xen system at home to play with, and Solaris tragically can’t help because it has removed 32bit dom0 support, and thus I cannot run Solaris xVM/Xen on the machine.

      That sucks.

      Solaris has lost a chance to prove itself more useful / flexible than Linux.

      Oh dear.

  6. John Levon

    Does the fact you seemingly can’t find *any* OS that works indicate that maybe you’re in need of some different hardware? Seriously?

    1. @John: so every machine you have is 64 bit? And come to that, why are you not removing 32bit domUs ?

  7. Loreen, I think that was most telling when they named the replacements for the Ultra series “Sun Blade” when they were most definitely not blade-like. Oh and then they stuck the word “Java” in front of everything, even if there was nothing Java about it.

    Don’t get me started on the “Ultra 40” which was an AMD x86-64 based PC….

  8. Clive

    I think Sun missed the way the industry was headed in several respects, which combined to leave them a significant problem.

    Firstly, Solaris concentrated on providing enterprise-class features for big servers, and concentrated on SPARC. The x86 version of Solaris is now comparable with the SPARC version, but for much too long it was a toy, at best a way of running Solaris on a laptop without Tadpole’s help.

    Secondly, people have moved towards modestly-sized, especially colocated, servers rather than single large machines. Google has shown emphatically that it’s possible to dominate the world with no individual computer bigger than the average desktop PC.

    And thirdly, SPARC has become wildly uncompetitive compared with x86.

    While there’s no particular reason not to run Solaris on a 2U x86 colocated server, Solaris also has no real advantage over Linux in that context. Given that Linux has lower TCO and the sysadmins all know Linux better than Solaris because their home desktops run Ubuntu, Solaris has been edged out.

  9. If I can mention the ugly H word, PA-RISC technology was competitive longer than Sparc in terms of performance. It didn’t help the HP workstation market survive either.

    I think for many the fact that you could do many of the tasks that the Unix workstations could do under a different OS drove the death of the Unix workstations. Last new ones I saw were HP boxes that at the time were way ahead of the x86 offerings being used for some heavy duty modeling (although I pointed out they would have been better off price/model run with a small super computer). So their justification for the premium was performance alone, Solaris actually worked better for some of the software they used, but that wasn’t relevant.

    SUN were hardware price competitive in the low end server market at one point. But the services division were always trying to flog you way more hardware than you needed.

    Hearts and minds I’d buy as part of the explanation. Certainly you see that now with GNU/Linux adoption, where a lot of it is driven by what people know (free – gratis – is also easier when you are a poor student).

    Case in point I keep eying ZFS features and have file system envy but I’m not taking the time to try OpenSolaris out (not least I’m told it is like GNU/Linux was N years ago in terms of package management, hardware supprt etc – where N varies depending who you talk to, and what distro they use, but is always positive).

    XEN LiveCD any good for the learning experience? I’m told the CD is based on Debian Lenny, but no surprise there.

    1. Get the OpenSolaris Live CD for ZFS – you’ll recognise it from Ubuntu.

  10. Pete

    It’s funny how people don’t mind dropping 1000 quid on boutique proprietary hardware like MacBook Air’s, but complain at the same time about not being able to run leading-edge open-source virtualization software on a clunker Thinkpad, not worth the copper contained inside. Apple commonly stops supporting its hardware in its N+2 MacOSX releases but they can still do no wrong among its users. “Thank you Sir, may I have another”?

    1. @Pete – Yes it is funny, isn’t it. Your point is?

      Oh, I would run Xen on OSX but it doesn’t seem to be possible.

  11. John Levon

    Yes, every machine I have is 64-bit. But that’s not the point, anyway.

    We’re not removing 32-bit domUs because Linux distros come in either 32-bit or 64-bit flavours, and the former are still fairly prevalent.

    1. So what you are saying is that 32 bit hardware is no longer powerful enough to do virtualisation.

      Unless you use Virtualbox.

      The implications don’t speak well of Solaris’ Xen implementation and the vaunted performance benefits of the paravirtualisation. Sounds to me prima fasciae that Vbox is a better bet.

      I would have more respect for cost cutting than *that* excuse.

  12. Rick

    I have an idea for you – why not install the 2009.06 distro of opensolaris (which will work on your 32bit thinkpad), then enable xen like the page you’ve found shows you (it’s a few more commands to enable for 2009.06 pre-126 builds as you have discovered, but it still works just fine), and be content.

    You won’t be able to upgrade to a newer version of opensolaris, but then again, you’re not upgrading to a newer version of hardware. I’d file this under “it’s better to use an older rev than to use none at all”.

    As for moving forward, if you really want to be using a hypervisor, you’re going to want a better/newer machine regardless. And I have to agree with the poster above – it’s ironic that you don’t mind shelling out big bucks for your apple running a desktop, but you’re reluctant to use anything even somewhat recent for doing server virtualization exposure/experimentation/whatever.

    Light a candle, don’t curse the darkness – it’s a good rule to live by.

  13. Hi Rick,

    I considered doing that – it’s not a new idea, in fact it was one of the first things I considered – but you see since the reason for this experiment at all was to find a developmental purpose for this old machine, that the ongoing purpose is truncated by having no ongoing upgrade path with OpenSolaris makes it a wasted effort.

    How many bugfixes to ZFS, how many improvements to GNOME, how many “ipkg” fixes will I miss out on? Rather a lot, I understand…

    You and the Apple poster don’t seem to see the point but I remember the days when code that ran on a Solaris box – albeit SPARC – ran *everywhere*. The last arbitrary “let’s EOL a platform which people are still using” was when Sun4c got dropped after Solaris7 and Sun4m after Solaris9 – and those were done on a major-release schedule and well advertised.

    If you want to talk “End Of Feature” instead, consider NIS+ got an EOF with Solaris9 and only just got taken out of S10. I can only assume either Xen OR 32-bit Solaris is less important than NIS+ to be treated thusly.

    And you are preaching to me to “get a real computer” to run OpenSolaris; having worked for Sun I truly know the power of shite old machines tucked away in odd corners doing valuable work – most of Sun’s USENET and first Web infrastructure existed on such.

    To light a candle on this dark old hardware requires an operating system, and it regrettably that flame can’t be OpenSolaris.

    Or, if I use OpenSolaris then the flame will be VBox, not Xen.

    Why don’t *you* mourn that? You don’t care that people are trying to use OpenSolaris instead of Linux? You don’t *want* Solaris to be adopted by tinkerers who can build value (and thus need, requirement, and eventually service contracts) atop it?

    I certainly hope not, because I have a lot of love for old Solaris and Sun, but that attitude would not bode well for it’s survival.

  14. Tweeted as: Blog Discussion: How much of #Solaris / #OpenSolaris can be EOLed on 32-bit x86 machines before people [are] allowed to care?

    I think that’s now my question. Maybe Sun should kill 32-bit ZFS too, after all ZFS is much less efficient on 32-bit hardware without the beneficial effects of the larger address space.

    That wouldn’t have any downsides, 32-bit users can still use UFS, right?

  15. Talking of home tinkering, another bitter disappointment is that there’s still no version of opensolaris I can install on any of my old home sparc machines. Sure they’re not cost effective machines compared to modern kit, but they’re here, and they were quite good “fiddle about with Solaris, learning stuff” machines until all of a sudden they were excluded (no WANboot OBP capability => no OpenSolaris for YOU!).

    For example I’ve got a Blade 1000 with a couple of 711 cases, additional SCSI cards and GbE. It’s supposed to be my test setup for ZFS – throw a pile of SCSI disk in the enclosures and it’s a good toy for learning how to configure and admin ZFS. However… still not able to get opensolaris on it. Yes I can use Solaris 10 on it but I’d prefer to be spending my time learning the new revamped Solaris.

    Since I specify and buy server hardware, including which operating system to use for certain infrastructure roles, this impacts *directly* on certain purchasing decisions at my employer.

    It seems to be hard for Sun to connect the dots between new hardware purchases and “hobbiests who want to polish their skills in their basements” but Alec is 100% right, there IS a link, but it’s getting harder to maintain.

  16. Rick

    I understand your position – believe me, I wrote the book on wringing out cycles from ancient hardware (most of it being sparc hardware, btw.)

    But – unless you’re planning on doing xen development, why not be content with the version of xen which runs (0906) on your laptop, then install whatever newer/dev version of opensolaris that you want in a domU on top. You’ll still be able to keep doing a ‘pkg image-update’ to the newest opensolaris build as you wish. I find that opensolaris pv domU guests on my opensolaris dom0 box perform plenty adequately.

    As a long-running Sun customer at work (who pays) and at home (who doesn’t), I don’t blame Sun for making business decisions which are more guided by the former than the latter. They are a business after all. I’m more than happy with what I get for free from Sun (via RTU speak).

  17. If this is really important to you, perhaps it’s worth finding the changesets that removed 32 bit support (there will be a few, as the changes span several repositories) and patching stuff back into current bits. Off the top of my head I can’t see why this wouldn’t work with a couple of days tinkering.

    Hey, that’s part of what this new-fangled open source thing is all about, right?

    (And when thundering herds coming running when it works they can help you maintain the fork.)

  18. @Dave: that’s also known as the “if the user sees a problem with OpenOffice he can download it and fix it” – except that for large projects like OOo the whole many-eyes/shallow-bugs thing falls apart; the former “many eyes is scalable” fallacy (also related to Zawinski’s ADD-programmer syndrome) being what dooms OOo to being supported by Oracle / a spinoff in the future

    For the whole of ON, downloading it to fix this makes even less sense.

    What I want to do is tinker. I can do that more easily with Linux than OpenSolaris. re: Xen, I don’t care what I run dom0 on, and thus OpenSolaris loses because it has presented me with an impediment which is more easily circumventable in Linux, but I would rather it was not necessary to circumvent in eiither.

    What angers me is as a former Sun employee and a lover of Solaris, that this has been taken away. I am angry because it bodes ill for the future of Solaris, for the reasons I have outlined.

    My “why not disable 32-bit ZFS, too?” question still stands.

    Yes I understand business economics, but I also understand open-source and I understand adoption, and from those reasons I consider this an extremely stupid move on Sun’s part.

  19. I’m struggling to appreciate your anger. As you point out, we are economically constrained. Keeping 32 bit dom0 support would have meant not doing something else, such as improvements to the disk infrastructure, upgrading to Xen 3.4, etc. The team made a call about priorities based on their understanding of Sun’s priorities, which in general comes down to customer needs.

    We might have chosen to leave the support in place and stopped testing it. Very recent experience shows that it breaks very quickly if we take that approach, in part because other ON developers don’t test with Xen often enough (if at all). When faced with a choice of “you can’t do that” or “yes, we hear that it’s broken but don’t plan to fix it and don’t test it”, my preference would be for the former, because that’s firm ground on which to stand.

    In this particular case your preference lost out. It’s never going to be possible to please everyone.

    Your OpenOffice comment alludes to it being too hard to contribute to OpenSolaris in this area. I’d agree that’s the case, both from technical (the Xen stuff is difficult to work with and poorly documented) and organisational perspectives (contributing can be a slow and difficult effort for non-Sun employees). That said, some significant contributions have been made (uppc support came from someone external to Sun who found himself in much the same position as yours).

  20. Oh, I didn’t answer your ZFS question.

    I’m not involved in ZFS development or decision making. I’d expect that 32 bit ZFS is important for a significant number of Sun’s customers. As such, it stays, even though there may be challenges associated.

    32 bit dom0 failed the “important for a significant number of Sun’s customers” test.

  21. @Dave: This probably sounds aggressive in a way that I am not actually trying to be, but: I’ve never really wanted or needed you or anyone to appreciate my anger in any sense of the word – it is what it is, and in more detail my feeling is that deprecation of 32-bit in any way whatsoever is

    a) unfortunately the thin end of a bad wedge – what to cut next? drivers? – and

    b) strategically unwise for Sun, and

    c) a damned nuisance from my perspective which is driving me back towards Linux and/or away from Xen entirely, which

    d) (repeating point b) ought to be something that you yourself might consider detrimental to your cause. Your [Sun’s] call on that one – woo your users, or not?

    Also, I’ve never been one to buy into the “Well what’s your alternative to doing the wrong thing?” approach to team management, because I believe the usual corollary (“If you don’t have an alternative, shut up”) is cover for a great deal of incompetence and blame-spreading, worse (or at least, more common) than other tactics of mediocrity like SunShots were.

    So, I speak up.

    Maybe next time people will be more alert to customer feelings; and maybe next time the pansy-ass “There is a desire to stop delivering the 32-bit xVM hypervisor in OpenSolaris” will be put more honestly and frankly. The decisions that get made are still Sun’s decisions, but at least they know who they’re pissing-off.

  22. alcem: Note that this do not affect 64-bit capable hardware, and almost all computers other than netbooks sold today are 64-bit capable.
    “(uppc support came from someone external to Sun who found himself in much the same position as yours)”
    Yea, that is a different story. MS mandated that all new desktop computers are to have the IO-APIC enabled in PC2001 back in 2000, but obviously it didn’t apply to laptops, and if you read Intel’s spec updates on the ICH/ICH2/ICH3 southbridges, you will read that these southbridge had an errata that made the C2/C3 sleep states unusable with the IO-APIC enabled. While unimportant for desktops, laptops need these states to save battery power. Intel finally fixed it in the ICH4, and some ICH4M laptops have the IO-APIC enabled, but it wasn’t until the ICH6M generation that all of them had the IO-APIC enabled.

Leave a Reply

Your email address will not be published. Required fields are marked *