Sunday, April 06, 2008

Getting apcupsd Working on CentOS, and Probably Elsewhere

In order to provide proper resilience for any server, it needs to react well in power outages. This means using a UPS (Universal Power Supply) and shutting down gracefully when all battery power has been exhausted, coming back up automatically when power is restored.

In order to do this we need an application that is able to communicate with the UPS, detect that there is a power outage and when the batteries have run down to a certain level, gracefully shut the system down. There is a piece of open source software called apcupsd that does this. Download the RPM for Enterprise Linux 5 from here.

Install via the normal rpm -ivh method. It’s also possible to get apcupsd through the DAG extended repository for CentOS and Red Hat, via RPMforge. As this is a one off package, we’ll just install it manually.

However, if you’re installing on a system that uses LVM, or you have a /usr partition that is separate from / then some further work will be needed. Make sure that your / partition is not on LVM. When you test shut down the machine you will notice that apcupsd does not power off the UPS correctly because it cannot find certain components – namely libnetsnmp.so.10, libz.so.1 and the wall executable. We can fix this by making copies to the /lib64 directory (if you run a 64-bit system):

cp /usr/lib64/libnetsnmp.so.10 /lib64
cp /usr/lib64/libz.so.1.2.3 /lib64
ln -s /lib64/libnetsnmp.so.10.0.1 /lib64/libnetsnmp.so.10
ln -s /lib64/libz.so.1.2.3 /lib64/libz.so.1

In addition, the wall executable cannot be found, as it resides in /usr/bin. If you do a check on the dependencies for wall you will see the following:

ldd /usr/bin/wall
libc.so.6 => /lib64/libc.so.6 (0×0000003092200000)
/lib64/ld-linux-x86-64.so.2 (0×0000555555554000)

This is fine, and shows the dependencies are there in the / directory. All we need to do is make a copy of wall to the root bin directory:

cp /usr/bin/wall /bin/span>

apcupsd will now have all the dependencies it needs when shutting down the machine, and when shutting down the UPS when all but the / partition has been unmounted.

Tuesday, February 19, 2008

PostgreSQL Versus...........Everyone Else

Recently, I have been pushing for us to move our databases for our web-based systems from MySQL to Postgres. This has finally been happening with very little trouble or drama, although many of our systems are now based on Rails and ActiveRecord so the change is transparent.

Things have finally got to a head where the features MySQL are adding are already there in Postgres, MySQL isn't quite as stable as we though it would be for a database system we will leave unattended for months at a time and the advantage of performance that MySQL supposedly once had has all but evaporated. From where I sit, it's becoming much easier for PostgreSQL to work away on getting their many features more efficient than it is for MySQL to add some of the features it lacks reliably. This isn't really a solid reason, but now that MySQL has been taken over by Sun I'm also not sure of their future direction. I've seen first hand how Sun have managed takeovers before, particularly in the case of Cobalt, and it doesn't inspire confidence. Sun always seem to pay over the odds for companies that don't really fit well with them or their current customers and market.

Additionally, I'm also pushing (begging on occasion) for one of our major clients (no names!) to update a system from using Access..............to just about anything else. They're not swallowing this at the moment because the current system works just about OK with the hacks we've put into it (we only keep a few days of the most recent data in the main database and then back up to another) and we get the odd corruption. This is a hangover from how the system was initially developed via another third-party. MTS was going to be used, and this was then dispensed with as they discovered they were trying to store stateful objects in it and the thing died, and then they had all the clients connecting directly to the Access database over a network share as a last resort. Surprisingly, with a few clients this seems to work, although Access 97 seems to be far better at concurrent connections than any later versions. The performance, as you can imagine, is terrible and getting meaningful data out of these databases ,which are kept on different sites, is a nightmare.

I wouldn't care, but this is a financial system! It just shows how temporary hacks that just about work turn into permanent solutions. Postgres would be an absolutely ideal replacement for this, would increase performance markedly, we'd have no corruption trouble, we could introduce proper transactions which would help enormously, we wouldn't need to back up data into another Access database (some of which are now creaking under the stress of their size and historical data), it's far easier to get access to the databases over a leased line with limited bandwidth, it would make support and new systems development far easier for us - and then there's the cost. The rest of the organisation pretty much uses Oracle, and installing and configuring Oracle at all these remote sites would be a really significant cost in terms of licensing and support for the Oracle DBAs.

I've read a lot of arguments about how Postgres is not suitable as a replacement for Oracle, but for all but the really, really, really high end features that pretty much no one uses, it's actually an improvement. I laughed at the comments in that article about editing the pg_hba.conf file being unintuitive (most things in Oracle are unintuitive) and crashing under extreme load. The first time I saw Oracle 9i and 10g start on a Windows box I was shocked. It took a full two or three minutes for it to come up, and that's when there was no data running on it yet. You needed a good gigabyte of memory on the server before it did anything. In contrast, the Windows ports of Postgres and MySQL start up in no time and take up far less resources than any Oracle installation.

I've been really impressed with how much Postgres has improved through the 8.x cycle, and the performance ideas and improvements that has come through such as asynchronous commit, autovacuum is now on by default in 8.3, full text search, lots of procedural language support and a native Windows port that is now truly excellent. These kind of steady improvements increase my confidence in Postgres still further.

Monday, January 21, 2008

Installing and Configuring Fedora Directory Server 1.1

Although there is some good documentation on Fedora Directory Server, the route I took to getting it installed was a sharp departure. If you simply install fedora-ds as they say on the installation page then you’ll get a whole lot of unneeded dependencies, such as the console GUI, which aren’t necessary since I’m running a non-GUI server. Firstly, you need to upgrade to CentOS 5.1 by doing a yum upgrade:


yum upgrade

If this pulls in the kernel then remember to configure the divider=10 option we have in this kernel within the /boo/grub/grub.conf file to avoid virtual machines taking up excessive CPU time: http://bugs.centos.org/view.php?id=2189.


Then look at the Enterprise 5 instructions on the FDS project page at http://directory.fedoraproject.org/wiki/Download#Enterprise_Linux_5. We’re going to take a detour from this set of instructions though.


Type the following:


yum install svrcore mozldap perl-Mozilla-LDAP

Set up the system to install Fedora 6 packages:

rpm—import http://download.fedora.redhat.com/pub/fedora/linux/core/6/i386/os/RPM-GPG-KEY-fedora
rpm—import http://download.fedora.redhat.com/pub/fedora/linux/extras/RPM-GPG-KEY-Fedora-Extras

We now need to configure our system to use Sun’s Java, rather than the GCJ used within Red Hat. Read and review the steps on the http://jpackage.org installation page. Download the latest JDK (development kit) Java from http://java.sun.com. Download and install this like so:


wget -O jdk-1_5_0_14-linux-i586-rpm.bin http://192.18.108.147/ECom/EComTicketServlet/BEGIN00F9CA7189084FD8F5239E020E63EC2A/-2147483648/2545881171/1/869090/868934/2545881171/2ts+/westCoastFSEND/jdk-1.5.0_14-oth-JPR/jdk-1.5.0_14-oth-JPR:4/jdk-1_5_0_14-linux-i586-rpm.bin

Install the RPM:

rpm -ivh jdk-1_5_0_14-linux-i586-rpm.bin
Note that the wget -O trick becomes necessary here because of the long filename, which wget doesn't get.

Install the JPackage utilities:

yum install jpackage-utils

Now we need to get the jpackage-compat installation as per the instructions on the JPackage installation page. Grab the matching RPM file from the java-sun-compat download page and get the matching RPM for the JDK you’ve just installed. If you have JDK 1.5 Update 14 then get java-1.5.0-sun-compat-1.5.0.14-1jpp.i586.rpm.

wget ftp://jpackage.hmdc.harvard.edu/JPackage/1.7/generic/RPMS.non-free/java-1.5.0-sun-compat-1.5.0.14-1jpp.i586.rpm

rpm -ivh java-1.5.0-sun-compat-1.5.0.14-1jpp.i586.rpm

We now have JPackage installed and ready for us. We can now go back to the FDS install and install what we need. The above allows the package jss to be installed, or we can use JPackage’s jss should we need to (I didn’t) via adding a yum repository from their download page.

Install admin-util and jss from Fedora Core 6:

rpm -ivh http://download.fedora.redhat.com/pub/fedora/linux/extras/6/i386/adminutil-1.1.5-1.fc6.i386.rpm

rpm -ivh http://download.fedora.redhat.com/pub/fedora/linux/extras/6/i386/jss-4.2.5-1.fc6.i386.rpm

We then need to add the FDS repositories to our system:

cd /etc/yum.repos.d

wget http://directory.fedoraproject.org/sources/idmcommon.repo

wget http://directory.fedoraproject.org/sources/dirsrv.repo


Now, the FDS project page isn’t completely forthcoming about the individual packages needed to get a base directory and administration server going, but they can be found on the FHS packaging page. We need fedora-ds-base, which gives us the base directory server, fedora-ds-admin, which gives us the administration server, fedora-ds-console and fedora-admin-console. The last two provide jars necessary for remote administration.
yum install fedora-ds-base
yum install fedora-ds-admin
yum install fedora-ds-console
yum install fedora-admin-console

Friday, November 09, 2007

Installing VMware Server

Firstly, we need to install the gcc compiler so VMware can compile its kernel modules:

yum install gcc

To install VMware, save the RPM from the VMware site in an appropriate location, and run the following to install it:

rpm -ivh VMware-server-1.0.4-56528.i386.rpm

Then run:

/usr/bin/vmware-config.pl

which runs the VMware configuration utility. You’ll see this:

Making sure services for VMware Server are stopped.

Stopping VMware services:
Virtual machine monitor [ OK ]

You must read and accept the End User License Agreement to continue. Press enter to display it.

Do you accept? (yes/no)

Press enter to display the license, then press ‘q’ to quit and then type in yes to accept the license.

In which directory do you want to install the mime type icons?
[/usr/share/icons]

Accept the default and press enter.

The path ”/usr/share/icons” does not exist currently. This program is going to create it, including needed parent directories. Is this what you want? [yes]

Press enter.

What directory contains your desktop menu entry files? These files have a .desktop file extension. [/usr/share/applications]

Press enter.

In which directory do you want to install the application’s icon? [/usr/share/pixmaps]

Press enter.

Trying to find a suitable vmmon module for your running kernel.

None of the pre-built vmmon modules for VMware Server is suitable for your running kernel. Do you want this program to try to build the vmmon module for your system (you need to have a C compiler installed on your system)? [yes]

Press enter. This is where we need GCC to compile the vmware kernel module for our running kernel.

What is the location of the directory of C header files that match your running kernel? [/lib/modules/2.6.18-8.1.14.el5/build/include]

Press enter. This is where the kernel headers and source are to enable compilation.

Extracting the sources of the vmmon module.

Building the vmmon module.

Using 2.6.x kernel build system.
make: Entering directory `/tmp/vmware-config0/vmmon-only’
make -C /lib/modules/2.6.18-8.1.14.el5/build/include/.. SUBDIRS=$PWD SRCROOT=$PWD/. modules
make1: Entering directory `/usr/src/kernels/2.6.18-8.1.14.el5-x86_64’
CC [M] /tmp/vmware-config0/vmmon-only/linux/driver.o CC [M] /tmp/vmware-config0/vmmon-only/linux/hostif.o CC [M] /tmp/vmware-config0/vmmon-only/common/cpuid.o CC [M] /tmp/vmware-config0/vmmon-only/common/hash.o CC [M] /tmp/vmware-config0/vmmon-only/common/memtrack.o CC [M] /tmp/vmware-config0/vmmon-only/common/phystrack.o CC [M] /tmp/vmware-config0/vmmon-only/common/task.o CC [M] /tmp/vmware-config0/vmmon-only/common/vmx86.o CC [M] /tmp/vmware-config0/vmmon-only/vmcore/moduleloop.o LD [M] /tmp/vmware-config0/vmmon-only/vmmon.o Building modules, stage 2. MODPOST CC /tmp/vmware-config0/vmmon-only/vmmon.mod.o LD [M] /tmp/vmware-config0/vmmon-only/vmmon.ko
make1: Leaving directory `/usr/src/kernels/2.6.18-8.1.14.el5-x86_64’
cp -f vmmon.ko ./../vmmon.o
make: Leaving directory `/tmp/vmware-config0/vmmon-only’
The module loads perfectly in the running kernel.

This should now have compiled OK.

Do you want networking for your virtual machines? (yes/no/help) [yes]

Press enter for yes. We do want networking in our VMs.

Configuring a bridged network for vmnet0.

Your computer has multiple ethernet network interfaces available: eth0, eth1. Which one do you want to bridge to vmnet0? [eth0]

Because we have two ethernet ports on the server, it will assign virtual interfaces to both of them. Press enter here, since we want vmnet0 to map to eth0.

The following bridged networks have been defined:

. vmnet0 is bridged to eth0

Do you wish to configure another bridged network? (yes/no) [no]

The following bridged networks have been defined:

. vmnet0 is bridged to eth0

Do you wish to configure another bridged network? (yes/no) [no] yes

Type yes, because we want to assign eth1.

Configuring a bridged network for vmnet2.

The following bridged networks have been defined:

. vmnet0 is bridged to eth0
. vmnet2 is bridged to eth1

All your ethernet interfaces are already bridged.

Do you want to be able to use NAT networking in your virtual machines? (yes/no) [yes]

Type no here. This just means that it will create a virtual network sitting behind NAT, which we don’t need.

Do you want to be able to use host-only networking in your virtual machines? [no]

Type yes here, because this creates a virtual internal network that we can put our VMs on, separate from the physical network.

Configuring a host-only network for vmnet1.

Do you want this program to probe for an unused private subnet? (yes/no/help) [yes]

Press enter here. This will just assign an unused network address to the virtual network.

Probing for an unused private subnet (this can take some time)...

The subnet 172.16.120.0/255.255.255.0 appears to be unused.

The following host-only networks have been defined:

. vmnet1 is a host-only network on private subnet 172.16.120.0.

Do you wish to configure another host-only network? (yes/no) [no]

Press enter for no here.

Extracting the sources of the vmnet module.

Building the vmnet module.

Using 2.6.x kernel build system.
make: Entering directory `/tmp/vmware-config0/vmnet-only’
make -C /lib/modules/2.6.18-8.1.14.el5/build/include/.. SUBDIRS=$PWD SRCROOT=$PWD/. modules
make1: Entering directory `/usr/src/kernels/2.6.18-8.1.14.el5-x86_64’ CC [M] /tmp/vmware-config0/vmnet-only/driver.o CC [M] /tmp/vmware-config0/vmnet-only/hub.o CC [M] /tmp/vmware-config0/vmnet-only/userif.o CC [M] /tmp/vmware-config0/vmnet-only/netif.o CC [M] /tmp/vmware-config0/vmnet-only/bridge.o CC [M] /tmp/vmware-config0/vmnet-only/procfs.o CC [M] /tmp/vmware-config0/vmnet-only/smac_compat.o SHIPPED /tmp/vmware-config0/vmnet-only/smac_linux.x86_64.o LD [M] /tmp/vmware-config0/vmnet-only/vmnet.o Building modules, stage 2. MODPOST
WARNING: could not find /tmp/vmware-config0/vmnet-only/.smac_linux.x86_64.o.cmd for /tmp/vmware-config0/vmnet-only/smac_linux.x86_64.o CC /tmp/vmware-config0/vmnet-only/vmnet.mod.o LD [M] /tmp/vmware-config0/vmnet-only/vmnet.ko
make1: Leaving directory `/usr/src/kernels/2.6.18-8.1.14.el5-x86_64’
cp -f vmnet.ko ./../vmnet.o
make: Leaving directory `/tmp/vmware-config0/vmnet-only’
The module loads perfectly in the running kernel.

Please specify a port for remote console connections to use

There may be a warning here about port 902 already being in use. Press enter, and if there is a warning that port 902 is in use just enter 902 anyway.

In which directory do you want to keep your virtual machine files?
[/var/lib/vmware/Virtual Machines]

Type in the location /vms. This is where our VM partition is.

Please enter your 20-character serial number.

Type XXXXX-XXXXX-XXXXX-XXXXX or ‘Enter’ to cancel:

For this, just copy and past one of the serial numbers in. If you registered, you should have got a bunch e-mailed to you.

Starting VMware services:

Virtual machine monitor [ OK ]

Virtual ethernet [ OK ]

Bridged networking on /dev/vmnet0 [ OK ]

Host-only networking on /dev/vmnet1 (background) [ OK ]

Bridged networking on /dev/vmnet2 [ OK ]

The configuration of VMware Server 1.0.4 build-56528 for Linux for this running kernel completed successfully.

All done.

Satisfying Dependencies With VMware Server

Installing VMware Server on a server with no graphical libraries installed is a peculiar experience, because even though no GUI is running VMware depends on a lot of X libraries. CentOS also installs a fair bit by default that it shouldn’t. Running /usr/bin/vmware-config.pl performs a check, and if dependencies aren’t fulfilled then we get this:

The correct version of one or more libraries needed to run VMware Server may be missing. This is the output of ldd /usr/bin/vmware:

linux-gate.so.1 => (0xffffe000)
libm.so.6 => /lib/libm.so.6 (0×0034a000)
libdl.so.2 => /lib/libdl.so.2 (0×00305000)
libpthread.so.0 => /lib/libpthread.so.0 (0×0038b000)
libX11.so.6 => not found
libXtst.so.6 => not found
libXext.so.6 => not found
libXt.so.6 => not found
libICE.so.6 => /usr/lib/libICE.so.6 (0xf7fdb000)
libSM.so.6 => /usr/lib/libSM.so.6 (0xf7fd2000)#
libXrender.so.1 => not found
libz.so.1 => /usr/lib/libz.so.1 (0×0030b000)
libc.so.6 => /lib/libc.so.6 (0×001c6000)
/lib/ld-linux.so.2 (0×001a9000)

This program cannot tell for sure, but you may need to upgrade libc5 to glibc before you can run VMware Server.

Firstly, do a:

yum remove libXi libXtst libXt libXrender libXext libX11

from the command line. This removes all X libraries pretty much from the system, along with a ton of stuff that CentOS (Red Hat) sees fit to install along with it that we don’t need like Pango and Cairo, and gives us a reasonably clean slate. Next, install the following needed libraries from the command line:

yum install libX11.i386 libXrender.i386 libXt.i386 libXtst.i386 libXi.i386

This is a 64-bit system, but VMware is a compiled 32-bit application, hence the i386 extensions. This should then enable the /usr/bin/vmware-config.pl to successfully run. Additional packages needed are:

yum install xinetd

xinetd is a package that allows servers to register themselves with the system, and listen on ports without the server itself having to be running in the background. xinetd hands off to the server process and it then starts running.

Monday, September 17, 2007

AMD PowerNow and Keeping Time in a VMware Virtual Machine........

For the past few days I've been scratching my head like anything at a time problem I've been having with running CentOS 5 in a VMware Server virtual machine. Time issues within virtual machines are certainly not uncommon as a bit of Googling should tell you, but rather than gaining time as is usually the case, my test VM has been losing time, and it has been impossible to see a pattern. I've used clocksource=pit, pmtmr, noapic, nolapic and nosmp as kernel options in the VM and nothing has made it any better. VMware tools is installed and working. I did add the following to /etc/vmware/config:

host.cpukHz = 2000000
host.noTSC = TRUE
ptsc.noTSC = TRUE


host.cpukHz is the speed of my processor (a couple of Opterons), which corresponds to 2GHz. This was needed because of the CPU frequency scaling within the motherboard. However, even this didn't make any difference. This was eventually solved by turning off AMD PowerNow in the host machine's BIOS (the real hardware). I took out all the kernel options I had put into the VM grub config file above, and it still worked perfectly. Time is now kept absolutely perfectly within the VMs to the second, with VMware tools installed and functioning.

Keeping the above options in the VMware config file seem to have the effect of keeping the VM's clock a few seconds behind so that the VMware Tools can synchronise properly, and stopping the VM clock from gaining a few seconds. Apparently, VMware tools can do nothing to synchronise the time if the VM's clock is fast.

Tuesday, June 05, 2007

Software or Hardware RAID Part 3: Conclusion - Hardware RAID Where Possible

A while ago I agonised over what an awful lot of people seem to agonise over when building and choosing servers and trying to judge the best options for storage, and came to the conclusion at the time that software RAID was going to be the best way because of the hardware we had available and the need for hardware RAID cards to really be in a PCI-X slot. Yes, you can use PCI slots, but really, you're limited by the I/O bandwidth. Like I said at the time, if you can find the right hardware then hardware RAID is preferable - and that's certainly the conclusion I have come to now. If you don't have the right board, or only have fake RAID available to you, please, just use software RAID. You will have pain, I promise you. You may have to forget hotswapping your disks though, because you really need your disk controller to react in the right way when a disk is disconnected and a new one added otherwise bad things will happen.

Well, the motherboard we were going to use just didn't happen for us for various reasons, and so I specifically looked at boards that would accommodate PCI-X slots. The availability of dual processor, server-based motherboards in the UK seems to be surprisingly spartan, with many dual CPU boards given over to ridiculous dual SLI graphics card workstation requirements. Needless to say, I don't need any of that. I eventually plumped for a Supermicro H8DAE after doing some reading around and reading some horror stories regarding an equivalent Gigabyte motherboard and PCI domains. Having got it all up and running now with CentOS, it's a very nice board with nothing more than I need with dual LAN, 4 PCI-X slots, 2 PCI slots, built-in graphics and IPMI support. Other bits and bobs in the new machine include:
  • The hardware RAID card available to us, at a very reasonable price, was an Adaptec 2420SA four port SATA card. It's a very nice card having got it up and running on Linux. The one thing I was worried about were management tools, because the OS will simply not see or know if a RAID array has a problem - it just sees a disk, remember. The Adaptec Storage Manager is perfectly adequate, with e-mail notifications, and I must commend Adaptec on their improved Linux support. aacraid is directly in the kernel now, and I got the whole thing up and running with no yucky driver installs. Yay! We also have a lot more scope for rearranging our storage and adding larger hard drives later on, without any downtime if we play our cards right. Needless to say, this does not mean you shouldn't have a backup system!
  • Chenbro server case. Solid little thing that is lockable and has a good SATA cage for RAID purposes.
  • 2 x AMD 146 Opterons. There is no Pacifica virtualisation support with these, or with the motherboard I have, but quite frankly, so what? I'm rather sceptical about the support and quality for this for the foreseeable furture, and VMware will see us right. We already have some existing VMs anyway.
  • Seasonic S12 600W PSU. A top quality PSU that is absolutely silent and totally cool (I'm talking about its temperature there). You couldn't buy any server with a better PSU than this. Spot on. Just be aware of a caveat with PSUs - make sure you get an EPS-SSI compatible PSU if you need it, otherwise some boards like many Tyans will just not power up. An additional CPU power connector isn't necessarily enough.
  • 2 x Xilence Blade coolers. The ones pictured there seem to have an aluminium block, but mine have a copper one. There's a story behind this :-). When we ordered our motherboard the seller didn't pack the right CPU fitments for a H8DAE and packed the wrong I/O backpanel. They packed those for a H8DAE-2 instead (I got some free SATA cables though ;-)). Our seller refused to send the right stuff out even after it was explained (tossers), and rather than go through the hassle of changing the board I got on to Supermicro support who kindly sent out the right I/O backpanel. I then took the opportunity to get some CPU coolers that were more silent, cooler and better for dust and cleaning than the stock ones I had. I also needed a cooler that had its own motherboard backplate. The flower coolers were too big for the board, so I went for the Xilence coolers. At first I thought they weren't going to fit, so I turned one cooler around one way and the other in the opposite direction. The RAID cage was a little close for comfort to one cooler, but doesn't seem to have made any difference to airflow or cooling having monitored it. Both CPUs maintain an idle temperature of about 40 degrees celsius, with one CPU one or two degrees out, as it always seems to be in dual CPU systems.
  • 2 x 1 gigabyte sticks of EEC registered DDR. Standard stuff. Just make sure you get EEC registered RAM for these boards.
  • 4 x Samsung 200 gigabyte drives. You do get warned by Adaptec that you should use, supposedly, enterprise class disk drives because of the way a normal desktop drive retries on bad sectors. That article does say this can result in data loss, but I'm not entirely sure how, and I'm not sure if this is really relevant or whether it's just a way of selling more expensive hard drives.
All in all, a tidy little machine, and but for the system fan which is necessary considering I have two CPUs and four disks in an enclosed RAID cage in there (albeit with its own fan), it would be almost completely silent. I was quite impressed.

Most of my decisions on what operating system to use, and what flavour of Linux to use, centred around what kind of aacraid support it had and whether I could easily get the Adaptec Storage Manager up and running or not. I mean, if the underlying disk storage is flaky and I can't manage the RAID array and get notifications when something goes wrong, what's the point? Everything else will fall to pieces around it. As it turned out, Adaptec has RPMs built for Red Hat and Suse specifically, but it isn't a whole lot of work to get it up and running on any 32 or 64-bit Linux system. The right dependencies are all there, which is the 32 or 64-bit libXP library as far as I can work out.

So I went for a RHEL (Red Hat Enterprise Linux) compatible system, which for us is CentOS 5. We're not running too much on the base system here, as much of our functionality will be on further virtual machines. CentOS has given me the option of using driver disks from Adaptec and other manufacturers if I so wish (hasn't been necessary as it turns out), packages such as Adaptec Storage Manager I can just install without tinkering, a bit more confidence in 64-bit and 32-bit backwards compatible support, better Xen support if I want it and much more predictable updates to the system. There are some things about CentOS, and hence RHEL, that I don't like but I'll get on to them at another time.

Conclusion: Go for hardware RAID every time, but you must find yourself the right controller. If you can't, you don't have the right hardware and you only have, God forbid, fake RAID then go the software RAID route.

Saturday, May 26, 2007

OpenXML as an ISO Standard: What Exactly is the Problem?

The primary reason why I started to talk about OpenXML more recently (OOXML, Office Open XML - whatever), and make a few comments here and there about it, is simply because I do not understand the rationale behind it in any way shape or form. That's pretty much it in a nutshell. Every comment, article and blog post I have ever read about it (and I've read quite a bit of the actual specification now - heavy going) has basically been trying to justify its existence. It's an awful lot of talk about..........nothing in particular. Some of the reasons have been quite flimsy to say the least, and the main one I've gathered out of it all is this:
OpenXML is about Microsoft documenting their Office formats.
Well, that's absolutely fine. If Microsoft want to document their formats and then submit them to some standards body then that's absolutely fine. No problem. That really is entirely up to them. Quite why Microsoft created another format to do that when it's transparently obvious that OpenXML is simply a translation of the old binary doc format and Microsoft Office features into a pseudo-XML format, I don't know. This is logically the case because Microsoft claims they created OpenXML as being backwards compatible. It isn't backwards compatible in terms of the files actually created, but it supposedly is backwards compatible with the Microsoft Office features and quirks within it because they can't be bothered to do a proper conversion.

Why not just document the binary format instead? Why not just create an entirely new and sensible format, eminently suitable for ISO submission, that referenced no previous versions of Microsoft Office or Windows technology and where older Microsoft Office documents could be simply converted, which is the job of Microsoft Office anyway? The latter question renders the issue of backwards compatibility a pretty void reason to me, but many people avoid that question.

However, the issue of it being an ISO standard is a different matter entirely, simply because any ISO standard must meet the aims and goals of the ISO. The biggest of those aims is about free trade and using standards to effectively communicate internationally, between governments, companies and organisations of various sizes:

http://www.iso.org/iso/en/aboutiso/introduction/index.html#six
http://www.iso.org/iso/en/aboutiso/introduction/index.html#eight

As you can probably imagine, being able to actually implement a standard in full, unambiguously, is of vital importance in an international standard. To create some parity with other industries and ISO standards, imagine just how worthless ISO 9000 would be if it transpired that only one of company could truly implement it - and they sold quality systems. I would imagine that would cause quite a big fricking fuss considering that ISO 9000 has become a requirement for many fields of work over the years. Having to buy a vendor's product to adhere to strictly it properly is not exactly on the agenda. It's expensive enough as it is.

This is why I had a bit of a go at Rick Jellife's blog entry, because in reviewing a proposed ISO standard this is absolutely all that matters. It's very important, no? Ultimately, a standard is about communication, and if the standard miscommunicates in a big, big way then you have a big, big problem - because it's useless. So, being able to implement a standard, and OpenXML should it want to be an ISO standard, is of absolutely vital, primary and fundamental importance. If it can't be proven to be fully implemented beyond one platform and office suite, then given that it is a standard that documents the format of one company's office suite, the ISO has then effectively given them a monopoly. Not good, eh, and not what all the talk about open document formats and the whole issue of ODF was about?

This whole issue is quite apart from any duplication of ODF by OpenXML that many people come up with. These debates usually end with accusations that many ODF supporters want there to be only one ISO office document standard, which would bizarrely create a monopoly for IBM...........somehow. I seem to have missed out several steps there. The IBM monopoly situation is not an issue if ODF is truly and independently implementable, and their own applications stick to it. It is what ODF itself is there to avoid. Yes, IBM could make their own incompatible extensions, but, whatever they are not specified in the standard and no one else is guaranteed to be able to do anything with them. There's simply no good reason for any specific extensions to be referenced absolutely anywhere within ODF - so they're not. You can just do it.

I know Groklaw have had a list of objections up for some time, and I've never seen a Microsoft web page with a point-by-point quote and rebuttal. Many people have tried quite badly to say why these don't matter, including trying to claim that the Microsoft and Office specific extensions and references are not the main part of the specification. Well, the point is, they're in there and the will be used by Microsoft Office, so claiming that no one else needs to use them is rather pointless. If a document cannot be created in Microsoft Office and sent to another system that has another implementation of OpenXML then as an ISO standard it is worthless. I know people have went tit-for-tat over this, so I'll just quote one part of the OpenXML specification and ask a question:

2.15.3.6 autoSpaceLikeWord95 (Emulate Word 95 Full-Width Character Spacing)

This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 95) when determining the spacing between full-width East Asian characters in a document's content.

Yes, I do know this disclaimer is below it:

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

I deliberately chose this section of the markup language reference because of the East Asian character reference, in view of its proposed international standard status, but there are many such sections like this within the reference document pertaining to some specific behaviour within a particular version of Word or WordPerfect.

The main, and only really, reason for being, that I've managed to pick up from Microsoft and various people, is that OpenXML exists for backwards compatibility reasons with, as they put it, the billions of Office documents already out there. Presumably, this is why sections such as this exist - for backwards compatibility reasons. Presumably, when older Microsoft Office documents are saved in the shiny new OpenXML format, elements such as this are used to preserve formatting. The only problem is that this behaviour is not specified. Anywhere. I can't implement an OpenXML application that will convert older Microsoft Office binary documents, and worse, I cannot even translate a modern OpenXML file simply because I don't know what this behaviour is. There are a great many sections such as this.

The real problem here is that incompatibilities, quirks, caveats and pitfalls in individual applications are simply being transferred from one format, the old binary doc format, to another with no appreciable benefit to anyone - least of all users. These pitfalls will continue to be inherited in successive versions of the document. The net effect is that anyone implementing OpenXML will have to deal with this inherited behaviour, despite Microsoft's disclaimer that applications should not use it. OpenXML, afterall, was invented for backwards compatibility reasons, remember? It would be interesting to see if Office 2007 still uses many of these so-called deprecated elements.

That's some background as to why it's a problem, but it's all beside the point really. The fact is, until I get some form of detailed specification for the behaviour that I need to handle then I cannot make an independent implementation of OpenXML. So, the question I wanted to ask is can anyone, anywhere, tell me where to find the behaviour I need to implement section 2.15.3.6? This could be in the specification documents, or as a reference. It doesn't really matter. If you read nothing else here, answer this question.

In a concise nutshell, if no one can answer that question then I certainly cannot implement anything within the OpenXML specification for backwards compatibility, OpenXML's reason for being remember, and I simply cannot make an independent implementation of OpenXML for myself and others as a result. This goes absolutely contrary to the ISO's free trade and communication aims and goals for its standards, this being the case. There you have it.