Wednesday, February 15, 2006

Hell is upgrading Oracle versions

We're migrating from Oracle 9i running on a single box (in MTS mode) to an Oracle 10g RAC cluster. For those that don't know, RAC is Real Application Server and it's (supposed to be) one of the ways that you create a highly scaleable Oracle system. Before we began I naively thought the migration would... a) cost a fortune, b) be a pain in the ass for the DBAs and IT folks but c) be relatively painless for us software engineers. To make a long story short, I was right about the first two points.

The problems for us stem from the sequence generation facilities. In Oracle you can configure the sequence generator to work in different ways but the way we've had it working for over two years is to generate numbers one right after the other. The numbers that come out of our sequences are in order and there are essentially no gaps between successive items.

One of the side-effects of this approach is that you can treat any such sequence-backed column as a kind of time-stamp; you can order records by their effective creation date just by sorting on the sequence-backed field.

When you move your database to a system that runs on multiple machines (i.e. RAC) you basically lose the ability to have your sequences work this way. More precisely, you can configure them to work that way but doing so means that the nodes in your cluster will need to constantly coordinate with one another and the ensuing chatter will take a big bite out of your performance.

So in switching to RAC we lost the comfort of being able to assume that successive sequence values would go in increasing order. Calling sequence_name.NEXTVAL three times in a row might get you: 47, 38, 52.

We had to hunt down every query in our application that ordered by a sequence- backed field. That in itself is no easy task when you have thousands of queries scattered across hundreds of files, some written in PHP, some in Java and some in PL/SQL. The most sophisticated tool available was grep and we had to manually inspect dozens of candidate queries to figure out which ones to go after.

Then began the hard work of actually fixing the queries. We had to change all the "ORDER BY" clauses to use a different field (or set of fields), specifically a DATE field. Unfortunately, a lot of the underlying tables didn't have any appropriate field so in those cases we needed to modify the schema. The problem then arises of what do with the millions of existing rows that were created before we needed to record the date. We ended up giving them an arbitrary date in the past but that just meant that the "ORDER BY" clause now needed to act on two columns instead of one.

All in all, it took three developers about two weeks.

* I started writing this post months ago but it got waylaid and forgotten. In finishing up I've tried to write in a manner consistent with my thoughts and experiences at the time of inception. I'd like the post to approximate my thoughts at the time

No comments: