pg_reorg 1.0.4

25 June 2009

At Hi5, we currently use pg_reorg1.0.3 in order to organize data in a clustered fashion. I posted previously about the strategy. Our version is slightly modified, the modifications I made to the C code essentially allow pg_reorg to spin/wait for locks on the objects to be released before proceeding.

The good news is the folks at NTT have incorporated a similar change in pg_reorg 1.0.4. This is a fantastic improvement, and frankly implemented in a cleaner way than my changes.

The crux of the issue is the situation where a database is being auto-vacuumed, you can’t be guaranteed that pg_reorg and the vacuum will not collide. In theory you should not need to vacuum a table which you are pg_reorg’ing because that is the point of a pg_reorg, it’s essentially a vacuum full w/ extra features because the table is being rebuilt from scratch. However in an environment where auto-vacuum is being utilized to keep tables vacuumed, both will need to co-exist.

The change is simple, use the NOWAIT option of lock table to fail if the lock can not be obtained. This is wrapped in a loop until the lock is granted. The effect is pg_reorg patiently sits and waits while your vacuums complete and then it can finish it’s work. The downside is if any of these operations run for too long, then the journal table may grow very large. So there should be some monitoring wrapped around the code if it’s intended to run in the background. For the future we need a backoff algorithm as well as perhaps a limit to the number of spin/sleep cycles, but hey this is excellent progress.

This tool is essential in my humble opinion for everyone running PostgreSQL in a high transaction/high availability environment. By the way, pg_reorg works seamlessly with Slony-I.

The code addition does the following:

for (;;)
        {
                command("BEGIN ISOLATION LEVEL READ COMMITTED", 0, NULL);
                res = execute_nothrow(table->lock_table, 0, NULL);
                if (PQresultStatus(res) == PGRES_COMMAND_OK)
                {
                        PQclear(res);
                        break;
                }
                else if (sqlstate_equals(res, SQLSTATE_LOCK_NOT_AVAILABLE))
                {
                        /* retry if lock conflicted */
                        PQclear(res);
                        command("ROLLBACK", 0, NULL);
                        sleep(1);
                        continue;
                }
                else
                {
                        /* exit otherwise */
                        printf("%s", PQerrorMessage(connection));
                        PQclear(res);
                        exit(1);
                }
        }

The text below is a snip of the strace on pg_reorg while it’s waiting for the lock:

rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
sendto(3, "P\0\0\0008\0SELECT reorg.reorg_apply($"..., 529, 0, NULL, 0) = 529
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
poll([{fd=3, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recvfrom(3, "1\0\0\0\0042\0\0\0\4T\0\0\0$\0\1reorg_apply\0\0\0\0"..., 16384, 0, NULL, NULL) = 77
rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
sendto(3, "P\0\0\0\177\0SELECT 1 FROM pg_locks WHE"..., 178, 0, NULL, 0) = 178
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
poll([{fd=3, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recvfrom(3, "1\0\0\0\0042\0\0\0\4T\0\0\0!\0\1?column?\0\0\0\0\0\0\0"..., 16384, 0, NULL, NULL) = 74
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {1, 0})

Your Postgresql log file will show the following:

Jun 25 17:09:33 <dbname> postgres[7825]: [37-2] 2009-06-25 17:09:33 PDTSTATEMENT:  LOCK TABLE <tablename> IN ACCESS EXCLUSIVE MODE NOWAIT
Jun 25 17:09:34 <dbname> postgres[7825]: [38-1] 2009-06-25 17:09:34 PDTERROR:  could not obtain lock on relation "<tablename>"