PostgreSQL Faster Way to Check Column Existence

We often need to check that a column exists before creating it, especially when dealing with dynamically generated data being inserted into PostgreSQL. This brief article shows how to perform this operation quickly and offers a PGPLSQL solution for creating new columns in databases older than PostgreSQL 9.6.

Querying The Catalog

There are several ways to search the PostgreSQL catalog tables. The easiest for the programmer is to use the information_schema. The fastest is to build a custom query.

While it is enticing to query the information schema, it is not a fast operation.  Multiple nested loops are created to pull from the catalog tables even when looking for a single attribute.

The following query on the information_schema results in running a rather large set of operations nested within several loops:

EXPLAIN SELECT count(*) > 0 FROM information_schema.columns WHERE table_schema LIKE 'us_ut_sor' AND table_name LIKE 'dirtyrecords' AND column_name LIKE 'bullage'

When run, the output is as follows:

"Aggregate  (cost=3777.32..3777.33 rows=1 width=0)"
"  ->  Nested Loop Left Join  (cost=2.39..3777.32 rows=1 width=0)"
"        ->  Nested Loop  (cost=1.83..3775.73 rows=1 width=4)"
"              ->  Nested Loop Left Join  (cost=1.54..3775.42 rows=1 width=8)"
"                    Join Filter: (t.typtype = 'd'::"char")"
"                    ->  Nested Loop  (cost=0.84..3774.50 rows=1 width=13)"
"                          ->  Nested Loop  (cost=0.42..3770.19 rows=1 width=8)"
"                                ->  Nested Loop  (cost=0.00..3740.84 rows=1 width=8)"
"                                      Join Filter: (c.relnamespace = nc.oid)"
"                                      ->  Seq Scan on pg_namespace nc  (cost=0.00..332.06 rows=1 width=4)"
"                                            Filter: ((NOT pg_is_other_temp_schema(oid)) AND (((nspname)::information_schema.sql_identifier)::text ~~ 'us_ut_sor'::text))"
"                                      ->  Seq Scan on pg_class c  (cost=0.00..3407.26 rows=121 width=12)"
"                                            Filter: ((relkind = ANY ('{r,v,f}'::"char"[])) AND (((relname)::information_schema.sql_identifier)::text ~~ 'dirtyrecords'::text))"
"                                ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.42..29.35 rows=1 width=14)"
"                                      Index Cond: ((attrelid = c.oid) AND (attnum > 0))"
"                                      Filter: ((NOT attisdropped) AND (((attname)::information_schema.sql_identifier)::text ~~ 'bullage'::text) AND (pg_has_role(c.relowner, 'USAGE'::text) OR has_column_privilege(c.oid, attnum, 'SELECT, INSERT, UPDATE, REFE (...)"
"                          ->  Index Scan using pg_type_oid_index on pg_type t  (cost=0.42..4.30 rows=1 width=13)"
"                                Index Cond: (oid = a.atttypid)"
"                    ->  Nested Loop  (cost=0.70..0.90 rows=1 width=4)"
"                          ->  Index Scan using pg_type_oid_index on pg_type bt  (cost=0.42..0.58 rows=1 width=8)"
"                                Index Cond: (t.typbasetype = oid)"
"                          ->  Index Only Scan using pg_namespace_oid_index on pg_namespace nbt  (cost=0.29..0.31 rows=1 width=4)"
"                                Index Cond: (oid = bt.typnamespace)"
"              ->  Index Only Scan using pg_namespace_oid_index on pg_namespace nt  (cost=0.29..0.31 rows=1 width=4)"
"                    Index Cond: (oid = t.typnamespace)"
"        ->  Nested Loop  (cost=0.56..1.57 rows=1 width=4)"
"              ->  Index Scan using pg_collation_oid_index on pg_collation co  (cost=0.28..0.35 rows=1 width=72)"
"                    Index Cond: (a.attcollation = oid)"
"              ->  Index Scan using pg_namespace_oid_index on pg_namespace nco  (cost=0.29..1.21 rows=1 width=68)"
"                    Index Cond: (oid = co.collnamespace)"
"                    Filter: ((nspname  'pg_catalog'::name) OR (co.collname  'default'::name))"

This is truly nasty. In fact, any program running in O(n^2) or larger time will be less than ideal in this situation.

Limiting the O(n) time can be done by directly querying the catalog tables. The previous query was merely checking to see if a column existed under a given table and schema. The following custom query performs this operation much faster:

EXPLAIN SELECT count(*) > 0 FROM (SELECT q1.oid,q1.relname,q1.relowner,q1.relnamespace,q2.nspname FROM (SELECT oid,relname,relowner,relnamespace FROM pg_class) as q1 INNER JOIN (SELECT oid, * FROM pg_catalog.pg_namespace) as q2 ON q1.relnamespace = q2.oid WHERE q1.relname LIKE 'dirtyrecords' AND q2.nspname LIKE 'us_ut_sor') as oq1 INNER JOIN (SELECT attrelid,attname FROM pg_attribute) as oq2 ON oq1.oid = oq2.attrelid WHERE oq2.attname LIKE 'bullage'

While larger, many less operations are performed for a comparatively lower speed cost:

"Aggregate  (cost=292.44..292.45 rows=1 width=0)"
"  ->  Nested Loop  (cost=0.84..292.43 rows=1 width=0)"
"        ->  Nested Loop  (cost=0.42..289.64 rows=1 width=4)"
"              ->  Seq Scan on pg_namespace  (cost=0.00..281.19 rows=1 width=4)"
"                    Filter: (nspname ~~ 'us_ut_sor'::text)"
"              ->  Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.42..8.44 rows=1 width=8)"
"                    Index Cond: ((relname = 'dirtyrecords'::name) AND (relnamespace = pg_namespace.oid))"
"                    Filter: (relname ~~ 'dirtyrecords'::text)"
"        ->  Index Only Scan using pg_attribute_relid_attnam_index on pg_attribute  (cost=0.42..2.79 rows=1 width=4)"
"              Index Cond: ((attrelid = pg_class.oid) AND (attname = 'bullage'::name))"
"              Filter: (attname ~~ 'bullage'::text)"

 

Notice how the cost of the first query was 3777.32 while the second was merely 292.44. That is a not so small order of magnitude better

PGPLSQL Function

For databases versions prior to PostgreSQL 9.6, which introduces the syntax ALTER TABLE x ADD COLUMN IF NOT EXISTS y TYPE, the following PGPLSQL function performs the desired table alteration:

 

CREATE OR REPLACE FUNCTION add_column_if_not_exists(schema_name varchar(63), table_name varchar(63), column_name varchar(63),column_type varchar(1024)) RETURNS void AS
$BODY$
     DECLARE
          column_exists BOOLEAN;
     BEGIN     
        IF schema_name IS NOT NULL THEN
		SELECT count(*) > 0 INTO column_exists FROM (SELECT q1.oid,q1.relname,q1.relowner,q1.relnamespace,q2.nspname FROM (SELECT oid,relname,relowner,relnamespace FROM pg_class) as q1 INNER JOIN (SELECT oid, * FROM pg_catalog.pg_namespace) as q2 ON q1.relnamespace = q2.oid WHERE q1.relname LIKE table_name AND q2.nspname LIKE schema_name) as oq1 INNER JOIN (SELECT attrelid,attname FROM pg_attribute) as oq2 ON oq1.oid = oq2.attrelid WHERE oq2.attname LIKE column_name;
		IF column_exists IS FALSE THEN
		    EXECUTE 'ALTER TABLE '||schema_name||'.'||table_name||' ADD COLUMN '||column_name||' '||column_type;
                END IF;
	ELSE
		SELECT count(*) > 0 INTO column_exists FROM (SELECT oid,relname,relowner,relnamespace FROM pg_class WHERE relname LIKE table_name) as oq1 INNER JOIN (SELECT attrelid,attname FROM pg_attribute) as oq2 ON oq1.oid = oq2.attrelid WHERE oq2.attname LIKE column_name;
		IF column_exists IS FALSE THEN
		    EXECUTE 'ALTER TABLE '||table_name||' ADD COLUMN '||column_name||' '||column_type;
                END IF;
        END IF;

      
END;
$BODY$
LANGUAGE plpgsql;

We did not create a trigger that fires on all alter statements to avoid creating additional cost when not desired. The provided function also avoids a costly join if no schema is present.

Conclusion

In this article, we discovered that the information schema is not as ideal as it seems. Armed with this information, we created a better function to add columns to a table only if they do not exist.

Advertisements