Adding OpenSSL User-Defined Types to PostgreSQL

Bear GilesDecember 30th, 2014Last Updated: December 29th, 2014

0 28 10 minutes read

PostgreSQL supports user-defined types (UDT). These types can be used to provide type-safety on user-defined functions when we would otherwise be forced to use simple BLOB objects.

This comes at a significant cost. Many databases support UDT but implementation details vary widely so there’s a significant amount of vendor lock-in. In addition C language UDT require deployment via PostgreSQL extensions containing shared libraries and that is rarely available when using SAAS, e.g., the Amazon cloud. This forces us to maintain our own database servers instead of relying on a SAAS provider.

On the other hand user-defined types and functions give us far more flexibility, e.g., a centralized location with dedicated cryptographic hardware.

I use the OpenSSL library for the simple reason that it’s already included by PostgreSQL to support encrypted channels. This eliminates the need to worry about library dependencies or legal restrictions on cryptographic software – I am making the reasonable assumption that this software is legal to use at any location that is already using the OpenSSL library.

Motivation

It is very common for sites to store x509 digital certificates as a blob with a number of additional columns used for indexing and searching.

CREATE TABLE certs (
   cert       BLOB NOT NULL,
   name       VARCHAR[100] NOT NULL,
   not_before TIMESTAMP NOT NULL,
   not_after  TIMESTAMP NOT NULL
);

The problem is that there’s no consistency enforced between the cert and the indexed fields. It’s unlikely for an attacker to slip in a different certificate but it can’t be ruled out and the results could be catastrophic.

A far better solution is to use triggers on insert and update

-- create trigger
CREATE CONSTRAINT TRIGGER cert_update() BEFORE INSERT OR UPDATE
    ON certs NOT DEFERRABLE FOR EACH ROW
    EXECUTE PROCEDURE cert_update_proc ();

-- create function that ensures indexed fields reflect cert
CREATE OR REPLACE FUNCTION cert_update_proc RETURNING trigger $$
    BEGIN
        INSERT INTO certs(cert, X509_name(cert), X509_not_before(cert), X509_not_after(cert));
        RETURN NEW;
    END;
$$ LANGUAGE plpgsql;

This ensures that an attacker can never replace the cert without all of the indexed values being updated. Updates to the indexed fields will be ignored. It’s not 100% – a sophisticated attacker could drop the trigger – but it is far more secure than relying on the user to maintain this information. Remember that the schema, including triggers, should be owned by a different database user than the application database user so a compromised application cannot drop the trigger.

(The table definition or stored procedure can also perform sanity checks, e.g., ensuring that the “not_after” timestamp is strictly later than the “not_before” timestamp.)

We are not forced to use a user-defined type but it allows us to be more precise when defining our helper functions. It is also much more user friendly since they can insert and retrieve standard OpenSSL PEM values instead of dealing with BLOBs.

In the long term we can create user-defined functions that can use the OpenSSL types to perform actual work. We can also apply more sophisticated checks that involve internal queries, e.g., ensuring that any certificate added or modified is signed by another certificate in the database. That is beyond the scope of this blog entry.

OpenSSL engines

It is not widely known that OpenSSL supports cryptographic hardware via the “engine” interface. The default behavior is to use a software implementation but with well-written code it should be straightforward to use dedicated cryptographic hardware instead.

This can result in significant performance gains. The software implementation is adequate for development and small sites but may become a limiting factor at large sites. The hardware solutions that reduce this problem.

A more subtle point is that encryption keys are sensitive information and many organizations will not want them exposed under any conditions. No files, no web services. Hardware implementations have the ability to generate their own encryption keys and there is no mechanism for exposing the keys. (At most there’s a way to clone the keys from one hardware device to a second one.)

Design

The design is based on a simple consideration – I only want to store valid objects. The best way to ensure this is to have the user-defined types convert the external PEM values into internal OpenSSL objects and vice versa. We can perform additional sanity checks but that isn’t required.

This has one unfortunate drawback – traditional keys cannot be encrypted. I don’t consider this a problem since traditional keys shouldn’t be used anyway (IMHO) – they should be stored in containers such as keystores (PKCS8 and PKCS12).

That said the traditional keys are so simple that they’re a great platform for developing our PGXS skills.

Implementation Constraints

The constraints imposed by PostgreSQL are specified in section 35.9.5 of the manual http://www.postgresql.org/docs/9.4/static/xfunc-c.html. Most of them are handled by the tools at PGXN – see my earlier blog entry for details.

One particularly nasty constraint is that PostgreSQL, like most long-running services, has its own memory management library. OpenSSL uses the standard memory management library by default. It is possible to override this with the CRYPTO_set_mem_functions function but it’s dangerous to call this in a server because you don’t know what objects have already been created, e.g., when establishing a secure connection to the database. It could cause a server crash if an object is malloc’ed but then pfree’d.

Instead we must be very careful to always convert and release any object created by the OpenSSL library.

Defining a RSA keypair UDT

We are now ready to create an RSA keypair UDT. As mentioned earlier keys should be stored in a PKCS8 or PKCS12 object instead of a traditional RSA keypair object since the latter must be stored unencrypted.

We start by defining the RSA UDT itself.

--
-- Create shell type.
--
CREATE TYPE RSA;

--
-- Create function that converts string to internal format.
--
CREATE OR REPLACE FUNCTION rsa_in(cstring)
RETURNS RSA
AS 'pgopenssltypes', 'rsa_in'
LANGUAGE C IMMUTABLE STRICT;

--
-- Create function that converts internal format to string.
--
CREATE OR REPLACE FUNCTION rsa_out(RSA)
RETURNS CSTRING
AS 'pgopenssltypes', 'rsa_out'
LANGUAGE C IMMUTABLE STRICT;

--
-- Redefine type with necessary functions.
--
CREATE TYPE RSA (
    INPUT   = rsa_in,
    OUTPUT  = rsa_out
);

A UDT can specify about a dozen C functions (see CREATE TYPE) but the only two mandatory functions are the INPUT and OUTPUT functions that convert the object between a C string and bytea representations.

A plain UDT is pretty boring so let’s also define two user-defined functions:

--
-- Generate RSA keypair. This is an expensive operation
-- so it should not be called casually.
--
CREATE OR REPLACE FUNCTION rsa_generate_keypair(int)
RETURNS RSA
AS 'pgopenssltypes', 'rsa_generate_keypair'
LANGUAGE C IMMUTABLE STRICT;

CREATE TYPE RSA_INFO AS (
    BITS int,
    N    BN,
    E    BN,
    D    BN,
    P    BN,
    Q    BN
);

--
-- Get details about RSA keypair.
--
CREATE OR REPLACE FUNCTION rsa_get_details(RSA)
RETURNS RSA_INFO
AS 'pgopenssltypes', 'rsa_get_details'
LANGUAGE C IMMUTABLE STRICT;

RSA_INFO is a composite type. It is not possible to return multiple columns from a function but sometimes values are intrinsically related and you want to return them as a unit. PostgreSQL supports this with composite types. You can access the fields easily.

bgiles=# SELECT rsa_get_details(rsa_generate_keypair(256)) AS details INTO sample;

bgiles=# \d sample
     Table "public.sample"
 Column  |   Type   | Modifiers 
---------+----------+-----------
 details | rsa_info | 

bgiles=# SELECT (details).P, (details).Q FROM sample;
                    p                    |                    q                    
-----------------------------------------+-----------------------------------------
 331128053999826595053108455708184431513 | 294756634092692440982306957700237950609
(1 row)

In this case there’s no reason why you would want to persist a ROW_INFO object – I can’t imagine using it anywhere except in stored procedures. But it’s a good tool to have in your toolbox.

Implementing a RSA keypair UDT (INPUT/OUTPUT)

The implementation of the basic INPUT and OUTPUT methods is straightforward once you understand the OpenSSL library. The public functions are:

/*
 * Read PEM format.
 */
PG_FUNCTION_INFO_V1(rsa_in);

Datum rsa_in(PG_FUNCTION_ARGS) {
    char *txt;
    bytea *result;
    RSA *rsa;

    // write RSA keypair into buffer
    rsa = rsa_from_string(txt);
    result = rsa_to_bytea(rsa);
    RSA_free(rsa);

    // return bytea
    PG_RETURN_BYTEA_P(result);
}

/*
 * Write PEM format.
 */
PG_FUNCTION_INFO_V1(rsa_out);

Datum rsa_out(PG_FUNCTION_ARGS) {
    bytea *raw;
    char *result;
    RSA *rsa;

    // write RSA keypair into buffer
    rsa = rsa_from_bytea(raw);
    result = rsa_to_string(rsa);
    RSA_free(rsa);

    PG_RETURN_CSTRING(result);
}

I didn’t check for a NULL value since the user-defined function was declared STRICT and it clutters the code. In practice it won’t do much harm to always check for a null value. The main takeaway is that I create an OpenSSL object, use it, and then immediately discard it.

This public methods use four static convenience methods:

/*
 * Convert string to RSA.
 */
static RSA * rsa_from_string(const char *txt) {
    BIO *inp;
    RSA *rsa = RSA_new();

    inp = BIO_new_mem_buf((char *) txt, strlen(txt));
    PEM_read_bio_RSAPrivateKey(inp, &rsa, 0, NULL);
    BIO_free(inp);

    return rsa;
}

/*
 * Convert bytea to RSA.
 */
static RSA * rsa_from_bytea(const bytea *raw) {
    BIO *bio;
    RSA *rsa;

    // convert into RSA keypair
    bio = BIO_new_mem_buf(VARDATA(raw), VARSIZE(raw) - VARHDRSZ);
    BIO_set_close(bio, BIO_NOCLOSE);
    rsa = RSA_new();
    d2i_RSAPrivateKey_bio(bio, &rsa);
    BIO_free(bio);

    if (rsa == NULL) {
        ereport(ERROR,
            (errcode(ERRCODE_DATA_CORRUPTED), errmsg(
                "unable to decode RSA keypair record")));
    }

    return rsa;
}

/*
 * Convert RSA to string.
 */
static char * rsa_to_string(const RSA *rsa) {
    BIO *bio;
    int len;
    char *ptr, *result;

    // write RSA keypair into buffer
    // arguments: ..., cipher, keyptr, keylen, passwd_cb, passwd_cb_data
    bio = BIO_new(BIO_s_mem());
    PEM_write_bio_RSAPrivateKey(bio, (RSA *) rsa, NULL, NULL, 0, NULL, NULL);

    // create results.
    len = BIO_number_written(bio);
    BIO_get_mem_data(bio, &ptr);
    result = palloc(len + 1);
    strncpy(result, ptr, len);
    result[len] = '';
    BIO_free(bio);

    return result;
}

/*
 * Convert RSA to bytea.
 */
static bytea * rsa_to_bytea(const RSA *rsa) {
    BIO *bio;
    int len;
    bytea *result;
    char *ptr;

    // write RSA keypair into buffer
    bio = BIO_new(BIO_s_mem());
    i2d_RSAPrivateKey_bio(bio, (RSA *) rsa);

    // create bytea results.
    len = BIO_number_written(bio);
    BIO_get_mem_data(bio, &ptr);
    result = (bytea *) palloc(len + VARHDRSZ);
    memcpy(VARDATA(result), ptr, len);
    SET_VARSIZE(result, len + VARHDRSZ);
    BIO_free(bio);

    return result;
}

The bytea is a PostgreSQL type that contains a variable amount of memory. The first four bytes (VARHDRSZ) are the length, and the data itself is accessed through a convenience macro (VARDATA). Since I’m only storing a single value I’m using the macros directly, more complex objects could define a type and cast it to a bytea object.

This code may be baffling if you’re not familiar with the OpenSSL library but there are two simple observations that will make it a lot clearer. First, all I/O is handled through a “Basic Input/Output (BIO)” abstraction. It’s a pain at first but it allows you to stack data manipulations such as compression and encryption.

Second, objects are read and written using d2i and i2d functions. These are “DER to internal” and “internal to DER”, respectively. PEM and DER formats are identical – PEM is base-64 encoded and has a brief comment describing the contents but there’s no additional information.

The rest you get from experience, man pages and googling for examples.

Generating a new keypair

Onward to the interesting user-defined functions. First up is generating a new keypair. This is not something you’ll do very often but it’s very convenient during development and testing since it’s extremely fast to generate 256-bit RSA keys.

The public method just makes sure we have valid and sane values. A warning is printed if a small key is requested but it’s not prohibited. This would be easy to do – use ERROR instead of INFO.

/**
 * Generate a random keypair
 */
PG_FUNCTION_INFO_V1(rsa_generate_keypair);

Datum rsa_generate_keypair(PG_FUNCTION_ARGS) {
    bytea *result;
    int bits;
    RSA *rsa;

    bits = PG_GETARG_INT32(0);
    if (bits <= 0) {
        bits = 2048;
    }

    if (bits < 2048) {
        // elog(INFO, "RSA keys should be at least 2048 bits.")
        ereport(INFO,
                  (errcode(ERRCODE_CHECK_VIOLATION,
                           errmsg("RSA keys should be at least 2048 bits.")));
    }

    rsa = rsa_generate_keypair_internal(bits);
    result = rsa_to_bytea(rsa);
    RSA_free(rsa);

    // return bytea
    PG_RETURN_BYTEA_P(result);
}

/*
 * actual key generation
 */
RSA * rsa_generate_keypair_internal(int bits) {
    BIGNUM *ep;
    RSA *rsa;

    rsa = RSA_new();
    ep = BN_new();
    BN_dec2bn(&ep, "65537");
    RSA_generate_key_ex(rsa, bits, ep, NULL);
    BN_free(ep);

    return rsa;
}

The key uses the standard exponent – 65537 (0x10001) – since there’s little value in making it configurable.

Retrieving keypair details

Retrieving keypair details is a bit more complicated and you will definitely want to read the PostgreSQL documentation while looking at this code.

/**
 * Get details about an RSA keypair
 */
PG_FUNCTION_INFO_V1( rsa_get_details);

Datum rsa_get_details( PG_FUNCTION_ARGS) {
    bytea *raw;
    RSA *rsa;
    TupleDesc desc;
    HeapTuple tuple;
    Datum *values;
    bool *retNulls;

    // check for null value.
    raw = PG_GETARG_BYTEA_P(0);
    if (raw == NULL || VARSIZE(raw) == VARHDRSZ) {
        PG_RETURN_NULL();
    }

    // read keypair, verify success.
    rsa = rsa_from_bytea(raw);
    if (rsa == NULL) {
        ereport(ERROR,
                (errcode(ERRCODE_DATA_CORRUPTED), errmsg(
                         "unable to decode RSA keypair record")));
        PG_RETURN_NULL();
    }

    // read details about return value.
    if (get_call_result_type(fcinfo, NULL, &desc) != TYPEFUNC_COMPOSITE) {
        RSA_free(rsa);
        ereport(ERROR,
                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg(
                        "function returning record called in context "
                                "that cannot accept type record")));
    }
    desc = BlessTupleDesc(desc);

    // these values are freed by PostgreSQL
    values = (Datum *) palloc(6 * sizeof(Datum));
    retNulls = (bool *) palloc(6 * sizeof(bool));

    // set return values
    values[0] = Int32GetDatum(8 * RSA_size(rsa));
    retNulls[0] = false;

    if (rsa->n == NULL) {
        retNulls[1] = true;
    } else {
        retNulls[1] = false;
        values[1] = BnGetDatum(rsa->n);
    }

    if (rsa->e == NULL) {
        retNulls[2] = true;
    } else {
        retNulls[2] = false;
        values[2] = BnGetDatum(rsa->e);
    }

    if (rsa->d == NULL) {
        retNulls[3] = true;
    } else {
        retNulls[3] = false;
        values[3] = BnGetDatum(rsa->d);
    }

    if (rsa->p == NULL) {
        retNulls[4] = true;
    } else {
        retNulls[4] = false;
        values[4] = BnGetDatum(rsa->p);
    }

    if (rsa->q == NULL) {
        retNulls[5] = true;
    } else {
        retNulls[5] = false;
        values[5] = BnGetDatum(rsa->q);
    }

    RSA_free(rsa);

    // convert to tuple.
    tuple = heap_form_tuple(desc, values, retNulls);
    FreeTupleDesc(desc);

    // return datum.
    PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
}

It’s a long function with unfamiliar calls but it follows the usual pattern. Read data, verify it, do something with it, prepare results, return results.

This function refers to the “BN” UDT. I don’t include that here but it is a wrapper for the OpenSSL “BIGNUM” type. If you’re interested the details are at my git repository.

The results are what we expect.

bgiles=# select * from sample;
                                                                                                                       details                                                                                                                         
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 (256,97602190650652191344500377647616852664474030847991993174417850011862837141417,65535,49816062791999163016436913587169239410285445043144541244445777903161695571583,331128053999826595053108455708184431513,294756634092692440982306957700237950609)
(1 row)

bgiles=# select (details).bits from sample;
 bits 
------
  256
(1 row)

bgiles=# select (details).e from sample;
   e   
-------
 65535
(1 row)

bgiles=# select (details).d from sample;
                                       d                                       
-------------------------------------------------------------------------------
 49816062791999163016436913587169239410285445043144541244445777903161695571583
(1 row)

bgiles=#

Source code and other resources

The source code shown above is available at https//github.com/beargiles/pgopenssltypes. This is very much a work in progress but I’m posting it since this idea has been on my blog backlog for over a year and I’ll soon be pulled to other tasks.

Here’s a good article from 2007: http://linuxgazette.net/142/peterson.html.

Reference:

Adding OpenSSL User-Defined Types to PostgreSQL from our JCG partner Bear Giles at the Invariant Properties blog.