Database and Webapp Security

Bear GilesNovember 23rd, 2012Last Updated: November 23rd, 2012

0 114 13 minutes read

Threat Model

This is a discussions on database and webapp security loosely based on the quick reference page on my site. That page is becoming unwieldy and does not make it easy for readers to interact with me or others.

Threat Model

All security analysis must begin by examining the threat model. A threat model requires you to answer four questions:

what I am trying to protect?
from whom?
for how long?
and at what (net) cost?

What am I trying to protect?

This is the obvious place to start… and your first answer is probably wrong! What I mean by that is that you may answer “the database password” but that’s not quite right. What you’re actually want to protect is access to the database as that user – an attacker might be able to find a way into the databasewithout the password, e.g., SQL injection.

But wait, that’s not quite right either! Our real concern is preventing the attacker from using that access to cause damage, learn sensitive information, and so forth. At this point we should enumerate our actual concerns, e.g., our database may contain

user content
financial information
user authentication and authorization
logs
static content

The way we access and use this information is varied

user content – need ongoing read/write access
financial information – need an oracle (for approval) and can leave details to fulfillment process
user authentication and authorization – need an oracle (for approval and authorizations) when a user logs in but never afterwards (oracle)
logs – need ongoing append-only access (oracle)
static content – need read-only access on startup (oracle)

(All of the access is modulo the need for maintenance.)

An oracle is a standalone method that takes (optional) values and returns either true or false. A bit more generally it can return any self-contained, immutable object. A good implementation choice for an oracle is a stored procedure in the database, a better choice would be a REST call to another webapp using an independent database.

Two examples of oracles:

User authentication: Use an oracle that takes a username and password and returns a boolean value indicating whether it was valid or not. (Alternative: return full authn/authz structure upon success.) The non-oracle approach is for the application to do a query on the user and password tables and compare the passwords itself.

Credit card authentication: Use an oracle that takes the credit card information and amount of purchase and returns either a confirmation number or an error indication. The application can rely on the oracle keeping a copy of previously provided values (but not the CVV!) so the user doesn’t have to fill out the same information every time. . The non-oracle approach is for the application to bundle the information itself.

The point I’m making here is that deciding what needs to be protected is an architectural question and a bit of foresight can have a dramatic impact on threat model. You want as little exposure to untrusted users (e.g., the webapp) as possible and small changes can make big differences.

Last but not least there is one other thing that the should be protected: your reputation. Not the company’s – the developer’s. What do you say when you get a phone call from the president of the company demanding to know why the company will be the lead story on the nightly news? You can’t protect against all attacks but you don’t want to be left speechless when someone demands to know why you didn’t take basic steps to protect the system.

From whom?

Everyone.

Okay, I kid. But there’s a far broader list than you first think.

fumble-finged employees. We’ve all done this. They already have legitimate access.
disgruntled employees, especially the soon-to-be-former employees. They already have legitimate access and motivation.
script kiddies. We tend to think of them as unsophisticated but they may be running cracking tools written by experts. They’ll probably move on to easier targets if your site is reasonably secure.
advanced persistent threats (ATP). These are the people who have a strong motivation and strong technical skills. Assume they will get in.

This list is far from exhaustive and listing additional ‘potentialattackers’ is left as an exercise for the reader.

For how long?

At the risk of being obvious there’s three broad categories

Information or access that must be protected until a relatively soon specific date and is then open knowledge, e.g., corporate financial reports.
Information or access that has declining value over time.
Information or access that must be protected forever, e.g., confidential legal and medical documents.

The first category is straightforward since the best known algorithms and attacks are known and the attacker has limited time to work.

The last category is difficult since we know don’t can’t predict future attacks. Some things that were impossible 10 years ago are now run-of-the-mill. One good bit of advice: things we don’t keep are things we don’t need to protect. Keep as little as possible but no less.

A full analysis is behind the scope of this blog entry but this is an important concern that should not be trivialized.

At what (net) cost?

“Cost” is a flexible concept since there are so many indirect and inferred costs. E.g., what’s the cost in making it harder for people to do their work… or is it cheaper since the system won’t be down for days after a breach? What’s the cost of people leaving the site in frustration vs. the benefits of people not leaving the site en masse after a breach at your site made the national news?

The bottom line is that this is ultimately a non-technical question. All you can do is identify the direct and indirect concerns and let the powers-that-be make the final determination.

Putting it together

The bottom line is that the threat model is ultimately a business decision. We can provide analysis and recommendations but the ultimate decision has to come from above.

That said there are many things we can do on our own initiative. This discussion will address some of them.

SQL Injection

What is SQL Injection?

SQL Injection is the ability of attackers to insert arbitrary SQL commands into your system.

Sample attack

Look at the following code:

ResultSet rs = stmt.execute(
   'select * from users where username='' + username +
   '' and password='' + password + ''');

What could go wrong? Let’s say we use the following values:

String username = 'bob' or 1=1; --';
String password = 'dont care';

When we call the earlier code the generated code is

select * from users where username='bob' or 1=1; --'
   and password='dont care'

This will list all users. Some web frameworks will list all users in the system. More carefully written applications will raise an alarm if more than one record is returned. This is easy to fix

String username = 'bob' or 1=1 order by userid limit 1; --';
String password = 'dont care';

to produce

select * from users where username='bob' or 1=1
  order by userid limit 1; --' and password='dont care'

The ‘order by’ stanza ensures we see the first user in the system. That’s normally the administrator – something attackers do not forget.

Wrong Approach

Many inexperienced programmers attempt to get around this problem by explicitly sanitizing the user-provided input.

ResultSet rs = stmt.execute(
  'select * from users where username='' +
  username.replaceAll(''',   '''') +
  '' and password='' + password.replaceAll(''', '''') + ''');

This might have worked in the 1980s but the world uses more than ASCII today. Properly identifying quote characters is a non-trivial problem and should be left to others. The JDBC writers often have database-specific methods for this but they can get out of sync with the database and are, of course, database-specific.

Prepared Statements and Placeholders

The standard solution to this problem is to use prepared statements and placeholders. This replaces the code

ResultSet rs = stmt.execute(
  'select * from users where username='' +
  username + '' and password='' + password + ''');

with

PreparedStatment stmt = conn.prepareStatement(
  'select * from users where username=? and password=? limit 1');
stmt.setString(1, username);
stmt.setString(2, password);
ResultSet rs = stmt.execute()

Limitations

There are times when prepared statements are inappropriate. One common example is multi-insert statements. These can be significantly faster than multiple prepared statement calls.

An example of a multi-insert statement is

insert into squares(x, y)
   values (1, 1),
          (2, 4),
          (3, 9),
          (4, 16),
          (5, 25);

As a general rule this should not be used with user-provided data. If it is absolutely required use the database-specific method provided by your JDBC provider, not a roll your own solution.

Cost/Benefit Analysis

The cost/benefit analysis to using prepared statement placeholders is irrelevant – it’s one of those things that you simply have to do.

SQL Injection in Stored Procedures

What are stored procedures and CallableStatements?

Stored procedures are bits of code kept in the database. The most common form is a SQL-like scripting language but additional languages are supported – PERL, tcl, ruby, java, etc.

It is important to remember that stored procedures are used in database triggers – you should be aware of them even if you all of your work with hibernate.

Wrong Approach

The wrong approach is to create a dynamic SQL query without sanitization.

DELIMITER $$
DROP PROCEDURE IF EXISTS SP_AUTHENTICATE$$
CREATE PROCEDURE SP_AUTHENTICATE(IN username VARCHAR(20),
                                 IN password VARCHAR(20),
                                 OUT success INT)
BEGIN
  SET @query = CONCAT('SELECT COUNT(credentials.username) INTO @succ
     FROM credentials
     WHERE credentials.username = \'', username,
        '\' AND credentials.password = \'', password, '\'');
   PREPARE stmt FROM @query;
   EXECUTE stmt;
   SELECT @succ;
   SET success = @succ;
END;
$$
DELIMITER ;

(Note: this code fragment is an example comes from the reference below.)

This stored procedure has no benefits over the “wrong answer” we saw previously with the exception of very modest encapsulation..

Sidenote: This is an example of an oracle. It returns the minimum amount of information about user authentication – a “thumbs up” or “thumbs down”. There’s no information leak in this implementation since the caller already knows the username and password but a more robust implementation could also verify that the user account has not been disabled, etc.

Stored Procedures and Parameterization

The first safe approach is executing the SQL directly instead of creating dynamic SQL.

The second safe approach is parameterization within the stored procedure. This is directly equivalent to Java prepared statements and placeholders.

DELIMITER $$
DROP PROCEDURE IF EXISTS SP_AUTHENTICATE$$
CREATE PROCEDURE SP_AUTHENTICATE(IN username VARCHAR(20),
                                 IN password VARCHAR(20),
                                 OUT success INT)
BEGIN
  SET @query = 'SELECT COUNT(credentials.username) INTO @succ
     FROM credentials
     WHERE credentials.username = ? AND credentials.password = ?';
   PREPARE stmt FROM @query;
   SET @usernm = username;
   SET @pass = password;
   EXECUTE stmt USING @usernm, @pass;
   SELECT @succ;
   SET success = @succ;
END;
$$
DELIMITER ;

PLPSQL Sanitization

There is another alternative if you’re willing to be tied to a specific database vendor. In practice this usually isn’t an issue – hibernate gives you some database transparency but stored procedures will always be tied closely to the database.

In plpsql (PostgreSQL) there are two commands that can be used for sanitizing input: quote_ident and quote_literal. There are undoubtably similar commands in other stored procedure languages.

Updating the wrong answer above we have:

DELIMITER $$
DROP PROCEDURE IF EXISTS SP_AUTHENTICATE$$
CREATE PROCEDURE SP_AUTHENTICATE(IN username VARCHAR(20),
                                 IN password VARCHAR(20),
                                 OUT success INT)
BEGIN
  SET @query = CONCAT('SELECT COUNT(credentials.username) INTO @succ
     FROM credentials
     WHERE credentials.username = ', quote_literal(username),
        'AND credentials.password = ', quote_literal(password));
   PREPARE stmt FROM @query;
   EXECUTE stmt;
   SELECT @succ;
   SET success = @succ;
END;
$$
DELIMITER ;

Direct SQL

The final safe approach is to use direct SQL calls with minimum parameter size. This is mentioned on the CERT website but I would hesitate to use it since it would be so easy to introduce unsafe code by accident.

DELIMITER $$
DROP PROCEDURE IF EXISTS SP_AUTHENTICATE$$
CREATE PROCEDURE SP_AUTHENTICATE(IN username VARCHAR(8),
                                 IN password VARCHAR(20),
                                 OUT success INT)
BEGIN
  SELECT COUNT(credentials.username) INTO success
     FROM credentials
     WHERE credentials.username = username AND credentials.password = password;
END;
$$
DELIMITER ;

Cost/Benefit Analysis

Stored procedures are harder to exploit than naked SQL queries but this often gives people a false sense of security. This should be considered mandatory for sensitive information (user authentication, audit logging) and highly recommended in all other cases.

Schema Ownership

What are DDL, DML, DCL and TCL?

SQL contains four distinct types of statements.

Data Definition Language

Data Definition Language (DDL) statements define the database structure. Think of this as the landlord that builds the warehouse but turns over the keys to the renter.

Statements:

create – create tables, views, indexes, etc.
alter – alter tables, views, indexes, columns, etc.
drop – delete tables, views, indexes, etc.
truncate – remove all of the records from a table
comment – add comments to tables, columns, views, etc.
rename – rename a table, view, etc.

Data Manipulation Language

Data Manipulation Language (DML) statements manage the data within the structure created by the DDL. Think of this as the tenant of the warehouse – it can use the warehouse but can’t knock down walls.

Statements:

select – retrieve data
insert – insert new data into a table
update – update existing data within a table
delete – delete data from a table
call – call a PL/SQL or other stored procedure
explain plan – explain how a querywill be executed
lock table – lock a table to limit concurrency

Data Control Language

Data Control Language (CDL) statements control access rights to the data and schema. Think of these as locks on the doors, permission to move walls within the warehouse, etc.

Statements:

grant – give the user additional privileges
revoke – remove user privileges

Transaction Control Language

Transaction Control Language (TCL) statement are used to control transactions.

Statements:

commit – save completed work
rollback – undo completed work
savepoint – mark a point that we can rollback to later without necessarily rolling back the entire transaction
set transaction – set transaction options

Use Different Database User for Schema And Data Ownership

The schema should be owned by one database user, e.g., app_owner and the data should be owned by a different database user, e.g., app_user.

The owner should:

have the ability to run DDL and DCL statements
arguably not have the ability to run DML statements
never be accessed via the webapp

The user should

have the ability to run DML and TCL statements
not have the ability to run DDL or DCL statements
be accessible via the webapp

Cost/Benefit Analysis

There is a very favorable cost/benefit ratio for separating the ownership of schema and database. There is a slightly higher cost when creating and maintaining the database but it essentially eliminates the ability of a web intruder to destroy the database schema itself. The data, on the other hand, can still be nuked.

User Authentication and Authorization Information

User authentication (authn) is how we know that a user is who he claims to be. At a minimum it’s a username and password but it could include much more if two-factor authentication is used.

User authorization (authz) is what we allow the user to do.

These are very different questions and should be treated as such. Some architectures for this, e.g., if a site uses siteminder or a similar tool then it doesn’t have access to authn information at all – it can only add authz.

What is user authn/authz information? It is

username and/or email
password
single sign-on (SSO) identifications
security tokens (for two-factor authentication)
security images/phrases (used to prove your site is legitimate to the user)
groups and roles

What is not user authn/authz information?

contact information
content subscriptions
or anything else that’s not required to authenticate or authorize the user.

Wrong Approach

Put everything – user authn/authz, static content and dynamic content – into a single database schema.

It’s quick.

It’s easy.

It’s the default behavior for auto-generation tools.

And it’s very, very wrong since anyone who cracks your webapp has also cracked your user authn/authz data. At best you’ll have a denial-of-service. At worst they can pretend to be other users, can add their own highly-priviledged account, etc.

Separate Schemas and Connection Pools

The quickest solution is to create a separate schema for the user authn/authz data and use a dedicated data source (or Hibernate session) when accessing this data. This schema should be unreadable from the standard data source (or Hibernate session). This gives you a good firewall from the world but isn’t perfect.

A seemingly more robust solution is to use a separate database, not just a separate schema, for the user authn/auhtz data. This would seem to protect you from misconfigured rights that would allow the dynamic content data source to access the user data source.

Sadly in some RDMBS there’s not a clear distinction between schemas and databases and a connection to one “database” can still access another “database” if the necessary rights are granted. You can’t be sure unless you have a separate database instance for user authn/authz and dynamic content. This may not be an undue burden if your architecture has a server dedicated for user authn/authz. This is not unreasonable with virtual servers or a cloud design.

Container-Based Authentication

A better solution is container-based authentication. Pull user authn/authz entirely out of the webapp – by the time your webapp gets the request the HttpServletRequest already has all necessary information populated. Your webapp has no access to the container’s authentication information. (Modulo the notes above – you don’t gain anything if the container looks into the same schema as your dynamic content.)

A variant of this is authentication filters put in front of the webapp, e.g., those from Spring Security. It’s a different mechanism but serves the same purpose of keeping a very sharp distinction between user data and dynamic content.

The Glitch – Adding and Updating Users

There’s one big glitch here – how do you add or update user information if your webapp can’t access the user authn/authz tables?

The first approach is to create a separate webapp that handles this. Your main webapp can transparently redirect to the second webapp as necessary. The upside is that you can have a consistent look and feel, the downside is that you’re exposing user authn/authz information to the weeb again.

The second approach is to create a separate REST service that handles this. Your webapp can provide the user interface but call the REST service instead of the standard business layer. The REST service can be within your firewall.

The third approach is to defer this entirely to the container. This ensures maximum separation but makes it difficult to have a consistent look and feel.

Reference: Database And Webapp Security, Part 1: Threat Model, Database And Webapp Security, Part 2: SQL Injection, Database And Webapp Security, Part 3: SQL Injection In Stored Procedures, Database And Webapp Security, Part 4: Schema Ownership, Database And Webapp Security, Part 5: User Authentication from our JCG partner Bear Giles at the Invariant Properties blog.