Database Usage Practices

George JaneveJuly 24th, 2012Last Updated: October 22nd, 2012

5 24 3 minutes read

After a long period of intense thinking, research and brainstorming sessions, I have finally made up my mind with few best practices that can be considered while using a database.

All database servers today has capabilities to optimize the queries sent to the server, ironically, they lack the capability to alter query in such a way that the throughput is maximized. It is up to the developer to analyze and optimize the queries to attain maximum throughput. In this section we will be looking at some useful tips to optimize the queries to produce the optimal output.

Prefer prepared statements over create statements. Preparing statements does a syntax checks only once during the query execution, while create statements compiles the queries every time they are executed. Considering today’s servers and application requirements this might not be a critical factor since there is only a benefit of few milliseconds or even lesser. Another more important benefit of using prepared statements is the use of bind variables. The quotes and other characters, used for SQL injection and corrupting database, are taken care by the driver itself, hence improving the security of the data and the database server.
Using tools like EXPLAIN, ANALYZE. These are the most important tool for DBAs and Developers to optimize their queries (in posgresql). These commands will give a real-time analysis of how their queries perform. The EXPLAIN commands shows the query execution plan and we can identify bottlenecks in the plan and take appropriate measures to eliminate these bottlenecks.
Take advantage of the indexes. Database uses the indexing capabilities wherever it can. Some databases even rearranges the ‘where clause’ so that it can return the output much faster. It is always best if you have the where clause on the indexes first followed by other clauses.
Maintaining cache for read-only queries and storing master tables. Master tables are tables that are very rarely modified. Every query on these tables will result in I/O which is the slowest process in any machine. It is a best practice to cache the master records in the application so that it is stored in the RAM so that the records can be fetched much faster. If the master table itself is huge, it is a concern since it might eat up your RAM and might result in creating swap files which are even worse. In such scenarios, better extract the master records out of the main application and deploy a separate caching service for retrieving those records.
Use VACUUM frequently on transaction tables. Frequently update/delete of records of a table might result in lots of dead tuples. When a record is deleted the memory allocated for that record is not actually freed but marked as dead. This dead space can be used to store another record that might fit into that space. Too many dead tuples could result in higher query execution time and indexing of the tables. The VACUUM command fragments these dead tuples. Another command, FULL_VACUUM, will reclaim the memory allocated for the dead tuples.
Prefer BETWEEN rather than <= and >=. Almost all major database drivers will optimize the queries with <= and >=. But by using BETWEEN clauses, we are explicitly giving an optimized query to the driver. Even though this does not result in a huge difference in the response time, it could be a good point to consider in complex queries.
Use LIMIT 1 when getting a unique row or if just one record is required for processing.
Use LIMIT, OFFSET clauses while querying huge databases. This will reduce the query time since now the database server doesn’t have to scan the entire table.
Always specify the required field names while executing SELECT/INSERT queries. This would prevent the application from breaking when the schema of the table changes.
In where clause place the predicate that will eliminate the greatest number of rows first.

Reference: Database Usage Practices from our JCG partner George Janeve at the Janeve.Me blog.

Tags

George JaneveJuly 24th, 2012Last Updated: October 22nd, 2012

5 24 3 minutes read

This site uses Akismet to reduce spam. Learn how your comment data is processed.

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Janeve George

11 years ago

This article contains some
techniques which I used to optimize DB queries. If you know of any other good
techniques that you might have used to optimize SQL queries, then I would love to know about it.

Ries van Twisk

Using FULL_VACUUM is not recommended. I am missing functional indexes here.

Reply to Ries van Twisk

@google-db6f7d83d079df3025f48d7fbfc0f767:disqus I would like to know why you say FULL_VACUUM is not recommended.

Reply to Janeve George

After few investigation I figured out that FULL_VACUUM is not recommended since it would take more time than the VACUUM command. I would still say a “If you run the FULL_VACUUM in a strategic manner, this would improve the performance of the database and all subsequent FULL_VACUUM would execute much faster.” I am not saying that you should run this every day. It might be nice to run the command once every month or even once every quarter would also be fine. Another option is to do it along with any other scheduled downtime. Once you do it systematically, I… Read more »

rvt

Janeve, read this : http://wiki.postgresql.org/wiki/VACUUM_FULL

Database Usage Practices

Thank you!

George Janeve

Thank you!

Thank you!

Related Articles

Thank you!