After a long period of intense thinking, research and brainstorming sessions, I have finally made up my mind with few best practices that can be considered while using a database.
All database servers today has capabilities to optimize the queries sent to the server, ironically, they lack the capability to alter query in such a way that the throughput is maximized. It is up to the developer to analyze and optimize the queries to attain maximum throughput. In this section we will be looking at some useful tips to optimize the queries to produce the optimal output.
- Prefer prepared statements over create statements. Preparing statements does a syntax checks only once during the query execution, while create statements compiles the queries every time they are executed. Considering today’s servers and application requirements this might not be a critical factor since there is only a benefit of few milliseconds or even lesser. Another more important benefit of using prepared statements is the use of bind variables. The quotes and other characters, used for SQL injection and corrupting database, are taken care by the driver itself, hence improving the security of the data and the database server.
- Using tools like EXPLAIN, ANALYZE. These are the most important tool for DBAs and Developers to optimize their queries (in posgresql). These commands will give a real-time analysis of how their queries perform. The EXPLAIN commands shows the query execution plan and we can identify bottlenecks in the plan and take appropriate measures to eliminate these bottlenecks.
- Take advantage of the indexes. Database uses the indexing capabilities wherever it can. Some databases even rearranges the ‘where clause’ so that it can return the output much faster. It is always best if you have the where clause on the indexes first followed by other clauses.
- Maintaining cache for read-only queries and storing master tables. Master tables are tables that are very rarely modified. Every query on these tables will result in I/O which is the slowest process in any machine. It is a best practice to cache the master records in the application so that it is stored in the RAM so that the records can be fetched much faster. If the master table itself is huge, it is a concern since it might eat up your RAM and might result in creating swap files which are even worse. In such scenarios, better extract the master records out of the main application and deploy a separate caching service for retrieving those records.
- Use VACUUM frequently on transaction tables. Frequently update/delete of records of a table might result in lots of dead tuples. When a record is deleted the memory allocated for that record is not actually freed but marked as dead. This dead space can be used to store another record that might fit into that space. Too many dead tuples could result in higher query execution time and indexing of the tables. The VACUUM command fragments these dead tuples. Another command, FULL_VACUUM, will reclaim the memory allocated for the dead tuples.
- Prefer BETWEEN rather than <= and >=. Almost all major database drivers will optimize the queries with <= and >=. But by using BETWEEN clauses, we are explicitly giving an optimized query to the driver. Even though this does not result in a huge difference in the response time, it could be a good point to consider in complex queries.
- Use LIMIT 1 when getting a unique row or if just one record is required for processing.
- Use LIMIT, OFFSET clauses while querying huge databases. This will reduce the query time since now the database server doesn’t have to scan the entire table.
- Always specify the required field names while executing SELECT/INSERT queries. This would prevent the application from breaking when the schema of the table changes.
- In where clause place the predicate that will eliminate the greatest number of rows first.