Quartz scheduler misfire instructions explained

Tomasz NurkiewiczApril 12th, 2012Last Updated: October 21st, 2012

1 437 5 minutes read

Sometimes Quartz is not capable of running your job at the time when you desired. There are three reasons for that:

all worker threads were busy running other jobs (probably with higher priority)
the scheduler itself was down
the job was scheduled with start time in the past (probably a coding error)

You can increase the number of worker threads by simply customizing the org.quartz.threadPool.threadCount in quartz.properties (default is 10). But you cannot really do anything when the whole application/server/scheduler was down. The situation when Quartz was incapable of firing given trigger is called misfire. Do you know what Quartz is doing when it happens? Turns out there are various strategies (called misfire instructions) Quartz can take and also there are some defaults if you haven’t thought about it. But in order to make your application robust and predictable (especially under heavy load or maintenance) you should really make sure your triggers and jobs are configured conciously.

There are different configuration options (available misfire instructions) depending on the trigger chosen. Also Quartz behaves differently depending on trigger setup (so called smart policy). Although the misfire instructions are described in the documentation, I found it hard to understand what do they really mean. So I created this small summary article.

Before I dive into the details, there is yet another configuration option that should be described. It is org.quartz.jobStore.misfireThreshold (in milliseconds), defaulting to 60000 (a minute). It defines how late the trigger should be to be considered misfired. With default setup if trigger was suppose to be fired 30 seconds ago, Quartz will happily just run it. Such delay is not considered misfiring. However if the trigger is discovered 61 seconds after the scheduled time – the special misfire handler thread takes care of it, obeying the misfire instruction. For test purposes we will set this parameter to 1000 (1 second) so that we can test misfiring quickly.

Simple trigger without repeating

In our first example we will see how misfiring is handled by simple triggers scheduled to run only once:

val trigger = newTrigger().
        startAt(DateUtils.addSeconds(new Date(), -10)).
        build()

The same trigger but with explicitly set misfire instruction handler:

val trigger = newTrigger().
        startAt(DateUtils.addSeconds(new Date(), -10)).
        withSchedule(
            simpleSchedule().
                withMisfireHandlingInstructionFireNow()  //MISFIRE_INSTRUCTION_FIRE_NOW
            ).
        build()

For the purpose of testing I am simply scheduling the trigger to run 10 seconds ago (so it is 10 seconds late by the time it is created!) In real world you would normally never schedule triggers like that. Instead imagine the trigger was set correctly but by the time it was scheduled the scheduler was down or didn’t have any free worker threads. Nevertheless, how will Quartz handle this extraordinary situation? In the first code snippet above no misfire handling instruction is set (so called smart policy is used in that case). The second code snippet explicitly defines what kind of behaviour do we expect when misfiring occurs. See the table:

Simple trigger repeating fixed number of times

This scenario is much more complicated. Imagine we have scheduled some job to repeat fixed number of times:

val trigger = newTrigger().
    startAt(dateOf(9, 0, 0)).
    withSchedule(
        simpleSchedule().
            withRepeatCount(7).
            withIntervalInHours(1).
            WithMisfireHandlingInstructionFireNow()  //or other
    ).
    build()

In this example the trigger is suppose to fire 8 times (first execution + 7 repetitions) every hour, beginning at 9 AM today ( startAt(dateOf(9, 0, 0)). Thus the last execution should occur at 4 PM. However assume that due to some reason the scheduler was not capable of running jobs at 9 and 10 AM and it discovered that fact at 10:15 AM, i.e. 2 firings misfired. How will the scheduler behave in this situation?

Simple trigger repeating infinitely

In this scenario trigger repeats infinite number of times at a given interval:

val trigger = newTrigger().
    startAt(dateOf(9, 0, 0)).
    withSchedule(
        simpleSchedule().
            withRepeatCount(SimpleTrigger.REPEAT_INDEFINITELY).
            withIntervalInHours(1).
            WithMisfireHandlingInstructionFireNow()  //or other
    ).
    build()

Once again trigger should fire on every hour, beginning at 9 AM today ( startAt(dateOf(9, 0, 0)). However the scheduler was not capable of running jobs at 9 and 10 AM and it discovered that fact at 10:15 AM, i.e. 2 firings misfired. This is a more general situation compared to simple trigger running fixed number of times.

CRON triggers

CRON triggers are the most popular ones amongst Quartz users. However there are also two other available triggers: `DailyTimeIntervalTrigger` (e.g. fire every 25 minutes) and `CalendarIntervalTrigger` (e.g. fire every 5 months). They support triggering policies not possible in both CRON and simple triggers. However they understand the same misfire handling instructions as CRON trigger.

val trigger = newTrigger().
 withSchedule(
  cronSchedule("0 0 9-17 ? * MON-FRI").
   withMisfireHandlingInstructionFireAndProceed()  //or other
 ).
 build()

In this example the trigger should fire every hour between 9 AM and 5 PM, from Monday to Friday. But once again first two invocations were missed (so the trigger misfired) and this situation was discovered at 10:15 AM. Note that available misfire instructions are different compared to simple triggers:

^QTZ-283Note: QTZ-283: MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY not working with JDBCJobStore – apparently there is a bug when JDBCJobStore is used, keep an eye on that issue.

As you can see various triggers behave differently based on the actual setup. Moreover, even though the so called smart policy is provided, often the decision is based on business requirements. Essentially there are three major strategies: ignore, run immediately and continue and discard and wait for next. They all have different use-cases:

Use ignore policies when you want to make sure all scheduled executions were triggered, even if it means multiple misfired triggers will fire. Think about a job that generates report every hour based on orders placed during that last hour. If the server was down for 8 hours, you still want to have that reports generated, as soon as you can. In this case the ignore policies will simply run all triggers scheduled during that 8 hour as fast as scheduler can. They will be several hours late, but will eventually be executed.

Use now* policies when there are jobs executing periodically and upon misfire situation they should run as soon as possible, but only once. Think of a job that cleans /tmp directory every minute. If the scheduler was busy for 20 minutes and finally can run this job, you don’t want to run in 20 times! One is enough, but make sure it runs as fast it can. Then back to your normal one-minute intervals.

Finally next* policies are good when you want to make sure your job runs at particular points in time. For example you need to fetch stock prices quarter past every hour. They change rapidly so if your job misfired and it is already 20 minutes past full hour, don’t bother. You missed the correct time by 5 minutes and now you don’t really care. It is better to have a gap rather than an inaccurate value. In this case Quartz will skip all misfired executions and simply wait for the next one.

Reference: Quartz scheduler misfire instructions explained from our JCG partner Tomasz Nurkiewicz at the Java and neighbourhood blog.