History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: QUARTZ-420
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: James House
Reporter: Jasper Rosenberg
Votes: 1
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Quartz Scheduler

SmartPolicy Missfire Instruction in a SimpleTrigger can cause RuntimeException if after end time

Created: 26/Apr/06 07:56 AM   Updated: 24/Dec/07 01:30 AM
Component/s: Triggers
Affects Version/s: 1.6, 1.5.2
Fix Version/s: 1.6


 Description  « Hide
This bug was originally reported in the forum here: http://forums.opensymphony.com/thread.jspa?forumID=11&threadID=27984

The post includes a patch.

in case :
+ getRepeatCount() > 0 AND != REPEAT_INDEFINITELY,
+ when SimpleTrigger.endTime != NULL AND set BEFORE "now".

what happens :
- if RepeatCount() == 0 : the job is executed only one time.
- else if RepeaCount() == SimpleTrigger.REPEAT_INDEFINITELY : the job isn't executed.
- else (RepeatCount() > 0) : RuntimeException End time cannot be before start time
java.lang.IllegalArgumentException: End time cannot be before start time
at org.quartz.SimpleTrigger.setStartTime(SimpleTrigger.java:323)
at org.quartz.SimpleTrigger.updateAfterMisfire(SimpleTrigger.java:512)
for example :
- if SimpleTrigger.endTime == "yesterday,12:00" and if RepeatCount() == 5 :
or if the server crash before the endTime, when restarting
> it ends to this RuntimeException

re-producting :
- we may schedule a job whose trigger.endTime is before "now".
see following files :
~ TestToShowException class (to instance the scheduler ..)
~ HelloJob class (a simple job)
> when starting the scheduler, it will cause the same effect.

possible causes :
- In file "SimpleTrigger.java", updateAfterMisfire(),
case "MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_EXISTING_REPEAT_COUNT" :
function setStartTime(newFireTime) is called without verifying if (newFireTime < endTime).

 All   Comments   Change History      Sort Order:
Jasper Rosenberg - [26/Apr/06 01:21 PM ]
I didn't want to set the end time to null as suggested in the patch, because the end time has truely expired, so changing it to unbounded seems like it will not lead to the desired behavior.

Instead, I copied the same behavior as for MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_REMAINING_REPEAT_COUNT, where end time is set to 50 ms in the future, which allows the startTime to be moved, but should not allow the trigger to fire again, no matter what the remaining repeat count is.

While I was in there I also noticed and removed some redundant code in getFinalFireTime().

Guillaume Dufloux - [27/Apr/06 09:19 AM ]
I've seen your changes, and there's no more RuntimeException with it :

<-- present code --
if(getEndTime() != null && getEndTime().before(newFireTime)) {
                setEndTime(new Date(newFireTime.getTime() + 50));
            }
            setStartTime(newFireTime);
            setNextFireTime(newFireTime);
-->
 
However, I didn't want either to set the END time to null, but i suggested to set NEXT time to null, as follow :

<-- suggested code --
if(getEndTime() != null && getEndTime().before(newFireTime)) {
             setNextFireTime(null); // cause endTime has passed
            }
else {
                setStartTime(newFireTime); // reschedule "now"
                setNextFireTime(newFireTime);
            }
-->

I mean that we can't either set end time to 50 ms, cause if end time has been set (for some reason), then the job musn't be executed later.
Then, in case end time is passed, no matter job has been misfired or not, next fire time must be set to null (to stop triggering).
Futhermore, setting start time to "newFireTime" must be done only if end time isn't passed -> in the else{} part.
(there's no reason why to do it if triggering is stopped)
To sum up, we may consider that firing again is only expected in case end time has not passed.

----------------------------------------------------------------------------------------------

While I was in there (too;), i noticed some odd code in getFireTimeAfter() :
there's a major problem and some simplification to point out :

<-- present code --
public Date getFireTimeAfter(Date afterTime) {

if (complete)
                return null;
        if ((timesTriggered > repeatCount) && (repeatCount != REPEAT_INDEFINITELY))
                return null;
        if (afterTime == null)
                afterTime = new Date();
        if (repeatCount == 0 && afterTime.compareTo(getStartTime()) >= 0)
                return null;

long startMillis = getStartTime().getTime();
        long afterMillis = afterTime.getTime();
        long endMillis = (getEndTime() == null) ? Long.MAX_VALUE : getEndTime().getTime();
        if (endMillis <= afterMillis)
                return null;
        if (startMillis < afterMillis && repeatCount == 0)
                return null;
        if (afterMillis < startMillis)
                return new Date(startMillis);

long numberoftimesexecutedplusone = ((afterMillis - startMillis) / repeatInterval) + 1;
        if ((numberoftimesexecutedplusone > repeatCount) && (repeatCount != REPEAT_INDEFINITELY))
                return null;
        Date time = new Date((numberoftimesexecutedplusone * repeatInterval) + startMillis);
        if (endMillis <= time.getTime())
                return null;

return time;
    }
-->

1/ First, look at : numberoftimesexecutedplusone = ((afterMillis - startMillis) / repeatInterval) + 1
And then : time = new Date((numberoftimesexecutedplusone * repeatInterval) + startMillis)
In fact : time = new Date (afterMillis + repeatInterval)

2/ Secondly, i think ((afterMillis - startMillis) / repeatInterval)+1 is more "the number of times executed" than "the number of times executed plus one" :
see this example :
     afterMillis = 3.4
     startMillis = 0
     repeatInterval = 1
     then ((3.4 - 0)/1)+1 = 4.4
     However, was executed at t= 0, 1, 2 and 3 : so, was executed 4 times (one first, and 3 repeats)
     supposing repeatCount = 4, we shouldn't have to stop (cause should be executed 1+4 repeat times)
     but ((3.4 - 0)/1)+1 = 4.4 > 4 :
So with present code, it return null whereas it should be triggered one more time after afterTime

Explanations : the integer part of ((afterMillis - startMillis) / repeatInterval)+1 is the number of times executed (exactly).
     (4, in the example)
Assuming (repeatCount + 1) is the max number of times to execute (repeatCount + first triggering) we want :
     number of times executed <= (repeatCount + 1)
So, we want :
     integer part of ( ((afterMillis - startMillis) / repeatInterval)+1 ) <= (repeatCount + 1)
So we must return null in case :
     ( number of times executed > (repeatCount + 1) )
So we must return null in case :
     integer part of ( ((afterMillis - startMillis) / repeatInterval)+1 ) > (repeatCount+1)
Then, we must return null since :
     ( ((afterMillis - startMillis) / repeatInterval)+1 ) > (repeatCount+1)
                                                                            ^ ^
to compare with existing code :
     ( ((afterMillis - startMillis) / repeatInterval)+1 ) > (repeatCount )
                                                                            ^ ^
To sum up : I think we have to replace this code : (that lead to an error, and may be confused)

<-- present code --
        long numberoftimesexecutedplusone = ((afterMillis - startMillis) / repeatInterval) + 1;
        if ((numberoftimesexecutedplusone > repeatCount) && (repeatCount != REPEAT_INDEFINITELY))
                return null;
        Date time = new Date((numberoftimesexecutedplusone * repeatInterval) + startMillis);
-->

with this one : (that seems me correct, and more understandable)

<-- suggested code --
        long numberoftimesexecuted = ((afterMillis - startMillis) / repeatInterval) + 1; // here just change the name
        if ((numberoftimesexecuted > repeatCount+1) && (repeatCount != REPEAT_INDEFINITELY)) // here correct the error (2/)
                return null;
        Date time = new Date(afterMillis + repeatInterval ); // here simplify calculus (1/)
-->

And to verify by testing : (with previous example)
     ((3.4 - 0)/1)+1 = 4.4 <4+1=5
So with suggested code, it will not return null (and it's ok cause it have to trigger again)

--
I think confusion has occured with "repeatCount" and "times to be executed", which differ by one time.

Regards,

Guillaume

Jasper Rosenberg - [27/Apr/06 01:36 PM ]
First, let me apologize for misreading your original patch. That will teach me to read it by hand rather than apply it and using a more friendly diff application :)

I do see your argument that for MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_EXISTING_REPEAT_COUNT and MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_REMAINING_REPEAT_COUNT the behavior when one is after the end time should be don't fire at all. However, I can see an equal argument that it is "RESCHEDULE_NOW" and so it might be expected to fire regardless of endTime. This interpretation is also somewhat consistent with the fact that these misfire instructions also ignore Calendars.

I didn't write SimpleTrigger originally, so when I went to fix the issue which was already handled in the MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_REMAINING_REPEAT_COUNT case, I decided to error in the direction of least regression impact and follow the same interpretation as the original author (fire now regardless of end time).

I hate to suggest adding more misfire instruction options as I think they are already a bit of source of confusion, but certainly one could say that both these options are legitimate and split the cases by adding two new instructions: MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_EXISTING_REPEAT_COUNT_OBEYING_END_TIME and MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_REMAINING_REPEAT_COUNT_OBEYING_END_TIME. (How is that for an ugly name! :)

I actually originally toyed with a third implementation option which was:


            if(getEndTime() == null || getEndTime().after(newFireTime)) {
setStartTime(newFireTime);
            }
            
setNextFireTime(newFireTime);

My thought was that we don't need to bother moving start time at all if we aren't going to fire ever again anyway (as you also mentioned). It also is conceptually more consistent I feel, as it makes it very explicit that the misfire is outside of the original window. Another reason I find this approach appealing is because it means that if this misfire rescheduling misfires itself, only the next fire time is moved again.

One of the reasons I decided against this was that if one doesn't move the end time, then a call to getFinalFireTime() will return the original final fire time, rather than the rescheduled fire time. Not sure if that is relevant though since the system never calls getFinalFireTime(), and there shouldn't be a window for it to be called by an application. (nor am I sure that wouldn't be the desired behavior in any case, but I am trying to be very careful to always maintain backwards compatibility to prior releases.)

No matter what approach we end up taking, it is clear that the behavior needs to be made more explicit in the javadoc for the misfire instructions.

So, having blathered on for a while here, a proposed set of changes might be:

1. Split the misfire instructions
2. Update the javadocs to make the distinction explicit
3. Modify the existing cases to use my alternative implementation above that just moves the next fire time and leaves the start/end times alone.

Thoughts?

Note: I haven't had a chance yet to review the second part of your comment yet regarding getFireTimeAfter(); I may end up splitting it into a subtask.

Jasper Rosenberg - [27/Apr/06 02:34 PM ]
Regarding getFireTimeAfter():

First off, the Millis are longs, so we can't have fractional values for them like "3.4".

Second, because it is long arithmatic, there is no decimal component to numberoftimesexecutedplusone (Java will trunc the value), so if I multiplied your values by 10 to mimic your example:

afterMillis = 34
     startMillis = 0
     repeatInterval = 10
     then ((34 - 0)/10) + 1 = 3 + 1 = 4

However, was executed at t= 0, 10, 20 and 30 : so, was executed 4 times (one first, and 3 repeats)
     supposing repeatCount = 4, we shouldn't have to stop (cause should be executed 1+4 repeat times)
     and ((34 - 0)/10) + 1 = 4 <= 4 so we will not stop.

So, luckily, it looks like the calculation itself is correct. However, I fully agree with you that the name "numberoftimesexecutedplusone" is innacurate. It should either be: numberOfTimesExecuted or numberOfTimesRepeatedPlusOne. I personally lean towards the former as you did.


Jasper Rosenberg - [28/Apr/06 05:44 AM ]
Posting on behalf of James:

My opinion is that "End time" should really be end time. If the repeat
count is what mattered, then the user should leave end time null. If they bother to set it, that means they want no firings after it.

james

-----------

That being the case, I will:

1. Make the change originally proposed by Guillaume
2. Update the javadocs for these two misfire instructions to make the behavior clear

Guillaume Dufloux - [28/Apr/06 07:04 AM ]
First, sorry for my long arithmatic and Java trunc behaviour knowledge (every day we learn about ;)
I'll try to work on smart policy in order to suggest policies behaviours based on these updated instructions.
I'll also commit back tests i will done on updated class (to check)

Regards,

guillaume

Jasper Rosenberg - [28/Apr/06 08:12 AM ]
That would be great, I'm always happy to add more unit tests to the code base. Thanks for all your help.

Jasper Rosenberg - [28/Apr/06 08:15 AM ]
BTW, I'm not clear on why you now want to change the smart policty behavior?