WPS execution time limits test randomly breaks builds
Description
Environment
Activity
I have @Ignored gs-wps-core InputLimitsTest.testAsyncExecutionLimit on
master 2.11.x, and 2.10.x. This should stop the intermittent build failures.
Kind regards,
Ben.
–
Ben Caradoc-Davies <ben@transient.nz>
Director
Transient Software Limited <http://transient.nz/>
New Zealand
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel
A review of the seven failed builds on ares Jenkins for which we still have logs (back to 13 April) indicates that InputLimits.testAsyncExecutionLimit
is the test we still see failing. All the other test*Limit
methods are @Ignored
, except testSyncTotalLimit
but failures of that method are not observed.
http://ares.boundlessgeo.com/jenkins/job/geoserver-master/4716/consoleText
http://ares.boundlessgeo.com/jenkins/job/geoserver-2.11.x/61/consoleText
http://ares.boundlessgeo.com/jenkins/job/geoserver-master/4700/consoleText
http://ares.boundlessgeo.com/jenkins/job/geoserver-2.11.x/48/consoleText
http://ares.boundlessgeo.com/jenkins/job/geoserver-master/4605/consoleText
http://ares.boundlessgeo.com/jenkins/job/geoserver-2.11.x/30/consoleText
http://ares.boundlessgeo.com/jenkins/job/geoserver-2.11.x/26/consoleText
I have been able to get testAsyncTotalLimit to fail intermittently locally just by continually re-running the test.
When the test fails I get the following output:
16 Nov 15:57:38 ERROR [wps.executor] - Process execution failed
org.geotools.process.ProcessException: org.geoserver.wps.ProcessDismissedException: The process executed got interrupted because it went beyond the configured limits of maxExecutionTime 2 seconds, maxTotalTime 3 seconds
at org.geotools.process.factory.AnnotationDrivenProcessFactory$InvokeMethodProcess.execute(AnnotationDrivenProcessFactory.java:565)
at org.geoserver.wps.executor.ProcessStartupFilter$ProcessStartupWrapper.execute(ProcessStartupFilter.java:51)
...
16 Nov 15:57:38 ERROR [geoserver.ows] -
org.geoserver.wps.WPSException: The execution is completed with status FAILED and yet the response cannot be located on disk, this is an internal failure
at org.geoserver.wps.GetStatus.run(GetStatus.java:50)
at org.geoserver.wps.DefaultWebProcessingService.getExecutionStatus(DefaultWebProcessingService.java:102)
at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:302)
...
junit.framework.AssertionFailedError: Expecting to find matches for Xpath //ows:ExceptionText[contains(., 'maxExecutionTime 2 seconds, maxTotalTime 3 seconds')]
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at org.custommonkey.xmlunit.XMLAssert.assertXpathExists(XMLAssert.java:901)
at org.geoserver.wps.InputLimitsTest.testAsyncTotalLimit(InputLimitsTest.java:622)
Adding some logging statements at the relevant parts of the code shows me the sequence of events when this occurs is as follows:
testAsyncTotalLimit -> waitForProcessEnd(statusLocationX3, 10);
waitForProcessEnd -> request status
GetStatus -> status = RUNNING
waitForProcessEnd -> status returned
...
org.geotools.process.ProcessException: org.geoserver.wps.ProcessDismissedException: The process executed got interrupted because it went beyond the configured limits of maxExecutionTime 2 seconds, maxTotalTime 3 seconds
WPSExecutionManager -> notifying failure
waitForProcessEnd -> request status
WPSExecutionManager -> failure notified
GetStatus -> status = FAILED
org.geoserver.wps.WPSException: The execution is completed with status FAILED and yet the response cannot be located on disk, this is an internal failure
WPSExecutionManager -> writing response to disk
WPSExecutionManager -> response written to disk
waitForProcessEnd -> status returned
junit.framework.AssertionFailedError: Expecting to find matches for Xpath //ows:ExceptionText[contains(., 'maxExecutionTime 2 seconds, maxTotalTime 3 seconds')]
Whereas when the test completes successfully the sequence of events is something like:
testAsyncTotalLimit -> waitForProcessEnd(statusLocationX3, 10);
waitForProcessEnd -> request status
GetStatus -> status = RUNNING
waitForProcessEnd -> status returned
...
org.geotools.process.ProcessException: org.geoserver.wps.ProcessDismissedException: The process executed got interrupted because it went beyond the configured limits of maxExecutionTime 2 seconds, maxTotalTime 3 seconds
WPSExecutionManager -> notifying failure
WPSExecutionManager -> failure notified
WPSExecutionManager -> writing response to disk
waitForProcessEnd -> request status
WPSExecutionManager -> response written to disk
GetStatus -> status = FAILED
GetStatus -> stored response returned
waitForProcessEnd -> status returned
So, it looks to me like the test is exposing a bug in the handling of process exceptions where it is possible for a status request to return an exception due to an attempt to read the response written to disk before the response has been written to disk.
The relevant section of code being:
and:
The status request which fails is running in between these two lines of code.
HTH.
Error:
testAsyncTotalLimit(org.geoserver.wps.InputLimitsTest) Time elapsed: 3488 sec <<< FAILURE! junit.framework.AssertionFailedError: Expecting to find matches for Xpath //ows:ExceptionText[contains(., 'maxExecutionTime 2 seconds, maxTotalTime 3 seconds')] at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.assertTrue(Assert.java:22) at org.custommonkey.xmlunit.XMLAssert.assertXpathExists(XMLAssert.java:901) at org.geoserver.wps.InputLimitsTest.testAsyncTotalLimit(InputLimitsTest.java:622) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Cannot reproduce locally. Will disable and leave this ticket open to analyze and hopefully fix for good during a bug fix sprint (already spent significant time trying to make it work when the test was contributed by a third party along the functionality the test covers, the rate of failure has risen since then though)