WPS execution time limits test randomly breaks builds

Description

Error:

testAsyncTotalLimit(org.geoserver.wps.InputLimitsTest) Time elapsed: 3488 sec <<< FAILURE! junit.framework.AssertionFailedError: Expecting to find matches for Xpath //ows:ExceptionText[contains(., 'maxExecutionTime 2 seconds, maxTotalTime 3 seconds')] at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.assertTrue(Assert.java:22) at org.custommonkey.xmlunit.XMLAssert.assertXpathExists(XMLAssert.java:901) at org.geoserver.wps.InputLimitsTest.testAsyncTotalLimit(InputLimitsTest.java:622) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Cannot reproduce locally. Will disable and leave this ticket open to analyze and hopefully fix for good during a bug fix sprint (already spent significant time trying to make it work when the test was contributed by a third party along the functionality the test covers, the rate of failure has risen since then though)

Environment

None

Activity

Ben Caradoc-Davies
May 12, 2017 at 12:02 AM

I have @Ignored gs-wps-core InputLimitsTest.testAsyncExecutionLimit on
master 2.11.x, and 2.10.x. This should stop the intermittent build failures.

Kind regards,
Ben.


Ben Caradoc-Davies <ben@transient.nz>
Director
Transient Software Limited <http://transient.nz/>
New Zealand

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Geoserver-devel mailing list
Geoserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-devel

Ben Caradoc-Davies
May 11, 2017 at 11:48 PM
(edited)
Craig Jones
November 16, 2016 at 5:23 AM

I have been able to get testAsyncTotalLimit to fail intermittently locally just by continually re-running the test.

When the test fails I get the following output:

16 Nov 15:57:38 ERROR [wps.executor] - Process execution failed org.geotools.process.ProcessException: org.geoserver.wps.ProcessDismissedException: The process executed got interrupted because it went beyond the configured limits of maxExecutionTime 2 seconds, maxTotalTime 3 seconds at org.geotools.process.factory.AnnotationDrivenProcessFactory$InvokeMethodProcess.execute(AnnotationDrivenProcessFactory.java:565) at org.geoserver.wps.executor.ProcessStartupFilter$ProcessStartupWrapper.execute(ProcessStartupFilter.java:51) ... 16 Nov 15:57:38 ERROR [geoserver.ows] - org.geoserver.wps.WPSException: The execution is completed with status FAILED and yet the response cannot be located on disk, this is an internal failure at org.geoserver.wps.GetStatus.run(GetStatus.java:50) at org.geoserver.wps.DefaultWebProcessingService.getExecutionStatus(DefaultWebProcessingService.java:102) at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:302) ... junit.framework.AssertionFailedError: Expecting to find matches for Xpath //ows:ExceptionText[contains(., 'maxExecutionTime 2 seconds, maxTotalTime 3 seconds')] at junit.framework.Assert.fail(Assert.java:57) at junit.framework.Assert.assertTrue(Assert.java:22) at org.custommonkey.xmlunit.XMLAssert.assertXpathExists(XMLAssert.java:901) at org.geoserver.wps.InputLimitsTest.testAsyncTotalLimit(InputLimitsTest.java:622)

Adding some logging statements at the relevant parts of the code shows me the sequence of events when this occurs is as follows:

testAsyncTotalLimit -> waitForProcessEnd(statusLocationX3, 10); waitForProcessEnd -> request status GetStatus -> status = RUNNING waitForProcessEnd -> status returned ... org.geotools.process.ProcessException: org.geoserver.wps.ProcessDismissedException: The process executed got interrupted because it went beyond the configured limits of maxExecutionTime 2 seconds, maxTotalTime 3 seconds WPSExecutionManager -> notifying failure waitForProcessEnd -> request status WPSExecutionManager -> failure notified GetStatus -> status = FAILED org.geoserver.wps.WPSException: The execution is completed with status FAILED and yet the response cannot be located on disk, this is an internal failure WPSExecutionManager -> writing response to disk WPSExecutionManager -> response written to disk waitForProcessEnd -> status returned junit.framework.AssertionFailedError: Expecting to find matches for Xpath //ows:ExceptionText[contains(., 'maxExecutionTime 2 seconds, maxTotalTime 3 seconds')]

Whereas when the test completes successfully the sequence of events is something like:

testAsyncTotalLimit -> waitForProcessEnd(statusLocationX3, 10); waitForProcessEnd -> request status GetStatus -> status = RUNNING waitForProcessEnd -> status returned ... org.geotools.process.ProcessException: org.geoserver.wps.ProcessDismissedException: The process executed got interrupted because it went beyond the configured limits of maxExecutionTime 2 seconds, maxTotalTime 3 seconds WPSExecutionManager -> notifying failure WPSExecutionManager -> failure notified WPSExecutionManager -> writing response to disk waitForProcessEnd -> request status WPSExecutionManager -> response written to disk GetStatus -> status = FAILED GetStatus -> stored response returned waitForProcessEnd -> status returned

So, it looks to me like the test is exposing a bug in the handling of process exceptions where it is possible for a status request to return an exception due to an attempt to read the response written to disk before the response has been written to disk.

The relevant section of code being:

https://github.com/geoserver/geoserver/blob/master/src/extension/wps/wps-core/src/main/java/org/geoserver/wps/executor/WPSExecutionManager.java#L426

and:

https://github.com/geoserver/geoserver/blob/master/src/extension/wps/wps-core/src/main/java/org/geoserver/wps/executor/WPSExecutionManager.java#L471

The status request which fails is running in between these two lines of code.

HTH.

Details

Assignee

Reporter

Triage

Priority

Created November 3, 2016 at 5:49 PM
Updated May 12, 2017 at 12:02 AM