Avoid unnecessary base64 encoding when fetching geometry from PostGIS

Description

When analyzing some GeoServer performance issues a couple of months ago, I noticed that GeoTools is requesting base64-encoded geometry from PostGIS:

https://github.com/geotools/geotools/blob/master/modules/plugin/jdbc/jdbc-postgis/src/main/java/org/geotools/data/postgis/PostGISDialect.java#L337

This creates a significant and unnecessary overhead over using `ST_AsBinary` directly, since base64 encoding takes a significant amount of time on Postgres and results in more data being transferred than simply calling `ST_AsBinary(geom)` and fetching the resulting `bytea` type using `ResultSet::getBytes`. From simple testing with psql, the base64 encoding may account for 30-50% of the query time for simple fetches.

I had hoped to prepare a pull request to show the issue, but I have spent several hours trying and failing to build GeoTools on multiple machines. I figure that raising the issue in JIRA is better than doing nothing.

Environment

None

Activity

Show:
Andrea Aime
January 30, 2019, 5:34 PM

I'm primarily worried about you not being able to build GeoTools here... it's built by different continuous integration server, AppVeyor for Windows, Travis and two separate Jenkins boxes for Linux, with both JDK 8 and JDK 11, and the build completes fine in all of them... How did it break for you?

About the base64 encoding, believe it or not, it's a performance optimization that has been developed and tested many years ago, because the PostgreSQL JDBC driver was doing only text transfer anyways, and was encoding binaries in a less efficient way than the base64 one. Nowadays it's possible to configure the JDBC driver to use binary transfer on a datatype by datatype basis, but I believe it's still explicity, and would have to be tested with various combinations of PostgreSQL/JDBC driver version, see https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters

Dan Baston
January 30, 2019, 5:59 PM

Interesting - the docs allude to a list of types for which binary transfer is enabled by default, but I can't find the actual list anywhere. I would assume that `bytea` would be included in this set, but who knows?

With regard to the build failure, I've run into different issues on each system I've tried. Currently I'm getting the following (though I don't want to pollute the ticket):

Andrea Aime
January 30, 2019, 6:14 PM

About the build failure, you probably have a corrupted jar somewhere in the maven repository, I cannot explain the class loading failure otherwise (besides the build servers, we have several developers committing daily, so it seems something unique to your machines, don't know what though, don't recall seeing a similar issue).

About the binary transfer, it might as well enabled by default now, not sure... but it would need to be tested. It's important that we get you going with the build, because if you don't do the patch, it might take a long time before anyone else has a paid contract on it, or decides to crush a Sunday on the topic.

Andrea Aime
February 13, 2019, 5:17 PM
Edited

I did a bit of side research while working on something else, reading roads of all Italy from a OSM derived database.
Removing the encoding makes the transfer slower, from around 16s up to 19s.
Debugging a bit in the driver code I saw that it should be defaulting to binary transfer for execution, but in fact, it ends up doing the hex encoding which takes more space than the base64 one.
From this reference, it seems one would have to use prepared statements, and use them a bunch of times, before binary transfer actually kicks in: https://github.com/pgjdbc/pgjdbc/issues/642
However we don't use prepared statements by default as the filtering params (the bbox) change a lot and a fixed plan results in very poor performance...
Now, there is this prepareThreshold parameter that can be used to disable server side prepare, but from the above it seems a server side prepare is what's needed to enable the creation of the metadata needed for binary transfer so... duh!

A presentation mentioning this (still have to read it fully): https://pgday.ru/presentation/160/5964966b4ba05.pdf

Anyways, more research is needed on this one, does not look like a quick change.

Andrea Aime
August 11, 2019, 9:03 AM

Latest versions of postgresql have better handling of prepared statement plans, so the part about the “silly plan” should be handled (finger crossed). However, it still took significant work to make the binary transfer work as expected. The benefit is not massive, but measurable (10 to 20%), in some OSM rendering setups:

Fixed

Assignee

Unassigned

Reporter

Dan Baston

Triage

None

Components

Affects versions

Priority

Medium
Configure