Performance improvement for reading large/complex shapefile polygons

Description

By spec, encoding of polygin rings in shapefiles do not follow an order. That is, it's not guaranteed that for a given outer ring, its inner rings (holes) follow. Rings can be encoded in any order.
This forces the reader implementation to parse all outer and inner rings to then find the parent outer ring for all holes, which is computationally intensive.
For example, for a test polygon with 25,568 shells, 37,667 holes, and 1,544,686 total number of coordinates, it takes only ~40 ms to read all coordinates, ~90 ms to create all the shells and holes LinearRings, and ~150 seconds to assign all holes to their corresponding shell.

Given the nature of the format, it's almost impossible to improve that by an order of magnitude, but a 2x/3x improvement is certainly possible.

Environment

None

Assignee

Gabriel Roldan

Reporter

Gabriel Roldan

Triage

None

Components

Priority

Medium
Configure