Shapefile performance/memory usage regression: the DBF file is fully read in memory when reading features

Description

https://osgeo-org.atlassian.net/browse/GEOT-7127 introduced a change in ShapefileSetManager that makes it read the DBF file fully in memory, as a byte array, to then check the DBF header contents.

Since as DBF file can be as large as 4GB, and this operation is done every time we need to read features, it’s a serious memory usage regression, as well as a performance regression.

The code should be likely reading only the header, or maybe even better, avoid the preliminary check (since it’s done every time) and just be tolerant to broken DBF files during a normal read, while the DBF is opened, instead?

thoughts?

Environment

None

Activity

Ian Turton October 25, 2024 at 3:59 PM

Ian Turton October 23, 2024 at 9:07 AM

Yes, that’s definitely an unintended out come - let me look at it

Fixed

Details

Assignee

Reporter

Fix versions

Priority

Created October 23, 2024 at 8:53 AM
Updated October 27, 2024 at 9:50 AM
Resolved October 25, 2024 at 4:00 PM

Flag notifications