Unsynchronized access to ContentEntry.state

Description

I was looking at a running GeoServer (2.0.1, which means GeoTools 2.6.1) that had lots of transactions running against it and it started throwing NPEs in org.geotools.data.store.ContentEntry:131. I looked at that line:

130 ContentState auto = state.get(Transaction.AUTO_COMMIT);
131 ContentState copy = (ContentState) auto.copy(); // NPE
132 copy.setTransaction(transaction != ? transaction : Transaction.AUTO_COMMIT);
133 state.put(transaction, copy);

So it implies that 'auto' is null. However, the constructor says:

87 ContentState autoState = dataStore.createContentState();
88 autoState.setTransaction(Transaction.AUTO_COMMIT);
89 .state.put(Transaction.AUTO_COMMIT, autoState);

this.state.put is only called in two places: line 133 and line 89. Line 89 happens first, associating Transaction.AUTO_COMMIT with a newly-created (and dereferenced, so non-null) ContentState. 133 associates a given transaction, transaction, with the ContentState associated with Transaction.AUTO_COMMIT, which we already must be non-null. So there is no chance of autoState ever being null, but it clearly was in the live system. I did a heap dump and ran {[jhat}} (I still have the heap dump, so I can answer any questions you may have about it, but I can't post it) and saw that there were instances of ContentEntry whose state map had no association for Transaction.AUTO_COMMIT! This should have been impossible, so it leaves only the possibility of map corruption by unsynchronized access (ContentEntry.state is a plain HashMap).

If this really is a problem, it would explain some of the random failures we've been having....

I'm going to try and reproduce the problem on my own laptop and then try and make a patch to fix the problem if there are no objections.

Environment

None

Assignee

Unassigned

Reporter

codehaus

Triage

None

Components

Affects versions

Priority

Medium
Configure