File upload via /rest/resource interface results in unusable files.

Description

Upload of any kind of file via rest API PUT /rest/resource/<path to file> leaves this file unusable.
The upload itself is successful, request returns 200 http status code, the file is saved there where it is supposed to land, but the content of the file is still wrapped in http multipart body envelope and therefore cannot be parsed.
Only one file is uploaded per request. It does not matter if it is clear text or binary file, the effect is the same. The file being uploaded is not a zip file, so no auto-extracting is involved.

Steps to reproduce:

1. create an empty json file called test.json and paste following content inside:

2. upload that file to a already existing workspace "topp" using /rest/resource rest API:

3. now go to the data_dir/workspaces/topp directory and you will find test.json with it's content wrapped, like this:

Environment

GeoServer 2.14.0 running on Ubuntu 16.04 LTS
Tested also on GeoServer 2.13.1 and 2.13.2, with same results.

Activity

Show:
Ian Turton
November 5, 2018, 4:33 PM

You need to set your content-type to be application/json. But GeoServer
should probably either throw an exception if passed form encoded data or
unpack it, but not sure if we would know what format it is inside in that
case?

Miodrag Vidanovic
November 5, 2018, 4:52 PM
Edited

I have tried with setting the content-type, not only for this file type, but for others as well (png, jpg, css...). At first, very specific content types, then more generic ones, like text/plain in case of css files or application/octet-stream for binary files. Always the same result. The rest endpoint documentation even stated, I believe, that if you leave out content type header, it will be auto detected. But whatever content type I set, it is obviously overwritten with application/octet-stream.

Bottom line, it is a very simple action - upload the file and save that stream to the file system byte for byte using the supplied filename. And that's it - server does not have to parse these files, in order to extract, import to db or process them in any way. The only potential difference I could think of, is the difference between clear text and binary files. It should be the same action for pdf, json, xml or jpeg file. Somehow I have the feeling that somewhere in the code the save method has been called on the parent instead of a child object.

Ian Turton
November 5, 2018, 5:00 PM

You are using the explicit file option to curl rather than the correct
-data/-d option - see the curl man page:

-d, --data <data>

(HTTP) Sends the specified data in a POST request to the HTTP server, in
the same way that a browser does when a user has filled in an HTML form and
presses the submit button. This will cause curl to pass the data to the
server using the content-type application/x-www-form-urlencoded. Compare to
-F, --form.

--data-raw is almost the same but does not have a special interpretation of
the @ character. To post data purely binary, you should instead use the
--data-binary option. To URL-encode the value of a form field you may use
--data-urlencode.

If any of these options is used more than once on the same command line,
the data pieces specified will be merged together with a separating
&-symbol. Thus, using '-d name=daniel -d skill=lousy' would generate a post
chunk that looks like 'name=daniel&skill=lousy'.

If you start the data with the letter @, the rest should be a file name to
read the data from, or - if you want curl to read the data from stdin.
Multiple files can also be specified. Posting data from a file named
'foobar' would thus be done with -d, --data @foobar. When --data is told to
read from a file like that, carriage returns and newlines will be stripped
out. If you don't want the @ character to have a special interpretation use
--data-raw instead.

See also --data-binary and --data-urlencode and --data-raw. This option
overrides -F, --form and -I, --head and -T, --upload-file.

-F, --form <name=content>

(HTTP SMTP IMAP) For HTTP protocol family, this lets curl emulate a
filled-in form in which a user has pressed the submit button. This causes
curl to POST data using the Content-Type multipart/form-data according to
RFC 2388.

For SMTP and IMAP protocols, this is the mean to compose a multipart mail
message to transmit.

This enables uploading of binary files etc. To force the 'content' part to
be a file, prefix the file name with an @ sign. To just get the content
part from a file, prefix the file name with the symbol <. The difference
between @ and < is then that @ makes a file get attached in the post as a
file upload, while the < makes a text field and just get the contents for
that text field from a file.

Tell curl to read content from stdin instead of a file by using - as
filename. This goes for both @ and < constructs. When stdin is used, the
contents is buffered in memory first by curl to determine its size and
allow a possible resend. Defining a part's data from a named non-regular
file (such as a named pipe or similar) is unfortunately not subject to
buffering and will be effectively read at transmission time; since the full
size is unknown before the transfer starts, such data is sent as chunks by
HTTP and rejected by IMAP.

On Mon, 5 Nov 2018 at 16:57, Miodrag Vidanovic (JIRA) <


Ian Turton

Miodrag Vidanovic
November 5, 2018, 10:29 PM

You are right, I was ignoring the whole time that the PUT method is being used, not POST. Somehow I have assumed that post is used for upload - as restful resource creation - and that put is reserved for resource updates, like moving to another directory. I was using put in my python code the whole time, but in the manner of post method and was deceived by actually successful upload, where server should have thrown an exception because data is not supplied.

As far as I'm concerned, this ticket could be closed if you guys do not plan to build this exception in and/or put a couple of additional words in the swagger file about this.

Jukka Rahkonen
January 15, 2020, 7:54 AM

You made good suggestions for improvements but I am closing the ticket anyway. A pull request or a new ticket about how to improve the documentation would raise the visibility.

Assignee

Unassigned

Reporter

Miodrag Vidanovic

Triage

None

Fix versions

None

Affects versions

Components

Priority

High
Configure