[cgiapp] file uploads and encodings
Todd Ross
tar.lists at yahoo.com
Mon Oct 4 16:34:25 EDT 2010
Hello,
I think I have an impossible problem. Or at least, it looks dire from where I'm
sitting.
I support a website that accepts file uploads. I accept uploads of all types
from text/plain (csv) to image/jpeg to application/pdf; it's currently
unconstrained. The file upload happens over a very typical setup of:
<form enctype="multipart/form-data" method="post">
<input type="file" name="my_file">
</form>
using CGI.pm for the form processing on the server.
Most file uploads are routed elsewhere for processing. One of our targets is a
COBOL application on z/OS and we need to perform some platform conversion.
Namely, we need to convert text/plain files to EBCDIC.
In order to convert _to_ EBCDIC, I need to know what I'm converting _from_. And
therein lies my impossible problem; how does one determine the encoding of a
file upload? The browser does provide some information in the form of the file
name and the mime type but neither would indicate whether the (text/plain) file
was encoded with ISO-8859-1 or UTF-8 or something else entirely.
These are uploads from a variety of clients running on a variety of platforms,
the details of which are largely unknown to me. Consequently, I'm reluctant to
assume any particular character encoding.
I can't imagine a character encoding field (or prompt) as being effective. My
users are business users not computer specialists. They might be responsible
for uploading the file, but they probably aren't responsible for creating it in
the first place.
Thoughts?
Thanks,
Todd
More information about the cgiapp
mailing list