[cgiapp] file uploads and encodings

Todd Ross tar.lists at yahoo.com
Mon Oct 4 16:34:25 EDT 2010


Hello,

I think I have an impossible problem.  Or at least, it looks dire from where I'm 
sitting.

I support a website that accepts file uploads.  I accept uploads of all types 
from text/plain (csv) to image/jpeg to application/pdf; it's currently 
unconstrained.  The file upload happens over a very typical setup of:

<form enctype="multipart/form-data" method="post">
    <input type="file" name="my_file">
</form>

using CGI.pm for the form processing on the server.

Most file uploads are routed elsewhere for processing.  One of our targets is a 
COBOL application on z/OS and we need to perform some platform conversion.  
Namely, we need to convert text/plain files to EBCDIC.

In order to convert _to_ EBCDIC, I need to know what I'm converting _from_.  And 
therein lies my impossible problem; how does one determine the encoding of a 
file upload?  The browser does provide some information in the form of the file 
name and the mime type but neither would indicate whether the (text/plain) file 
was encoded with ISO-8859-1 or UTF-8 or something else entirely.

These are uploads from a variety of clients running on a variety of platforms, 
the details of which are largely unknown to me.  Consequently, I'm reluctant to 
assume any particular character encoding.

I can't imagine a character encoding field (or prompt) as being effective.  My 
users are business users not computer specialists.  They might be responsible 
for uploading the file, but they probably aren't responsible for creating it in 
the first place.

Thoughts?

Thanks,

Todd



      


More information about the cgiapp mailing list