[cgiapp] Problem displaying French, sometimes

Ron Savage ron at savage.net.au
Sun Sep 7 20:21:21 EDT 2008


Hi Peter

On Sat, 2008-09-06 at 20:49 -0500, Peter Karman wrote:
> Ron Savage wrote on 9/5/08 7:51 PM:
> > Hi Folks
> > 
> > Here is the set up (details below):
> > o An fcgid scripts calls...
> > o A module based on CGI::Application::Dispatch, which calls...
> > o My module, which reads country names from Postgres and displays them
> > 
> > This works, so Ivory Coast is displayed as 'CÔte D'ivoire' (ignoring the
> > upper-case O with caret for the moment).
> > 
> > But when the first module above is installed as a mod_perl handler,
> > and /that/ calls my module, the output is 'CÔte D'ivoire'.
> > 
> > I find this scary, and would love an explanantion.
> > 
> 
> Sounds like a typical encoding issue. The 'bad' display above is likely because
> you are sending utf8 encoded strings to the browser but claim that the charset
> is latin1.
> 
> IMO, the best route is all utf8, all the time. Store strings encoded as utf8 in
> your db, send utf8 to the browser, and encode/decode at your program boundaries.
> It's a real b*tch to track down the problem spots in a multiple-encoding set up.
> That's why I wrote Search::Tools::UTF8 to help me. If I'm having trouble, I
> usually throw a to_utf8() function call at suspect strings and make sure I
> declare utf8 as my charset in all my http headers and output.

Nice to know about Search::Tools::UTF8. Thanx.

Using it, the valid output carps (as expected, since the -1 is
documented):

[Mon Sep 08 10:01:29 2008] [warn] mod_fcgid: stderr: byte -1 (R) is not
Latin1 (it's 82 dec / 52 hex)
at /home/ron/perl.modules/Local-Sites/lib/Local/Sites/Test/Sites.pm line
73

whereas the invalid output carps:
byte 3 (�) is not Latin1 (it's 148 dec / 94 hex)
at /home/ron/perl.modules/Local-Sites/lib/Local/Sites/Test/Sites.pm line
73

And in the log (valid, invalid):

CGIApp: ..............................                                 
CGIApp: http://127.0.0.1/search/sites.fcgi                             
CGIApp: CÔTE D'IVOIRE. Encoding: UTF8 off, ASCII, 3 characters 3 bytes 
CGIApp: is_flagged_utf8:                                               
CGIApp: is_perl_utf8_string: 0                                         
CGIApp: is_sane_utf8: 1                                                
CGIApp: find_bad_latin1_report: -1                                     
CGIApp: ..............................                                 
CGIApp: http://127.0.0.1/test/sites                                    
CGIApp: CÔTE D'IVOIRE. Encoding: UTF8 off, ASCII, 3 characters 3 bytes 
CGIApp: is_flagged_utf8:                                               
CGIApp: is_perl_utf8_string: 1                                         
CGIApp: is_sane_utf8: 0                                                
CGIApp: find_bad_latin1_report: 3                                      

so I'll abandon DBI -> data_string_desc($name).

But I knew there was a problem! Your module nicely demonstrates that.

Since the underlying module is the same in both cases, the question is
why does one calling mechanism work and the other mangle the data?

I'll dig into it :-((.
-- 
Ron Savage
ron at savage.net.au
http://savage.net.au/index.html




More information about the cgiapp mailing list