[cgiapp] UTF-8 output

Mark Rajcok mrajcok at gmail.com
Mon Mar 8 17:50:58 EST 2010


On Sun, Feb 28, 2010 at 12:12 AM, Mark Rajcok <mrajcok at gmail.com> wrote:

> On Mon, Nov 2, 2009 at 9:27 AM, Michael Peters <mpeters at plusthree.com>wrote:
>
>> > And is there no better way for the template output than to post_process
>> > the whole template? Is there no way to get the output of tt_process as
>> > UTF-8 so that there is no post_processing necessary?
>>
>> I have a patched HTML::Template that reads in the templates as UTF8, my
>> db connections are all UTF8 and I decode the CGI params as UTF8. As long
>> as all your inputs are UTF8 decoded then you don't need to explicitly
>> encode the output.
>>
>
> 1. Michael, care to share your patch for HTML::Template?
>
> 2. I didn't realize I could get away with not encoding the output if
> everything is decoded as UTF-8 coming in... I'll have to try that.
>

1. I found a/the HTML::Template patch:
https://rt.cpan.org/Public/Bug/Display.html?id=30586
but more importantly, I found that I can use TMPL_VARs to insert UTF-8
content into an ASCII template. As long as I UTF-8 decode my form
parameters/content before inserting them into the template, the template
should get implicitly "upgraded" to UTF-8 when necessary.  For my
application, this is sufficient -- so my HTML templates don't have to be in
UTF-8 format.

http://sourceforge.net/mailarchive/forum.php?thread_name=4607245C.8030702%40netratings.com.au&forum_name=html-template-users

2. If I don't explicitly encode the output, a 'wide character' warning is
generated if there is a Unicode character that has a codepoint above 255.
$ perl -MEncode -e 'my $t = "\x94"; my $utf8=decode('cp1252',$t); printf
"%s\n", $utf8'

Wide character in print at -e line 1.
”

Note that \x94 is a Microsoft "smart quote" character (which is part of the
Windows-1252 character set).  When that character is UTF-8 decoded, it gets
decoded to Unicode codepoint U+201D.  Trying to print that without
explicitly encoding it causes a warning.

I've added the above to
http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8
(I mentioned this URL before, but that was in a thread not related to UTF-8,
hence the repeat mention for the archives).

-- Mark R.


More information about the cgiapp mailing list