History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: CACHE-38
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Andres March
Reporter: Bartek Lewandowski
Votes: 2
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
OSCache

oscache filter doesn't support correctly i18N

Created: 22/Apr/03 10:57 AM   Updated: 19/Apr/05 02:35 PM
Component/s: Filters
Affects Version/s: 1.7.5, 2.1
Fix Version/s: 2.1.1

File Attachments: 1. Java Source File TypedCacheHttpServletResponseWrapper.java (2 kb)

Issue Links:
Duplicate
 
This issue is duplicated by:
CACHE-159 CacheFilter does not set encoding pro... Major Closed
Related
 
This issue is related to:
CACHE-83 CacheHttpServletResponseWrapper & Res... Major Closed


 Description  « Hide
cached non-latin web pages (iso-8859-2) using oscache filter have corrupted non-latin characters (those characters are converted to question marks)


 All   Comments   Change History      Sort Order:
Hani Suleiman - [19/Jun/03 01:21 PM ]
Does the content appear correctly without the cache tag? If yes, then can you submit a sample jsp which illustrates this error? Thanks!

Hani Suleiman - [19/Jun/03 01:22 PM ]
Sorry,I meant does the content appear without the cahe filter in place

olivier liechti - [04/Nov/03 09:42 AM ]
I have experienced the same problem. If I access the content via the filter, then I get question marks '?' instead of special characters. If I access the content directly, then it works ok.

olivier liechti - [05/Nov/03 06:39 AM ]
I think I have a fix for this problem.

The class that needs to be modified is com.opensymphony.oscache.web.filter.CacheHttpServletResponseWrapper.

The problem is that in this class, a Writer is created, without a character encoding being specified in the constructor. Hence, the default encoding for the platform is used. This can be a problem if the defaut encoding does not support characters with accents (and this is the case on Solaris if the env. variable LC_CTYPE="C").

So the fix is to specify a valid encoding in the constructor. One possibility is to use the encoding of the HttpResponse...

public PrintWriter getWriter() throws IOException {
        if (cachedWriter == null) {
            /*
                The following line has been commented out. It does not work, because it uses the default encoding for the platform, which
                can cause problems with special characters. On Solaris, for instance, if the env. variable LC_CTYPE is set to "C", then
                the encoding for the platform is US-ASCII-C (ISO646-US). In that case, special characters (e.g. characters with accents in
                French) are replaced by question marks ('?').
             */
            //cachedWriter = new PrintWriter(getOutputStream());
            
String platformEncoding = System.getProperty("file.encoding");
           String responseEncoding = getResponse().getCharacterEncoding();
           
/*
            * These lines have been added. Instead of using the platform encoding when creating the Writer, we explicitely specity the same
            * encoding as the one used for the HTTP response (in the constructor).
            */
           log.info("The character encoding for the platform is " + platformEncoding);
           log.info("The character encoding for the HTTP response is " + responseEncoding);
           log.info("Creating a writer with HTTP response encoding: " + responseEncoding);
            cachedWriter = new PrintWriter(new OutputStreamWriter(getOutputStream(), responseEncoding));
        }

Raymond Lai - [11/Nov/03 08:41 PM ]
This method helped only half of the problem.

I'm using UTF-8 at my JSP contents. The first time retreiving the JSP, with the modifications above, it's going fine; but the second time when the cached content is being used the characters cannot display correctly.

However, they are not ?s, but gabbage characters, suggesting the cached content was not being sent in the character set it should use. The HTTP Content-Type header captured by the browser was

Content-Type: text/html;charset=ISO-8859-1

whenever cached JSP content is being used, even UTF-8 was declared in JSP pageEncoding and HTML <meta> tags.

puy - [06/Apr/04 11:09 PM ]
OSCache made two major mistakes. First, it didn't bother to preserve any of the headers from the original response in the cached response. The other is that it cheerily ignored the encoding my servlet requested, instead using an OutputStream into a byte array

Simone Avogadro - [12/Jan/05 05:55 AM ]
We too had the same problem (with 2.0.2) and already resolved it by subclassing the wrapper.
Essentially the wrapper intercepts the "setContentType" call, however there are other ways used by servlets to set the page response content type:
- addHeader
- setHeader

by intercepting this we solved the problem, because the content-type is recorded and then re-sent to the client

Simone Avogadro - [12/Jan/05 06:59 AM ]
the client mis-interprets the page content due to missing content-tpye header

Simone Avogadro - [17/Jan/05 03:20 AM ]
If you wish I can provide my (working) patch to fix this (against 2.0.2), the patch has been successfully tested extensively against ISO-8859-1 and ISO-8859-11. Minor testing has been done also for UTF-8.

Andres March - [29/Jan/05 11:05 PM ]
Please attach a patch if you have one

Simone Avogadro - [31/Jan/05 02:47 AM ]
I did it as a sub-class of the original wrapper in order to keep it separated from the main OS-Cache jar. My own filter obviously used this wrapper instead of the original one.

Simone Avogadro - [31/Jan/05 03:07 AM ]
Further comments on the mini-patch I submitted.
As you can read in ther comments the OSCache filter holds the cached response into a byte[], which might seem incorrect to some of us.
The patch I submitted will substantially grant the the cache response will be built using same response type as the original request, so a byte[] will work.
However...
What if the first client accepted only ISO-8859-1 and the new client only accepts CP-1251?
The current implementation will respond with an ISO-8859-1.
Most times this will _not_ be a problem since modern browsers do support a wide variety of encodings, however exetic devices/clients (e.g.: cinese browser, PDA's....) might be affected.
Alas the solution is not simple, as the ResponseContent.writeTo() does not know the request's acceptable content-types, so it's not able to do on-the-fly response encoding translation, which could otherwise be done simply be making (byte[],orig-ContentType)=>String=>(byte[],dest-ContentType)

Lars Torunski - [01/Feb/05 12:29 AM ]
1.)
I added Simone's changes to my current CACHE-128, CACHE-135 and CACHE-83 modifications of the CacheHttpServletResponseWrapper.

I put "result.setContentType(value);" in addHeader and setHeader. But should "super.setContentType(value)" invoked also?

2.)
We need to factor in the encoding of the first client and the new client. But also we have to avoid that the content is changed by content-types of "image/gif" etc.

Does anybody know when we have to correct the response (e.g. "text/html", "text/plain") ?
Every reponse with a content-type "text/*" ?

Lars Torunski - [02/Feb/05 03:56 PM ]
Simone's changes are the first try to resolve this problem. By introducing gzip compression and handling a further different content-type, we may have the possibility to react to different encodings in ResponseContent.

Simone Avogadro - [09/Feb/05 04:50 AM ]
1) I called super.setContentType(...) because I needed to notify the original wrapper of the content type (it recorded it only there).
I don't know how the underlaying J2EE container handles content-types, but I don't expect it to be needed once you override the setHeader and addHeader methods directly within OSCache's wrapper

2) I don't fully understand the question, however the problem is that what we have in cache is only valid for the recorded content-type, if the browser requests a different content-type then the application might have to be invoked again (e.g.: different HTML for different content-types), I don't know of a public API to query the container for content-type compatibility :-(

Andres March - [11/Apr/05 04:26 PM ]
charset needs to be set from the content-type

Andres March - [16/Apr/05 04:47 PM ]
I don't use the cache filter and we don't as of yet have unit tests for this area, so it would be great if someone could try this fix out.

Simone Avogadro - [19/Apr/05 03:19 AM ]
we're currently in production with 2.1 + the patch I submitted with ISO-8859-15 (Italian accents + € simbol) and everything is fine, if someone can provide a stable snapshot I will try our portaling solution with that snap.