History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: CACHE-71
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Bjorn Wang
Votes: 1
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
OSCache

Flush and refresh of cached pages fail under heavy load

Created: 10/Dec/03 07:35 AM   Updated: 12/Mar/05 01:53 AM
Component/s: Tags
Affects Version/s: None
Fix Version/s: 2.1

Environment: Combination of OSCache and Struts Tiles running on BEA Weblogic. Tested on Windows XP and Sun Solaris.


 Description  « Hide
Flush and refresh of cached pages fails under heavy load.

I'm testing this issue by continually requesting the front page of our website, which is cached using the OSCache tags. Flushing is done from within the same JSP if the underlying data has changed in the database.

We're using the following JSP to control caching. JSTL tags are used to check if element should be flushed or not, based on the "flush" flag. Between the cache tags, a Struts Tiles tag, <tiles:insert> is used to include the JSP that is responsible for rendering the page.
After rendering is finished, we use our own tag, <odin:markclean>, that tags this page as "clean" in our business tier.

JSP code:


  <c:if test="${flush}">
    <oscache:flush key='<%=(String) request.getAttribute("pagekey")%>' scope="application"/>
  </c:if>

<oscache:cache key='<%=(String) request.getAttribute("pagekey")%>' scope="application" time="1800">
    <tiles:insert attribute="cachedcontent"/>
    <odin:markclean key="pagekey" />
  </oscache:cache>


This strategy works fine up to a certain amount of load is applied. Then suddenly we start loosing updates, and the cache contents are never refreshed even though flush and refresh is clearly logged by OSCache:


  2003-12-10 13:53:35,049 INFO com.opensymphony.oscache.web.tag.CacheTag - <cache>: Cached content not used: New cache entry, cache stale or scope flushed : /PAG
E#W1028-bn-dok1--rel
  2003-12-10 13:53:35,159 INFO com.opensymphony.oscache.web.tag.CacheTag - <cache>: Updating cache entry with new content : /PAGE#W1028-bn-dok1--rel


Bjorn Wang

 All   Comments   Change History      Sort Order:
Chris Miller - [08/Jan/04 09:10 AM ]
Can you please try the current CVS version of OSCache to see if this resolves your problem?

Ben Rometsch - [18/Feb/04 03:09 PM ]
Is there any news on this? I have noticed similar effects when running in production on a site serving ~300,000 pages per day.

I have built the latest version of oscache from CVS today, and it appears to be running fine in my staging environment, but that is not under the same load as my production machine. I'd like to try and put this beta build into production, but I am worried about any unwanted side effects. Does anyone have any thoughts?

Bjorn Wang - [23/Feb/04 04:43 PM ]
Sorry for the late reply. I've been very busy finishing our project for release and haven't been able to make time for testing before now.

The problem has not disappeared, but we've optimized the code so the time used rendering the page has decreased. This has helped the problem, as it occurs less frequently.

My guess is that it's caused by the implementation of the AbstractConcurrentReadCache class. The API documentation says:

"A version of Hashtable that supports mostly-concurrent reading, but exclusive writing. Because reads are not limited to periods without writes, a concurrent reader policy is weaker than a classic reader/writer policy, but is generally faster and allows more concurrency. This class is a good choice especially for tables that are mainly created by one thread during the start-up phase of a program, and from then on, are mainly read (with perhaps occasional additions or removals) in many threads. If you also need concurrency among writes, consider instead using ConcurrentHashMap."

We haven't really considered switching to using ConcurrentHashMap, but I suspect this would affect the performance too much. Our site takes up to 2,000,000 hits a day, so speed is of the utmost importance.

Instead, we've set the timeout of every element to 30 minutes, to be shure that stale pages do not live too long:

<oscache:cache ... time="1800">

We've also beefed up the application cluster with an extra cluster member, which means the load on each member will be less. OSCache runs standalone on each cluster member. Reducing the load on each cache instance reduses the possibility of stale pages, as the flush error only occurs during high load. Also, if the error occurs, it wil probably just occur on one of the cluster members, not all of them, meaning that most users will still get information that is up to date.

This issue is a real problem to us, but speed is even more important. Since we've been able to decrease the risk of the problem occuring, we've choosing to live with it as of now.

Bjorn Wang - [23/Feb/04 04:59 PM ]
I should add that the problem occurs very infrequently now. If page generation is fairly quick, we have a hard time reproducing it.

We have a requirement specification that demands a maximum delay of 10 seconds on updates, and this is why this situation is critical to us. In any other situation I would not hesitate to trust OSCache.

I have yet to do testing with the newest build from CVS. We've had to focus on reducing the problem with the release version of OSCache.

Ben Rometsch - [23/Feb/04 05:30 PM ]
We built a version from CVS on 20040218 and, after a short amount of testing in our staging environment, deployed it to our live server. It is running without a hitch (we served 500,000 pages the other day and it did not hiccup).

The issue seems to have been fixed on our live server, but the load has reduced a lot in recent days so it's hard to say for certain...