History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: CACHE-217
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Lars Torunski
Reporter: Christoph Kutzinski
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
OSCache

Avoid DiskPersistenceListener deadlocks if process has no rights to delete cache file

Created: 23/Nov/05 04:29 AM   Updated: 21/Jan/07 01:55 PM
Component/s: Listeners
Affects Version/s: 2.1.1
Fix Version/s: 2.3

File Attachments: 1. Text File stack.txt (63 kb)

Environment: Solaris 9, Sun JDK 1.5.0_05 (but should apply to any platform)


 Description  « Hide
We ran our application (Tomcat webapp) by mistake as root. When we restarted tomcat with the normal restricted user, we observed that it was total unresponsive and completely occupied one out of two CPUs.
Another restart didn't help as well, same behaviour.

After examing the stack trace I concluded that OSCache was hanging in AbstractConcurrentReadCache.persistStoreGroup() because of mssing write rights -
the cache files were created in the previous run with root as the owner.
After changing the cache files owner and restarting Tomcat it worked well.


I think OSCache should fail fast in such a scenario instead of blocking forever. I.e. the while (groupFile.exists() && !groupFile.delete()) {} loop in AbstractDiskPersistenceListener.store() should be changed.
As a suggestion, I've changed it to the following, but haven't tested it yet:

int count = 0;
while (file.exists() && !file.delete() && count < 3) {
    count++;
     try {
        Thread.sleep(100);
       } catch (InterruptedException ignore) {}
}

 All   Comments   Change History      Sort Order:
Christoph Kutzinski - [23/Nov/05 04:34 AM ]
This is the stacktrace I took.
 IMO this part is the problem:

"Thread t@60: (state = IN_NATIVE)
 - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise)
 - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame)
 - com.opensymphony.oscache.base.algorithm.AbstractConcurrentReadCache.persistStoreGroup(java.lang.String, java.util.Set) @bci=80, line=1136 (Interpreted frame)
 - com.opensymphony.oscache.base.algorithm.AbstractConcurrentReadCache.addGroupMappings(java.lang.String, java.util.Set, boolean, boolean) @bci=142, line=1532 (Interpreted frame)"

All other calls to AbstractConcurrentReadCache.put are blocked, because thread 60 is still in the synchronized(this) block in AbstractConcurrentReadCache.put

Note: I think the stack trace reported by jstack for thread 60 isn't complete, as it must be in AbstractDiskPersistenceListener.storeGroup. There is no file operation in AbstractConcurrentReadCache.persistStoreGroup.


Lars Torunski - [24/Nov/05 12:44 AM ]
How should OSCache react if the file can't be updated? The old and stale content is still available.

Christoph Kutzinski - [24/Nov/05 02:41 AM ]
OSCache should throw an exception that the file cannot be written. That's what happens if you change the loop as I suggested.
This is clearly a fatal error as OSCache can do nothing to resume normal operation. Therefore OSCache should throw an exception, the application should catch the exception, recognize that the cache content cannot be retrieved and handle this appropriately (server newly generated content, break with fatal error, ...)

IMO nearly any other behaviour is better than deadlocking the complete cache. Because with a deadlock you can only find out by taking stacktraces (or something like that) what is wrong. If you fail fast you have at least the Exception - which will hopefully be logged :-) - and can probably deduce from that the write permissions are not sufficient.