Tracing slow Java I/O in Solaris 10
Solution 1
If your using Solaris, you are lucky to be able to use dtrace. This will allow you to profile to the kernel level and get more clues about how the jvm is interacting with the kernel.
More information here
http://www.devx.com/Java/Article/33943
If you want find out what you jvm is doing then run dtrace with the jvm probes.
http://java.sun.com/javase/6/docs/technotes/guides/vm/dtrace.html
http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Java#DTrace_Topics:_Java
this will give you much more meaningful output in relation to your program. Have a look at 'Method Times' section.
http://www.princeton.edu/~unix/Solaris/troubleshoot/diskio.html
is a great guide for finding i/o bottle necks.
this may also help http://prefetch.net/blog/index.php/2007/03/03/viewing-busy-code-paths-with-dtrace/
There are no hard and fast rules when tracking problems like this, but information is the key!!. If you follow these guides , You are well on the way to becoming a ninja system engineer.
You may be able to use jprofiler http://www.ej-technologies.com/products/jprofiler/overview.html
it's not open source, however i've had great success in tracking down java performance problems with it.
You should also run your java vm and your application with full debug logging. If you have access to the kernel logs then check them for unusual events.
Good luck.
Does anyone else on server fault have some ninja tips for fault finding problems such as this. I've got my own way of debugging, but it would be interesting to know what other think?
-----------------UPDATE--------------
I've looked at the trace again, seems that you seem to be doing a lot of resolvepaths.
This needs to be fixed. Are you using a very long path's, or running from a directory that is symlinked. Try removing symlinks, or using hardlinks and see if you get better results.
Does the app do lot's of directory traversal , then try using absolute path's , rather than relative paths.
Again, just an idea' I had over eating a cheese cake. Run dtrace with the java probes, that should drill down enough to see what java class/methods are taking the most time.
Good Luck (again). Don't give up now, I think we are very close to solution.
Solution 2
On your solaris system just run sys_diag -G -I1 -l -v
and it will aggregate all performance info (CPU/Memory/Network/Disk IO/Dtrace/kernel...) and analyze the output with a single color-coded .html report of findings/bottlenecks characterizing the workload by sub-system.. This will show any/all bottlenecks as well as locking that might be occuring (lockstat,..). The latest version is v8.1 HTH.
Solution 3
Just to post the final result.
The server (Sun T Series) is just that slow for a single-threaded Java application. Go figure.
Thank you all for the answers and support.
Solution 4
Your dtrace output shows that your applications spends most of the time on writes. You should check (using dtrace of course :-)) where these writes go. If they go to solaris filesystem then you should check if the filesystem is not a bottleneck.
fglez
Updated on September 17, 2022Comments
-
fglez over 1 year
We have a Java application which is significantly slower in Solaris 10 server than in Windows PC.
We have profiled (-Xprof) the application and observed that UnixFileSystem.getBooleanAttributes0 method consumes about 40% CPU time with native calls.
How can we follow our search to identify which is the cause of the slow behaviour?
Update:
We have executed
dtrace
and we observe the CPU time is 0.489 sec, while thetime
results arereal 0m40.759s user 0m47.669s sys 0m2.287s
It seems time is not spent on CPU but in another place. How can we proceed with further investigation having this
dtrace
results? Can we use anotherdtrace
options to show more info?Thank you very much.
# /dtrace-toolkit/DTraceToolkit-0.99/procsystime -aTp 5695 Tracing... Hit Ctrl-C to end... dtrace: 1032 dynamic variable drops with non-empty dirty list ^C Elapsed Times for PID 5695, SYSCALL TIME (ns) gtime 7799 sysconfig 8018 lwp_kill 10105 getsockname 10215 systeminfo 10984 getrlimit 11093 lwp_self 20210 schedctl 26361 lwp_continue 27349 priocntlsys 33390 unlink 53050 lwp_sigmask 136303 setcontext 143447 so_socketpair 202427 lwp_create 223626 brk 227141 so_socket 263276 memcntl 461091 ioctl 1094955 lwp_cond_broadcast 1946728 lwp_mutex_wakeup 2579492 lwp_cond_signal 2903277 mmap 3082973 access 4446250 mprotect 4534229 getdents64 5713309 mmap64 6272480 fsat 6925239 stat 6926115 open 8324983 fcntl 8840868 close 11264404 llseek 11678791 munmap 11867401 fstat64 17852129 lwp_park 22098570 open64 23549922 stat64 39673214 read 41876097 resolvepath 81905859 yield 175946072 write 225507878 lwp_mutex_timedlock 887174392 pollsys 20393318412 lwp_cond_wait 345099369278 TOTAL: 367108549202 CPU Times for PID 5695, SYSCALL TIME (ns) gtime 1208 sysconfig 1647 lwp_self 2198 systeminfo 2856 lwp_kill 3186 getsockname 3625 getrlimit 4613 lwp_continue 13839 schedctl 14059 priocntlsys 14498 lwp_sigmask 20541 unlink 45801 setcontext 96874 brk 121804 lwp_park 158601 so_socketpair 195178 lwp_create 209017 so_socket 241199 memcntl 365972 ioctl 606397 lwp_cond_broadcast 1367556 lwp_mutex_wakeup 1561103 lwp_cond_signal 1803831 access 1885436 mmap 2655710 llseek 3266410 open 3614795 fsat 4043369 getdents64 4253373 mprotect 4345864 close 4547395 fcntl 4871095 stat 5183012 mmap64 5308789 fstat64 8141116 munmap 10774529 pollsys 11928811 lwp_mutex_timedlock 19212980 open64 19443958 lwp_cond_wait 23144761 read 28043647 stat64 31710269 resolvepath 70641662 yield 77731698 write 137678654 TOTAL: 489282936 Syscall Counts for PID 5695, SYSCALL COUNT getrlimit 1 getsockname 1 gtime 1 lwp_kill 1 rexit 1 so_socketpair 1 sysconfig 1 systeminfo 1 unlink 1 lwp_continue 2 lwp_create 2 schedctl 2 lwp_self 3 priocntlsys 3 so_socket 3 lwp_exit 4 setcontext 7 lwp_park 11 memcntl 14 brk 16 lwp_sigmask 19 mprotect 25 mmap 56 access 67 lwp_cond_broadcast 86 ioctl 87 open 100 fsat 129 mmap64 133 lwp_mutex_wakeup 148 munmap 153 lwp_cond_signal 162 getdents64 224 stat 283 pollsys 348 lwp_cond_wait 537 lwp_mutex_timedlock 558 open64 590 fcntl 625 close 777 stat64 1146 llseek 1368 read 1404 fstat64 1559 resolvepath 1644 yield 10049 write 13416 TOTAL: 35769
Update 2:
Here is the source of the dtraced Java application. It just loads several thousand classes from one single .jar file, reproducing the same behaviour observed in a bigger app.
import java.io.IOException; import java.io.InputStream; import java.util.Enumeration; import java.util.Properties; public class TestCL { public static void main(String[] args) { Properties properties = new Properties(); InputStream in = TestCL.class.getClassLoader().getResourceAsStream("TestCL.properties"); if (in == null) { System.err.println("TestCL.properties not found"); System.exit(-1); } try { properties.load(in); } catch (IOException e) { System.err.println("TestCL.properties loading error"); e.printStackTrace(); System.exit(-2); } finally { try { in.close(); } catch (IOException e) { e.printStackTrace(); } } Enumeration e = properties.keys(); while (e.hasMoreElements()) { String className = (String) e.nextElement(); try { Class.forName(className); System.out.println("OK " + className); } catch (Throwable t) { System.err.println("KO " + className + " " + t.getClass().getName() + " " + t.getMessage()); } } } }
Update 3:
All classes are packed in a single .jar file. We have also run the test with all classes unpacked and there's no significant improvement on performance.
java -cp library.jar:. TestCL
-
user3789902 about 14 yearscould you configure dtrace to be more verbose about the resolvepath. If you have truss you can get out put like------- resolvepath("/export/home/sean/getlock", "/export/home/sean/getlock", 1023) = 25 please post this.
-
fglez about 14 yearsJust for updating, this problem is now in Sun's hands. I'll try to post the results.
-
-
fglez about 14 yearsThanks for your response. We'll try to get permissions for running dtrace.
-
fglez about 14 yearsUpdated question with dtrace results.
-
user3789902 about 14 yearsI've updated with probes specific for the java vm. This should give you more clues. Don't forget jprofiler also. Personally I'm a ninja at tracking down bugs and performance problems such as this. I wish i was there to help you :-(
-
fglez about 14 yearsThe dtraced application is a little Java program which loads (Class.forName) several thousand classes from one .jar. It reproduces the same problem we have in a bigger aplication.
-
user3789902 about 14 yearsRight, so it's performance bottle neck in java's class loader. Things to try could you provide a dump of your $CLASSPATH path? is it large? In the java tool, the -classpath option is a shorthand way to set the java.class.path property. can you place classes to be loaded in a ram disk? and rerun the tests. For some reason the class loader is trying to lots of resolve paths. You may be able to directly force java to use a more direct path. make sure you using full paths like /a/b/c/d rather than ../../../a/b/c/d or a/b/c/ are you classes mounted on nfs?
-
user3789902 about 14 years