Path: blob/aarch64-shenandoah-jdk8u272-b10/jdk/src/solaris/demo/jni/Poller/README.txt
32287 views
README.txt123This Poller class demonstrates access to poll(2) functionality in Java.45Requires Solaris production (native threads) JDK 1.2 or later, currently6the C code compiles only on Solaris (SPARC and Intel).78Poller.java is the class, Poller.c is the supporting JNI code.910PollingServer.java is a sample application which uses the Poller class11to multiplex sockets.1213SimpleServer.java is the functional equivalent that does not multiplex14but uses a single thread to handle each client connection.1516Client.java is a sample application to drive against either server.1718To build the Poller class and client/server demo :19javac PollingServer.java Client.java20javah Poller21cc -G -o libpoller.so -I ${JAVA_HOME}/include -I ${JAVA_HOME}/include/solaris\22Poller.c2324You will need to set the environment variable LD_LIBRARY_PATH to search25the directory containing libpoller.so.2627To use client/server, bump up your fd limit to handle the connections you28want (need root access to go beyond 1024). For info on changing your file29descriptor limit, type "man limit". If you are using Solaris 2.630or later, a regression in loopback read() performance may hit you at low31numbers of connections, so run the client on another machine.3233BASICs of Poller class usage :34run "javadoc Poller" or see Poller.java for more details.3536{37Poller Mux = new Poller(65535); // allow it to contain 64K IO objects3839int fd1 = Mux.add(socket1, Poller.POLLIN);40...41int fdN = Mux.add(socketN, Poller.POLLIN);4243int[] fds = new int[100];44short[] revents = new revents[100];4546int numEvents = Mux.waitMultiple(100, fds, revents, timeout);4748for (int i = 0; i < numEvents; i++) {49/*50* Probably need more sophisticated mapping scheme than this!51*/52if (fds[i] == fd1) {53System.out.println("Got data on socket1");54socket1.getInputStream().read(byteArray);55// Do something based upon state of fd1 connection56}57...58}59}6061Poller class implementation notes :6263Currently all add(),remove(),isMember(), and waitMultiple() methods64are synchronized for each Poller object. If one thread is blocked in65pObj.waitMultiple(), another thread calling pObj.add(fd) will block66until waitMultiple() returns. There is no provided mechanism to67interrupt waitMultiple(), as one might expect a ServerSocket to be in68the list waited on (see PollingServer.java).6970One might also need to interrupt waitMultiple() to remove()71fds/sockets, in which case one could create a Pipe or loopback localhost72connection (at the level of PollingServer) and use a write() to that73connection to interrupt. Or, better, one could queue up deletions74until the next return of waitMultiple(). Or one could implement an75interrupt mechanism in the JNI C code using a pipe(), and expose that76at the Java level.7778If frequent deletions/re-additions of socks/fds is to be done with79very large sets of monitored fds, the Solaris 7 kernel cache will80likely perform poorly without some tuning. One could differentiate81between deleted (no longer cared for) fds/socks and those that are82merely being disabled while data is processed on their behalf. In83that case, re-enabling a disabled fd/sock could put it in it's84original position in the poll array, thereby increasing the kernel85cache performance. This would best be done in Poller.c. Of course86this is not necessary for optimal /dev/poll performance.8788Caution...the next paragraph gets a little technical for the89benefit of those who already understand poll()ing fairly well. Others90may choose to skip over it to read notes on the demo server.9192An optimal solution for frequent enabling/disabling of socks/fds93could involve a separately synchronized structure of "async"94operations. Using a simple array (0..64k) containing the action95(ADD,ENABLE,DISABLE, NONE), the events, and the index into the poll96array, and having nativeWait() wake up in the poll() call periodically97to process these async operations, I was able to speed up performance98of the PollingServer by a factor of 2x at 8000 connections. Of course99much of that gain was from the fact that I could (with the advent of100an asyncAdd() method) move the accept() loop into a separate thread101from the main poll() loop, and avoid the overhead of calling poll()102with up to 7999 fds just for an accept. In implementing the async103Disable/Enable, a further large optimization was to auto-disable fds104with events available (before return from nativeWait()), so I could105just call asyncEnable(fd) after processing (read()ing) the available106data. This removed the need for inefficient gang-scheduling the107attached PollingServer uses. In order to separately synchronize the108async structure, yet still be able to operate on it from within109nativeWait(), synchronization had to be done at the C level here. Due110to the new complexities this introduced, as well as the fact that it111was tuned specifically for Solaris 7 poll() improvements (not112/dev/poll), this extra logic was left out of this demo.113114115Client/Server Demo Notes :116117Do not run the sample client/server with high numbers of connections118unless you have a lot of free memory on your machine, as it can saturate119CPU and lock you out of CDE just by its very resource intensive nature120(much more so the SimpleServer than PollingServer).121122Different OS versions will behave very differently as far as poll()123performance (or /dev/poll existence) but, generally, real world applications124"hit the wall" much earlier when a separate thread is used to handle125each client connection. Issues of thread synchronization and locking126granularity become performance killers. There is some overhead associated127with multiplexing, such as keeping track of the state of each connection; as128the number of connections gets very large, however, this overhead is more129than made up for by the reduced synchronization overhead.130131As an example, running the servers on a Solaris 7 PC (Pentium II-350 x1322 CPUS) with 1 GB RAM, and the client on an Ultra-2, I got the following133times (shorter is better) :1341351000 connections :136137PollingServer took 11 seconds138SimpleServer took 12 seconds1391404000 connections :141142PollingServer took 20 seconds143SimpleServer took 37 seconds1441458000 connections :146147PollingServer took 39 seconds148SimpleServer took 1:48 seconds149150This demo is not, however, meant to be considered some form of proof151that multiplexing with the Poller class will gain you performance; this152code is actually very heavily biased towards the non-polling server as153very little synchronization is done, and most of the overhead is in the154kernel IO for both servers. Use of multiplexing may be helpful in155many, but certainly not all, circumstances.156157Benchmarking a major Java server application which can run158in a single-thread-per-client mode or using the new Poller class showed159Poller provided a 253% improvement in throughput at a moderate load, as160well as a 300% improvement in peak capacity. It also yielded a 21%161smaller memory footprint at the lower load level.162163Finally, there is code in Poller.c to take advantage of /dev/poll164on OS versions that have that device; however, DEVPOLL must be defined165in compiling Poller.c (and it must be compiled on a machine with166/usr/include/sys/devpoll.h) to use it. Code compiled with DEVPOLL167turned on will work on machines that don't have kernel support for168the device, as it will fall back to using poll() in those cases.169Currently /dev/poll does not correctly return an error if you attempt170to remove() an object that was never added, but this should be fixed171in an upcoming /dev/poll patch. The binary as shipped is not built with172/dev/poll support as our build machine does not have devpoll.h.173174175176