Multiple Threads in Notes Agents
OK, I'm biased. I think Java is the best language on the market today. But why should you use it when LotusScript achieves the same purpose? The answer lies in the many advanced features of the language. One of these features, which I will discuss in this article is called multi-threading.
A multi-threaded application is one that is divided into many smaller parts (or threads) by the programmer when writing the code. Each thread typically performs its own logical task within the program. The Lotus Domino server is an example of a multi-threaded application. Each server task (ie: HTTP, Agent Manager) runs in its own thread distinct from the main server's code. The main benefit of this multi-threaded approach is that if one thread is waiting for a resource to become available, it doesn't impact the performance of the other threads.
A project I was working on recently involved uploading a vast quantity of records from a Notes database to a relational database. First draft of the scheduled agent was written in LotusScript using ODBC. Normally this approach would work fine, but in this scenario the entire upload had to occur overnight. We were expecting approximately 100,000 records, and with an upload speed of one per second that equated to a total elapsed time of around 28 hours. Much too long. The nature of the task was a simple push to the relational database so suited itself very nicely to a multi-threaded Java solution.
A multi-threaded program works best when each logical unit of work does not rely on the completion of another. There are three important things to consider in a multi-threaded program:
- A mechanism to limit the total number of threads
- Inter-thread communications
- Serialising access to resources
The total number of threads should be limited, otherwise the server will very quickly fill up with thousands of threads. Each thread requires some system resources and execution time on the processor. Multiply this by 1000s and the performance gained by multi-threading begins to diminish.
The second factor to consider is inter-thread communications. A typical multi-threaded agent will consist of a main program loop with many child threads performing the work. At the most basic level, the child threads require a mechanism to tell the parent process that it has completed. Often child threads will also want to access properties in the parent process.
The third and most difficult to code is serial access to objects and resources. To maintain data integrity you must ensure that thread access is only one at a time. Again Java makes this easy through the 'synchronized' keyword. This keyword tells the JVM to only allow one thread at a time access. It is difficult to code because there is a small performance overhead for the JVM to synchronise the access. You need to ensure you only synchronise the critical code sections. In a large program these may be difficult to identify.
Java makes creating new threads a snap. Simply create a new class that extends NotesThread, then write the code your thread should execute in the runNotes() method. Executing the thread is as simple as calling NewThread.start() from the parent process. The thread is considered dead and may be garbage collected once the run method exits.
And now, on to the code:
The approach I have taken is one of simplicity. This is most likely because I'm not very bright and get confused easily, and the fact that this will be maintained by someone else who may not have intimate application knowledge. There are only two classes (excluding the JDBC and specific library classes). The first class is the main controlling class which spawns the worker threads. The second class is the worker thread, of which there are many at run-time. Keep in mind that the code examples have had some big chunks removed in order to improve readability. In addition there are two classes m_wcSystem and m_err that are from a library, just ignore references to these as they have no real bearing on the exercise. Note also that the DB2 JDBC driver jar file is attached to the agent by clicking the "Edit Project" button. As we can't show the full source, the following describes the three techniques we mentioned earlier. For your reference there is a full copy of the source code attached to this document.
1. A mechanism to limit the total number of threads
To limit the total number of threads running I have implemented a simple while loop. While there are more threads than the threshold (specified by the constant MAX_THREADS), the code stays "blocked" in a loop. When the worker thread completes, it decrements m_iThreadCount, effectively releasing the loop.
2. Inter-thread communications
This is the constructor for the new thread. Notice how we pass a reference to the parent object as a parameter. This means we can access any of the public methods or properties in the parent object from the worker thread.
When the thread has completed (i.e. as the last line in the runNotes() method completes) we decrement the thread counter in the parent. This allows the main loop in the parent thread to create a new worker thread and start it.
3. Serialising access to resources
Instructing the JVM to allow serial access to execute methods is as easy as adding the "synchronised" key word. Remember that there is a slight performance penalty incurred by adding this, so ensure you only synchronise the methods and objects that require it.
A few known Issues:
There are a few known issues with this code, they include:
1. The while loop does not trap thread timeout problems and may result in an infinite loop, and hence a hang. You could check against a timestamp and, if a certain period has elapsed, exit the loop.
2. Creating a new JDBC connection is an expensive resource operation. There are some techniques to pool JDBC connections in a connection pool. This is well beyond the scope of the article.
3. Creating new threads is an expensive operation. Again there are techniques for thread pooling which is also beyond the scope of this article.
4. The use of getNthDocument() while looping through very large collections (>2000 entries) is horrendously inefficient, and will cause the agent to go slower the longer it runs. It is better to use getFirstDocument() and then repeated calls to getNextDocument().
Summary:
Now with this new-found knowledge have a look at the full source and see how it all fits together. Remember don?t write your code in Java just because it is a ?new? language, always select the right tool for the job. There are two distinct advantages through using Java in Lotus Notes. First is the ability to use JDBC which has much better functionality than the basic LotusScript ODBC LSX (and it?s multi-platform!). Second is the performance benefit of using multithreaded agents.
Jake's Final Thoughts
I'm sure you'll all join me in thanking Brendon for this article. At last the idea of multi-threading an agent has been demystified. I for one am very grateful.
Some of you may be dissapointed that this article is not like most on this site, where you have something that you can go away and "Plug 'n' Play" in to your database. However, I hope you see the benifits to be gained by learning and, more importantly, using this technique.
About the Author:
Brendon Upson is a freelance consultant based in Sydney, Australia, specializing in C, Lotus Notes and Java. Likes: Hoegaarden, Guinness and sailboarding. Dislikes: Fax machines, acronyms (especially anything beginning with X) and Biere d'Alsace.
java vs lotusscript
It was quite inspiring to see how threading works in Java, however what I wanted to know if there were other benefits using java over ls when writing agents. Is there a performance boost or other advantages ? Please shed some further light on this.
Reply
Re: java vs lotusscript
This is a fun one to answer!
From what I understand both the LotusScript interpreter and JVM sit on top of the Domino native code. This means that when you call something like getFirsttDocument() in either language, the actual binary code that gets executed is the same in both languages. Asuming the interpreter/JVM overhead is the same - then there should be no performance difference between the two.
[---- You object code (LS/Java)----] [**** OBJECT CODE INTERPRETER / JVM ****] [#### domino executables ####]
The single reason I use predominantly LS is that the knowledge pool is deeper, ie: currently more people know LS(aka VB) than do Java. This may change in the future. From a support perspective this makes maintenance easier for others.
My pet hate is that LS is *NOT* a true object oriented language. It is basically structures with functions. Inheritance, overloading are the big two I feel are missing. This can make you LS code much less elegant than it could be in, say, Java.
My rule is: If you're doing simple stuff (getting and saving documents, etc), use LS. If you want to do something a bit more difficult that you cannot do without mangling your LS code, use Java.
Brendon.
Reply
Show the rest of this thread
It's a great issue, but ...
Could you post the code of m_wcSystem and m_err objets ??? please !!!
I think that it could good for learn more of java and how debugger an agent java
thanks a lot ...
Reply
m_wcSystem & m_err
Discussion of these two classes was intentionally left out because it would complicate the matter.
Here's some background: Almost every application you write has the same core functions: error handling, application profile documents, ... These two classes are used to handle this reusable functionality. They exist in a library and are plugged into (almost) everything.
m_wcSystem This handles the functionality of commonly used functions.
m_err This is the error handler. Essentially this writes events to the agent log. In LS there are different type of error handlers: UI, Agent and web. The idea is you pass a UI error object into the system object and suddenly all your error messges pop up via messageboxes in the UI. Need a scheduled agent? Just change the error handler to an agent error and everything goes to the agent log.
Brendon.
Reply
Show the rest of this thread
Very useful article
Will you write articles about using JSP, Servlets with Domino, it will be interesting and useful for whom waiting for Rnext.
Reply
Re: Very useful article
Glad you like it.
Not sure if Brendon plans to but I definitely will be writing more about JSP and Servelets as and when the release of RNext is upon us...
Jake -CodeStore
Reply
What about multithreaded LotusScript agents?
In R5, there is multithreading capability in LotusScript as well. How does this compare to multithreading support in Java?
Reply
Java in Domino primer?
Hi,
Does anyone here have a link to a good primer for learning how to use Java in Domino? Of course, there is the general Java tutorial at Sun's web site, but I was wondering if there was anything Lotus-specific out there. Thanks!
Reply
Re: Java in Domino primer?
www.tlcc.com has great courses - with documented code examples that work.
they have 2 for java, one for xml, and a couple for websphere as well.
Reply
Mutex?
Would a mutex be preferable to Thread.sleep in this scenario?
Reply
Re: Mutex?
There are a range of solutions to this problem, with Thread.sleep() being the simplest. Obviously for the *best* performance this is not optimal because no matter what, we always wait the specified sleep period.
Two additional solutions spring to mind: 1. Use wait() and notifyAll() for interthread communication 2. Roll your own thread manager
Attached is an article that describes a number of approaches: http://www.javaworld.com/javaworld/jw-11-1998/jw-11-toolbox_p.html
There is probably no single greatest solution. You need to look at the needs of the program and the environment in which it runs and implement the solution that best fits.
Brendon.
Reply
most efficient way to use a connection object?
from the quick look that i gave the code, it looks like the connection object is recreated for each thread. am i correct?
if so, then couldn't you have just created one connection object and passed it into the InsertThread constructor? or is that a no-no with db2?
the reason that i ask is that i wrote a similiar agent that shares a thin connection object amongst 25 or so threads. it replicates over 200,000 records from oracle8i to domino 5.x. while it does run fast -- around 2.5 hours -- ocassionally it seems to hang the amgr task. i was wondering if you had encountered something similiar and thats why you did the connection object the way that you did.
see the following at bob's site:
http://www.looseleaf.net/Looseleaf/Forum.nsf/8178b1c14b1e9b6b8525624f0062fe9f/a6 da545cd9e50e8785256b4c004dc061?OpenDocument
Reply
Re: most efficient way to use a connection object?
Official information on exactly what is 'correct' seems scant. I don't have the desire to dig into the java SDK source for information on the Connection object :-)
From my work with JDBC, it has always been safest (but not fastest) to ensure a connection is only used by one thread at a time. If you are sharing the Connection object amongst threads, then you need to make sure that access to the Connection object is serial. The other (easier) way is to create a new Connection for each thread.
The project I am working on at the moment maintains a Connection pool. The way this works is that the thread gets a free Connection object, does some work with it, then releases it back to the pool. For performance this is definitely the way to do it, otherwise you have the latency of connecting with the RDBMS every time you create a new Connection.
Here's an article on connection pooling: http://www.webdevelopersjournal.com/columns/connection_pool.html
Hope this helps.
Brendon.
Reply
Good
Good!
Reply
java.lang error : out of memory
I have a Java agent that is updating the SQL database. Just One connection has been established and the records are updated. After about 200 documents, the Agent terminates wit the subject error. Infact, this Agents taks is to update about 1000 documents. I have increased the servers ini parameter of JavaMaxHeapSize to 512 MB but to no effect.
Can you please recommend a solution or 2
Reply
Re: java.lang error : out of memory
How long is it taking? There's a limit on the time agents can run for.
Reply
examples on threads
need more examples on threads
Reply
Good article
First of all Thanks for posting an article which is very useful. You dont find articles like these everywhere and that too in so much detail.
This one is great.
Reply
Asynchronous mono threaded
Hi,
I have learnt recently that Lotus agent does not fully run in asynchronous mode. This is defined as asynchronous mono threaded. By this it means that the agents start simultaneously but the threads are not being treated asynchronous. I think that this is ridiculous since this will have great impact on response time when there will be many users requesting information concurrently.
Reply
What happens when the last document is handed to the child thread to process it, and the main thread exits? Wouldn't this invalidate the references of the parent in the children? Or wouldn't this immediately terminate the children?
Reply