Pushing Boundaries?
Anybody who looks after a Domino server will know that the screengrab below from this site's log.nsf doesn't look good.
Any entry with a start time and no end time tends to indicate a crash. At least it does in each of the cases above!
As I feared yesterday it's starting to look a lot like my latest round of changes to this site are killing the server. I just can't work out why. Nothing I've done is really pushing any boundaries. Surely?
The obvious culprit would be the major change, which is that I now build the entire comment thread using a Rich Text item and a Web Query Open agent each time a blog or article document is opened. There's nothing fancy going on there though and I'm thinking that's a red herring.
Right now I'm suspecting it's more to do with creating new replies. The crashes seem to be happening at random when you press the "Post It" button. When it does crash the reply you posted doesn't save but something must happen as the browser receives the redirect back from the server, which happens in the WQS agent (Print "Location: /bla"). Very odd.
While I look in to it (and beg Prominic.NET for an 8.5.1 upgrade) it might be worth copying the text of the reply before you press the post button!
I should know not to try and push boundaries with Domino. If the only solution to this is a design rollback I'm going to be pretty peeved about it after I've put in so much effort to try and show that Domino can do anything others servers can. It feels a bit like one step forward and two steps back.
If it comes to it I might even ask Prominic to revert the server to 7.0.2 and/or move it to a Windows machine (it's on Linux now and Domino version is 8.5HF224).
Any ideas folks?
Q: Do you have an agent that runs when someone leaves a comment? Do users recieve a 500 error? Do you have any logging in the agent? I have experienced similar situation in the past and we tracked it down to the agent log growing too large.
Reply
Yes there's a LotusScript WQS agent. Nothing special. Not sure what other people are seeing, but I just saw a "can't connect" page in the browser. The server seems to crash before it gets chance to throw a 500.
Agent doesn't use an Agentlog.
Reply
Show the rest of this thread
Your server isn't on 8.0.2FP2 is it? If so, it might be helpful to know that I encountered a number of unexpected crashes which magically resolved when I reverted to FP1.
Reply
Nope. It's on 8.5 (HF224 whatever that is). Think I skipped 8.0. Was on 7.0.2 before this.
Hopefully I'll be on 8.5.1 soon.
Reply
Sorry, scrap the above, I should have read your last sentence...
Reply
Did you try the RTItem.Compact() call I suggested? That might be all you need.
Reply
That's been in place since last night, so didn't prevent the crashes this morning, although I'm not convinced the RT field is even to blame
Reply
Also, are you getting an NSD file? If so, search for "fatal" and post the call stack of the fatal thread so we can see it.
Reply
I'm about to go looking for the NSD files. Not easy on a Linux box via the shell.
Reply
You can store nsd's in a notes database (Lotus Notes/Domino Fault Reports, lndfr.nsf). Then it is as simple on Linux than on any other OS.
Reply
If you still have problems with the crashes, you could add a lot of Prints to your code. Then, when you find a crash, dig through the log.nsf (if Prominic provides you with one), and see what the last part that ran was.
From all the trouble you're having, I presume the agent doesn't crash the "normal" way, which results in errors going to the log (or that you don't have a log available).
If you don't have a log.nsf available, OpenLog + LogEvent is also a tool.
Reply
Most likelly your crash has nothing to do with the WQO agents directly, as we have a CMS based completely around that, and we don't have any troubles pushing Domino in that direction (we have our own template engine that writes html directly to an rtitem on WQO)
However, we have been having serious stability issues with the 8.5 httpstack, crashing the main servers several times a day, without explanation. Upgrading to 8.5.1 seems to have resolved the situation for us.
Reply
It's not happened since, so I'm starting to think it was "one of those things" and just a coincidence it was happening around the time I made the first changes to this sites design in years.
Prominic have promised an 8.5.1 upgrade this week anyways.
Reply