Making Sense of Domino Logs
Hopefully, you guys can help me out here. What do you know about Domino log files and reporting on NSF page activity?
This server doesn't have the domlog.nsf file enabled. Instead, it creates a text log file of all the activity for each day. They are typically quite large files - sometimes as big as 15MB for one day! Domino compresses these files and I download them via FTP for storage locally.
I've been doing this for ages now, but I've not bothered looking at them for probably over a year now. That is until a client started asking me about log analysis. What I need to be able to do is report on the title of the documents being accessed the most. What I'm not sure of is whether this is possible from the standard Domino Extended Common log files. The standard log records the URL of the page and not the title. What's the easiest way to log page hits by title?
While looking in to this I downloaded the free "lite" version of WebLog Expert and ran it on some of codestore's logs. Here are the stats for June this year. Interesting stuff (if you're me) - more than a million hits and 100,000 visitors.
So, what do you lot do to keep the boss happy when it comes to reporting page hits? Is there an off-the-shelf Holy Grail of reporting that gives managers everything they want? Or do I need to do some heavy-duty coding?
In the past I've managed to write an WebQueryOpen agent that runs any time a page, document,form, etc. is opened. It simply captures information like the current URL, page title, referrer, browser, IP Address, etc. and stores them in new documents. The load on the page is unaffected by the document creation.
I've used this solution for times when the domlog or the text files weren't available to the client.
That's an option I am considering Jorge. But am I right in thinking that WQO won't trigger when opening things like $$NavigatorTemplates?
Jake,
in case you haven't read the Lotus low-down, here it is:
{Link}
It appears you will have to:
- do some inline processing (like Jorge, above);
- post-processing (a simple lookup of webpage to title or whatever); or
- maybe a third-party product will have already solved it.
Domino only stores what is in the CGI Extended format, so proprietary meta data isn't explicitly catered for.
Jake - the WQO does work with $$navigatortemplates.
I figure your thinking of the db launch thing using a navigator/template - which I use a lot.
The {Link} template stores all sort of stat info - so you might be able to steal some views/forms from that - im uploading a new version tonight which has some code which looks up country/region info etc for ip addresses as well - something else you could use.
cheers
Steve
oh - it ranks page titles as well by having a dblookup in stat document based on web page document id and gets the subject/category etc.
Jake, just off the top of my head, can't you try something like those insidious '<img>' tags that are used (e.g.) to track if people have read HTML email? The principle is outlined in #5 in this thread:
{Link}
I think it ought to be possible to invoke a ?CreateDocument and pass the various elements that the browser knows about the current document as parameters.
Sorry if I'm being vague - I'm just scribbling this down before running off to the gym....
I'll check back later to see if I have made any sense :-)
My ISP (f2s.net) uses Webalizer, a free (GNU Licence) product that produces all sorts of pretty graphs. I use it for my personal (hobby) website and it seems pretty useful.
It can be found at {Link}
We use WebTrends Suite for Domino. It is great and works with Domino. Very costly, but we host several Domino sites, so it makes since.
{Link}
I'm trying to evaluate awstat this week - {Link} - I'll report back when I've tried it.
Like Darin, I have also used WebTrends for Domino, and been very impressed by it. Pricey for individuals yes, but value for money I think is good for any company interested in their website.
They do have some 'lighter' (cheaper) products, but there is something they have done in the 'Domino' version that helps it understand better the style of Domino URLs and log files.
My ISP also gives me Webalizer, but the reports are generated for me, so not had a look at the product. Stats aren't bad though :-)
My 2pence!
Simon.
Re: Bernard Devlin's comments above. I have implemented a similar solution on various sites with help from Mike @ notestips. It uses the '<img>' tag and forces a createdocument.
My only issue is more to do with the size of the resulting database as it can become quite unmanageable without archiving.
tq
Hey,
i ve been using a specific tool to build reports on http page hits : Workflo! Log Analyzer ({Link}
This product is based on a domlog improvement, and it s really worth it on my opinion. More than all, it s highly customizable.
Hope it helps.
Thanks guys. I'll digest this all later and try and report back with what I decide on...
Can't you crib what Mike does on NotesTips, that seemed to do the trick. I think I sent you a similar db once as well which you could use ... remember the one where you can draw IP addresses bz location onto an SVG map of the world.
I have not had much luck with analyzer tools. They always seem to have trouble distinguishing what's a page, image, .js, etc.
I have been using hosted solutions for my clients. They answer all the questions, and since they are just a little piece of code you place on each page they are very good at tracking real page view vs. hits - which domino logs mess up.
I use ClickTracks ($50/month) and SiteMeter (~$10/month) - Does anyone else have any good hosted solutions? I'm always looking.
IMHO, building activity logging into your application is a bad practice and should really be a last resort. Every modern web server, including Domino, has web logging on board. That's where logging belongs, not in the application itself.
It is true that in the case of Domino there's no easy way to see which page is visited because of the cryptic URLs. There's free tools to get close to it though, or you could code something yourself to analyze log files.
Matt, I find that incredibly expensive just for stats.
I agree Ferdy. The hidden image that creates a document is looking like my best option but I want it to be my last one.
I was looking through your report (I have been dealing with "Web Analytics" for years and was interested to see how it all looks for a site like this) and noticed you have a lot of referers from this site:
{Link}
From the looks of it that article links directly to some images on your site of the MySQL Control Centre.
With regards to the titles thing, most log analysers use the low tech method of actually retrieving the page to get the title :)
The other common way is basically your hidden image way, except the URL is generated using Javascript which can grab info like screen resolution, javascript version, the document title, etc, and add them as query parameters.
an idea that i have to look into myself actually, based on the domlog stats (with cryptic notes id urls): it shouldn't be hard to create a view (only showing the log items containing "?opendocument") sorting by originating database/directory and id of the opened document. an agent could easily run through this view, check which are the 10 most popular documents by database (or directory for that matter). and look up a field (DocumentTitle?) in each of those resulting documents.
A small export to excel to top it off, and mailing that report to the domlog database owners once a month.
Could this be a possible way of doing what you need?
Your logs can tell you which document was accessed (using the docID in the URL)?
As mentioned, it's probably a bad idea to build logging into the app itself.
Focus on the reporting side of it ...
You should be able to parse the normal log file and do a lookup when creating the reports.
C.
Hi,
just use awstats. We switched from WebSuxxess to this GPL-Tool.
{Link}
Its just great Software and if its installed, you dont need to do anything. In our case we now get a daily-report from our sites.
Greetings Maiko
Hi Jake,
Check this
{Link}
www.weboscope.com/?LANGUAGE=UK
Thanks for the reminder Marcin. I noticed this too and, a couple of weeks ago, I tried to get in touch with him to politely ask he stopped. He has no means of communication on the site though so I guessed some email addresses at his domain. I heard nothing so now I've had to use guerilla tactics. He'll have a shock next time he opens that page ;o)
Thanks to all you guys too. It's looking like I am going to have to code something to capture page titles then?
I created something based on domlog for this - it attempts to find a title for the page - be it a document or design element. It will make guesses at appropriate field titles on documents but it is configurable so that fields can be specified by form name. It then converts the domlog output to text files that the free Webalizer is quite happy with and mails them somewhere. Finally it optionally deletes the domlog entries to keep that DB size under control.
It's best to restrict the logging just to those things of interest - miss out all those navigation images for example.
Are you willing to share David?
Sorry I should have said. It's available as BeaverPlus, for free, from www.mailrat.net Feedback welcome as I don't currently have access to much data for testing.
Has your server always been on US time Jake? Or is it yet another problem with this awful laptop I'm forced to use?
Thanks David, will take a look.
Dates have always been US format. Although there was some confusion a while back.
It was the time of these comments I was referring to as they appear to be written 6 hours earlier than they were.
Yeah, sorry, they are US time. Central I think. Codestore server lives there so I've left it like that. I've thought about changing it but don't see the point. As long as they are relative that's all that matters. Change to UK time and US folk are just as confused. No way to please all...
someone needs to come up with IST - Internet Standard Time. Set all servers to use that, then the user only needs to know the time difference to their local time zone. Simple. No confusion.
Chad, isn't that the principle behind Greenwich Mean Time (GMT)?
Well said Sam. Time itself is owned by us, the British. Rulers of the world ;-)
I suppose what I should do append the time zone to the end and let people work it out from there. But does it really matter though? I don't think so personally.
The same "rulers" that claim to have invented soccer, but really didn't?
hehe ;)
Ferdy, we invented football, not soccer. I suppose you're going to tell me the Dutch did?
I generally have 2 different approaches. I have taken the dominolog.nsf (and/or the text log files) and exported them to MySQL for reporting. From porting them to MySQL I can use the same weblog analysis tools as you would for an Apache server -- like Webalizer, etc.
The second approach was similar to Jorge, but instead of storing with a Lotus Notes database I would store it to MySQL via a WQO LotusScript agent using LS:DO to insert the row to the table (in Linux you can unixODBC). One of my clients wedj.com has from 10,000 to 20,000 unique visitors a day. The Linux server does not slow down nor does it have any delay in loading. Using a relation back end remove the burden from the NSF file. Plus the server does not have to worry about indexing a huge domlog.nsf for a site such as this.
HTH
Hi
I find
If you want to do a little more advanced stuff i think it's okay to implement some logging code in your application.
I need not be that hard. A simple applet, or xmlhttp doing a Post to a servlet which saves data in database like Firebird (love it).
You can then do much better user tracking.
regards
Jesper Kiaer
{Link}
Here's the solution I came up with for use with the domlog.nsf file.
Since stat requests are made far less often than they are recorded, adding extra recording / parsing to the capture event didn't seem like a worthwhile server load / cost / complexity tradeoff. Instead, we're implementing an on-demand query of the domlog that will parse data from a specified date range to produce a basic set of results and return it in simple graph form similar to awstats to a web browser.
One list of unique urls will be used to do a lookup on the top n hits so we can get titles with more information available as a separate request to our parsing agent which will be written in java. I'll be adding a single view with UNID and title columns to support this.
Sorry I don't have a sample to show as I havn't developed it yet... just planned out for now.
Some words about WebAlizer
I DONOT like it.
Main reasons:
1) Not very correct stats
2) Refspam through webalizer logs
Refspam is popular in my country, and in case they make it more often the site with WebAlizer may me ddosed.
Thats what i think