Bayesian Tail 63
A user writes "We all know anti-spam-software using Bayesian filtering. The results with these are amazingly good. So that made me thinking: why not create a tool which monitors logfiles and determines using a Bayesian filter what events to display and what not? That's why I created btail. Btail is just that: it monitors a logfile and filters it with a Bayesian filter. The results are above my own expectations!"
Re:Sure... (Score:4, Funny)
Cool idea but may be dangerous (Score:2, Insightful)
Anyhow credits on a decent idea
Re:Cool idea but may be dangerous (Score:5, Insightful)
Re:Cool idea but may be dangerous (Score:5, Insightful)
Re: (Score:3, Informative)
Re:Cool idea but may be dangerous (Score:3, Informative)
I published a paper [chalmers.se], with GPL source code [chalmers.se] (you need Python etc) a few months back using visualisation (colorisation) to lend the user insight into the operation of a Bayesian classifier.
It actually works pretty well, and the idea could be applied to other uses of the Naive Bayesian classifier.
Re:Cool idea but may be dangerous (Score:3, Insightful)
Server logs are not the same at all. The administrator has some control over the logs that get generated, and the programmer has full control. There isn't supposed to be the equivalent of email spam at all, because useless messages should just be filtered or redirected at the source. Leaving everything at "verbose" and relying on filtering just
Re:Cool idea but may be dangerous (Score:1, Insightful)
Re:Cool idea but may be dangerous (Score:2)
examples (Score:4, Interesting)
Hey! (Score:3, Funny)
Site getting sluggish already (Score:5, Informative)
Still very preliminary at this point, but shows promise. Now, to build and try it out!
Re:Site getting sluggish already (Score:2)
Well, whereæs the story ? (Score:2, Interesting)
Heck, I wanna know what the results are goddamnit. What made the thing so great.
Re:Well, whereæs the story ? (Score:1, Funny)
When it merely turned a couple of dogs inside out, he knew the time had come to offer it as a Slashdot story.
my comment: (Score:1)
I concur (Score:1)
What I would like to see (Score:2, Interesting)
Re:What I would like to see (Score:4, Informative)
If you pop over to the CRM114 site [sourceforge.net] and search the general list archives [sourceforge.net] for monkeyplexer to find the discussions about it.
Here is the last version announcement that I could find in my mailbox:
monkeyplexer is a tool for automatically sorting incoming email messages into appropriate folders. A new version of monkeyplexer, 0.7, is now available. http://bad.dynu.ca/~evan/monkeyplexer/monkeyplexe
This version includes the following changes:
You can specify which mailboxes to use, instead of which mailboxes to exclude. This can save some typing and some time at runtime, at the expense of dynamically updating the list. You can tell the monkeytrainer to only train messages that were received in the last few weeks, days, hours, minutes -- whatever. The monkeyplexer remembers which messages have been trained for which folders. If you train a message for a different folder, the monkeyplexer will automatically forget the first folder before training for the new one. Thanks to everyone who has installed monkeyplexer already. I hope this new version helps some people out. I find it easier and more accurate.
~ESP
Re:What I would like to see (Score:3, Informative)
If this were Trek... (Score:5, Insightful)
01:56 Plasma injector #1 offline, switching to #2 backup.
02:23 Overheat in plasma injector #2.
02:44 Failure to shutdown plasma injector #2.
02:58 Overheat in reactor core.
03:20 Containment weakening.
03:25 Containment weakening.
03:30 Containment weakening.
03:35 Five minutes to containment failure.
03:40 FIVE SECONDS TO WARP CORE BREACH!!!
Better be careful to train the filter about those warnings that don't happen very often, but when they do, you really want to know about them.
Re:If this were Trek... (Score:3, Interesting)
01:37 [error] Overheat in plasma injector #1.
01:37 [warning] Cargo bay door 2 is open.
01:38 [warning] Cargo bay door 2 is open.
01:38 [warning] Oxegen sensor on deck 2 not responding.
01:39 [warning] Cargo bay door 2 is open.
01:40 [warning] Cargo bay door 2 is open.
01:41 [warning] Oxegen sensor on deck 2 not responding.
01:56 [error] Plasma injector #1 offline, switching to #2 backup.
In other words real interesting errors i
Re:If this were Trek... (Score:2)
Re:If this were Trek... (Score:1)
... will show only the php errors. :-)
Re:If this were Trek... (Score:2)
Re:If this were Trek... (Score:1)
For my firewall sound effects program, I basically tail the ZoneAlarm logs, and play a selectable sound effect depending on the port/type. It's cute and even useful for detecting patterns (if you don't mind the noise), but I'm thinking about if Bayesian filtering could be applied to a real security report. It might be
Re:If this were Trek... (Score:2)
tail -F
I use similar rules for alerts about SSH break-in attempts, mail relay probes and machine check exceptions. I know there are all sorts of sophisticated log analyzer and
Re:If this were Trek... (Score:2)
Re:If this were Trek... (Score:2)
I do this kind of thing all the time, allow me to share...
Re:If this were Trek... (Score:2)
That said, I don't buy logfile filtering until I see it works. Sometimes you are interested in messages of one kind, sometimes in messages of another kind. I still think that fixed pattern matching can do the job better. Of course, that's what many people feel about spam filtering.
Re:If this were Trek... (Score:2)
I think you're misinterpreting what the tool is meant for. Often, when you are looking at the logs, you are looking for something in particular. In those cases, as you suggest, grep is probably the best tool for the job. But, as
Bayesian is good for almost everything (Score:4, Interesting)
Bayesian is good for almost everything-Dessert. (Score:1, Interesting)
Recovering the Slashdot lost since 2000, by eliminating most (-1) material e.g.GNAA,FP,etc. Eliminating the human biasis in the moderation system (Since client-side moderation is out). Tagging interesting material (A Baysian agent).
Re:Bayesian is good for almost everything (Score:3, Interesting)
Bayesian filtering could be used for lots of things outside of spam. One example could possibly be Wikis, determining spam from ham modifications (well, yes, it is spam here). I've had some other ideas that involve Bayesian, but they've escaped me for the moment.
Re:Bayesian is good for almost everything (Score:2)
In fact, I may just go and make a ff extension that does just that, hmmm, mebbe call it "NNSFW"?
Re:This code belongs on (Score:3, Insightful)
Re: (Score:1)
Re:This code belongs on (Score:2)
Well, no it doesn't ... (Score:5, Insightful)
The [brackets] used in the usage message are standard in the Unix world for specifying an optional or default argument. Just look at any man page. So that, actually, is pretty straightforward. The name of the default config file would likely also be spelled out in the man page, which I would expect, so that's not confusing.
As for changing the if construct into a switch, well, I'm trusting the accuracy of your excerpt, but I didn't find his code to be very difficult to read, to be honest, and certainly not a candidate for DailyWTF, which typically contains laughably horrible code.
As far as other code may go, the guy states that this is in a nascent stage, so jumping on his source files seems like a bit of an easy shot
Re:This code belongs on (Score:5, Insightful)
Reinvent the Wheel Much? (Score:5, Informative)
Why learning with supervision? (Score:3, Interesting)
new pr0n! (Score:2)
Here's how to make this a lot more useful (Score:4, Interesting)
Step 2: Include btail with major distros
Step 3: Any package for an app that generates logs can come with a ready-made canned training package, which gets dropped into the
That way, you could apt-get a package, start btail-ing its logfiles immediately without the need to tediously train the filter first. Training would still be possible, to personalise the filter.
Nor for Me (Score:1, Insightful)
throw out the baby with the bathwater, will you? (Score:2)
That's akin to only filling a dictionary with
Spellcheck in Firefox (Score:1)
Bayesian (Score:2, Interesting)
We all know that if the filter makes a mistake and hides a message in the Spam box, and chances are you'll might miss many of them, another the chance t
Bayesian AIM bot (Score:3, Interesting)
I love Bayes stuff - and there's a very nice Python module written by divmod [divmod.org].
I was playing around with AIML to cobble together a basic chat bot when I realised that I could use a Bayesian parser to radically cut down the amount of AIML that I needed to write. AIML is an XML style of chat bot repsonses, it's clever in that it's highly recursive but the downside is that you need to create a rule for every eventuality.
By adding in a bit of Bayesian guessing before the AIML parser got it hands on the conversation, I'm able to keep the AIML files very focused and give the chat bot a bit more sparkle - you don't have to train him about everything. After a while he realised that 'yo', 'hi' and 'hello' are all the same thing, so he just guesses that you're saying hello and pulls out the correct response from the AIML file (rather than creating an AIML rule to deal with all the variations on 'hello').
If you're interested I'd strongly recommend installing GrokitBot. You can get the source and a bit more explanation at my site, Suttree.com [suttree.com]
Playaholics : Free Online Games [playaholics.com]