When running a Usenet server, you face a challenge of combating constant influx of spam and floods from various sources. You could just run a site without any filtering, but then some newsgroups would become unusable. Hence there's a need to detect and reject spam one way or the other, while at the same time minimizing the impact on legitimate posts. You want as much spam as possible to be filtered while making sure that no posts from legitimate users are lost in the process.
There are quite a few different ways a spam can be filtered on the Usenet. One of them is to use an INN filter script. INN can run an external script, written in either Perl or Python (or both at the same time), and ask these scripts to classify incoming articles. When both scripts mark the article as good, the server accepts it. If at least one marks it as bad, the server rejects it.
Most Usenet servers run a Cleanfeed filter. It's a general-purpose filter, written in Perl, that combats many common attacks, spams, and floods on the Usenet. It's a good first line of defense. There are other filters that work in a similar way, one notable example being PyClean (a similar filter written in Python).
Some sites also run SpamAssassin or other filters.
While individual sites run their own filters, sometimes it would be useful if they could communicate with each other. Let's say that a filter on site A catches spam, site B learns about it, and (if it trusts the judgment of the site A) deletes the offending article from its spool. This mechanism exists and is called NoCeM. You can find an introduction to it in a document called NoCeM FAQ.
NoCeM notices are published on a special newsgroup (news.lists.filters). Multiple notice types from the same site can have a different type field. You can opt in to trust all types from a given issuer, or only certain ones.
Anyone can post to news.lists.filters, but since NoCeM notices are signed with PGP, it's impossible (as long as the cryptography itself isn't cracked) to impersonate someone just by using his email address and abusing the established trust to remove innocent articles. The signature simply won't match and the notice won't be processed.
Note: The problem resolved itself when Google Groups cut itself off from Usenet, so this section is left here for historic reasons only. Thai flood is, fortunately, the story of the past, but it was what triggered me to implement NoCeMs in the first place.
During the recent months there has been a huge amount of flood originating from Google Groups. It's automated, written mostly in Thai, and makes certain newsgroups practically unusable. Cleanfeed doesn't seem to be able to handle it, and SpamAssassin seems to perform quite poorly and has a considerable amount of false positives.
The easiest solution would be, as many suggest, to depeer Google Groups, but for many sites this simply isn't an option, as valid users still post from it.
That's why the need for a filter removing this type of flood arose.
news.chmurka.net started running such filter and issued notices about articles killed by it to news.lists.filters, with type field set to thai. It was running on hierarchies: pl.*, alt.*, uk.*, de.*, and BIG 8).
When Google Groups depeered themselves from the Usenet, thai filter lost its value, but another filter, with type set to spam, took its place. It filters articles based on some common characteristics, like From addresses of known spammers, or URLs (or other body parts) found in articles. It's designed to minimize false-positives.
Internally it's just a Python filtering script that rejects the article when it has certain characteristics, but saves it first. It's done for two reasons: first, to be reposted to a local spam newsgroup (chmurka.spam), so false-positives can easily be caught (and the article reposted locally, if needed) and second, to be processed and posted as a NoCeM notice by another process running on the server every N minutes (currently N = 5).
Here's the policy statement, as required by the NoCeM registry.
news.chmurka.net sends NoCeMs for articles that match specific, fixed criteria, like certain headers, body parts, or other static characteristics of articles that are considered spam by the server owner. Every article that's marked this way is reposted to a local newsgroup (chmurka.spam), both for transparency and for inspection. My NoCeMs are never used to censor individuals – only spam is targeted.
You should never trust a random person on the Internet! And especially their automated scripts. That's why it's a good idea to review my notices before deciding that you want to use them, and then modify the NoCeM filter to save these killed articles and review them periodically.
It's a good idea, but then you're on your own. If the script needs to be adjusted, you'll have to adjust it yourself. I'm monitoring my filter and adjusting it if it doesn't detect everything (but I always want to be on the safe side to keep the false positives as unlikely as possible), so by using automated notices you don't have to worry about the script yourself (but you're still responsible for what's cancelled on your server, even if it's because of my notices! You should control it).
I don't want to publish the script here (spammers might read it too), but feel free to contact me and I'll send you the most recent copy (or a Git access, when I finally move this stuff to some repository).
Enabling NoCeM processing is really easy (as long as we're talking about INN). You have to:
It's also a good idea to modify the perl-nocem script to save articles and do something about them (repost to a local newsgroup, keep in mbox and review from time to time, or really anything else that would allow you to monitor what's being cancelled on your site).
Public key used to sign NoCeM notices posted by news.chmurka.net can be downloaded from:
http://news.chmurka.net/nocem-chmurka.ascThe easiest way to import it to gpg should be something like this (adjust your paths!):
curl http://news.chmurka.net/nocem-chmurka.asc | \ gpg --no-default-keyring --allow-non-selfsigned-uid \ --primary-keyring /usr/local/news/etc/pgp/ncmring.gpg \ --no-options --no-permission-warning --import -a
You should see some output (from curl and gpg) and, among the lines, something like:
gpg: key 41237614098DD255: public key "news.chmurka.net <nocem@chmurka.net>" imported
There's a nocem.ctl file in your etc directory (probably /usr/local/news/etc) that stores pairs of emails and notice types that perl-nocem program will accept. You should edit it and add this line:
nocem@chmurka.net:spam
Go to news.lists.filters and pick a notice from news.chmurka.net. Save its Message-ID (let's call it Notice-Message-ID). You can also save a Message-ID of a spam article from this notice (let's call it Spam-Message-ID).
First, verify if you have a message with Spam-Message-ID on your system.
grephistory Spam-Message-ID | sm
You should see the spam article on the screen. Then, issue the following command:
grephistory Notice-Message-ID | perl-nocem
And then the first command again. You should see that the spam article is no longer there.
Open your newsfeeds file and add the following (again, adjust paths!):
nocem!\ :!*,news.lists.filters/!local\ :Tc,Wf,Ap:/usr/local/news/bin/perl-nocem
Reload your newsfeeds file:
ctlinnd reload newsfeeds "Enable NoCeM"
And you should be all set. You can check the syslog if these notices are really processed.
If you want to re-feed old notices to perl-nocem (and you're using tradindexed overview method), you can use the following command:
tdx-util -g -n news.lists.filters | cut -f 6 -d' ' | perl-nocem
If you're using ovdb, try this:
ovdb_stat -r 1- news.lists.filters | cut -f 5 | grephistory -s | perl-nocem
There's a line that says:
eval { require Sys::Syslog; import Sys::Syslog; $use_syslog = 1; };
You can comment it out, and then log entries from the script will be saved in a log/perl-nocem.log file.
You can also make this script save articles for you, for further processing. This is my way of doing this (as a patch). Note: I don't know perl, so there's probably (certainly) a better way (without using the intermediate Bash script).
--- perl-nocem +++ perl-nocem @@ -115,6 +115,7 @@ close $artfh; return unless $nocems; + copy_articles($msgid, $nocems); &$cancel($nocems); logmsg("Articles cancelled: " . join(' ', @$nocems)); my $diff = (time - $start) || 0.01; @@ -391,6 +392,34 @@ return undef; } +# Copy single article to nocem spool to be reposted to spam group +# by an external program. +# +# First argument: notice msgid +# Second argument: cancelled post msgid +sub copy_article { + my $pid = fork; + if($pid == 0) { + exec "$INN::Config::pathbin/local/nocem-copy-article.sh", $_[0], $_[1]; + exit 1; + } + waitpid $pid, 0; +} + +# Copy articles to nocem spool to be reposted to spam group +# by an external program. +# +# First argument: notice msgid +# Second argument: array of msgids to copy +sub copy_articles { + my $notice_msgid = $_[0]; + my $ids = $_[1]; + + foreach(@$ids) { + copy_article($notice_msgid, $_) + } +} + # Cancel a number of Message-IDs. We use ctlinnd to do this, # and we run up to 15 of them at the same time (10 usually). sub cancel_ctlinnd {
It makes the Bash script, bin/local/nocem-copy-article.sh, receive a notice ID (Message-ID of the notice) and spam ID (Message-ID of the article to be cancelled) as its arguments, and give it a chance to save the article before it's cancelled. The script:
#!/bin/bash set -f dir=/usr/local/news/spool/nocem-reposts grephistory=/usr/local/news/bin/grephistory sm=/usr/local/news/bin/sm [ $# -eq 2 ] || { echo "Invalid number of arguments"; exit 1; } notice_id=$1 art_id=$2 art_token=$($grephistory "$art_id") [ $? -eq 0 ] || exit 1 [ "$art_token" = "/dev/null" ] && exit 1 filename=$(tr -cd 'a-f0-9' < /dev/urandom | head -c 32) temppath=$dir/$filename.tmp finalpath=$dir/$filename.to-reformat echo "$notice_id" > $temppath $sm "$art_token" >> $temppath 2>/dev/null if [ $? -eq 0 ]; then mv $temppath $finalpath else rm -f $temppath fi
Remember to create /usr/local/news/spool/nocem-reposts directory and review it from time to time. If you don't, you can easily run out of inodes, as there will be tons of files there. I have my own scripts to process these articles, but they're specific to news.chmurka.net and not mature enough to be published.
I can't guarantee there are no false-positives in these notices, but if you spot one, please contact me at news-contact (at) chmurka.net. You can also contact me if you have any questions about these filters (but for general questions about NoCeM, news.admin.net-abuse.usenet newsgroup will be better).
Please do not use the address from the notice (nocem@chmurka.net). It's created only for the purpose of signing notices and isn't valid.