news.chmurka.net — NoCeM notices

Preface

When running a Usenet server, you face a challenge of combating constant influx of spam and floods from various sources. You could just run a site without any filtering, but then some newsgroups would become unusable. Hence there's a need to detect and reject spam one way or the other, while at the same time minimizing the impact on legitimate posts. You want as much spam as possible to be filtered while making sure that no posts from legitimate users are lost in the process.

Spam filtering

There are quite a few different ways a spam can be filtered on the Usenet. One of them is to use an INN filter script. INN can run an external script, written in either Perl or Python (or both at the same time), and ask these scripts to classify incoming articles. When both scripts mark the article as good, the server accepts it. If at least one marks it as bad, the server rejects it.

General filtering

Most Usenet servers run a Cleanfeed filter. It's a general-purpose filter, written in Perl, that combats many common attacks, spams, and floods on the Usenet. It's a good first line of defense. There are other filters that work in a similar way, one notable example being PyClean (a similar filter written in Python).

Some sites also run SpamAssassin or other filters.

What is NoCeM?

While individual sites run their own filters, sometimes it would be useful if they could communicate with each other. Let's say that a filter on site A catches spam, site B learns about it, and (if it trusts the judgment of the site A) deletes the offending article from its spool. This mechanism exists and is called NoCeM. You can find an introduction to it in a document called NoCeM FAQ.

NoCeM notices are published on a special newsgroup (news.lists.filters). Multiple notice types from the same site can have a different type field. You can opt in to trust all types from a given issuer, or only certain ones.

Anyone can post to news.lists.filters, but since NoCeM notices are signed with PGP, it's impossible (as long as the cryptography itself isn't cracked) to impersonate someone just by using his email address and abusing the established trust to remove innocent articles. The signature simply won't match and the notice won't be processed.

Thai flood

Note: The problem resolved itself when Google Groups cut itself off from Usenet, so this section is left here for historic reasons only. Thai flood is, fortunately, the story of the past, but it was what triggered me to implement NoCeMs in the first place.

During the recent months there has been a huge amount of flood originating from Google Groups. It's automated, written mostly in Thai, and makes certain newsgroups practically unusable. Cleanfeed doesn't seem to be able to handle it, and SpamAssassin seems to perform quite poorly and has a considerable amount of false positives.

The easiest solution would be, as many suggest, to depeer Google Groups, but for many sites this simply isn't an option, as valid users still post from it.

That's why the need for a filter removing this type of flood arose.

Filtering on news.chmurka.net

news.chmurka.net started running such filter and issued notices about articles killed by it to news.lists.filters, with type field set to thai. It was running on hierarchies: pl.*, alt.*, uk.*, de.*, and BIG 8).

When Google Groups depeered themselves from the Usenet, thai filter lost its value, but another filter, with type set to spam, took its place. It filters articles based on some common characteristics, like From addresses of known spammers, or URLs (or other body parts) found in articles. It's designed to minimize false-positives.

Internally it's just a Python filtering script that rejects the article when it has certain characteristics, but saves it first. It's done for two reasons: first, to be reposted to a local spam newsgroup (chmurka.spam), so false-positives can easily be caught (and the article reposted locally, if needed) and second, to be processed and posted as a NoCeM notice by another process running on the server every N minutes (currently N = 5).

Policy statement

Here's the policy statement, as required by the NoCeM registry.

news.chmurka.net sends NoCeMs for articles that match specific, fixed criteria, like certain headers, body parts, or other static characteristics of articles that are considered spam by the server owner. Every article that's marked this way is reposted to a local newsgroup (chmurka.spam), both for transparency and for inspection. My NoCeMs are never used to censor individuals – only spam is targeted.

Why should I trust you?

You should never trust a random person on the Internet! And especially their automated scripts. That's why it's a good idea to review my notices before deciding that you want to use them, and then modify the NoCeM filter to save these killed articles and review them periodically.

So should I run the script myself instead?

It's a good idea, but then you're on your own. If the script needs to be adjusted, you'll have to adjust it yourself. I'm monitoring my filter and adjusting it if it doesn't detect everything (but I always want to be on the safe side to keep the false positives as unlikely as possible), so by using automated notices you don't have to worry about the script yourself (but you're still responsible for what's cancelled on your server, even if it's because of my notices! You should control it).

I don't want to publish the script here (spammers might read it too), but feel free to contact me and I'll send you the most recent copy (or a Git access, when I finally move this stuff to some repository).

How do I subscribe to your NoCeMs?

Enabling NoCeM processing is really easy (as long as we're talking about INN). You have to:

Install GnuPG 2.x
Import my public key
Whitelist the email and type of notices in a nocem.ctl file
Test it by feeding an example NoCeM notice to perl-nocem program (supplied with INN)
Create a newsfeeds entry to send all notices to perl-nocem
Optionally: feed old notices already found in news.lists.filters to perl-nocem

It's also a good idea to modify the perl-nocem script to save articles and do something about them (repost to a local newsgroup, keep in mbox and review from time to time, or really anything else that would allow you to monitor what's being cancelled on your site).

Public key

Public key used to sign NoCeM notices posted by news.chmurka.net can be downloaded from:

http://news.chmurka.net/nocem-chmurka.asc

The easiest way to import it to gpg should be something like this (adjust your paths!):

curl http://news.chmurka.net/nocem-chmurka.asc | \
gpg --no-default-keyring --allow-non-selfsigned-uid \
--primary-keyring /usr/local/news/etc/pgp/ncmring.gpg \
--no-options --no-permission-warning --import -a

You should see some output (from curl and gpg) and, among the lines, something like:

gpg: key 41237614098DD255: public key "news.chmurka.net <nocem@chmurka.net>" imported

Whitelist email and type

There's a nocem.ctl file in your etc directory (probably /usr/local/news/etc) that stores pairs of emails and notice types that perl-nocem program will accept. You should edit it and add this line:

nocem@chmurka.net:spam

Test it

Go to news.lists.filters and pick a notice from news.chmurka.net. Save its Message-ID (let's call it Notice-Message-ID). You can also save a Message-ID of a spam article from this notice (let's call it Spam-Message-ID).

First, verify if you have a message with Spam-Message-ID on your system.

grephistory Spam-Message-ID | sm

You should see the spam article on the screen. Then, issue the following command:

grephistory Notice-Message-ID | perl-nocem

And then the first command again. You should see that the spam article is no longer there.

Automate it

Open your newsfeeds file and add the following (again, adjust paths!):

nocem!\
	:!*,news.lists.filters/!local\
	:Tc,Wf,Ap:/usr/local/news/bin/perl-nocem

Reload your newsfeeds file:

ctlinnd reload newsfeeds "Enable NoCeM"

And you should be all set. You can check the syslog if these notices are really processed.

Refeed old notices

If you want to re-feed old notices to perl-nocem (and you're using tradindexed overview method), you can use the following command:

tdx-util -g -n news.lists.filters | cut -f 6 -d' ' | perl-nocem

If you're using ovdb, try this:

ovdb_stat -r 1- news.lists.filters | cut -f 5 | grephistory -s | perl-nocem

Useful perl-nocem modifications

There's a line that says:

eval { require Sys::Syslog; import Sys::Syslog; $use_syslog = 1; };

You can comment it out, and then log entries from the script will be saved in a log/perl-nocem.log file.

You can also make this script save articles for you, for further processing. This is my way of doing this (as a patch). Note: I don't know perl, so there's probably (certainly) a better way (without using the intermediate Bash script).

--- perl-nocem
+++ perl-nocem
@@ -115,6 +115,7 @@
     close $artfh;
     return unless $nocems;

+    copy_articles($msgid, $nocems);
     &$cancel($nocems);
     logmsg("Articles cancelled: " . join(' ', @$nocems));
     my $diff = (time - $start) || 0.01;
@@ -391,6 +392,34 @@
     return undef;
 }

+# Copy single article to nocem spool to be reposted to spam group
+# by an external program.
+#
+# First argument: notice msgid
+# Second argument: cancelled post msgid
+sub copy_article {
+    my $pid = fork;
+    if($pid == 0) {
+        exec "$INN::Config::pathbin/local/nocem-copy-article.sh", $_[0], $_[1];
+        exit 1;
+    }
+    waitpid $pid, 0;
+}
+
+# Copy articles to nocem spool to be reposted to spam group
+# by an external program.
+#
+# First argument: notice msgid
+# Second argument: array of msgids to copy
+sub copy_articles {
+    my $notice_msgid = $_[0];
+    my $ids = $_[1];
+
+    foreach(@$ids) {
+        copy_article($notice_msgid, $_)
+    }
+}
+
 # Cancel a number of Message-IDs.  We use ctlinnd to do this,
 # and we run up to 15 of them at the same time (10 usually).
 sub cancel_ctlinnd {

It makes the Bash script, bin/local/nocem-copy-article.sh, receive a notice ID (Message-ID of the notice) and spam ID (Message-ID of the article to be cancelled) as its arguments, and give it a chance to save the article before it's cancelled. The script:

#!/bin/bash

set -f

dir=/usr/local/news/spool/nocem-reposts
grephistory=/usr/local/news/bin/grephistory
sm=/usr/local/news/bin/sm

[ $# -eq 2 ] || { echo "Invalid number of arguments"; exit 1; }

notice_id=$1
art_id=$2

art_token=$($grephistory "$art_id")
[ $? -eq 0 ] || exit 1
[ "$art_token" = "/dev/null" ] && exit 1

filename=$(tr -cd 'a-f0-9' < /dev/urandom | head -c 32)
temppath=$dir/$filename.tmp
finalpath=$dir/$filename.to-reformat

echo "$notice_id" > $temppath
$sm "$art_token" >> $temppath 2>/dev/null
if [ $? -eq 0 ]; then
	mv $temppath $finalpath
else
	rm -f $temppath
fi

Remember to create /usr/local/news/spool/nocem-reposts directory and review it from time to time. If you don't, you can easily run out of inodes, as there will be tons of files there. I have my own scripts to process these articles, but they're specific to news.chmurka.net and not mature enough to be published.

Contact

I can't guarantee there are no false-positives in these notices, but if you spot one, please contact me at news-contact (at) chmurka.net. You can also contact me if you have any questions about these filters (but for general questions about NoCeM, news.admin.net-abuse.usenet newsgroup will be better).

Please do not use the address from the notice (nocem@chmurka.net). It's created only for the purpose of signing notices and isn't valid.

Back to main page