Getting rid of old spam is hard

Updated November 17, 2021 – My spam remover extension, which uses the Akismet service (may require paying Akismet to use the service) can reliably remove a lot of legacy spam.


phpBB is now pretty good at keeping spam users from registering and posting. In the phpBB 3.0 and 3.1 days, its defenses turned out to be pretty weak. The GD Image spambot countermeasure (still the default) was easily hacked. phpBB has at least added settings to let you tune it better, making it harder to hack. It also started supporting Google’s reCAPTCHA, but the version in phpBB 3.1 was quickly hacked and phpBB was not agile enough to quickly integrate its versions 2 and 3 reCAPTCHAs.

This led to the an inundation of spam on certain forums, mostly bogus spam registrations but also lots of spam posts in some forums. Some administrators countered by requiring all new users to be approved by an administrator. But when inundated with hundreds of these in a short period of time, it’s a hassle to delete them all, or discern the real new users from the spam ones. For a few years, I made quite a bit of money removing spam for clients.

With phpBB 3.2 things slowly got better, at least if administrators used best practices. Best practices were to use reCAPTCHA version 2 “I am not a robot”, or the Question & Answer, providing the questions were sufficiently difficult. A malicious human could still take the time to solve the questions, but these were unusual. There were also a few extensions that could help. The Sortables CAPTCHA was one of the more useful ones.

My go to for years has been the Cleantalk extension, which requires subscribing to their service. But now there is also an Akismet spam extension, which also requires a subscription, which can be free for personal sites.

All this is good at preventing spam, but how do you get rid of months or years of spam posts? That was my dilemma this week working with a client.

The latest version of the Cleantalk extension has a feature that removes spam users and their posts. But I discovered it has a few serious limitations:

  • It bases its judgment based on the IP of the poster. The user’s last IP is stored automatically. It doesn’t examine the post text. Over time, IPs that used to be marked as spam get cleaned up, and when this happens these IPs are no longer flagged, so spam registrations aren’t caught.
  • Its interface for finding these users is slow and can easily time out, which means sometimes it can’t succeed. It also lacks pagination.

Why this particular client ignored this problem for a few years, I don’t know. But cleaning up the database was a big challenge. The only real way to do it is to manually look at every post and flag those that were spam, then use moderator tools to get rid of them.

This was time prohibitive, but if these could be removed presumably people would start posting again and Google would rank the site as legitimate again, bringing in new people.

The next best solution I found was to try to identify when the spam started. This took quite a bit of analysis, but looking in the most posted forum on the board it looked like it started on Feburary 10, 2018. So I used phpBB’s Prune User feature to remove users and their posts that registered after the spam started.

This seems to have gotten rid of the spam. But it also removed accounts of some legitimate users, and their posts as well. Those who had accounts before then were unaffected and their posts remained.

phpBB needs a real solution which so far doesn’t exist.

But I think I have found a solution … if I write the extension. If you are an extension developer, please go ahead and develop it, just tell me so I don’t waste my time.

It turns out that the Akismet, the biggest solution out there and used widely in WordPress to moderate comments, has a Submit Spam API. So in theory, if you pass the needed information to it including the poster’s IP and the post text, it can render a judgment on whether it is spam or not. If these posts can be flagged, they can then be removed.

One possible issue is that the service requires sending it a User Agent string. phpBB does not store this. Perhaps a fake user agent string could be supplied, but would this render a correct judgment? If no, this solution wouldn’t work. Also, it requires an Akismet key to use, which might require some boards to purchase the key. This may be a limiting factor for some.

As I have time I hope to see if this is a viable approach finding and removing spam posts in phpBB.

Using unapproved extensions is dangerous

As more phpBB extensions are developed, they are becoming more popular. Extensions add functionality to phpBB beyond what is available by default. Based on my work with clients, most have extensions installed, so I factor them in when updating or upgrading their forums. They often need to be upgraded as well when a forum is upgraded or updated.

The phpBB Group maintains a database of approved extensions. Both the phpBB Group and me recommend that if you install an extension that you only install extensions downloaded from this database. This is because approved extensions are quality checked by the phpBB Extensions Review Team. The team thoroughly inspects the extension and ensures they adhere to all coding standards as well as use best practices to minimize security issues. An extension typically goes through a number of reviews before it is accepted, if it is accepted. So you can have confidence that if you download an official extension it is of high quality and secure.

If you are not familiar with how to install extensions, the instructions are on the Manage extensions page: ACP > Customise > Manage extensions.

Unapproved extensions fall into two categories:

  • Extensions in development
  • Third-party extensions

Extensions in development

Extensions don’t appear out of nowhere. Like all software, they go through a development process. You can see a list of extensions in development on that forum. The topic title is prefixed by the state of the extension in brackets. Links to the extension for downloading are in the first post. If you have feedback on the extension, you leave it as a post on the topic.

The phpBB group has extension authors self-certify the quality of the extension they are creating. This is similar to other software. The levels from most risky to least risky are:

  • [DEV] – Development – the extension is very recent and is being issued for feedback and to refine features. It should only be used on a test board.
  • [ALPHA] – Alpha – The extension is no longer in development. The feature set is largely set and the code quality has been refined. Traditionally an alpha release has meant that it is to be used “within an organization”. Alpha release testers are expected to provide feedback and significant bugs and security issues may be experienced. “Within the organization” has no meaning with phpBB so it simply indicates it’s out of the principle early development phase. Using it on a live, production board is quite risky and definitely not recommended. A download link is usually provided.
  • [BETA] – Beta – The extension is designed to be used and tested by a larger group of people. There may be significant bugs and security issues. It should not be used on a production forum, but the code quality should be pretty high at this point and most bugs should have been addressed. A download link is required.
  • [RC] – Release candidate – Most of the bugs have been found and fixed. The release candidate could be submitted for formal review for inclusion as an extension if no more issues are discovered as a result of testing. The extension should be stable with no more features anticipated. Using it on a production forum is not recommended, but if you choose to do so anyhow it is likely to work as intended and not show any problems. Release candidates are submitted to the phpBB Group extension review team at the author’s discretion.
  • [CDB] – Customise database. You will see this in the Extensions in development forum. It means that the extension is approved. There should be a link to take you to its official page on phpbb.com. The topic is locked.
  • [ABD] – Abandoned. The extension author abandoned work on the extension. It is not approved, should not be used but some other extension author could take up working on the extension. These are placed in their own abandoned extensions forum.

There may be multiple versions of the extension in each phase. Generally extensions in development start with 0.1 and as an extension reaches Alpha or Beta stage it becomes 1.0. But there is no fixed standard for version numbers other than the PHP Composer guidelines. The extension is usually suffixed by the build quality, ex: 0.1.0-dev. The extension is usually downloaded from GitHub.

Third-party extensions

Third-party extensions are usually developed by commercial companies and typically tie into existing products outside of phpBB. Companies can submit their extensions for review by the phpBB group but usually don’t. This is because a review is time consuming. It can take months to get a review, then multiple issues must be fixed, and the extension resubmitted. This is not agile enough for many companies. In addition, the phpBB Group frowns on software that does not use an open source license. Many third-party extensions are issued with open source licenses but tie into products or services that are not.

When you use one of these extensions, you are assuming significant risk. Obviously, these companies don’t want their reputation besmirched, so generally they will take the time to write a quality extension and possibly adhere to the coding standards for extensions. But since in most cases they aren’t approved extensions, they are risky because they were not reviewed by the phpBB group to ensure their quality. They are typically downloaded from the company’s website or from their GitHub page.

Tapatalk

Tapatalk is a smartphone app that allows you to use the same user interface to access multiple forums, phpBB or otherwise. Prior to phpBB 3.1 the Tapatalk modification was widely used because styles for phpBB were not responsive, i.e. did not resize intelligently for mobile devices. Since phpBB 3.1, approved styles must be responsive, so users can use a browser on their smartphone to access the forum without the hassle of the past. Still, many people like the convenience of using one app to access multiple forums, so Tapatalk developed an extension. It creates an interface between phpBB and the Tapatalk app.

This extension is not approved and likely would never be approved by the phpBB Group. Why? When you install the extension, although an interface is seen in the /ext folder as usual, there is also a /mobiquo folder installed in your forum’s root directory. The software in the /mobiquo folder does most of the work of communicating between phpBB and the Tapatalk app. Tapatalk is available for other forum solutions too, and they use a similar architecture. The /mobiquo folder does all the data munging, so it is unique and proprietary. phpBB’s extension architecture requires that all extensions work within the /ext folder. Since Tapatalk doesn’t do this and its data munging is proprietary, it’s unlikely to ever be approved. It’s clear that Tapatalk developers don’t want to try.

More importantly, Tapatalk injects a major vulnerability in that it can bypass phpBB’s functions that do important work like posting to the database. This makes it dangerous. You should encourage your users to use a mobile browser instead of Tapatalk to access your forum. Ideally, you should disable and uninstall the extension.

Cleantalk

Cleantalk is an antispam service. Cleantalk’s extension for phpBB is approved, but it’s very old (2016 as of this writing). It may be that a newer version of the extension has been submitted for review, but the version on phpBB.com probably won’t work on phpBB 3.2. If it does, it’s missing many features. So as a practical matter, if you use Cleantalk you will want to get its most recent published version off of GitHub. Just bear in mind it’s a version that has not been approved by the phpBB Group, so using it may be risky.

Proprietary style user interfaces

Many proprietary (paid) phpBB styles come with a user interface that makes it easier to customize the style, doing things like changing background colors easily, swapping in different logos, changing fonts, etc. Because these styles are proprietary, they are not free and thus not allowed on the list of approved styles for phpBB. Consequently, extensions bundled with their styles are not approved as well. Using a proprietary style incurs some risk by itself. Using an extension used to manage the style adds additional risk.

Cleantalk extension for phpBB can remove spam posts, plus its spam firewall feature is very useful

This is an update on an earlier post on removing spam posts.

Removing spam posts is hard because it requires actually reading the post and deciding if the post is spam or not and then using moderator tools to remove these posts. If your forum is overwhelmed with spam posts, this is a Herculean endeavor. Ideally though posts could be “read” by software and it would make the judgment on whether it is spam or not.

The Cleantalk extension for phpBB 3.1.x and 3.2.x can do just this as well as lots of other really cool tricks. My customers love Cleantalk, but the service is not free. However, it is so inexpensive that it easily justifies spending $8/year for the service. You can subscribe on the Cleantalk website. As of this writing, you can try it for free for 7 days. After 7 days, it won’t bring down your forum but it will stop working.

What is Cleantalk?

Cleantalk is essentially a huge database of addresses of known spammer sites. While it’s not perfect, based on the experience of my clients it is about 99% perfect. I originally recommended it as a spam registration solution for my clients. It still does that but is less necessary since phpBB 3.2. This is because since phpBB 3.2, version 2 of Google’s reCaptcha is supported. Unless it gets hacked, as long as you have it properly configured as a spambot countermeasure it should prevent virtually all spam registrations.

However, it has two powerful features that still keep it relevant for phpBB forums.

Cleantalk ACP Interface
Cleantalk ACP Interface

Installing and enabling Cleantalk

Cleantalk is installed like any other extension. While it can be downloaded from phpbb.com, you should download it from its GitHub page. This is because as of this writing the version on phpbb.com does not include the spam firewall feature, and you will probably want to enable this feature. You can access it through the Administration Control Panel: ACP > Extensions > Antispam by Cleantalk. Before you can do much with it you have to enter your Cleantalk key which you can get from their website or by pressing the button in the extension that should retrieve it for you.

Removing spam users and spam posts

As you can see from the image, once the extension is enabled and the key is properly configured there is a prominent Check users for spam button on its page within the Administration Control Panel. If you have lots of users, it may hang. Based on my experience though the next time you go into its interface you will see a list of potential spammers.

As I said, it is not perfect. So I recommend that for users with posts to check these out these users topics to make sure their posts are spam before deleting them. For those you want to delete, check the boxes next to their usernames and then press Delete marked. You can also press Delete all to remove all users and their posts. You may have to go through many pages to delete all spam users and their posts, but this is obviously much faster than doing a visual inspection of all your posts.

Spam firewall

This is a new feature which as of this writing is not available if you download the extension from phpbb.com. It keeps almost all spammers from hitting your site at all. Instead, Cleantalk’s servers grab it first. In the event the user is legitimate, there is a link that will take them to your website.

Why is this useful? Because it reduces the stress on your server by limiting it to legitimate traffic only. It speeds up the performance of your forum and makes it less likely that you will have to pay for the cost of a higher class of hosting to handle your traffic. Isn’t that worth $8 a year?

Stopping contact form spam

Cleantalk has one other useful feature: the ability to stop contact form spam. Of course you can disable the contact form (ACP > General > Contact page settings) and that will solve that issue. Or you can have Cleantalk essentially moderate it for you, passing on only valid contact forms to you. Simply check that option on the extension’s page and submit the form. Somewhat oddly, the phpBB group did not tie the contact form to the spambot countermeasure feature of phpBB. Perhaps that will come in a future release.

In any event for forums that get lots of spam and/or lots of traffic, using the Cleantalk service with the Cleantalk extension for phpBB is a no-brainer providing you know about it. Now you do!