Recently I seem to be running into a little bit of a spamming problem with backlink's being submitted to the blog. It looks like the way .net blog engine does post is really easy for the spammer to be able to post lots and lots of comments with bots (I guess the spammers are getting good at this now). This sounds quite bad but everything is being caught by the spam filters so its really not so bad. So this is a bit of a guide to attempt to protect a website from such action coming from abusive computers around the internet.
Firstly here is a little background on the situation. Almost over night the amount of spam on the sites went from around 2-5 messages a week to around 500+ in a single day. This happened over a period of 2 days. So I needed a quick solution to attempt to put the bot's off a bit until I come up with a more complete solution. So this is a post handling only the basic "emergency" solution as a fast way to help protect a site. The first thing that comes to my mind is to block by ip address. It is quick and simple to do. It is also a solution that can be plugged into multiple websites very quickly.
The thing here to do first is to try to get a feel for how successful this is actually going to be. The .net blog engine setup I am using is based on MS SQL so it should be straight forward to access the data. So let's run some sql.
SELECT COUNT(*) FROM be_PostComment WHERE IsSpam = 1 SELECT Ip, COUNT(*) AS Total FROM be_PostComment WHERE IsSpam = 1 GROUP BY Ip ORDER BY COUNT(*) DESC
The first query is going to give me the total number of spam comments. The 2nd the total number posted by ip address. So I have ~400 as the first results and in the 2nd set of results i have:
- 171 - 109.230.222.51
- 113 - 78.8.33.105
- 10 - 58.150.182.76
- etc..
It drops off quite sharply after than to 2's and 1's. Though something I did notice further down is a run of ip address's in the same range. 67.212.185.80-67.212.185.92 So I think that this is somebody attempting to do some sort of SEO by posting backlinks. So from what I was able to tell was by instantly blocking by ip address its going to reject around 70-80% of the spam before even accessing the site. Since the same bot's on the machines are just attempting to repost the same old crap.
The next task is to actually do the blocking. Since we know its going to work (well kind of). I know that this isn't a solution but it took me less than an hour to implement and role out. The easy way for to block things in an asp.net application is to create an HttpModule based on the interface IHttpModule. I came up with the following basic's of what I wanted to do.
- Block Sites by IP (Kind of obvious this)
- Be able to feed data to the block list
- Limited caching so as not to use lots of resources per web request.
public class AddressCheck : IHttpModule { private static Dictionary<string, CachedIP> IPCache = new Dictionary<string, CachedIP>(); public void Init(HttpApplication App) { App.BeginRequest += new EventHandler(App_BeginRequest); } public void App_BeginRequest(object sender, EventArgs e) { HttpResponse Response = HttpContext.Current.Response; try { string CurrentIP = HttpContext.Current.Request.UserHostAddress; if (IPCache.ContainsKey(CurrentIP)) { if (IPCache[CurrentIP].Expires < DateTime.Now) { IPCache.Remove(CurrentIP); CheckAddress(CurrentIP); } } else { CheckAddress(CurrentIP); } if (IPCache[CurrentIP].Block == true) { SqlCommand SqlUpdateCount = new SqlCommand("UPDATE IPBlockList SET BlockCount = BlockCount + 1, LastBlocked = GETDATE() WHERE IPAddress = @IP"); SqlUpdateCount.Parameters.AddWithValue("@IP", CurrentIP); Conn.Execute(SqlUpdateCount); Response.Clear(); Response.Write("Sorry, You Are Banned From This Site!"); Response.End(); } } catch (Exception ) { Conn = null; /* Sometimes we can loose a database connection */ } } public void Dispose() { Conn.Dispose(); Conn = null; } private static bool CheckAddress(string CurrentIP) { SqlCommand SqlCheck = new SqlCommand("SELECT IPBlockListID FROM IPBlockList WHERE IPAddress = @IP"); SqlCheck.Parameters.AddWithValue("@IP", CurrentIP); object tmp = Conn.ExecuteScalar(SqlCheck); if (tmp != null) { IPCache[CurrentIP] = new CachedIP(CurrentIP, true); return true; } IPCache[CurrentIP] = new CachedIP(CurrentIP, false); return false; } private static DBConn _Conn = null; private static DBConn Conn { get { if (_Conn == null) _Conn = new DBConn(ConfigurationManager.ConnectionStrings["SqlServer"].ConnectionString); return _Conn; } set { _Conn = value; } } internal class CachedIP { public string IP = null; public DateTime Expires = DateTime.Now.AddMinutes(15); public bool Block = false; public CachedIP(string IP, bool Block) { this.IP = IP; this.Block = Block; } } }
It is really easy to setup. You just compile it in the class lib. and then drop it into the application's bin directory on the server. You will also need to add the module to the web config and the sql connection string. The connection string is hard coded in the module (this isn't an ideal thing). It is not that hard to separate them out into a configsection for the class lib if it is required (another tutorial sometime). So the module above is going to be accessing a data table of the following structure inside sql server.
CREATE TABLE [dbo].[IPBlockList]( [IPBlockListID] [uniqueidentifier] NOT NULL, [IPAddress] [nvarchar](64) NOT NULL, [BlockCount] [int] NOT NULL, [CreatedOn] [datetime] NOT NULL, [LastBlocked] [datetime] NOT NULL, CONSTRAINT [PK_IPBlockList] PRIMARY KEY CLUSTERED ( [IPBlockListID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[IPBlockList] ADD CONSTRAINT [DF_IPBlockList_IPBlockListID] DEFAULT (newid()) FOR [IPBlockListID] GO ALTER TABLE [dbo].[IPBlockList] ADD CONSTRAINT [DF_IPBlockList_BlockCount] DEFAULT ((0)) FOR [BlockCount] GO ALTER TABLE [dbo].[IPBlockList] ADD CONSTRAINT [DF_IPBlockList_CreatedOn] DEFAULT (getdate()) FOR [CreatedOn] GO ALTER TABLE [dbo].[IPBlockList] ADD CONSTRAINT [DF_IPBlockList_LastBlocked] DEFAULT (getdate()) FOR [LastBlocked] GO
So the only thing left that is actually needed is some data in the list. Since we already know who is attacking the site this is the really simple way to bulk add all the spammer's ip address in one shot!
INSERT INTO IPBlockList (IPAddress) SELECT DISTINCT Ip FROM Stev.dbo.be_PostComment WHERE IsSpam = 1 AND NOT EXISTS(SELECT IPAddress FROM IPBlockList WHERE Ip = IPAddress)
At this point the system is now working. It will activley be blocking connections and displaying our nice friendly error message (if a human should ever see it). At this point I also made a choice to create a stored procedure and to have it run at a reasonable period eg hourly.
However something to be aware of here is falsely detected spam. After all we don't want to be banning our own users. Since this is a blog and most people won't post lots and lots. Something to do would be to build a select statement which make's sure the ip address has posted multiple times. Or multiple times with multiple emails address's / website address's. This makes it really easy to detect a spammer.
The other additional note that I will add here is that since the spammer keeps banging on and posting links to the sites. We can use this to our advantage by building a few more complex query's that will detect multiple posts in a short time range and then also delete the messages. Some examples:
The extended method:
Any website url that has been posted more than 3 times by at least 3 ip addresses we use this to select all the posts that contain either the url or the ip addresses in question. Block all ip addresses where the url was used. Then see what else the ip addresses posts and start blocking on that etc.. etc.. This gets to the point where we can automatically delete around 99% of the detected spam. Leave our false spam in the bucket for normally manual processing and generate an automatic blacklist which prevents more spam coming it. I will probably do a follow up post on this once I have come up with a complete rule set.
The other extension:
We can even automatically unblock ip addresses after a period of time and simply re-block based on the website / backlink being submitted should they show up again.
Goodbye backlink SPAM!