Joho the Blog
|
|
|
May 18, 2004
NOTE: I updated this script on June 2 so that it now pulls out all (?) of the URLs embedded in the selected spams. The code listed below is updated, but not my comment about it not doing what I just updated it to do. So to speak. (The version without the wordwrap issues has also been updated. But what you really want is this pure text version, suitable for copying and pasting.) I just received 200 comment spams. They each listed a different URL and came from a different IP address,which means the invaluable MT-Blacklist (thank you, Jay Allen) has to be told to delete each one, one at a time. Instead, I cobbled together an Outlook script — yes, I use OL on my desktop machine, although I've been happy with Thunderbird on my laptop — that looks through the messages you have highlighted in your inbox and builds a list of the URLs that people listed in the URL field of the comment. You then paste these into the text box in MT-Blacklist's "Add" tab. (It also shows a list of the IP addresses, although I don't know why I bothered.) Here are just some of the caveats you need to take very seriously: I am fumbling around in the dark when it comes to VBA for Outlook. And, there's almost no (= NO) error checking in this little program, so you could end up banning your mother; you must carefully inspect each of the URLs to make sure you really want to delete the comment that contains it. Further, I don't really understand how MT-Blacklist works. And there are probably some bad line wraps in the code below which will totally break it. Finally, this does NOT find any of the URLs in the body of the message because that's too hard. Well, finding the beginning of the urls isn't hard, but figuring out when they end is. So with that warning (WARNING: read the warning!), here's the script: Sub FindURLStoBAN()
' walks through selected
'files to find bad urls
Dim objApp As Application
Dim objSelection As Selection
Dim objItem As Object
Dim ipstr As String
Dim urlstr As String
Dim ips As String
Dim us As String
Set objApp = CreateObject("Outlook.Application")
' get the selected msgs
Set objSel = objApp.ActiveExplorer.Selection
x = 0
For Each objItem In objSel
If objItem.Class = 43 Then ' 43=mailitem
msgtxt = objItem.Body ' get msg text
' Is this msg from mt-blacklist?
p = InStr(msgtxt, "MT-Blacklist")
If p > 0 Then ' yes it is
' get the ip to ban
p1 = InStr(msgtxt, "IP Address:")
p1 = p1 + 12
p2 = InStr(p1, msgtxt, vbCr)
ips = Mid(msgtxt, p1, p2 - p1)
ipstr = ipstr & vbCr & vbLf & ips
' get the url listed for the name
p1 = InStr(msgtxt, "URL: ") + 5
p2 = InStr(p1, msgtxt, vbCr)
us = Mid(msgtxt, p1, p2 - p1)
urlstr = urlstr & vbCr & vbLf & us
' ----'Get urls in the text
udone = False: prevp = 1
'u ppercase it because I'm lazy
msgtxt = UCase(msgtxt)
While Not udone
u = ""
' get next a href
p1 = InStr(prevp, msgtxt, "<A HREF=")
' get end of href
p3 = InStr(p1 + 1, msgtxt, "">")
' find end of href
If p1 > 0 And p3 > 0 Then
' get /a
p2 = InStr(p1 + 1, msgtxt, "</A">")
' if it has an end /a
If p2 > 0 Then
' extract the string
u = Mid(msgtxt, p1 + 9, (p3 - (p1 + 11)))
' note where it ended for next loop
prevp = p2
' is it already in the string?
If InStr(1, urlstr, u) = 0 Then
urlstr = urlstr & vbCr & vbLf & u
End If
End If
End If
' are we out of links?
if p1 = 0 Then udone = True
Wend
End If ' if p > 0 msg from mtblacklist
x = x + 1
End If
Next
' Fill the two textboxes
mtblacklistfrm.iptxt.Text = ipstr
mtblacklistfrm.urltxt.Text = urlstr
mtblacklistfrm.Show
Set objItem = Nothing
End Sub
(Here's a version that shouldn't have word-wrap problems.) To make this work, you have to create a form called mtblacklistfrm and stick into it a text box that you name iptxt and one that you name urltxt. Set the text boxes' scroll bars to on and make sure that they're set to multiline. If you don't know how to stick a script like this into OL, then you shouldn't. If you do, then you could have done this better yourself. Warning: Do not trust this script! It undoubtedly is embarrassingly wrong and dangerous. Have pity on me. I'm a humanities major. Thank you. Posted
by D. Weinberger at May 18, 2004 11:09 PM
|
Comments
Could you be so kind as to add all of those IPs to Jay's Blacklist Clearinghouse?
http://www.jayallen.org/comment_spam/submit
Thanks!
Posted by: timsamoff | May 19, 2004 11:09 AM
We're testing a very simple/stupid anti-spam mechanism: add another checkbox to the "Post a comment" form that states: "I am not a bot" (or less parsable variant thereof" and don't post anything that doesn't have the box checked. Maybe I'm being stupid spreading this among MT users, but most bots are too naive to overcome that very simple step.
Posted by: Gene Koo | May 25, 2004 06:19 PM
Having just spent *hours* cleaning up the blacklist over at corante.com, let me add a caveat to all of this.
There are two problems with blindly grabbing the URLs from the messages and adding them.
The first is the proliferation of "throwaway" subdomains in order to make blacklisting harder. Thus, you might get 100+ versions of offensive-string-of-text.baddomain.com -- all the blacklist needs is "baddomain.com", and if you add the subdomains, not only will you clog up the blacklist, you'll still get spam from offensive-string-of-text2.baddomain.com.
The second is the "poison pill" issue--I've had to remove a bunch of "good" URLs and text strings from the blacklist because it was preventing people from posting legitimate comments. "yahoo.com" for example, which blocked everyone with a yahoo mail account.
Posted by: Liz Lawley | June 2, 2004 11:24 AM
Liz, how serious is the clogging problem? I know theoretically it's good to keep the list shorter rather than longer, but after about a year, I have about 4,500 entries on my list. Do you have a sense of how many names it takes to truly slow the system down?
As for the poison pills: Yup yup yup! My script creates an editable display of the urls partially for that reason. Thanks for the warning!
Posted by: David Weinberger | June 2, 2004 12:30 PM
Hallo friends! Really nice place here. I found a lot of interesting stuff all around. Just what I was looking for. Great joy!
Posted by: Josi Denise | September 21, 2004 04:51 AM
In response to the earlier comment regarding throwaway subdomains. A better approach is to resolve the IP address from the various websites, all sub-domains would have the same ip-address. In the case of large organisations like Geocities / Yahoo, they may use server farms, and thus different ip addressses.
I've an article on resolving DNS names at my website
Posted by: Joe Mc Laughlin | November 6, 2004 11:25 AM
its just more coding and more memory, and more stupid irrating problems
Posted by: John Yajer | June 6, 2005 05:13 PM
Thank you very much for the link that helped a TON.
Posted by: tucex | January 23, 2006 10:37 PM
My main concern is that you can't guarantee every page of your website will be included in the SERPs. Considering I'm constantly adding new products to my company's website, I need to be sure that customers can find them as soon as possible.http://www.seoptimizerz.com
Posted by: SEO | July 23, 2007 09:42 AM