Combating Comment Spam in WordPress

| Posted in General Techologies

I share the same pain as Paul Chaney. Ever since I started my blog on 10/1, I have been getting a ton comment spams.

In the beginning I just turned on comment approval in WordPress so no spam is posted, however, I was getting so much spam it became a huge hassle to go through them.

Since I run my own server, I figure I might just be able to block the IPs the spammers are coming from. A little analysis of the Apache access log showed that the spammers were spoofing all kinds of IPs, so IP blocking is out.

However, further investigation showed that the spammers were using the same set of Name and E-mail (required fields for my blog) when posting spam. Knowing that, I’ve decided to do a bit of code modification to WordPress to combat these spammers. (Another advantage of running your own web server is the ability to modify the code.)

Here’s the code snippet that I added to wp-comments-post.php to block known spammers:

if ($email == 'lilo@suddenenlightenment.us' || $email == 'spammers@yousuck.org')
        die( __('Sorry, comments are closed for this item.') );

Obviously you can be a bit more creative when it comes to the message you send to spammers.

Since the modification, I have gotten very few spams.

Adam Kalsey has written a Comment Spam Manifesto that’s worth reading.

October 31st, 2004 | Jian Zhen | No Comments

Data Quality Discipline

| Posted in General Techologies

Intelligent Enterprise‘s latest issue has an article on Data Quality Discipline.

It talked about

how data quality and extract, transform and load (ETL) are tied together by describing the transofrmation explosion and analytics thrashing that occurs with poorly understood source data.

There are many issues that the BI community encounters during the ETL of data and these are the exact issues that the log analysis community encounters as well. They are

  • Multiple meanings for the same data element
  • Multiple sources of data elements
  • Differing levels of history
  • Data cleanliness and accuracy
  • Deintegration for audits and validation

I was talking to a PS guy from a SIM vendor the other day and he mentioned that one of the biggest problem all SIM vendors have is the understanding of the the logs.

Logs come in at all different sizes, formats and syntax. In order to truly understand the logs, one has to perform extensive research in order to understand them. A lot of the log formats are not publicized, so it makes it even more difficult to find out what the logs contain.

Sometimes logs have multiple fields that are similar. For example, in Windows events, there could be up to 3 different user or account name fields. What do each of them mean? How do you distinguish them?

Understanding of the logs has a direct correlation to the quality of the reports. Without knowing what the logs provide, the reports will be meaningless. Without truly understanding the meaning of the different log fields, it’s impossible to correlate the different log sources to identify anomalies.

Log analysis vendors need to have at least a data architect whose sole responsibility is to understand the various log sources, dissect the log fields and generate meaningful reports. If one can afford it, these tasks should be performed an analytics team of 2 to 3 people.

When evaluating a log analysis product or vendor, this is definitely a question to ask. A vendor without a data architect or analytics team will not be able to provide the domain expertise for the implementation to succeed.

October 31st, 2004 | Jian Zhen | No Comments

Why Hasn’t the Buyout Begun?

| Posted in General Techologies

So I got a question for everyone. Why hasn’t the SIM or log analysis market consolidated?

The SIM market is about 5 years old now. There are many players in this field, both pure SIM players and players expanding into the SIM space.

Some of the pure players include

Other non-pure players that are either getting into or already in the SIM space include

I was expecting the wave of buyouts to begin when Symantec acquired the 3 companies, but nothing has happened.

I can think of a couple reasons

  1. SIM vendors haven’t proved their value. There’s a lot of good technologies out there, but most of them are very high priced. I think the SIM vendors have a tough time justifying the ROI.
  2. Most SIM vendors have gotten several rounds of funding now, probably anywhere from $15 to $60 million. Most companies don’t want to spend a whole lot of money buying these vendors. Symantec bought Mountain Wave for $20 million, Riptech for $145 million and Recourse for $135. Both Riptech and Recourse brought more than just log analysis products.

What do you think? I would love to hear your thoughts on this issue.

October 30th, 2004 | Jian Zhen | 2 Comments

Incident Management Life Cycle

| Posted in General Techologies

Everyone loves to throw the term “life cycle” around like it actually means something, so I figure I will join the crowd and get one of my own.

Today we will discuss the life cycle of managing an incident. Here’s my take on this:

Definition

  • Define the incident in terms of rules or queries

Detection

  • Detect the occurrences of incidents based on the definition, either real-time or historical analysis
  • Correlate multiple incidents to identify policy violations

Alert/Act

  • Alert the appropriate personnel based on priorities and pre-defined alerting mechanisms
  • Sometimes there’s some preliminary action taken to mitigate the attack, then further investigation will be performed.

Classification

  • Properly prioritizing and categorizing the incident for accurate escalation

Investigation

  • Investigate the incidents to perform assessment and root cause analysis

Resolution

  • Resolve or respond to incidents in order to minimize adverse impact.
  • Contain, eradicate and recover

Report/Audit

  • Report on events and incidents for trending/planning
  • Audit reports to identify anomalies or incidents missed in normal process
October 29th, 2004 | Jian Zhen | No Comments

MySQL 4.1 Production Ready

| Posted in General Techologies

From OSNews,

MySQL announced the general availability of MySQL 4.1. Certified by the company as production-ready for large-scale enterprise deployment, this significant upgrade to the MySQL database server features advanced querying capabilities through subqueries, faster and more secure client-server communication, new installation and configuration tools, and support for international character sets and geographic data.

MySQL is one of the most used databases when administrators are building their own log analysis infrastructure. Back at Cable & Wireless America, I built a reporting infrastructure from scratch using MySQL as the backend. The database ran on a E420 Sun server with 4 procs, 4 GB memory and two 300GB T3 arrays (fiber channel SAN).

This setup was sufficient for supporting several hundred customers and generating daily, weekly and monthly reports. It also provided real-time raw log view for the customers.

MySQL (3.23 when I first built it) was stable, extremely fast in both insertion rate and query speed. It was used heavily with loads into the database every 5 minutes (real-time analysis were done outside of the DB). We were receiving ~2000 events per second after fine tuning the firewall and IDS’s logging. It can run for months without a single issue.

If you are considering building your own logging infrastructure and don’t want to pay SIM vendors a large chunk of cash, and have the in house expertise to create the reports and analysis engine, I say definitely give MySQL a try.

Another must-have tool for building your own logging infrastructure is Syslog-NG, which we will discuss in more details later.

October 28th, 2004 | Jian Zhen | No Comments

Forrester’s 2004 Security Event Management Series

| Posted in General Techologies

Forrester Research recently came out with a series of Scorecard Summaries on SIM products. The products reviewed are: – ArcSight 2.5 – Symantec Incident Manager 3.0 – Consul’s InSight Security Manager 5.0 – Network Intelligence’s Engine Running enVision v.2.003 – GuardedNet’s neuSECURE 2.0 – netForensics 3.1.1 Forrester evaluated the products based on six different attributes: [...]

More...
October 27th, 2004 | Jian Zhen | No Comments

iPod Photo for Logging?

| Posted in General Techologies

You are probably asking, what the heck does the new iPod Photo have anything to do with log analysis? The truth is, it doesn’t. I just want one! Besides, I can carry all of the log analysis tools on it just in case I need it when I listen to my music and scan through [...]

More...
October 26th, 2004 | Jian Zhen | No Comments

Open Source Security Information Management (OSSIM)

| Posted in General Techologies

Open Source Security Information Management announced the availability of 0.9.7 today. We’re proud to announce the availability of ossim 0.9.7. This release fixes numerous bugs present in rc1 and rc2 and provides two major feature enhancements: optional database configuration replacing ossim.conf and pdf reporting using FPDF.

More...
October 25th, 2004 | Jian Zhen | No Comments

Five mistakes of log analysis

| Posted in General Techologies

Anton Chuvakin has written an interesting article on the mistaks of log analysis. It’s a great starter for some of the things to avoid when you are building or evaluating your log analysis infrastructure. However, I wish Anton had been more in-depth with some of the topics. For example, what are the regulatory pressures organizations [...]

More...
October 21st, 2004 | Jian Zhen | No Comments

Schneier on SIMS

| Posted in General Techologies

Bruce Schneier has written a blog on his view of SIMS. I agree mostly with Schneier’s view on the current SIM space. I agree that log analysis can provide a gold mine of information to IT groups. I also agree that log analysis works, regardless whether or not you use a SIM product. As Schneier [...]

More...
October 20th, 2004 | Jian Zhen | No Comments