Archive for October, 2004

Combating Comment Spam in WordPress

Sunday, October 31st, 2004

I share the same pain as Paul Chaney. Ever since I started my blog on 10/1, I have been getting a ton comment spams.

In the beginning I just turned on comment approval in WordPress so no spam is posted, however, I was getting so much spam it became a huge hassle to go through them.

Since I run my own server, I figure I might just be able to block the IPs the spammers are coming from. A little analysis of the Apache access log showed that the spammers were spoofing all kinds of IPs, so IP blocking is out.

However, further investigation showed that the spammers were using the same set of Name and E-mail (required fields for my blog) when posting spam. Knowing that, I’ve decided to do a bit of code modification to WordPress to combat these spammers. (Another advantage of running your own web server is the ability to modify the code.)

Here’s the code snippet that I added to wp-comments-post.php to block known spammers:

if ($email == 'lilo@suddenenlightenment.us' || $email == 'spammers@yousuck.org')
        die( __('Sorry, comments are closed for this item.') );

Obviously you can be a bit more creative when it comes to the message you send to spammers.

Since the modification, I have gotten very few spams.

Adam Kalsey has written a Comment Spam Manifesto that’s worth reading.

Data Quality Discipline

Sunday, October 31st, 2004

Intelligent Enterprise’s latest issue has an article on Data Quality Discipline.

It talked about

how data quality and extract, transform and load (ETL) are tied together by describing the transofrmation explosion and analytics thrashing that occurs with poorly understood source data.

There are many issues that the BI community encounters during the ETL of data and these are the exact issues that the log analysis community encounters as well. They are

  • Multiple meanings for the same data element
  • Multiple sources of data elements
  • Differing levels of history
  • Data cleanliness and accuracy
  • Deintegration for audits and validation

I was talking to a PS guy from a SIM vendor the other day and he mentioned that one of the biggest problem all SIM vendors have is the understanding of the the logs.

Logs come in at all different sizes, formats and syntax. In order to truly understand the logs, one has to perform extensive research in order to understand them. A lot of the log formats are not publicized, so it makes it even more difficult to find out what the logs contain.

Sometimes logs have multiple fields that are similar. For example, in Windows events, there could be up to 3 different user or account name fields. What do each of them mean? How do you distinguish them?

Understanding of the logs has a direct correlation to the quality of the reports. Without knowing what the logs provide, the reports will be meaningless. Without truly understanding the meaning of the different log fields, it’s impossible to correlate the different log sources to identify anomalies.

Log analysis vendors need to have at least a data architect whose sole responsibility is to understand the various log sources, dissect the log fields and generate meaningful reports. If one can afford it, these tasks should be performed an analytics team of 2 to 3 people.

When evaluating a log analysis product or vendor, this is definitely a question to ask. A vendor without a data architect or analytics team will not be able to provide the domain expertise for the implementation to succeed.

Why Hasn’t the Buyout Begun?

Saturday, October 30th, 2004

So I got a question for everyone. Why hasn’t the SIM or log analysis market consolidated?

The SIM market is about 5 years old now. There are many players in this field, both pure SIM players and players expanding into the SIM space.

Some of the pure players include

Other non-pure players that are either getting into or already in the SIM space include

I was expecting the wave of buyouts to begin when Symantec acquired the 3 companies, but nothing has happened.

I can think of a couple reasons

  1. SIM vendors haven’t proved their value. There’s a lot of good technologies out there, but most of them are very high priced. I think the SIM vendors have a tough time justifying the ROI.
  2. Most SIM vendors have gotten several rounds of funding now, probably anywhere from $15 to $60 million. Most companies don’t want to spend a whole lot of money buying these vendors. Symantec bought Mountain Wave for $20 million, Riptech for $145 million and Recourse for $135. Both Riptech and Recourse brought more than just log analysis products.

What do you think? I would love to hear your thoughts on this issue.

Incident Management Life Cycle

Friday, October 29th, 2004

Everyone loves to throw the term “life cycle” around like it actually means something, so I figure I will join the crowd and get one of my own.

Today we will discuss the life cycle of managing an incident. Here’s my take on this:

Definition

  • Define the incident in terms of rules or queries

Detection

  • Detect the occurrences of incidents based on the definition, either real-time or historical analysis
  • Correlate multiple incidents to identify policy violations

Alert/Act

  • Alert the appropriate personnel based on priorities and pre-defined alerting mechanisms
  • Sometimes there’s some preliminary action taken to mitigate the attack, then further investigation will be performed.

Classification

  • Properly prioritizing and categorizing the incident for accurate escalation

Investigation

  • Investigate the incidents to perform assessment and root cause analysis

Resolution

  • Resolve or respond to incidents in order to minimize adverse impact.
  • Contain, eradicate and recover

Report/Audit

  • Report on events and incidents for trending/planning
  • Audit reports to identify anomalies or incidents missed in normal process

MySQL 4.1 Production Ready

Thursday, October 28th, 2004

From OSNews,

MySQL announced the general availability of MySQL 4.1. Certified by the company as production-ready for large-scale enterprise deployment, this significant upgrade to the MySQL database server features advanced querying capabilities through subqueries, faster and more secure client-server communication, new installation and configuration tools, and support for international character sets and geographic data.

MySQL is one of the most used databases when administrators are building their own log analysis infrastructure. Back at Cable & Wireless America, I built a reporting infrastructure from scratch using MySQL as the backend. The database ran on a E420 Sun server with 4 procs, 4 GB memory and two 300GB T3 arrays (fiber channel SAN).

This setup was sufficient for supporting several hundred customers and generating daily, weekly and monthly reports. It also provided real-time raw log view for the customers.

MySQL (3.23 when I first built it) was stable, extremely fast in both insertion rate and query speed. It was used heavily with loads into the database every 5 minutes (real-time analysis were done outside of the DB). We were receiving ~2000 events per second after fine tuning the firewall and IDS’s logging. It can run for months without a single issue.

If you are considering building your own logging infrastructure and don’t want to pay SIM vendors a large chunk of cash, and have the in house expertise to create the reports and analysis engine, I say definitely give MySQL a try.

Another must-have tool for building your own logging infrastructure is Syslog-NG, which we will discuss in more details later.

Forrester’s 2004 Security Event Management Series

Wednesday, October 27th, 2004

Forrester Research recently came out with a series of Scorecard Summaries on SIM products.

The products reviewed are:

- ArcSight 2.5
- Symantec Incident Manager 3.0
- Consul’s InSight Security Manager 5.0
- Network Intelligence’s Engine Running enVision v.2.003
- GuardedNet’s neuSECURE 2.0
- netForensics 3.1.1

Forrester evaluated the products based on six different attributes:

- Architecture and Integration
- Reliability and Scalability
- Configuration and Flexibility
- Administrator and Reporting
- Market Presence
- Cost

These criteria are very similar to what we have talked about previously for the BI product evaluation.

Unfortunately, unless you have paid Forrester a ton of cash, you will not be reading them. But these are useful if you are considering a log solution and your marketing department happens to have paid Forrester.

By the way, Forrester has a RSS feed for all of their research summaries if you are interested. I think it’s a great way to get updated about their latest research.

iPod Photo for Logging?

Tuesday, October 26th, 2004

pic

You are probably asking, what the heck does the new iPod Photo have anything to do with log analysis?

The truth is, it doesn’t. I just want one! Besides, I can carry all of the log analysis tools on it just in case I need it when I listen to my music and scan through my photos. Who knows, I might want to do some analysis on which songs I listen to most or which photo I view the most.

Btw, does iPod have logs?

Open Source Security Information Management (OSSIM)

Monday, October 25th, 2004

Open Source Security Information Management announced the availability of 0.9.7 today.

We’re proud to announce the availability of ossim 0.9.7. This release fixes numerous bugs present in rc1 and rc2 and provides two major feature enhancements: optional database configuration replacing ossim.conf and pdf reporting using FPDF.

Five mistakes of log analysis

Thursday, October 21st, 2004

Anton Chuvakin has written an interesting article on the mistaks of log analysis.

It’s a great starter for some of the things to avoid when you are building or evaluating your log analysis infrastructure. However, I wish Anton had been more in-depth with some of the topics. For example, what are the regulatory pressures organizations are facing.

Also, Anton has written this from a security perspective. As I wrote previously, security intelligence is only a third of the log story. We can extract a lot more value from logs than just security.

I do realize that the SIM space is created based on the security issues that kept popping up, however, I believe the SIM space is limited and will need to provide a lot more operational intelligence in order to justify the cost.

Schneier on SIMS

Wednesday, October 20th, 2004

Bruce Schneier has written a blog on his view of SIMS.

I agree mostly with Schneier’s view on the current SIM space. I agree that log analysis can provide a gold mine of information to IT groups. I also agree that log analysis works, regardless whether or not you use a SIM product. As Schneier said in the blog, there’s also a lot of hype in the current SIM space.

However, The SIM space is only 4-5 years old, much of the technologies and ideas are still being developed. Faster and more intelligent solutions are still to come. Even though I agree that having human intelligence behind a SIM solution (product or service) is better than an all-technology solution, I have to disagree with Schneier in that people is the only solution. I also disagree that SIM products don’t help enhancing the security of the network.

The best solution is a combination of people and technology.

Much of what’s in the SIM products these days are based on human intelligence. Human intelligence remains inside a human brain mainly because there’s no easy way to express the process/steps/procedures into algorithms.

People are better at identifying patterns that have not been seen before. But once the pattern is identified and can be expressed in computer’s language, the computer can do a much faster job of identifying the same patterns. As computers are programmed to detect more and more existing patterns, the people are freed to identify new patterns. This is a process (cycle) in which human and machine can work together and provide the best solution.

The SIM products enables this process. SIM products provide tools for the people to express patterns to the computer and take over the job of detecting those patterns. The SIM products also enables people in identifying new patterns quicker. For example, a graph of past day’s events can show spikes in usage, or other anomalies. If a human were to just go through the logs, she may have missed some of events.

SIM products have their places, we just need to use them correctly and not expect that by installing a SIM product, all your problems are solved.

MarketingProfs.com: Top 10 Web Analytics Problems

Tuesday, October 19th, 2004

Jim Sterne from MarketingProfs.com has written a very interesting article on the problems organizations have encountered in the world of web analytics.

In web analytics, data come from many different sources including content side, application side, e-commerce side. A lot of the data gathered from the various sources are actually logs!

Web analytics applications, among various other features, has a built-in specialized log analysis tool that can parse through the logs and extract the wealth of information that’s hidden in them.

Many of the problems described in this article are similar to the field of log analysis. Some of the problems are

  • Data integration problems - imagine the number of different log types you have to integrate in log analysis. Most SIM products claim to support 100’s of log sources.
  • Looking for solution but found only tools - How true is this? Most SIM vendors are still in the world of selling technology instead of solutions!!
  • Not sure what to look for - again, this is one of the biggest reasons why organizations don’t have a log analysis strategy.

    We can learn a lot from our predecessors.

  • Event vs. Incident

    Thursday, October 14th, 2004

    An event is an observable occurrence in an information system that actually happened at some point in time.

    • A TCP/IP connection
    • An email
    • A user login

    An incident is an adverse event in an information system - includes the significant threat of an adverse event.

    • Implies harm or attempt to harm
    • An attempt to gain unauthorized access
    • Unwanted denial-of-service
    • Changes without owner’s knowledge, instruction, or consent

    Policy Integration

    Wednesday, October 13th, 2004

    One of the more interesting features that SIM vendors have been adding is the integration of policies into their products.

    Most of the SIM vendors have been integrating technical policies into the product to provide rapid response to network attacks. For example, the SIM product detects an attack and sends a policy update request to a device or application in order to block or mitigate the risk. This all happens in real time.

    A recent integration example is between Guardednet and Solsoft.

    I think there’s more that we can do with the integration. A Quantitative Study of Firewall Configuration Errors, which I found via a blog by Martin McKeay, showed that the more complex the rule set, the more errors there are.

    One of the things we can do is utilize the log data generated by the devices to verify that the technical policies don’t have any security issues. For example, if a firewall policy accidentally (or someone deliberately) allowed incoming telnet connections, upon detecting such connection within the logs, a SIM product can identify the device that has the problem make recommendations to correct the error.

    Taking an extra step, a SIM product can even integrate with a business policy product to further verify the technical policies conform to the business policies.

    In order to provide additional operational intelligence, log analysis products need to expand their capabilities. Policy integration is just one step towards that goal.

    Nari Kannan: Why is Business Reporting So frustrating?

    Monday, October 11th, 2004

    Nari Kannan from Ajira Technologies has written an interesting piece on the frustration of business users when it comes to reporting.

    I can’t agree more and I think the operational users share the same frustrations.

    First, in order to extract value out of the log ocean requires the operational users to find out whether the data they are looking for are even in the logs. One example is that many times the system administrators are looking for what files the users have touched on their system. Well, guess what, that information isn’t available on most standard operating systems!! There are exceptions obviously. Windows, when full audit is turned on, gives the administrators some information. And you can install additional tools on UNIX/Linux systems to gather that type of information. But most of the time, that information is just not there!

    Secondly, assuming that the data is available, most log analysis products don’t provide the right reports for the administrators. Most of the time these products provide you a framework in which you can build your reports, but not the reports themselves. Some vendors include some basic reports, but most of the time they are just not enough.

    Lastly, the report building interface in most products are on one of the two extremes. Either they are too user friendly and don’t give the administrator the flexibility to perform complex queries, or they are too complicated to use and require the users to know how to program. Neither are sufficient for organizations that are short staffed.

    So vendors, please listen to your customers and give them the solution they need, not just the technologies for them to build their own.

    SOX Summary

    Sunday, October 10th, 2004

    One of the most talked about regulations these days is obviously Sarbanes-Oxley. Below is a quick summary of Section 302 and 404. Remember, if you are a CEO or CFO, don’t screw up. Otherwise you will be fined up to $1 million and/or up to 10 years in jail.

    Section 302

  • Proper documentation and disclosure of the controls and procedures
  • certifying officers (financial officers)
  • authorized, complete and accurate

    Section 404

  • requires the mgmt of the public companies to assess the effectiveness of the organization’s internal control over financial reporting
  • requires annual review and assessment of the effectiveness of the internal controls
  • requires a company’s independent auditor to attest to mgmt’s assessment of its internal control over financial reporting
  • internal controls
    • records are logged in reasonable details, accurate and reflect the transactions
    • assurance that transactions are being recorded
    • assurance that prevention or timely detection of unauthorized acquisition, use of disposition of the assets that could have a material effect on the financial statements
  • ated that an ineffective control environment should be regarded as atleast a significate deficiency and as a strong indicator that a material weakness in inernal control over financial reporting exists.
  • the IT control environment includes the IT governance process, monitoring and reporting. The IT governance process includes the information systems strategic plan, the IT risk management process, compliance and regulatory management, IT policies, procedures and standards. Monitoring and reporting are required to ensure IT is aligned with business requirements.
  • Building a strong internal control program within IT can help to:
    • enhance overall IT governance
    • enhance the understanding of ITamong executives
    • make better business decisions with higher-quality, more timely information
    • align project initiatives iwth business requirements
    • prevent loss of intellectual assets and the possibility of system breach
    • contribute to the compliance of other regulatory requirements, such as privacy
    • gain competitive advantage through more efficient and effective operations
    • optimize operations with an integrated approach to security, availability and processing integrity
    • enhance risk management competencies and prioritization of initiatives