Archive for December, 2004

Open Source Log Analysis Tools

Friday, December 31st, 2004

Here’s a list of open source log analysis tools that I know of.

Please let me know if you know of others that are not on this list. Thanks.

sisyphus toolkit

Friday, December 31st, 2004
                Welcome to the sisyphus toolkit!
                 Version 0.9beta (Nov 5, 2004)

This is a snapshot of some tools created by a project with the
following charter:
  With the specific goal of increasing supercomputer RAS (reliability,
  availability, and serviceability), we intend to produce a
  machine-learning analysis system which enables content-novice
  analysts to efficiently understand evolving trends, identify
  anomalies, and investigate cause-effect hypotheses in large
  multiple-source event log sets.

Currently it provides two independant tools (teirify and slctify)
which address the first two items above by automatically generating
regular expressions of messages in your logfiles, categorized by
increasing anomaly: common, deviant, and anomalous.  Common are those
types which occur at least k times (k is an input argument), deviant
are messages which appear fewer than k times but are similar in
content to common messages, and anomalous are messages which are
completely anomalous in content and occurence.  A simple GUI is
included for efficient review of results.  This provides an efficient
means to define "normal", and thus provides a basis to detect
"abnormal".  See pdfs in doc/ieee_cluster04 for more details.

Posted to the log analysis mailing list by Jon Stearley.
http://www.cs.sandia.gov/sisyphus/

Amazon/Red Cross Donation

Thursday, December 30th, 2004

Amazon has setup a great way for everyone to donate cash to the Indian ocean tsunami victims.

As of this moment, amazon users have donated over $6.6 million!!

Wikipedia provides up-to-date information on the event.

The death toll from the Indian Ocean Earthquake and subsequent tsunamis on December 26 has exceeded 130,000 people in 13 countries from Malaysia to Somalia.

LMon

Wednesday, December 29th, 2004

Anders Nordby released his new tool, LMon.

LMon is a package for near real-time monitoring of logs, sending e-mail
alerts upon known (rule hits) or unknown data (rule misses).

Features:

- Buffer multiple rule hits within a given interval, cap at a given maximum
number of lines, wait for a given interval before sending next alert.

- Auto-discovery of log rotation.

- Simplicity. LMon can run from the command line without configuration, or
be controlled from a central configuration file with multiple instances
monitoring different log files/sending alerts to different people. It is very
much intended to be simple (Keep It Simple, Stupid).

War on Intellectual Property Leakage

Tuesday, December 28th, 2004

Approximately sixty to eighty percent of your company’s asset is defined as Intellectual Properties, or IP.

IP includes everything from patents, trademarks, brands, trade secrets, designs, architectures, copyrights, algorithms, software code, hardware schematics, inventions, business processes, and many other intangible assets. These are properties that may or may not have no physical presence. They exist mostly in the digital world or people’s minds.

A study by PricewaterhouseCoopers, the U.S. Chamber of Commerce, and the American Society for Industrial Security International estimated that American companies lost up to $59 billion in intellectual property and proprietary information between July 2000 and June 2001. The largest average dollar value of loss per incident occurred in research and development ($404,000), followed by financial data ($356,000).

Probably not surprising to information security professionals, most of the IP leakage incidents involve insiders. Insiders are generally considered “trusted” users who have access to the internal network, whether they are connected on the internal LAN or through VPNs. The insiders can be current and former employees, contractors or business partners.

Any one of these employees, contractors or business partners could be dissatisfied for whatever reason and decide to send a few design specs to the competitors. Once the secret is out, it is extremely difficult to contain it. The cost of IP litigation, if you choose to go that route, can cost from several hundred thousand dollars to several million dollars. This amount doesn’t even include the cost due to loss of reputation, brand, speed to market and other factors.

So how does a company go about securing their intellectual properties and make sure access to the IPs are tracked?

Enterprise Content Management

The first class of companies who attacked this problem is the Enterprise Content Management (ECM) vendors such as FileNet, Documentum, Interwoven, Open Text, Stellent and Vignette. These vendors generally provide centralized document management capabilities that allow users to

  • Organize and classify electronic documents
  • Search documents using keywords
  • Share documents with other users
  • Check-in and check-out documents for edit
  • Version control for all documents
  • Audit all access to documents

The main solution to the IP leakage problem by these vendors is all access to electronic documents are recorded and reported. These products will help manage and track documents when it’s stored centrally on the server. They can track who has accessed which file at what time. How many times files are accessed and how often people access these files.

Some of the more sophisticated products can also tell you the access behavior by individual users. For example, if a user who doesn’t normally access a certain section of the repository all the sudden starts to download all the files in that section, something suspicious may be going on and should be alerted.

But what happens when the file has been downloaded to the user’s desktop? Once that happens, these products can no longer protect or track the documents. What happens if the user emails the file via Yahoo Mail or Gmail? What happens if the user uploads the file to another server using FTP or HTTP? What happens if the user copies it to an USB drive or prints it out?

IP Leakage Detection

A whole new class of companies, including Vericept, Vidius, and Vontu, has been founded to detect IP leakage on the network. These companies’ products are designed to detected IP leakage by monitoring all the exit points in which information can leave the corporate network.

In general, when users intentionally or unintentionally leak intellectual properties, they will probably

  • E-mail the documents as attachments
  • Upload the documents to another server via FTP or HTTP
  • IM another user

All unencrypted traffic on the network can be sniffed out by package sniffers and have the content be examined. This is essentially what some of the products are doing. Most of the products in this category are basically re-purposing technologies from the IDS and content filtering world. These products will captures the contents from either the network or email stream; examine the content by either performing a keyword or regular expression search; and alert the administrators if any matches occur.

The detection mechanisms in these products are not unlike signature-based IDS. They also suffer the same high false positive rate problems as the IDS products. You will also need to spend quite a bit of time tuning and maintaining the products in order for it to accurately detect IP leakage.

However, some vendors, such as Vericept, claims to have additional technology that performs statistical or linguistic analysis on the content and are able to detect leakage much more accurately and efficiently.

IP Leakage Control

One major problem that the network-based detection products cannot solve is sneakerware leakage. Sneakerware leakage includes scenarios where the user copies the file onto removable media such as CDs, USB drives and floppies, or the user prints the documents out. The user can then carry these removable medias or printouts with them and no one will notice.

Another class of companies, including Verdasys, Liquid Machine, Authentica, and AegisDRM, are attacking the IP leakage problem a different perspective. They have designed agents that run on users’ desktops and track all user actions including opening and printing files, copying files to removable media, and sending files across the network. These products allow users to define Acceptable Use Policies, monitors all actions performed, and prevent or alert when a violation occurs. This class of companies is generally categorized as Digital Rights Management vendors.

In general, however, these products cannot detect whether a document contains confidential information. Administrators or users must explicitly mark documents as either confidential and should be protected, or not confidential. Administrators can also set up policies to globally disallow copying to removable medias, or file sharing via P2P networks.

The Future

What’s in the future in fighting against IP leakage?

As storage and security solutions are merging, as evidenced by the Symantec and Veritas marriage, we can expect comprehensive solutions that will integrate all of the above components. We can expect products that

  • Have centralized enterprise contents management capabilities
  • Have components that can monitor network exit points and match the outbound content with the central repository
  • Have agents that can monitor user activities

These three components will talk to each other to more accurately detect and prevent intellectual property leakage.

We will also probably see many of the pure play vendors in these three areas (ECM, DRM, IP Leakage Detection) be bought up by some of the bigger vendors such as Symantec and EMC/Documentum.

Network Intelligence Knowledge Base

Monday, December 27th, 2004

I was searching the web for information on Cisco IDS and found this link. Obviously it’s not available as it has been password protected by Network Intelligence. However, if you use the Google Cache, you are able to see the content at the time Google indexed it.

A bit more poking around, I found a bunch of Network Intelligence Knowledge Base articles through Google cache.

A bit scary, isn’t it?

A ton of competitive information is now available to everyone because of a simple mistake made early on.

Will Google remove the cache if you request them to do so? Let us know if you know the answer.

Banks and Hospitals

Saturday, December 25th, 2004

This is for you stats buffs who are doing market research. :)

23% of US banks are planning major updates to infrastructure

About 23% of all US banks are planning major initiatives related to core banking systems, according to Gartner. US banks lag behind financial institutions in other parts of the world such as Europe, where banks are pressed to change by new European Union rules.

Hospital IT spending to increase 6-10% in 2005

Hospitals are expected to increase their capital expenditures on health IT for 2005 by 6-10%. The operating budget is expected to swell to 3.5%, from 2.5% in 2004, according to Cap Gemini.

Security spending in UK and Ireland to more than double

IDC says British and Irish enterprises will more than double spending on IT security solutions in 2005.

Stats provided by IT Facts.

Advanced Visualization

Wednesday, December 22nd, 2004

Terry Kim had a short piece on Beyond the Pie Chart. I agree w/ him to some extent. Most vendors put visualization up for the wow factor and not necessary for anything useful.

He then asked the question Who is the leader in advanced visualization?

I am not sure I can answer that question but here’s a list of vendors who are providing visualization components or software:

If you know of any more companies providing advanced visualization or have an opinion on these vendors, I would love to hear from you.

Happy Holidays!

Tuesday, December 21st, 2004

Cisco Buys Protego

Monday, December 20th, 2004

I had a question a while back on why hasn’t the buy out begun in the log management market, and here’s Cisco’s answer.

Cisco Systems, Inc., today announced a definitive agreement to acquire privately-held Protego Networks, Inc. of Sunnyvale, CA, ….. Under the terms of the agreement, Cisco will pay approximately $65 million in cash for Protego.

CFO responsibility to fund log analysis for Sarbanes-Oxley compliance

Wednesday, December 15th, 2004

Ron Lepofsky from ERE Information Security had a great article, CFO responsibility to fund log analysis for Sarbanes-Oxley compliance, on SC Magazine.

Here’s a summary SC Magazine provided:

Corporations responsible for complying with Sarbanes-Oxley, face great hurdles with a basic compliance objective: analysis of their (server and security device) event logs. Some do not for lack of awareness, and others because of the difficulty (and cost) of performing the analysis. Further, issuers erroneously place the cost burden of SOX compliance on the IT security department, when the costs should be borne by the CFOs SOX compliance budget.

What’s In A Log: Part 1

Tuesday, December 14th, 2004

Much ink has been spilled all over the web and in print writing about log management and analysis. Google returned over 640,000 hits for the search ‘“log management” OR “log analysis”‘.

A whole technology segment has been created just for this purpose. IDC and Gartner both predicted that the log management space will be over $500M the next couple of years.

Many Global 2000 corporations have started log management projects, mostly driven by regulatory or standards compliance. Public companies have to be SOX compliant. Healthcare companies have to be HIPAA compliant. Financial companies have to be FFIEC or Basel II compliant. Government agencies have a whole list of federal and state regulations and standards they have to compliant with.

Yet many people are still wondering why they should look at logs.

What are logs? What’s in them?

What makes them so important to the world of IT performance, availability, troubleshooting, security and regulatory compliance?

To best understand how logs affect all areas of IT management, it is necessary for us to dissect the logs and see what information they provide.

What’s in a log

Firewalls probably generate the most logs amongst all devices. A busy PIX firewall, with debug logging turned on, can generate 2,000 to 3,000 messages per second (MPS).

%PIX-6-302013: Built inbound TCP connection 543127891 for
outside:192.168.11.250/41612 (192.168.11.250/41612) to
inside:10.1.241.2/80 (10.1.241.2/80)

This is a fairly typical log message from a PIX firewall. Most administrator will probably ignore these logs as there are a ton of them. However, let’s look closely to see what information we can find.

%PIX

This first 4 characters immediately tells us that the log message is a PIX message. With this information, if you are writing a parser for multiple log types, you can throw this over to the PIX parser and go on to the next message. You can also use this to classify or categorize your logs based on device type.

-6-

The dashes are just delimiters so we will ignore them for now.

The number “6″ is interesting for us because it tells us the severity level of the message. 6 in this case means Informational. Other levels are 0 (Emergency), 1 (Alert), 2 (Critical), 3 (Error), 4 (Warning), 5 (Notification) and 7 (Debug).

These 7 severity levels are fairly standard in the syslog world. Almost all devices and applications logging via syslog will follow these severity levels.

In any case, Informational messages are usually harmless in that they don’t require our immediate attention. However, excessive of informational messages may indicate something suspicious and will need drilling down.

302013:

This number represents the message ID of the PIX message. It gives us an idea what information we will find in the remainder of the message, as well as the format of the message.

The message ID is extremely useful in a distribution report. A distribution report shows us the count of each message time over a specific period of time. Tracking that information over time, we can discover things that we cannot see by looking at individual messages.

For example, if we track the daily distribution count of the message types over a course of a month, we may see that our average wednesday count for message type 302013 is around 5 million. If one wednesday we saw a count of 6 million, we might suspect that something anomalous is going on. The 1 million messages averages out to be about 11 per second, which is probably not high enough to trigger any alerts on its own.

If we didn’t track the distribution report over a longer period, we would probably have missed the 1 million message increase as well.

We will ignore the colon as it doesn’t represent anything important.

Built

This is probably one of the most important words of the whole message as it tells us the action that the firewall took.

The word “Built” tells us that the connection has been accepted based on the security policy and the PIX firewall is going to create a tunnel (figuratively) from the outside world to the inside of the firewall.

Other firewalls may use the words “accept” or “allow” to represent the same information.

Most log management products will normalize this into “accept.” By doing so, we can run reports across many different firewall types and identify trends and anomalies.

It is also important in the compliance world to track all access by users and machines. For example, the SOX regulation requires that all access to financial systems be logged and reviewed. In this case, successful connections should be reviewed periodically to see whether users accessing the financial systems are authorized.

Because majority of our logs probably contain this type of information, it’s generally a bad idea to use it in a real-time correlation rule. It will overwhelm your correlation engine in no time. However, thresholds can be set using this, probably along with the source or destination information (see below).

inbound

In this PIX message, the words “inbound” and “outbound” may appear here. “Inbound” tells us that the original connection was initiated from outside of the firewall. Vice versa, “outbound” means that the original connection was initiated from inside of the firewall.

This word is significant for several reasons.

First of all, if we see the word “inbound” in a message, but the connection is initiated from the inside, or “outbound” connection initiated from the outside, then immediately we know something weird is going on. It could mean that there’s a mis-configuration, or a bug in the PIX software :). It’s worth an investigation nonetheless.

Secondly, if we run a firewall that normally doesn’t allow “inbound” connections, and all the sudden we are seeing them in our logs, we would want to drill down and investigate what’s happening. Some administrator may have accidentally opened the firewall to the inside for some reason. Whatever it may be, two questions should be answered. First, why did the administrator open up the firewall? And second, why did she leave it open? We may also want to setup alerts based on this scenario.

Last but not least, we can run a distribution report as we did with the message ID and track it over time. Any sudden increase (or even decrease) in the count may be worth checking into.

TCP

TCP stands for Transmission Control Protocol. It is used by many internet services such as HTTP, SMTP, FTP, SSH, etc. By itself, the word “TCP” isn’t all that interesting. However, the protocol and the destination port together determines the service that the connection is for. For example

shell           514/tcp
syslog          514/udp

As we can see, the port number for these two services are the same, 514. However, the “shell” service uses TCP and “syslog” uses UDP. Without the protocol, we would have not been able to determine the service.

Most log management systems don’t have reports defined for the common protocols such as TCP or UDP. It might, however, be interesting to track the uncommon ones, if you run anything weird.

connection 543127891 for

We will skip the words “connection” and “for”.

The number, 543127891, is a unique number that represents this specific session inside the PIX’s connection table. It is used to track information such as duration and bytes transferred for this connection. These information will become available in another message (ID 302014), once the connection is closed.

Information such as duration and bytes transferred can be extremely valuable for performance and utilization tracking. We will go over these in more details in Part 2 of Anatomy of Logs.

outside:

We will skip the colon.

The word “outside” represents the source interface of the PIX firewall. It is the interface where the connection is originally initiated. Other firewalls may use the word “zone” to describe this.

Having this information allows us to quickly map the network from the logs. For example, we can easily identify all the IPs that are “outside” of the firewall vs “inside” of the firewall. If we identified IPs that have appeared both “outside” and “inside”, it may be a cause for concern: there maybe a backdoor out of your network. We have seen this happen many times on bridged networks. It is probably one of the weirdest scenarios in network troubleshooting as it leaves you scratching your head trying to figure out where/what the backdoor is.

192.168.11.250/41612 (192.168.11.250/41612)

This whole section here shows the source IP address and port of the connection. In this case, the source IP is 192.168.11.250 and the source port is 41612. There are two parts to this section. The first are the “real” IP and port, before any translation is done. The second part, inside the parenthesis, are the mapped IP and port, after the network address translation is applied.

For firewalls with no NAT, these two should be exactly the same. For “inbound” connections from the outside, these two are probably the same as well. However, if you, for whatever wacky reason, decide to NAT incoming traffic, then these will be different. (Ok, so maybe not wacky, I have seen people who have a second layer of firewall, something about defense-in-depth or some wacky idea like that ;), that will only accept connections from a single source IP. So in this case, NAT might be used.)

The source port is generally not significant; however, in specific scenarios, it may indicate an exploit attempt. For example, BEWARE ftp clients from SOURCE PORT 1!

In other cases, it may indicate the the source host is using network address translation. E.g. PIX firewalls’ Port Address Translation uses ports to track the connections, so it’s possible that port 1 is used.

There’s all kinds of reports that can be generated from the source IP address. We can track

  • how often the IPs visit our site by doing a summary report over time
  • which company/domain/country visit our site by performing a reverse DNS looking (or whois) and summarizing
  • which IPs are attempting to scan our network by summarizing the destination IP/ports (see below) attempted

inside:

This represents “inside” of the firewall, or the internal network. It is similar to the “outside” interface as described above.

10.1.241.2/80 (10.1.241.2/80)

This section describes the destination IP and port of the connection. The first part shows the original, or untranslated, IP and port. The second shows the translated, or mapped, IP and port. For networks that use NAT, these two parts will be different.

The IP address, 10.1.241.2, shows the destination of the connection.

The destination port, 80, unlike the source port, is very significant. Together with the protocol (see above), the port will tell us exactly what service is being requested. In this case, port 80 of protocol TCP is HTTP, which means this connection is requesting a web server connection.

Just like the source IP, many reports can be generated for the destination IP and ports. For example,

  • what are the most accessed servers
  • what are the most requested services
  • what are the most denied servers and/or services
  • what’s the distribution of servers and services over time

As before, seeing a distribution over time will tell us whether we are getting more traffic and whether we should consider upgrading our servers to handle the additional load. Trend analysis is very important for most IT shops.

One of the more interesting reports to see is “what are the LEAST requested services or servers?”

When hackers install a backdoor on your system, they don’t normally make a lot of noises by connecting to it a thousand times. They try to hide their tracks by connecting at odd hours and very infrequently. So seeing the least requested service may actually tell you if there’s any backdoor activities.

What about the time

At this point you are probably wondering, “where does it tell me when this log was generated?!”

Unfortunately, most logs generated by devices do not include the time. The devices depend on the log management server to include the time when the log is received. So most of the time when you see the time stamp in front of a log, it’s the server’s time, not the device time.

In some cases, such as the PIX, the device will allow you to send the time as well. In that case, you will see something like this:

Apr 30 2004 08:23:36: %PIX-6-302013: Built inbound TCP connection 543127891 for
outside:192.168.11.250/41612 (192.168.11.250/41612) to
inside:10.1.241.2/80 (10.1.241.2/80)

As you can see, there’s a wealth of information in just a single log message. Next time, we will take a look at the corresponding Teardown message in PIX.

How to calculate firewall log size requirement

Monday, December 13th, 2004

Someone googled for “how to calculate firewall log size requirement” and found this blog. Since google only pointed to my main site and not the specific article, here’s it is:

Five Factors to Consider When Building Your Logging Infrastructure

What the heck is security event management, anyway?

Sunday, December 12th, 2004

Techworld has an article on this topic. Unfortunately, Larry Lunetta made it sound like the whole SEM space is about IDS alerts reduction. It would be really sad if that’s all SEM products do.

I think SEM is probably the wrong name for this space anyway. Most of the vendors mentioned in the article are pure SEM players in the sense that they only do security events. But there’s a lot more to log management than just security.

Most people who go through logs use them for

  • network/system troubleshooting
  • fault isolation
  • utilization tracking
  • availability detection
  • performance tracking

Security is obviously an important use of logs, but logs tell you a lot more than just security.

What better way to tell whether your pair of PIX firewalls have failed over than to look for

  • %PIX-1-104001: (Primary) Switching to ACTIVE (cause: string).
  • %PIX-1-104002: (Primary) Switching to STNDBY (cause: string).
  • %PIX-1-104003: (Primary) Switching to FAILED.

This will tell you exactly when the switch happened. Then you can perform a search for logs that are around this time frame and determine the exact root cause.

This is definitely NOT “security event management”. So I think IDC/Gartner should come up w/ a new term for the log management space. :)

rsyslog

Friday, December 10th, 2004

Rainer Gerhards announced the initial beta release of the rsyslog package, an alternate syslogd implementation.

Rsyslog has been forked from the sysklogd package. It currently shares
its base design but includes many important enhancements. Most
importantly it supports

- fully configurable output formats, including
* high precision timestamps with year ;)
* access to each of the message parts as well as substrings thereof
(includes access to facility and priority)
* access to the raw message received
- direct logging to MySQL database servers
- compatibility to stock linux syslogd

Rsyslog is GPL’ed software.