In a world replete with regulations and threats, organizations today have to go well beyond just securing their data. Protecting this most valuable asset means that companies have to perpetually monitor their systems in order to know who did exactly what, when, and how — to their data.
Database logging only tells you what has happened on your database, not what is happening. Even then, the log details probably won’t be enough to satisfy compliance requirements. What is needed is a new way to audit databases without impacting their performance.
How do you know what’s going on inside your database? The traditional answer is to use transaction logs or run trace utilities. Logging database activity is fine in its way — but it is only reactive. You’ll only ever know what has already happened on your database, which is like finding that the bank has been robbed, rather than knowing that the bank is being robbed — and then being able to do something about it. The other problem with logs is granularity — the logs may not capture enough detail or may miss out completely on certain critical activities such as a read operation on sensitive data.
The traditional alternative is to run trace utilities. The trouble with traces is that they consume CPU cycles. It has been estimated that running the DB2 audit trace has a CPU overhead of around 5% per transaction when all audit trace classes are started. IBM estimates that DB2’s global trace can add 100% CPU overhead when all audit trace classes are started.
It seems that what we have is one technique that is inadequate and another that is impractical. So, perhaps the important question to ask is, why should we bother? The answer is because of compliance regulations. There are two key regulations that apply — the Sarbanes-Oxley Act (SOX) and the Payment Card Industry Data Security Standard (PCI-DSS).
And while we’re thinking of compliance with auditing regulations, who would usually be the person responsible for reviewing the logs or running and examining trace utilities? That would be the DBA. To comply with auditing requirements, you also need some way to check the DBA’s activities to ensure that he isn’t the person “robbing the bank”, so to speak.
So far we have four criteria for a successful database-auditing tool. It must:
Comply with the latest regulations,
Audit DBA activity as well as all the other users of our database,
Not impact on the performance of the database,
Have a way of identifying in real-time any problems, i.e., any violations of corporate policies.
Many sites have implemented Security Information and Event Management (SIEM) tools, a hybrid of Security Information Management (SIM) and Security Event Management (SEM) tools, thinking that will help solve the problem. While they do import log data from a range of systems and network devices, they have one flaw. They don’t natively monitor DBMS activity information, and they require the DBMS utilities to be turned on.
An ideal solution would run off the mainframe to not impact the mainframe’s performance, while, at the same time, monitoring and tracking all database activity in real-time.
Fully compliant auditing solutions store, analyze, and report database information. They can identify anomalous behavior and policy violations immediately and respond with policy-based actions, such as security alerts. Database activity is captured at the DBMS level, and it can capture activity initiated by a mainframe-based applications and networked applications. It can also monitor by role or by application, which helps to meet auditing requirements.
A robust database access auditing solution that addresses regulatory compliance should be able to provide answers to at least the following questions:
Who accessed the data?
At what date and time was the access?
What program or client software was used to access the data?
From what location was the request issued?
What SQL was issued to access the data?
Was the request successful; and if so, how many rows of data were retrieved?
If the request was a modification, what data was changed? (A before and after image of the change should be accessible.)
Knowing who is doing what to your data and when will protect your data and your company.ng data from different tables in the databases. It should not try to archive at the file or row level because of the way business data can be spread about. Thirdly, this archive process needs to be policy driven and automated.
Not only does the data have to be in the archive for possibly decades in order to comply with regulations, it also has to be accessible to authorized people and must be retrievable using standard SQL queries. In addition to access and retrieval characteristics, it’s important to be able to produce reports about the data using standard techniques.
The next important archive characteristic is compliance with Section 802 of the Sarbanes-Oxley Act and rule 240.17a-4 of the Securities and Exchange Act (1934). These regulations affect the authenticity of the archive. Companies face severe penalties if they alter or delete their archived data. So, for compliance reasons, the archived records must be stored in a format that is both non-rewritable and non-erasable.
If data from a database archive is restored, then it needs to go back into the same columns and tables in which it originally existed. Information about these tables and columns is called metadata, so for an archive to be successful, it must also store the metadata along with the data. Over time the database may be modified as new versions are brought out, or, with company acquisitions and mergers, the database in use may change. This is why archiving the metadata is so important. No matter what happens, the archive data will remain accessible in its original format. In terms of compliance, recent amendments to the Federal Rules of Civil Procedure (FRCP) affect the discovery of electronically stored information. Rule 34b states that, “A party who produces documents for inspection shall produce them… as they are kept in the usual course of business…” So, basically, this means that the archive data has to be independent of the originating database.
Once data has reached the end of its “legal life” and is no longer required to be retained, the archive solution should have a policy for the automatic deletion of that data from the archive.
In the event that litigation occurs or is pending, data is placed in a litigation hold. That means it cannot be deleted or changed for any reason. Having decided what information might be available and needed in the court case, the next stage is to be able to locate that data in the archive. This is where e-discovery can be used. It is important that the archive stores data in a way that allows e-discovery tools to work fairly quickly. There have been cases where huge fines have been imposed because electronic documents have not been produced in a timely fashion (For example, Serra Chevrolet v. General Motors). Once litigation is over, the data may still have a long legal life ahead of it before it can be deleted.
It almost goes without saying that if archives are to store data for up to 30 years that they will be very big. Figures in petabytes (1015 bytes) have been suggested. The analyst firm Enterprise Strategy Group concluded that between 2005 and 2010 the required digital archive capacity will increase by more than a factor of 10 — from 2500 petabytes in 2005 to 27,000 petabytes in 2010.
Sophisticated archiving systems will prevent data from being altered or deleted, while at the same time allowing it to be accessed and retrieved. The archive data is stored on a storage area network (SAN) using encapsulated archive data objects (EADO), which allow access and retrieval of data from the archive, while also maintaining the authenticity of the data and preventing it from being overwritten or deleted. This ensures that users are compliant with the growing list of regulations today and tomorrow, and also for many decades into the future.