PowerScale: How to increase syslog export performance for PowerScale Isilon Audit performance

Summary: In certain high volume environments syslog audit exports performance may not be able to always keep up with your Audit ingest rate. Performance improvements in OneFS 9.4 may help.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

The syslog Audit export backlog is never caught up. There is no simple way to see this, due to the distributed nature of the audit databases and OneFS workload. 

A method to do this is:

  1. Review the most recent timestamps on received Audit syslog messages.
  2. If the audit record dates are behind your real-time date, it is behind.
  3. If, each day the records seem to be a consistent length of time behind, or are sometimes behind and sometimes caught up, it means that the cluster and syslog setup are at relative parity. Usually, this is acceptable. How far "behind" is acceptable depends on the environment.
  4. If, each day, the performance falls further and further behind and never catches up, it may be necessary to reduce the local Isilon-side Audit workloads. 

Cause

Audit works on a FIFO (first in, first out) queue on all nodes.
If audit data is exported faster than it is ingested, the queue is up to date.

If node #1 ingests and records 1,000 audit events per second, but is only able to export from that node 500 events per second, and that rate holds true for 24x7x365, it means that, for that one single Isilon node:

  • After 1 s, there is a backlog of 500 items
  • After 1 minute, 30,000 items
  • After 1 hour, 1,800,000 items
  • After 1 day, 43,200,000 items


If even for half of the hours of the day, however, the system can ingest at half the rate or export at double the rate, the backlog would be cut in half.

With the syslog deployment used pre-9.4, reported cases of clients with severe audit backlogs are relatively rare.

Even with the CEE alternative, which is more efficient than pre 9.4 syslog, there are sometimes reports of backlogs due to the local "audit design" and ingest rates.

It can be hard to know how much volume there is until the audit runs. With the more efficient CEE, you can also deploy additional target CEE machines, and then work with your third-party vendor who "collects" that data for Audit record scrutiny.

Additional CEEs allow the Isilon side to export more efficiently. Each individual Isilon node can connect to up to three (3) unique CEE machines simultaneously.

 

Syslog is different. Each defined syslog target in the Audit setup receives each logged Audit event that is exported.
Imagine a single logged audit event is sent from the queue. If there were a single target syslog server, the Isilon node with that queue item would send that record over UDP port 514, on a 1:1 basis, to that single target syslog.
If another syslog target were added, giving two total, that Isilon node instead must send that same single queue item to both unique syslog targets. This effectively doubles some of the workload within the Isilon node. Three syslog targets would triple some of the work, and so on.

There are additionally no relevant tunable configurations for pre-9.4 OneFS on syslog as relates to Audit.

With CEE, you could reduce backlog by auditing less and, or adding more target CEEs and more 3rd party vendor capacity beyond those CEEs.

With syslog, you could only reduce backlog by auditing less over time.

Resolution

Upgrade to OneFS 9.4 or later.

 

In optimal and controlled lab tests, Syslog for Audit exporting with the OneFS 9.4 releases has shown performance improvements of ~300%.

Upgrading to a version of OneFS of 9.4+, is likely to give performance improvements to Audit syslog export performance.

Performance will vary based on the environment, deployment, design, networks, work flows and human behavior factors that generate Audit activity. 
 

If syslog performance for exports, even on OneFS 9.4+ is unable to overcome a persistent or seemingly unworkable queue, several weeks after upgrading, contact Support for review.
The only way to reduce the number of Audited events per second and per node is to adjust how and what is audited.

Where syslog exports are correctly "flowing" from the Isilon cluster, further optimization is outside of the scope of Isilon Support:

 

  • Review the design of how you Audit with Dell system engineers. Consider how your end users (human or automated) get into the cluster in the first place to perform Audit logged activities.
  • Balance workflows and volume of logged activities across all Isilon nodes more equally: All logged actions only are recorded for Audit and exported out of the node that user is connected to.
  • If you have twenty nodes in your Isilon but only five take Protocols connections, then 100% of those Audit activities will only happen on those five nodes.
  • Consider reviewing the entire deployment to make sure the total volume of all protocols activities are distributed over as many physical nodes as is possible. This distributes the total Audit workload over as many nodes as is possible.
  • Consider a CEE-based solution as an alternative.
  • Consider reducing the volume and scope of what is Audited. Do not go far beyond organizational or regulatory requirements. 
  • Engage with Dell system engineering for further discussions.

Affected Products

PowerScale OneFS, PowerScale F200, PowerScale F600, PowerScale F900, PowerScale Hybrid H700, PowerScale Hybrid H7000
Article Properties
Article Number: 000212377
Article Type: Solution
Last Modified: 28 May 2023
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.