CEE with Isilon 7.2 and Event Flow

Question

I'm running Isilon 7.2 using CEE and I'm trying to determine the flow of audit events through the system.

1. Event is triggered by User

2. isilon records event in /ifs/.ifsvar/logs

3. isi_audit_cee (or isi_audit_d) forwards those events to CEPA (running on windows server in my case)

4. Does isilon get a response back at this point from CEPA that the event has been received?

5. Audit event is forwarded to 3rd party tool from CEPA server

6. Is third party tool required to say that the event has been received?

I can't tell in this process if there's a gap such that if one part is down, you lose audit events. What happens if CEPA is down, will Isilon resend those events? Does CEPA queue events if the 3rd party tool has not responded to a passed event?

Thanks,

Greg

scott_owens · Accepted Answer

For Audit, OneFS has an audit queue service (isi_audit_d) which handles the writing and rotating of logs in /ifs/.ifsvar/audit/logs

For forwarding to CEE (isi_audit_cee), the workflow is

1. Get the next item from the queue, and encode it into the required CEE
format.
2. Check Heartbeat of CEE server
3. Forward the event to CEE via HTTP
4. Receive notification that the event was successfully received in HTTP
response codes
5. The CEE Server forwards the event to the defined endpoint
6. Update the pointer to the next record in the audit logs

dynamox · Answer

we are using Varonis and these questions came up as we were discussing HA and CEPA/Varonis collector/Varonis DB patching.

4. Isilon sends those events to CEPA, we configured two so that it will go to the next one if one is available. In our testing we have proved it multiple times that built-in CEPA failover in Isilon is not good and will miss audit events. We had to put an F5 VIP in order to ensure faster failover if one CEPA is down.

5. With Varonis next step is CEPA sends events to collectors, if collector is not available it will buffer x number of events and then start discarding them. You could tweak that "buffer" in the registry but it was not that huge, also depends how many events are coming in.

6. In Varonis case collectors send data to Varonis app, it will simply accept them, there is no ACK back to Isilon ..."hey buddy i got it". And there is no way to reconcile missed events in case of CEPA/Collector outage.

gtjones1 · Answer

Thanks, very helpful.

Regarding your 4th point: If the CEPA server is down and Isilon can't forward events it will simply pick up where it left off once the CEPA server is back up. Unless I need to see events in Varonis almost real-time, I don't care if my CEPA server is down an hour or two. See scott_ownes reply to this thread, particularly point 4.

It seems to me, my only concern is when the collector is down. If its down, CEE continues to forward but without ACK for Varonis, there's no guarantee and events are lost; however, there does seem to be a guarantee in the Isilon to CEE communication.

Am I understanding this right?

dynamox · Answer

gtjones wrote:

Thanks, very helpful.

Regarding your 4th point: If the CEPA server is down and Isilon can't forward events it will simply pick up where it left off once the CEPA server is back up.

not in my experience, while CEPA server is down those events will not be re-send once CEPA is back online. They are in Isilon logs but they will never make it to your Varonis infrastructure. So as far as Varonis is concerned, those events never happened.

scott_owens · Answer

If the CEE is there , but the partner app is not, then CEE would return an error code stating “cepp_not_found” – indicating that the partner app is not there.

If CEE is down, OneFS would not get any response other than an http error.

In either case OneFS will try another CEE in the pool, if that is error as well, OneFS will replay the events later when CEE is back “online”.

In addition, we added commands in OneFS 7.2, that allow you to move the pointer in the audit logs back to an earlier point in time:
By setting the --cee-log-time or --syslog-log-time, you can advance the point of time from where to start to forward events.

Example: The following will update the pointer to forward events newer than Nov 19, 2014 at 2pm
isi audit settings modify --cee-log-time "Protocol@2014-11-19 14:00:00"
isi audit settings modify --syslog-log-time "Protocol@2014-11-19 14:00:00"

The primary use case for the --cee-log-time and/or --syslog-log-time is in a scenario where auditing on the cluster has been configured and enabled prior to setting up CEE and/or Syslog, the cluster will attempt to forward all events from the time auditing was configured. In that scenario, you would use the above commands to move the pointer forward to an appropriate date in the future.

dynamox · Answer

scott_owens wrote:

If the CEE is there , but the partner app is not, then CEE would return an error code stating “cepp_not_found” – indicating that the partner app is not there.

If CEE is down, OneFS would not get any response other than an http error.

In either case OneFS will try another CEE in the pool, if that is error as well, OneFS will replay the events later when CEE is back “online”.

is that new to 7.2 ? I have extensively tested this with Varonis engineers online and if both CEE servers were offline the events were lost (as far as application was concerned). As a matter of fact we lost events when Isilon was trying to figure out that one CEE server was offline and was transitioning to the next CEE server.

dynamox · Answer

i am on OneFS 7.1.0.6 so maybe there is some buffering in 7.2 ?

The way i tested was create a loop (perl/powershell/vbscript) and have it create a text file every second and increment that text files, ie: 1.txt, 2.txt, 3.txt. With this loop going try your tests again. After that do the same thing but from multiple windows machines (create files like 1-1.txt from one machines and keep incrementing 1-2.txt, 1-3.txt ..etc). That way it's easier to look in Varonis and see if you missed any events.

gtjones1 · Answer

I did the following test with Isilon 7.2 and CAVA 6.5.

1. Shutdown CEE

2. Create six files on audited Isilon directory

3. Start CEE

The files show up in Varonis after a few minutes. I have the details of the run if you want to see it.

I also ran a second test where I shutdown the probe but left CEE up and again Varonis was able to capture the six events. I'm sure the six events did not overfill any buffers on the CEE server so probably not the best test.

Greg

prod_stockage · Answer

Hi all,

I need assistance for varonis and CEE configuration, we have just one varonis server 5.8 release and isilon side release is 7.2.1.2.

how to check configuration, documentation is very poor on this item and does it possible to detail this (line by line):

1. Get the next item from the queue, and encode it into the required CEE
format.
2. Check Heartbeat of CEE server
3. Forward the event to CEE via HTTP
4. Receive notification that the event was successfully received in HTTP
response codes
5. The CEE Server forwards the event to the defined endpoint
6. Update the pointer to the next record in the audit logs

Thanks by advance ?

scott_owens · Answer

1. Get the next item from the queue, and encode it into the required CEE

format.

We take the next entry in the Protocol Audit Log and encode the entry into the format required to be processed by the CEE server.

2. Check Heartbeat of CEE server

The heartbeat task makes CEE servers available for audit event delivery. Only after a CEE server has received a successful heartbeat will audit events be delivered to it.

Every 10 seconds the heartbeat task wakes up and sends each CEE server in the configuration a heartbeat.

It starts at the lnn offset into the configuration of CEE servers. This is so other nodes will likely not get the same CEE servers, unless they have an overlapping lnn index.

If the heartbeat was successful the CEE server is marked available and any CELOG events are resolved. If the heartbeat was unsuccessful and the CEE server doesn't have an active CELOG alert, a new CELOG event is raised and added to the server's context.

Each successful CEE server is appended to a list and after iterating all CEE servers is added to the available servers.

3. Forward the event to CEE via HTTP

If the heartbeat is successful, we forward the event to CEE using a HTTP PUT request

4. Receive notification that the event was successfully received in HTTP response codes

We receive a response back from CEE that the event forwarded in step 3 was successfully received

5. The CEE Server forwards the event to the defined endpoint

Depending on the endpoint defined on the CEE server, the message forwarded in step 3 is sent to a defined endpoint.

Some of the possible endpoints are

Varonis DatAdvantage
Symantec Data Insight
STEALTHbits StealthAUDIT
Dell Change Auditor for EMC

6. Update the pointer to the next record in the audit logs

We move to the next entry in the audit log, so the message can be forwarded. In some of the newer versions of OneFS, you can see the current position in the audit logs using the 'isi_audit_progress' command. The command is not available in your release of 7.2.1.2

Here is a sample of isi_audit_progress

For output of a single node

The last consumed event time is the last event that was successfully forwarded to CEE, while the last logged event is the most recent audit entry captured in the local logs. As an event is successfully forward, the last consumed event would advanced to the next event that was consumed.

tme-sandbox-4# isi_audit_progress -t protocol CEE_FWD

Last consumed event time: '2016-10-21 19:59:54'

Last logged event time: '2017-01-17 17:05:32'

Looking at the output for every node using isi_for_array.

tme-sandbox-4# isi_for_array "isi_audit_progress -t protocol CEE_FWD"

tme-sandbox-4: Last consumed event time: '2016-10-21 19:59:54'

tme-sandbox-4: Last logged event time: '2017-01-17 17:05:32'

tme-sandbox-6: Last consumed event time: '2016-10-21 19:59:54'

tme-sandbox-6: Last logged event time: '2017-01-12 20:25:10'

tme-sandbox-5: Last consumed event time: '2016-07-22 18:26:25'

tme-sandbox-5: Last logged event time: '2017-01-12 21:50:17'

Brian_Coulombe_ · Answer

Thanks Scott.  I was poking around trying to find commands to figure out if CEE is actually forwarding data (and this helps).  Do you have any commands that might show the outbound CEE packet size?

Isilon

CEE with Isilon 7.2 and Event Flow

Was this post helpful?