For Audit, OneFS has an audit queue service (isi_audit_d) which handles the writing and rotating of logs in /ifs/.ifsvar/audit/logs
For forwarding to CEE (isi_audit_cee), the workflow is
1. Get the next item from the queue, and encode it into the required CEE format. 2. Check Heartbeat of CEE server 3. Forward the event to CEE via HTTP 4. Receive notification that the event was successfully received in HTTP response codes 5. The CEE Server forwards the event to the defined endpoint 6. Update the pointer to the next record in the audit logs
we are using Varonis and these questions came up as we were discussing HA and CEPA/Varonis collector/Varonis DB patching.
4. Isilon sends those events to CEPA, we configured two so that it will go to the next one if one is available. In our testing we have proved it multiple times that built-in CEPA failover in Isilon is not good and will miss audit events. We had to put an F5 VIP in order to ensure faster failover if one CEPA is down.
5. With Varonis next step is CEPA sends events to collectors, if collector is not available it will buffer x number of events and then start discarding them. You could tweak that "buffer" in the registry but it was not that huge, also depends how many events are coming in.
6. In Varonis case collectors send data to Varonis app, it will simply accept them, there is no ACK back to Isilon ..."hey buddy i got it". And there is no way to reconcile missed events in case of CEPA/Collector outage.
Regarding your 4th point: If the CEPA server is down and Isilon can't forward events it will simply pick up where it left off once the CEPA server is back up. Unless I need to see events in Varonis almost real-time, I don't care if my CEPA server is down an hour or two. See scott_ownes reply to this thread, particularly point 4.
It seems to me, my only concern is when the collector is down. If its down, CEE continues to forward but without ACK for Varonis, there's no guarantee and events are lost; however, there does seem to be a guarantee in the Isilon to CEE communication.
Regarding your 4th point: If the CEPA server is down and Isilon can't forward events it will simply pick up where it left off once the CEPA server is back up.
not in my experience, while CEPA server is down those events will not be re-send once CEPA is back online. They are in Isilon logs but they will never make it to your Varonis infrastructure. So as far as Varonis is concerned, those events never happened.
If the CEE is there , but the partner app is not, then CEE would return an error code stating “cepp_not_found” – indicating that the partner app is not there.
If CEE is down, OneFS would not get any response other than an http error.
In either case OneFS will try another CEE in the pool, if that is error as well, OneFS will replay the events later when CEE is back “online”.
In addition, we added commands in OneFS 7.2, that allow you to move the pointer in the audit logs back to an earlier point in time: By setting the --cee-log-time or --syslog-log-time, you can advance the point of time from where to start to forward events.
Example: The following will update the pointer to forward events newer than Nov 19, 2014 at 2pm isi audit settings modify --cee-log-time "Protocol@2014-11-19 14:00:00" isi audit settings modify --syslog-log-time "Protocol@2014-11-19 14:00:00"
The primary use case for the --cee-log-time and/or --syslog-log-time is in a scenario where auditing on the cluster has been configured and enabled prior to setting up CEE and/or Syslog, the cluster will attempt to forward all events from the time auditing was configured. In that scenario, you would use the above commands to move the pointer forward to an appropriate date in the future.
If the CEE is there , but the partner app is not, then CEE would return an error code stating “cepp_not_found” – indicating that the partner app is not there.
If CEE is down, OneFS would not get any response other than an http error.
In either case OneFS will try another CEE in the pool, if that is error as well, OneFS will replay the events later when CEE is back “online”.
is that new to 7.2 ? I have extensively tested this with Varonis engineers online and if both CEE servers were offline the events were lost (as far as application was concerned). As a matter of fact we lost events when Isilon was trying to figure out that one CEE server was offline and was transitioning to the next CEE server.
i am on OneFS 7.1.0.6 so maybe there is some buffering in 7.2 ?
The way i tested was create a loop (perl/powershell/vbscript) and have it create a text file every second and increment that text files, ie: 1.txt, 2.txt, 3.txt. With this loop going try your tests again. After that do the same thing but from multiple windows machines (create files like 1-1.txt from one machines and keep incrementing 1-2.txt, 1-3.txt ..etc). That way it's easier to look in Varonis and see if you missed any events.
I did the following test with Isilon 7.2 and CAVA 6.5.
1. Shutdown CEE
2. Create six files on audited Isilon directory
3. Start CEE
The files show up in Varonis after a few minutes. I have the details of the run if you want to see it.
I also ran a second test where I shutdown the probe but left CEE up and again Varonis was able to capture the six events. I'm sure the six events did not overfill any buffers on the CEE server so probably not the best test.
I need assistance for varonis and CEE configuration, we have just one varonis server 5.8 release and isilon side release is 7.2.1.2.
how to check configuration, documentation is very poor on this item and does it possible to detail this (line by line):
1. Get the next item from the queue, and encode it into the required CEE format. 2. Check Heartbeat of CEE server 3. Forward the event to CEE via HTTP 4. Receive notification that the event was successfully received in HTTP response codes 5. The CEE Server forwards the event to the defined endpoint 6. Update the pointer to the next record in the audit logs
1. Get the next item from the queue, and encode it into the required CEE
format.
We take the next entry in the Protocol Audit Log and encode the entry into the format required to be processed by the CEE server.
2. Check Heartbeat of CEE server
The heartbeat task makes CEE servers available for audit event delivery. Only after a CEE server has received a successful heartbeat will audit events be delivered to it.
Every 10 seconds the heartbeat task wakes up and sends each CEE server in the configuration a heartbeat.
It starts at the lnn offset into the configuration of CEE servers. This is so other nodes will likely not get the same CEE servers, unless they have an overlapping lnn index.
If the heartbeat was successful the CEE server is marked available and any CELOG events are resolved. If the heartbeat was unsuccessful and the CEE server doesn't have an active CELOG alert, a new CELOG event is raised and added to the server's context.
Each successful CEE server is appended to a list and after iterating all CEE servers is added to the available servers.
3. Forward the event to CEE via HTTP
If the heartbeat is successful, we forward the event to CEE using a HTTP PUT request
4. Receive notification that the event was successfully received in HTTP response codes
We receive a response back from CEE that the event forwarded in step 3 was successfully received
5. The CEE Server forwards the event to the defined endpoint
Depending on the endpoint defined on the CEE server, the message forwarded in step 3 is sent to a defined endpoint.
Some of the possible endpoints are
Varonis DatAdvantage
Symantec Data Insight
STEALTHbits StealthAUDIT
Dell Change Auditor for EMC
6. Update the pointer to the next record in the audit logs
We move to the next entry in the audit log, so the message can be forwarded. In some of the newer versions of OneFS, you can see the current position in the audit logs using the 'isi_audit_progress' command. The command is not available in your release of 7.2.1.2
Here is a sample of isi_audit_progress
For output of a single node
The last consumed event time is the last event that was successfully forwarded to CEE, while the last logged event is the most recent audit entry captured in the local logs. As an event is successfully forward, the last consumed event would advanced to the next event that was consumed.
Thanks Scott. I was poking around trying to find commands to figure out if CEE is actually forwarding data (and this helps). Do you have any commands that might show the outbound CEE packet size?
scott_owens
60 Posts
0
March 3rd, 2015 20:00
For Audit, OneFS has an audit queue service (isi_audit_d) which handles the writing and rotating of logs in /ifs/.ifsvar/audit/logs
For forwarding to CEE (isi_audit_cee), the workflow is
1. Get the next item from the queue, and encode it into the required CEE
format.
2. Check Heartbeat of CEE server
3. Forward the event to CEE via HTTP
4. Receive notification that the event was successfully received in HTTP
response codes
5. The CEE Server forwards the event to the defined endpoint
6. Update the pointer to the next record in the audit logs
dynamox
9 Legend
•
20.4K Posts
1
March 3rd, 2015 06:00
we are using Varonis and these questions came up as we were discussing HA and CEPA/Varonis collector/Varonis DB patching.
4. Isilon sends those events to CEPA, we configured two so that it will go to the next one if one is available. In our testing we have proved it multiple times that built-in CEPA failover in Isilon is not good and will miss audit events. We had to put an F5 VIP in order to ensure faster failover if one CEPA is down.
5. With Varonis next step is CEPA sends events to collectors, if collector is not available it will buffer x number of events and then start discarding them. You could tweak that "buffer" in the registry but it was not that huge, also depends how many events are coming in.
6. In Varonis case collectors send data to Varonis app, it will simply accept them, there is no ACK back to Isilon ..."hey buddy i got it". And there is no way to reconcile missed events in case of CEPA/Collector outage.
gtjones1
1 Rookie
•
10 Posts
0
March 4th, 2015 05:00
Thanks, very helpful.
Regarding your 4th point: If the CEPA server is down and Isilon can't forward events it will simply pick up where it left off once the CEPA server is back up. Unless I need to see events in Varonis almost real-time, I don't care if my CEPA server is down an hour or two. See scott_ownes reply to this thread, particularly point 4.
It seems to me, my only concern is when the collector is down. If its down, CEE continues to forward but without ACK for Varonis, there's no guarantee and events are lost; however, there does seem to be a guarantee in the Isilon to CEE communication.
Am I understanding this right?
dynamox
9 Legend
•
20.4K Posts
1
March 4th, 2015 07:00
not in my experience, while CEPA server is down those events will not be re-send once CEPA is back online. They are in Isilon logs but they will never make it to your Varonis infrastructure. So as far as Varonis is concerned, those events never happened.
scott_owens
60 Posts
1
March 4th, 2015 08:00
If the CEE is there , but the partner app is not, then CEE would return an error code stating “cepp_not_found” – indicating that the partner app is not there.
If CEE is down, OneFS would not get any response other than an http error.
In either case OneFS will try another CEE in the pool, if that is error as well, OneFS will replay the events later when CEE is back “online”.
In addition, we added commands in OneFS 7.2, that allow you to move the pointer in the audit logs back to an earlier point in time:
By setting the --cee-log-time or --syslog-log-time, you can advance the point of time from where to start to forward events.
Example: The following will update the pointer to forward events newer than Nov 19, 2014 at 2pm
isi audit settings modify --cee-log-time "Protocol@2014-11-19 14:00:00"
isi audit settings modify --syslog-log-time "Protocol@2014-11-19 14:00:00"
The primary use case for the --cee-log-time and/or --syslog-log-time is in a scenario where auditing on the cluster has been configured and enabled prior to setting up CEE and/or Syslog, the cluster will attempt to forward all events from the time auditing was configured. In that scenario, you would use the above commands to move the pointer forward to an appropriate date in the future.
dynamox
9 Legend
•
20.4K Posts
0
March 4th, 2015 08:00
is that new to 7.2 ? I have extensively tested this with Varonis engineers online and if both CEE servers were offline the events were lost (as far as application was concerned). As a matter of fact we lost events when Isilon was trying to figure out that one CEE server was offline and was transitioning to the next CEE server.
dynamox
9 Legend
•
20.4K Posts
0
March 5th, 2015 11:00
i am on OneFS 7.1.0.6 so maybe there is some buffering in 7.2 ?
The way i tested was create a loop (perl/powershell/vbscript) and have it create a text file every second and increment that text files, ie: 1.txt, 2.txt, 3.txt. With this loop going try your tests again. After that do the same thing but from multiple windows machines (create files like 1-1.txt from one machines and keep incrementing 1-2.txt, 1-3.txt ..etc). That way it's easier to look in Varonis and see if you missed any events.
gtjones1
1 Rookie
•
10 Posts
0
March 5th, 2015 11:00
I did the following test with Isilon 7.2 and CAVA 6.5.
1. Shutdown CEE
2. Create six files on audited Isilon directory
3. Start CEE
The files show up in Varonis after a few minutes. I have the details of the run if you want to see it.
I also ran a second test where I shutdown the probe but left CEE up and again Varonis was able to capture the six events. I'm sure the six events did not overfill any buffers on the CEE server so probably not the best test.
Greg
prod_stockage
1 Message
0
January 15th, 2017 22:00
Hi all,
I need assistance for varonis and CEE configuration, we have just one varonis server 5.8 release and isilon side release is 7.2.1.2.
how to check configuration, documentation is very poor on this item and does it possible to detail this (line by line):
1. Get the next item from the queue, and encode it into the required CEE
format.
2. Check Heartbeat of CEE server
3. Forward the event to CEE via HTTP
4. Receive notification that the event was successfully received in HTTP
response codes
5. The CEE Server forwards the event to the defined endpoint
6. Update the pointer to the next record in the audit logs
Thanks by advance ?
scott_owens
60 Posts
2
January 17th, 2017 14:00
1. Get the next item from the queue, and encode it into the required CEE
format.
We take the next entry in the Protocol Audit Log and encode the entry into the format required to be processed by the CEE server.
2. Check Heartbeat of CEE server
The heartbeat task makes CEE servers available for audit event delivery. Only after a CEE server has received a successful heartbeat will audit events be delivered to it.
Every 10 seconds the heartbeat task wakes up and sends each CEE server in the configuration a heartbeat.
It starts at the lnn offset into the configuration of CEE servers. This is so other nodes will likely not get the same CEE servers, unless they have an overlapping lnn index.
If the heartbeat was successful the CEE server is marked available and any CELOG events are resolved. If the heartbeat was unsuccessful and the CEE server doesn't have an active CELOG alert, a new CELOG event is raised and added to the server's context.
Each successful CEE server is appended to a list and after iterating all CEE servers is added to the available servers.
3. Forward the event to CEE via HTTP
If the heartbeat is successful, we forward the event to CEE using a HTTP PUT request
4. Receive notification that the event was successfully received in HTTP response codes
We receive a response back from CEE that the event forwarded in step 3 was successfully received
5. The CEE Server forwards the event to the defined endpoint
Depending on the endpoint defined on the CEE server, the message forwarded in step 3 is sent to a defined endpoint.
Some of the possible endpoints are
6. Update the pointer to the next record in the audit logs
We move to the next entry in the audit log, so the message can be forwarded. In some of the newer versions of OneFS, you can see the current position in the audit logs using the 'isi_audit_progress' command. The command is not available in your release of 7.2.1.2
Here is a sample of isi_audit_progress
For output of a single node
The last consumed event time is the last event that was successfully forwarded to CEE, while the last logged event is the most recent audit entry captured in the local logs. As an event is successfully forward, the last consumed event would advanced to the next event that was consumed.
tme-sandbox-4# isi_audit_progress -t protocol CEE_FWD
Last consumed event time: '2016-10-21 19:59:54'
Last logged event time: '2017-01-17 17:05:32'
Looking at the output for every node using isi_for_array.
tme-sandbox-4# isi_for_array "isi_audit_progress -t protocol CEE_FWD"
tme-sandbox-4: Last consumed event time: '2016-10-21 19:59:54'
tme-sandbox-4: Last logged event time: '2017-01-17 17:05:32'
tme-sandbox-6: Last consumed event time: '2016-10-21 19:59:54'
tme-sandbox-6: Last logged event time: '2017-01-12 20:25:10'
tme-sandbox-5: Last consumed event time: '2016-07-22 18:26:25'
tme-sandbox-5: Last logged event time: '2017-01-12 21:50:17'
Brian_Coulombe_
1 Rookie
•
107 Posts
0
May 10th, 2017 09:00
Thanks Scott. I was poking around trying to find commands to figure out if CEE is actually forwarding data (and this helps). Do you have any commands that might show the outbound CEE packet size?