Highlighted
2 Bronze

Re: Ask the Expert: Performance Calculations on Clariion/VNX

Do we have to deal with redo logs different as with archive logs?

When storage is assigned for a database server, right now the database administrators put the archive log on a RAID volume... When I am searching for the redo logs, I can find them on a RAID5 volume, with also the database on it.

I never paid attention to it. I was thinking our database admins knew what they are doing... 😉

0 Kudos
Highlighted
6 Indium

Re: Ask the Expert: Performance Calculations on Clariion/VNX

Exactly my thought , but with Oracle this might be different, I don't know, but it changes nothing to the fact that measuring your needs is always the best.

0 Kudos
Highlighted
7 Thorium

Re: Ask the Expert: Performance Calculations on Clariion/VNX

bartdonders wrote:

Do we have to deal with redo logs different as with archive logs?

When storage is assigned for a database server, right now the database administrators put the archive log on a RAID volume... When I am searching for the redo logs, I can find them on a RAID5 volume, with also the database on it.

I never paid attention to it. I was thinking our database admins knew what they are doing... 😉

we separate database, redo and archive ..not only from performance perspective but also recovery perspective. If your LUN that contains data files and redo logs becomes corrupt , you just lost transactions. You can restore from tape and then roll forward using archive logs, but they won't get you as close to the point of corruption as redo logs could if they were not destroyed.

Highlighted
3 Argentum

Re: Ask the Expert: Performance Calculations on Clariion/VNX

Bart...

I think this white paper might be of help to you in your search for Oracle answers

http://www.emc.com/collateral/hardware/white-papers/h8242-deploying-oracle-vnx-wp.pdf

Highlighted
2 Bronze

Re: Ask the Expert: Performance Calculations on Clariion/VNX

Thanks for the document...

I will share this with our database administrators also.

0 Kudos
Highlighted
2 Bronze

Re: Ask the Expert: Performance Calculations on Clariion/VNX

Hi Jon. The answer to your question of why R1/0 for logs and not R 5 is. For the server to write a full stripe write on the raid group disks the write size from os/host should match the block size (or i do not know what exactly we say) at storage (might be disk allingment) so each write actually uses the exact stipe size, or else it will create 4 write for 1 write (you know write penality) as it need to sperately update the parity.... So it will again put more IOPS and performace will have a hit... Not sure i am clear with the explanation, but i did found something understandable in an ISM book i had....

0 Kudos
Highlighted
7 Thorium

Re: Ask the Expert: Performance Calculations on Clariion/VNX

swadeey123 wrote:

Hi Jon. The answer to your question of why R1/0 for logs and not R 5 is. For the server to write a full stripe write on the raid group disks the write size from os/host should match the block size (or i do not know what exactly we say) at storage (might be disk allingment) so each write actually uses the exact stipe size, or else it will create 4 write for 1 write (you know write penality) as it need to sperately update the parity.... So it will again put more IOPS and performace will have a hit... Not sure i am clear with the explanation, but i did found something understandable in an ISM book i had....

not exactly, in RAID-5 unless it's a full stripe write there will be 4 I/O operations (not all 4 are write).  Considering there is only 2 I/O penalty in R1/0 it's probably a good choice for redo logs.  Data files could be serviced by RAID-5 pool if you have enough spindles so your strategy could be to place database data files in a pool and have your redo logs in traditional R1/0 raid groups.

0 Kudos
Highlighted
2 Bronze

Re: Ask the Expert: Performance Calculations on Clariion/VNX

Yeap Not all 4 will be writes two reads (one data and one parity) and two writes one data and one parity so total comes out to be four. Thanks for your help in getting this clarified, as i really meant the same thing. So again you came to the rescue.... 🙂

0 Kudos
Highlighted
3 Zinc

Re: Ask the Expert: Performance Calculations on Clariion/VNX

Let’s build on Rob’s post with some performance troubleshooting. For example, a server administrator has asked us to create a LUN that can handle 1000 IOps in a R:W ratio of 70:30. We calculated that he would end up with 700 read IOps and 300 write IOps from server perspective. Since we only have RAID5 at our disposal, we calculated that the back-end write IOps would not be 300 IOps but in fact 300 x 4 = 1200 IOps. Remember, this is a “all random I/O”, worst-case calculation. Real life will probably have some sequential write I/O, lowering the write penalty a bit. But since making assumptions can cost you dearly, if you don’t have facts, assume worst.

So Rob designed a LUN that needs to handle 700 + 4x300 IOps = 1900 IOps. Calculating with 15k FC/SAS drives, he ended up with 10.55 drives. Since EMC doesn’t sell parts of a drive, he built a 11 disk RAID 5 group, created the LUN and allocated it to the server. Job done!

Or is it? A couple of weeks later, the server admin comes back:

“my customers are complaining about the performance of the application. It’s the storage, fix it please!”

So how do you continue?

Customers (or server admins for that matter) usually don’t mind about utilization, throughput, R:W ratios, block sizes and the lot. They want only one thing: low response time. Think of yourself: opening Google, you only want the page to be there quickly. You don’t care how many CPU cycles it takes, whether it’s coming from cache or of its efficient on the back-end.

So I always start looking at response times. If I can, I prefer to start on the server end, using perfmon or an equivalent. This gives me the best view from the customer perspective, it allows me to look to the storage as a whole (SAN switches included). And, not unimportant, it allows me to check the assumption the server admin made: if I see <10 ms response times to the storage at all times, my money is on a different server component that’s being the bottleneck.

So let’s assume the server does show response times of a 100 ms. Ok, it’s a storage bottleneck. Make a note of the other perfmon data you have at your disposal: the server will also show you how many writes and/or reads it’s sending. This may come in handy during the following steps, or might give you some clues where to look. For example, if you know your RAID group can deliver 1900 IOps, you know it’s dedicated to this server, you verified that the R:W ratio is in the neighborhood of 70:30 and you only see 1000 IOps going towards the storage, my money is not on the disks or the RG. Probably it’s a bottleneck somewhere else: storage processor utilization, cache configuration, SAN ISL overloaded, etc. On the other hand, if you see the LUN requesting 5000 IOps all the time, with your knowledge of the disk setup you can already fairly certain assume the disks aren’t keeping up.

So after perfmon on the server side, I usually jump straight into Unisphere Analyzer, skipping the SAN switches. I haven’t had too many SAN bottlenecks yet, so I’ll save me the time. Again, I start at response times for the LUN. If the server is seeing 100 ms response times and the storage is reporting 20 ms response times, I know I have to double back to the SAN for the 80 ms gap. If the storage is also reporting 100 ms response times, I know I can focus there. From there on, I start at the storage processors and drill my way down to the disks, checking metrics such as utilization, throughput and queue lengths. Don’t forget to look at the LUN/RAID group configuration as well: maybe at some point someone disabled write cache!

At this point I think there’s no single flow of troubleshooting: it entirely depends on what you find and even more so on how you “connect the dots”. For example, in the previous example where the server was experiencing high response times but wasn’t using that many IOps, we also experienced slowdowns for approximately 50% of the environment and the problem was apparent on two storage systems at the same time. Checking some servers, we quickly came to the conclusion that the problem was only experienced on all LUNs attached to storage processor A. We used the VMware Infrastructure Client for this, which combined with a consistent VMFS data store naming convention made connecting the dots easy.

It turned out that a DBA was restoring a database to a virtual machine, doing so at the perfect speed of 300+ MB/s. The downside was that the insane amount of write throughput completely overloaded the storage processor, causing all other LUNs on that SP to slow down. The reason the other storage system also experienced a slowdown was because both systems were mirroring to each other, causing the writes on that system to slow down as well.

So what is the most common troubleshooting you have to do? What do you find easy or hard, or would you recommend to other storage troubleshooters? Which tools do you use? Let us know!

Highlighted
4 Beryllium

Re: Ask the Expert: Performance Calculations on Clariion/VNX

Thank you for the detailed response Rob and Jon. Rob provided the theoritcal part whereas Jon provided the scenario which helped to understand Storage Admins requirement and a flowchart in what to do to narrow down to what causes it!  (connecting the dots)

So what is the most common troubleshooting you have to do? What do you find easy or hard, or would you recommend to other storage troubleshooters? Which tools do you use? Let us know

- When it comes to t/s there are many things we take in to account for, right from physical layer to application. (I can not share the detailed procedure).

- I am still new to all this, but I try my level best to provide the best recommendation as per EMC Best Practice.

- Some properitory tools & Unisphere Analyzer.

0 Kudos