Posted on behalf of Alison Krause, who works in Dell’s Storage Product Group, Social Media & Communications.
We have been so excited about all the great new things Dell is doing with the acquisition of Ocarina. We held a SANchat last month to talk all about compression and this month we continued the Ocarina conversation by talking about deduplication. It was a “part 2,” if you will. We had a great conversation!
As I did last month, want to repeat my thanks to Mike Davis (@mike_davis) for joining us as our expert. Before diving into this chat’s transcript, if you missed the post about last month’s discussion (as well as a link to further explain SANchats), you can find that here.
Mike did a great job explaining the difference between compression and dedupe. He described it as, “Compression = using math to describe patterns. Dedupe = eliminating redundancy either across or within files. 2 diff implementations.” He also answered questions about examples of when to use each and why Dell used one over the other on some of our products. Read through the transcript below to see more!
You can find the full transcript below. Be sure to follow us on Twitter so that you stay up to date on the upcoming SANchats and tweet us if you have any follow up questions/comments! Join us in January as we talk about Dell Storage Forum 2012 London!
dell_storage | #SANchat starts in 1 hour1 talking all about deduplication today – come chime in! |
NewFulcrumPoint | RT @dell_storage: #SANchat starts in 1 hour1 talking all about deduplication today – come chime in! |
gminks | We’re running a bit behind schedule this morning for our #dedupe #SANchat |
gminks | I’m gonna blame it on the cold – it’s 32 degrees and everyone in Austin is frozen :O #SANchat |
dell_storage | It’s time for #SANchat! We recommend using tweetchat to join the converstation, this month we’re talkin dedupe! http://t.co/GBBzIa7v |
iSCSIKing | @gminks It is not supposed to be that cold in Texas, so yes we are all frozen #SANChat |
LiemNguyen | MT @dell_storage: #SANchat starts in 1 hour1 talking all about deduplication today ! cc\@rogerlund <Rog can I get a follow? 🙂 |
mike_davis | Ahhh maybe they dedupe’d the temperature reading in TX #SANchat |
LiemNguyen | RT @iSCSIKing: @gminks It is not supposed to be that cold in Texas, so yes we are all frozen #SANChat <<I’m ashamed to admit I agree. |
iSCSIKing | RT @mike_davis: Ahhh maybe they deduped the temperature reading in TX < I think so #SANChat |
gminks | as we wait for the austinites to thaw 😉 …here’s last month’s transcript on compression: http://t.co/7V8eGkLy #SANchat |
gminks | hi @iscsiking @mike_davis @liemnguyen! you guys were here last month… any take aways #SANchat |
NewFulcrumPoint | RT @gminks: as we wait for the austinites to thaw 😉 …here’s last month’s transcript on compression: http://t.co/7V8eGkLy #SANchat |
gminks | or we can just jump into talking about #dedupe! #SANchat |
johnobeto | Tuning in to #sanchat on dedupe. Shhh. I’m in learning mode |
gminks | We’re talking #dedupe this morning …. grab a coffee and join in #SANchat |
mike_davis | We’re here to talk dedupe. Last time was compression. Two different animals, implemented differently,using different sys resources. #SANchat |
gminks | @johnobeto nice to see you….anything in particular you want to learn this morning? #SANchat |
mike_davis | Hey @johnobeto, you’re a veteran at dedupe…from #techfieldday 2010 #SANchat |
iSCSIKing | @johnobeto Hey John, thanks for chatting with us this morning #SANChat |
johnobeto | @gminks Hello Gina. How dedupe can be driven down to SMBs cost-efficiently. #sanchat |
gminks | @mike_davis – so i know you are an expert on #dedupe and compression — what’s your background again? #SANchat |
DennisMSmith | Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/DtncDqbW |
gminks | actually – maybe everyone could introduce themselves. 🙂 #SANchat |
gminks | @DennisMSmith hey Dennis! welcome to the early morning edition of #SANchat |
DennisMSmith | @gminks Thanks Gina! Teach me all I need to know 🙂 #SANchat |
iSCSIKing | @gminks Hey Gina, this is Lance from the Dell TechCenter Storage team #SANChat |
mike_davis | Hi @gminks, I ran Marketing for Ocarina Networks (acq 17 mo ago). Implemented dedupe&compress concurrently in our products. #SANchat |
gminks | @mike_davis ok cool! so – lets start with 101 questions,…what’s the main diffs between compression and deduplication? #SANchat |
johnobeto | I learned my baby steps about dedupe from Ocarina…before they grabbed their pot of gold from Dell. They didn’t give me any 🙁 #sanchat |
gminks | I’m Gina from Dell Storage. #SANchat |
DennisMSmith | I’m Dennis with the @DellTechCenter team #SANchat |
gminks | @johnobeto well @mike_davis is sharing the knowledge now, that’s worth more than gold right? 😉 #SANchat |
mike_davis | @johnobeto DD for SMB is definitely useful, although implementation needs low cost overhead = embedded in arrays. #SANchat |
storageDiva | good morning all! Sheryl from Dell here @mike_davis – is it fair to call dedup a type of compression? #SANchat |
johnobeto | @gminks @mike_davis On the surface. However, I’m a superficial guy. Gimmie the Latinum! #sanchat |
mike_davis | Compression = using math to describe patterns. Dedupe = eliminating redundancy either across or within files.2 diff implementations #SANchat |
johnobeto | @mike_davis Very true. It also need to be totally abstrated from the average SMB supt drone, easy to implement & use. #sanchat |
gminks | @storageDiva good morning and welcome to #SANchat |
hansdeleenheer | everytime you guys say goodmorning, I’m trying to go home. Everytime you say good afternoon I’m trying to sleep. Damn’ techchats #sanchat |
mike_davis | @storagediva technically compression is different. But a less strict interpretation could include DD as a method of compression. #SANchat |
gminks | RT @mike_davis: Compression = using math to describe patterns. Dedupe = eliminating redundancy either across/within files<nice def! #SANchat |
johnobeto | @mike_davis The initial cost of an SMB dedupe soln can be accounted for. It just needs to be a total background device/process IMO #sanchat |
gminks | @hansdeleenheer sorry! What time is it, so I know what to say to you #SANchat |
JeffSullivan | RT @DennisMSmith: Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/4zxE4rJm |
iSCSIKing | @hansdeleenheer well thanks for joining us today! #SANChat |
mike_davis | @johnobeto making sure it’s transparent is hard. All Dell implementations make it end-user xparent, but always a resource cost. #SANchat |
DennisMSmith | @hansdeleenheer so would it be good evening to you now 🙂 #SANchat |
gminks | @JeffSullivan hey Jeff. #SANchat |
mike_davis | We’re very cautious about use-case. In backup we can anticipate different things than primary storage. data patterns, types, etc #SANchat |
WarrenAtDell | RT @DennisMSmith: Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/Qn1WHEXh |
hansdeleenheer | what has the most impact on performance? Compression or Dedupe? (lets assume at block level in the SAN) #sanchat |
cyndenabc | RT @WarrenAtDell: RT @DennisMSmith: Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/Qn1WHEXh |
calmo | @mike_davis how about comparing dedupe vs. compression WRT recoverable space potential (real world)? #SANchat #SANchat |
mike_davis | We need to tailor the design; in-band,post-proc, different levels of aggressiveness. In different products you will see differences #SANchat |
hansdeleenheer | Would you prefer Compression/Dedupe at source or at target? before or after initial write? #sanchat |
mike_davis | @hansdeleenheer both DD and Cmp have overhead. DD is mem IO intense, Cmp is CPU intense. So it depends on what’s available. #SANchat |
gminks | @cyndenabc @WarrenAtDell @calmo welcome to #SANchat |
storageDiva | @mike_davis amen on use-case – it’s no good being the guy with the hammer – esp. when we can anticipate characteristics #SANchat |
mike_davis | Content aware compr will have much different resource overhead than generic fast comp (LZ etc). So we play those together also. #SANchat |
gminks | @hansdeleenheer wow we can tell its not early morning for you. Great questions! #SANchat |
gminks | RT @mike_davis: Were very cautious about use-case. In backup we can anticipate different things than primary storage. #SANchat |
mike_davis | @calmo in a typ backup workflow, just DD can deliver 90%+ savings. But apply the same alg to primary storage and maybe 40%. YMMV. #SANchat |
hansdeleenheer | is the impact on performance on rehydration same level as on initial write? #sanchat |
johnobeto | Does dedupe require you to tier your storage? If so, are there cost efficiencies in doing that? #sanchat |
storageDiva | RT @mike_davis: both DD and Cmp have overhead. DD is mem IO intense, Cmp is CPU intense. So it depends on what’s available. #SANchat |
mike_davis | In primary storage we’d like to have content awareness and policies. But that implies file-based (eg our DX). Block is tougher. #SANchat |
hansdeleenheer | @Gminks I can be a pain in the ass all day long. Ask the DellTechCenter people 🙂 #sanchat |
gminks | @hansdeleenheer oh so you are one of us. #SANchat |
mike_davis | @hansdeleenheer we drive for asymmetry in performance. Take more time to shrink than to rehydrate. So read impact is minimized. #SANchat |
JeffSullivan | @hansdeleenheer Not at all! #sanchat |
gminks | RT @storageDiva: @mike_davis amen its no good being the guy with the hammer – esp. when we can anticipate characteristics #SANchat |
DennisMSmith | @hansdeleenheer @Gminks haha, not at all. You have great questions, we’re just here to make finding the answer easier 🙂 #SANchat |
johnobeto | @MBLeib Thanks, Matt. My goal is trying to find a sweet [financial] spot where tiering becomes fiscally prudent to implement #sanchat |
iSCSIKing | @hansdeleenheer You ask great questions, keeps us on our toes … #SANChat |
mike_davis | @johnobeto DD/Compr definitely a ‘virtual’ tiering in some cases (post proc). In backup less so. #SANchat |
gminks | @mike_davis why is block so much tougher? #SANchat |
gminks | hi @MBLeib – you joining us for #SANchat |
gminks | RT @mike_davis: we drive for asymmetry in performance. Take more time to shrink than to rehydrate. So read impact is minimized. #SANchat |
mike_davis | @gminks Block is easy to implement; generic algorithm.But hard to be content aware; data is opaque in most cases.And less CPU/RAM! #SANchat |
hansdeleenheer | if Compression/Dedupe is such a great thing, why not implementing it in all storage solutions (at block level!) #sanchat |
mike_davis | @hansdeleenheer Anytime you alter the core data path of product, need to go slow and get it right. minimize overhead, maximize rel. #SANchat |
gminks | @mike_davis so there is content aware and non-content aware #dedupe (and compression??) #SANchat |
storageDiva | another Q for @mike_davis: why did we choose compres (vs. dedup) for the DX? #SANchat |
johnobeto | From a technical standpoint, is it possible to add a ‘dedupe processor/co-processor’ to storage to improve performance? #sanchat |
gminks | RT @Mike_Davis: @hansdeleenheer Anytime you alter the core data path of product, need to go slow and get it right. minimize overhead, maximize rel. #SANchat |
hansdeleenheer | grwat question Gina! why would I need content-aware block dedupe? that is what block is all about. #sanchat |
mike_davis | content-aware means recogniz data type and doing something differently with it (special alg). Applies more to compr, but DD as well #SANchat |
mike_davis | One feature we suppt is ‘object’ DD. If we see a JPG for ex in a stream, we will treat as 1 ‘chunk’. Improves performance. #SANchat |
gminks | RT @Mike_Davis: content-aware means recogniz data type and doing something differently with it (special alg). Applies more to compr, but DD as well #SANchat |
hansdeleenheer | @gminks It’s not! thats what I meant #SANchat |
iSCSIKing | @mike_davis which is better in-line or post-process #dedup ? or does it matter? #SANChat |
MBLeib | RT @MBLeib: @johnobeto The idea of adding a proc to the purpose of dedupe on the array is very cool, but I’ve yet to see it done #SANchat |
johnobeto | @MBLeib Seems like it would help in offloading the performance hit in dedupe #sanchat |
mike_davis | @johnobeto Co-proc is definitely something we consider. break out compute overh into sep box. Adds cost, but also perf, flexibility #SANchat |
gminks | @hansdeleenheer I’m confused. #SANchat |
mike_davis | @MBLeib One interesting co-proc idea is GPU. Great at FP operations. Not so much for dedupe. #SANchat |
hansdeleenheer | @gminks I want dedupe to happen on block level so I don’t need content aware solutions #SANchat |
MBLeib | @johnobeto The idea is that most enterprise San’s have processor to spare, so unnecessary. @mike_davis makes a good point #sanchat |
gminks | @hansdeleenheer aH.ok thx for spelling it out. #needanothercupofcoffee #SANchat |
iSCSIKing | RT @mike_davis: @johnobeto Co-proc is def something we consider. break compute overh in2 sep box. Adds $ but also perf, flexibility #SANchat |
mike_davis | @storageDiva chose aggress content aware compr for DX because it’s an archival workload. getting every GB out is P1. #SANchat |
Justin_Lauer | @MBLeib @johnobeto With multicore CPUs why would you need to dedicate a CPU to only dedup? Should be native to array by now. #SANchat |
MBLeib | @Justin_Lauer We came from EMC, in Ent, the vMax has processor expanability up to 8, I believe, so no need to dedicate to dedupe #SANchat |
mike_davis | @Justin_Lauer If we have CPU+RAM surplus, can definitely lever that. But some use cases want to throttle/schedule. #SANchat |
mike_davis | Sometimes a SAN controller CPU is a lot more expen$ive than generic co-proc CPU 😉 #SANchat |
gminks | @Justin_Lauer good morning & welcome to #SANchat |
johnobeto | @Justin_Lauer @MBLeib Not necessarily a CPU. Maybe a core, ASIC or some dedicated silicon. Anything that nullifies the dedupe hit #sanchat |
gminks | RT @mike_davis: Sometimes a SAN controller CPU is a lot more expen$ive than generic co-proc CPU 😉 #SANchat |
hansdeleenheer | @Justin_Lauer lso: if you can handle dedupe/compression by he server it doesnt hit the network / connectivity of the SAN #sanchat |
Justin_Lauer | @johnobeto @MBLeib I guess what i’m saying is hardware more than powerful enought. Make it native to the filesystem #sanchat |
MBLeib | RT @Justin_Lauer: @johnobeto @MBLeib Make it native to the filesystem>> Well said, Justin. Completely agreed. #sanchat |
mike_davis | @Justin_Lauer @johnobeto Agreed, sys resources are more than enough, esp in SMB environment. SQL cares about xaction latency though #SANchat |
hansdeleenheer | @Justin_Lauer native to filesystem = MS has this integrated in Win8! #sanchat |
johnobeto | @Justin_Lauer @MBLeib Good. That allows me to segue into this: forthcoming Windows Server 8 has rudimentary dedupe built into it…#sanchat |
hansdeleenheer | @Justin_Lauer MS has this integrated in Win8! … but I guess you weren’t waiting for this answer 🙂 #sanchat |
gminks | RT @hansdeleenheer: @Justin_Lauer MS has this integrated in Win8! … but I guess you werent waiting for this answer 🙂 <haha #SANchat |
rogerlund | RT @LiemNguyen: MT @dell_storage: #SANchat starts in 1 hour1 talking all about deduplication today ! cc\@rogerlund <Rog can I get a follow? 🙂 |
hansdeleenheer | @Mike > this brings u to: what will be the impact of MS dedupe over SAN dedupe? #sanchat |
Justin_Lauer | @johnobeto Win8 is great, but that is only one OS. Large Enterprise run all sorts of stuff on VMware. Dedup at array solves that. #sanchat |
gminks | @rogerlund hi Roger – you joining for the last five minutes of #SANchat |
mike_davis | The interesting part of Win8 is that it is data path flexibility. Will be good to bring data reduction back to exchange envir #SANchat |
mike_davis | @hansdeleenheer data red at host, in SAN, in file sys, in archive, and in backup will all work together…complementary. #SANchat |
hansdeleenheer | @mike_davis will it lead to dedupe on dedupe on dedupe or will the end result be the same? #sanchat |
mike_davis | @Justin_Lauer dedupe in array can solve, but helps to be content aware (VMDK). Other solutions can exist in Hypervisor or agent #SANchat |
gminks | hey @ISCSIking are these the sorts of questions @hansdeleenheer normally asks? LOL #SANchat |
calmo | Interesting question Justin. Googling “filesystem dedupe” fills out my reading list for the next week. #SANchat |
mike_davis | @hansdeleenheer keep in mind moving data in deduped form delivers big benefits too. #SANchat |
gminks | We’re getting close to the end of our hour, thanks @mike_davis for joining us again to have a #dedupe discussion on #SANchat |
Justin_Lauer | @mike_davis Ahhh! Dedupe and context aware! Now you are talking my language and it sounds like @TintriInc #SANchat |
johnobeto | RT @Mike_Davis: The interesting part of Win8 is that it is data path flexibility. Will be good to bring data reduction back to exchange envir #SANchat |
iSCSIKing | @gminks yes @hansdeleenheer always has great questions #SANChat |
johnobeto | @mike_davis Agree. Anything that improves Exchange is “A Good Thing” #SANchat |
gminks | @mike_davis so where will you be in the next couple of months? maybe we can continue this discussion on a future #SANchat |
hansdeleenheer | RT @mike_davis: …moving data in deduped form delivers big benefits too. > Yep – bandwith reduction for example! #SANchat |
johnobeto | @gminks @mike_davis We’ve got to #SANchato this again. Love Ocarina & these SANchats. Thanks to all y’all & @LiemNguyen Cheers #sanchat |
mike_davis | @gminks I spend my time half in Sunnyvale, half in Austin, half on Southwest. #SANchat |
hansdeleenheer | Goodnight all! #sanchat |
mike_davis | @hansdeleenheer and backup window, and restore time, and replication synchronicity #SANchat |
gminks | Ok everyone, time to close it out. If you didn’t introduce yourself, now’s a good time. Also plz let us know what you’re working on #SANchat |
gminks | I’m working on the Dell Storage Forum…maybe we can do this live from London next month! #dellsf12 #SANchat |
iSCSIKing | @hansdeleenheer good night Hans! Thanks for joining #SANChat |
gminks | @hansdeleenheer good night hans thank you for joining and making it so lively today #SANchat |
iSCSIKing | Be sure to join us next week for the TechChat – Tuesday at 3 PM #SANChat |
gminks | RT @iSCSIKing: Be sure to join us next week for the TechChat – Tuesday at 3 PM <thanks for joining today #SANchat |
storageDiva | thanks @gminks @mike_davis et al – that was a great chat today #SANchat |
web20education | #WeVideo video editing and collaboration in the cloud #edtech20: #elearning #edchat #ukedchat #sanchat #socialmedia – http://t.co/SirBuXaU |
equallogic | We will have the #SANchat transcript posted soon, will tweet the link once it’s ready! |
web20education | At #leweb via @techcrunch Facebook To Launch A Subscribe Button For Websites http://t.co/2JB2cd3Y #edtech20 #socialmedia #edchat #sanchat |
gminks | RT @johnobeto: @gminks @mike_davis We’ve got to #SANchato this again. Love Ocarina & these SANchats. Thanks to all y’all & @LiemNguyen Cheers #sanchat |
gminks | RT @storagediva: thanks @gminks @mike_davis et al – that was a great chat today #SANchat <thx for joining ms. diva! |
web20education | #Crowdbooster #SocialMedia Analytics and Optimization #edtech20: #edchat #ukedchat #elemchat #leweb #sanchat #smm – http://t.co/RRLwYd0q |
johnobeto | @mletschin @MBLeib Hey Mike. Missed this tweet during the #sanchat. Yes, we should |