December #SANchat Transcript – All about deduplication

 Posted on behalf of Alison Krause, who works in Dell’s Storage Product Group, Social Media & Communications.

We have been so excited about all the great new things Dell is doing with the acquisition of Ocarina. We held a SANchat last month to talk all about compression and this month we continued the Ocarina conversation by talking about deduplication. It was a “part 2,” if you will. We had a great conversation!

As I did last month, want to repeat my thanks to Mike Davis (@mike_davis) for joining us as our expert. Before diving into this chat’s transcript, if you missed the post about last month’s discussion (as well as a link to further explain SANchats), you can find that here.

Mike did a great job explaining the difference between compression and dedupe. He described it as, “Compression = using math to describe patterns. Dedupe = eliminating redundancy either across or within files. 2 diff implementations.” He also answered questions about examples of when to use each and why Dell used one over the other on some of our products. Read through the transcript below to see more!

You can find the full transcript below. Be sure to follow us on Twitter so that you stay up to date on the upcoming SANchats and tweet us if you have any follow up questions/comments! Join us in January as we talk about Dell Storage Forum 2012 London!

dell_storage #SANchat starts in 1 hour1 talking all about deduplication today – come chime in!
NewFulcrumPoint RT @dell_storage: #SANchat starts in 1 hour1 talking all about deduplication today – come chime in!
gminks We’re running a bit behind schedule this morning for our #dedupe  #SANchat
gminks I’m gonna blame it on the cold – it’s 32 degrees and everyone in Austin is frozen :O #SANchat
dell_storage It’s time for #SANchat! We recommend using tweetchat to join the converstation, this month we’re talkin dedupe! http://t.co/GBBzIa7v
iSCSIKing @gminks It is not supposed to be that cold in Texas, so yes we are all frozen
#SANChat
LiemNguyen MT @dell_storage: #SANchat starts in 1 hour1 talking all about deduplication today ! cc\@rogerlund <Rog can I get a follow? 🙂
mike_davis Ahhh maybe they dedupe’d the temperature reading in TX #SANchat
LiemNguyen RT @iSCSIKing: @gminks It is not supposed to be that cold in Texas, so yes we are all frozen #SANChat <<I’m ashamed to admit I agree.
iSCSIKing RT @mike_davis: Ahhh maybe they deduped the temperature reading in TX < I think so  #SANChat
gminks as we wait for the austinites to thaw 😉 …here’s last month’s transcript on compression: http://t.co/7V8eGkLy #SANchat
gminks hi @iscsiking @mike_davis @liemnguyen! you guys were here last month… any take aways #SANchat
NewFulcrumPoint RT @gminks: as we wait for the austinites to thaw 😉 …here’s last month’s transcript on compression: http://t.co/7V8eGkLy #SANchat
gminks or we can just jump into talking about #dedupe!  #SANchat
johnobeto Tuning in to #sanchat on dedupe. Shhh. I’m in learning mode
gminks We’re talking #dedupe this morning …. grab a coffee and join in #SANchat
mike_davis We’re here to talk dedupe. Last time was compression. Two different animals, implemented differently,using different sys resources. #SANchat
gminks @johnobeto nice to see you….anything in particular you want to learn this morning? #SANchat
mike_davis Hey @johnobeto, you’re a veteran at dedupe…from #techfieldday 2010 #SANchat
iSCSIKing @johnobeto Hey John, thanks for chatting with us this morning
#SANChat
johnobeto @gminks Hello Gina. How dedupe can be driven down to SMBs cost-efficiently. #sanchat
gminks @mike_davis – so i know you are an expert on #dedupe and compression — what’s your background again? #SANchat
DennisMSmith Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/DtncDqbW
gminks actually – maybe everyone could introduce themselves. 🙂 #SANchat
gminks @DennisMSmith hey Dennis! welcome to the early morning edition of  #SANchat
DennisMSmith @gminks Thanks Gina!  Teach me all I need to know 🙂 #SANchat
iSCSIKing @gminks Hey Gina, this is Lance from the Dell TechCenter Storage team #SANChat
mike_davis Hi @gminks, I ran Marketing for Ocarina Networks (acq 17 mo ago). Implemented dedupe&compress concurrently in our products. #SANchat
gminks @mike_davis ok cool! so – lets start with 101 questions,…what’s the main diffs between compression and deduplication? #SANchat
johnobeto I learned my baby steps about dedupe from Ocarina…before they grabbed their pot of gold from Dell. They didn’t give me any 🙁 #sanchat
gminks I’m Gina from Dell Storage.  #SANchat
DennisMSmith I’m Dennis with the @DellTechCenter team
#SANchat
gminks @johnobeto well @mike_davis is sharing the knowledge now, that’s worth more than gold right? 😉 #SANchat
mike_davis @johnobeto DD for SMB is definitely useful, although implementation needs low cost overhead = embedded in arrays. #SANchat
storageDiva good morning all! Sheryl from Dell here @mike_davis – is it fair to call dedup a type of compression? #SANchat
johnobeto @gminks @mike_davis On the surface. However, I’m a superficial guy. Gimmie the Latinum! #sanchat
mike_davis Compression = using math to describe patterns. Dedupe = eliminating redundancy either across or within files.2 diff implementations #SANchat
johnobeto @mike_davis Very true. It also need to be totally abstrated from the average SMB supt drone, easy to implement & use. #sanchat
gminks @storageDiva good morning and welcome to  #SANchat
hansdeleenheer everytime you guys say goodmorning, I’m trying to go home. Everytime you say good afternoon I’m trying to sleep. Damn’ techchats #sanchat
mike_davis @storagediva technically compression is different. But a less strict interpretation could include DD as a method of compression. #SANchat
gminks RT @mike_davis: Compression = using math to describe patterns. Dedupe = eliminating redundancy either across/within files<nice def! #SANchat
johnobeto @mike_davis The initial cost of an SMB dedupe soln can be accounted for. It just needs to be a total background device/process IMO #sanchat
gminks @hansdeleenheer sorry! What time is it, so I know what to say to you #SANchat
JeffSullivan RT @DennisMSmith: Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/4zxE4rJm
iSCSIKing @hansdeleenheer well thanks for joining us today!
#SANChat
mike_davis @johnobeto making sure it’s transparent is hard. All Dell implementations make it end-user xparent, but always a resource cost. #SANchat
DennisMSmith @hansdeleenheer so would it be good evening to you now 🙂 #SANchat
gminks @JeffSullivan hey Jeff.  #SANchat
mike_davis We’re very cautious about use-case. In backup we can anticipate different things than primary storage. data patterns, types, etc #SANchat
WarrenAtDell RT @DennisMSmith: Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/Qn1WHEXh
hansdeleenheer what has the most impact on performance? Compression or Dedupe? (lets assume at block level in the SAN) #sanchat
cyndenabc RT @WarrenAtDell: RT @DennisMSmith: Grab your favorite morning beverage and join us for #SANChat as we talk about #dedupe http://t.co/Qn1WHEXh
calmo @mike_davis how about comparing dedupe vs. compression WRT recoverable space potential (real world)? #SANchat #SANchat
mike_davis We need to tailor the design; in-band,post-proc, different levels of aggressiveness. In different products you will see differences #SANchat
hansdeleenheer Would you prefer Compression/Dedupe at source or at target? before or after initial write? #sanchat
mike_davis @hansdeleenheer both DD and Cmp have overhead. DD is mem IO intense, Cmp is CPU intense. So it depends on what’s available. #SANchat
gminks @cyndenabc @WarrenAtDell @calmo welcome to  #SANchat
storageDiva @mike_davis amen on use-case – it’s no good being the guy with the hammer – esp. when we can anticipate characteristics #SANchat
mike_davis Content aware compr will have much different resource overhead than generic fast comp (LZ etc). So we play those together also. #SANchat
gminks @hansdeleenheer wow we can tell its not early morning for you. Great questions!  #SANchat
gminks RT @mike_davis: Were very cautious about use-case. In backup we can anticipate different things than primary storage.  #SANchat
mike_davis @calmo in a typ backup workflow, just DD can deliver 90%+ savings. But apply the same alg to primary storage and maybe 40%. YMMV. #SANchat
hansdeleenheer is the impact on performance on rehydration same level as on initial write? #sanchat
johnobeto Does dedupe require you to tier your storage? If so, are there cost efficiencies in doing that? #sanchat
storageDiva RT @mike_davis: both DD and Cmp have overhead. DD is mem IO intense, Cmp is CPU intense. So it depends on what’s available. #SANchat
mike_davis In primary storage we’d like to have content awareness and policies. But that implies file-based (eg our DX). Block is tougher. #SANchat
hansdeleenheer @Gminks I can be a pain in the ass all day long. Ask the DellTechCenter people 🙂 #sanchat
gminks @hansdeleenheer oh so you are one of us.  #SANchat
mike_davis @hansdeleenheer we drive for asymmetry in performance. Take more time to shrink than to rehydrate. So read impact is minimized. #SANchat
JeffSullivan @hansdeleenheer Not at all!  #sanchat
gminks RT @storageDiva: @mike_davis amen its no good being the guy with the hammer – esp. when we can anticipate characteristics #SANchat
DennisMSmith @hansdeleenheer @Gminks haha, not at all.  You have great questions, we’re just here to make finding the answer easier 🙂 #SANchat
johnobeto @MBLeib Thanks, Matt. My goal is trying to find a sweet [financial] spot where tiering becomes fiscally prudent to implement #sanchat
iSCSIKing @hansdeleenheer You ask great questions, keeps us on our toes … #SANChat
mike_davis @johnobeto DD/Compr definitely a ‘virtual’ tiering in some cases (post proc). In backup less so. #SANchat
gminks @mike_davis why is block so much tougher? #SANchat
gminks hi @MBLeib – you joining us for  #SANchat
gminks RT @mike_davis: we drive for asymmetry in performance. Take more time to shrink than to rehydrate. So read impact is minimized. #SANchat
mike_davis @gminks Block is easy to implement; generic algorithm.But hard to be content aware; data is opaque in most cases.And less CPU/RAM!  #SANchat
hansdeleenheer if Compression/Dedupe is such a great thing, why not implementing it in all storage solutions (at block level!) #sanchat
mike_davis @hansdeleenheer Anytime you alter the core data path of product, need to go slow and get it right. minimize overhead, maximize rel. #SANchat
gminks @mike_davis so there is content aware and non-content aware #dedupe (and compression??) #SANchat
storageDiva another Q for @mike_davis: why did we choose compres (vs. dedup) for the DX? #SANchat
johnobeto From a technical standpoint, is it possible to add a ‘dedupe processor/co-processor’ to storage to improve performance? #sanchat
gminks RT @Mike_Davis: @hansdeleenheer Anytime you alter the core data path of product, need to go slow and get it right. minimize overhead, maximize rel. #SANchat
hansdeleenheer grwat question Gina! why would I need content-aware block dedupe? that is what block is all about. #sanchat
mike_davis content-aware means recogniz data type and doing something differently with it (special alg). Applies more to compr, but DD as well #SANchat
mike_davis One feature we suppt is ‘object’ DD. If we see a JPG for ex in a stream, we will treat as 1 ‘chunk’. Improves performance. #SANchat
gminks RT @Mike_Davis: content-aware means recogniz data type and doing something differently with it (special alg). Applies more to compr, but DD as well #SANchat
hansdeleenheer @gminks It’s not! thats what I meant #SANchat
iSCSIKing @mike_davis which is better in-line or post-process #dedup ? or does it matter? #SANChat
MBLeib RT @MBLeib: @johnobeto The idea of adding a proc to the purpose of dedupe on the array is very cool, but I’ve yet to see it done #SANchat
johnobeto @MBLeib Seems like it would help in offloading the performance hit in dedupe #sanchat
mike_davis @johnobeto Co-proc is definitely something we consider. break out compute overh into sep box. Adds cost, but also perf, flexibility #SANchat
gminks @hansdeleenheer I’m confused.  #SANchat
mike_davis @MBLeib One interesting co-proc idea is GPU. Great at FP operations. Not so much for dedupe. #SANchat
hansdeleenheer @gminks I want dedupe to happen on block level so I don’t need content aware solutions #SANchat
MBLeib @johnobeto The idea is that most enterprise San’s have processor to spare, so unnecessary. @mike_davis makes a good point #sanchat
gminks @hansdeleenheer aH.ok thx for spelling it out. #needanothercupofcoffee #SANchat
iSCSIKing RT @mike_davis: @johnobeto Co-proc is def something we consider. break compute overh in2 sep box. Adds $ but also perf, flexibility #SANchat
mike_davis @storageDiva chose aggress content aware compr for DX because it’s an archival workload. getting every GB out is P1. #SANchat
Justin_Lauer @MBLeib @johnobeto With multicore CPUs why would you need to dedicate a CPU to only dedup?  Should be native to array by now. #SANchat
MBLeib @Justin_Lauer We came from EMC, in Ent, the vMax has processor expanability up to 8, I believe, so no need to dedicate to dedupe #SANchat
mike_davis @Justin_Lauer If we have CPU+RAM surplus, can definitely lever that. But some use cases want to throttle/schedule. #SANchat
mike_davis Sometimes a SAN controller CPU is a lot more expen$ive than generic co-proc CPU  😉 #SANchat
gminks @Justin_Lauer good morning & welcome to  #SANchat
johnobeto @Justin_Lauer @MBLeib  Not necessarily a CPU. Maybe a core, ASIC or some dedicated silicon. Anything that nullifies the dedupe hit #sanchat
gminks RT @mike_davis: Sometimes a SAN controller CPU is a lot more expen$ive than generic co-proc CPU  😉 #SANchat
hansdeleenheer @Justin_Lauer lso: if you can handle dedupe/compression by he server it doesnt hit the network / connectivity of the SAN #sanchat
Justin_Lauer @johnobeto @MBLeib I guess what i’m saying is hardware more than powerful enought.  Make it native to the filesystem #sanchat
MBLeib RT @Justin_Lauer: @johnobeto @MBLeib Make it native to the filesystem>> Well said, Justin. Completely agreed. #sanchat
mike_davis @Justin_Lauer @johnobeto Agreed, sys resources are more than enough, esp in SMB environment. SQL cares about xaction latency though #SANchat
hansdeleenheer @Justin_Lauer native to filesystem = MS has this integrated in Win8! #sanchat
johnobeto @Justin_Lauer @MBLeib Good. That allows me to segue into this: forthcoming Windows Server 8 has rudimentary dedupe built into it…#sanchat
hansdeleenheer @Justin_Lauer MS has this integrated in Win8! … but I guess you weren’t waiting for this answer 🙂 #sanchat
gminks RT @hansdeleenheer: @Justin_Lauer MS has this integrated in Win8! … but I guess you werent waiting for this answer 🙂 <haha #SANchat
rogerlund RT @LiemNguyen: MT @dell_storage: #SANchat starts in 1 hour1 talking all about deduplication today ! cc\@rogerlund <Rog can I get a follow? 🙂
hansdeleenheer @Mike > this brings u to: what will be the impact of MS dedupe over SAN dedupe? #sanchat
Justin_Lauer @johnobeto Win8 is great, but that is only one OS.  Large Enterprise run all sorts of stuff on VMware.  Dedup at array solves that. #sanchat
gminks @rogerlund hi Roger – you joining for the last five minutes of  #SANchat
mike_davis The interesting part of Win8 is that it is data path flexibility. Will be good to bring data reduction back to exchange envir #SANchat
mike_davis @hansdeleenheer data red at host, in SAN, in file sys, in archive, and in backup will all work together…complementary. #SANchat
hansdeleenheer @mike_davis will it lead to dedupe on dedupe on dedupe or will the end result be the same? #sanchat
mike_davis @Justin_Lauer dedupe in array can solve, but helps to be content aware (VMDK). Other solutions can exist in Hypervisor or agent #SANchat
gminks hey @ISCSIking are these the sorts of questions @hansdeleenheer normally asks? LOL #SANchat
calmo Interesting question Justin. Googling “filesystem dedupe” fills out my reading list for the next week. #SANchat
mike_davis @hansdeleenheer keep in mind moving data in deduped form delivers big benefits too. #SANchat
gminks We’re getting close to the end of our hour, thanks @mike_davis for joining us again to have a #dedupe discussion on  #SANchat
Justin_Lauer @mike_davis Ahhh!  Dedupe and context aware!  Now you are talking my language and it sounds like @TintriInc  #SANchat
johnobeto RT @Mike_Davis: The interesting part of Win8 is that it is data path flexibility. Will be good to bring data reduction back to exchange envir #SANchat
iSCSIKing @gminks yes @hansdeleenheer always has great questions
#SANChat
johnobeto @mike_davis Agree. Anything that improves Exchange is “A Good Thing” #SANchat
gminks @mike_davis so where will you be in the next couple of months? maybe we can continue this discussion on a future  #SANchat
hansdeleenheer RT @mike_davis: …moving data in deduped form delivers big benefits too. > Yep – bandwith reduction for example! #SANchat
johnobeto @gminks @mike_davis We’ve got to #SANchato this again. Love Ocarina & these SANchats. Thanks to all y’all & @LiemNguyen Cheers #sanchat
mike_davis @gminks I spend my time half in Sunnyvale, half in Austin, half on Southwest. #SANchat
hansdeleenheer Goodnight all! #sanchat
mike_davis @hansdeleenheer and backup window, and restore time, and replication synchronicity #SANchat
gminks Ok everyone, time to close it out. If you didn’t introduce yourself, now’s a good time. Also plz let us know what you’re working on #SANchat
gminks I’m working on the Dell Storage Forum…maybe we can do this live from London next month! #dellsf12 #SANchat
iSCSIKing @hansdeleenheer good night Hans! Thanks for joining
#SANChat
gminks @hansdeleenheer good night hans thank you for joining and making it so lively today #SANchat
iSCSIKing Be sure to join us next week for the TechChat – Tuesday at 3 PM #SANChat
gminks RT @iSCSIKing: Be sure to join us next week for the TechChat – Tuesday at 3 PM <thanks for joining today #SANchat
storageDiva thanks @gminks @mike_davis et al – that was a great chat today
#SANchat
web20education #WeVideo video editing and collaboration in the cloud #edtech20: #elearning #edchat #ukedchat #sanchat #socialmedia – http://t.co/SirBuXaU
equallogic We will have the #SANchat transcript posted soon, will tweet the link once it’s ready!
web20education At #leweb via @techcrunch Facebook To Launch A Subscribe Button For Websites http://t.co/2JB2cd3Y  #edtech20 #socialmedia #edchat #sanchat
gminks RT @johnobeto: @gminks @mike_davis We’ve got to #SANchato this again. Love Ocarina & these SANchats. Thanks to all y’all & @LiemNguyen Cheers #sanchat
gminks RT @storagediva: thanks @gminks @mike_davis et al – that was a great chat today
#SANchat <thx for joining ms. diva!
web20education #Crowdbooster #SocialMedia Analytics and Optimization #edtech20: #edchat #ukedchat #elemchat #leweb #sanchat #smm – http://t.co/RRLwYd0q
johnobeto @mletschin @MBLeib Hey Mike. Missed this tweet during the #sanchat. Yes, we should

About the Author: Gina Rosenthal