Start a Conversation

Unsolved

This post is more than 5 years old

1300

November 25th, 2013 11:00

Data Hoarding and Dark Data

pile-of-hard-drives.jpg

I read a lot about Big Data these days, from how cops are using it to predict crime patterns to how the WHO is tracking the spread of epidemics to better provide resources and care to the sick. There are a flood of articles about Big Data, followed by another flood of reports. It was last year's #2 word of the year from Time Magazine. And I've always wondered--all of this data is created, but where is it stored?

Wired recently published an article called Is the Internet of Things creating Data Hoarders? and this particular piece caught my eye:

Take for example, “dark data,” which is typically files, presentations, reports, images, and emails that are stockpiled but not analyzed, used or otherwise monetized. Dark data not only consumes valuable storage, network and management resources, it increases liability unless proactively purged using pre-established criteria for defensible deletion.

I have to admit, I'm guilty of hoarding some forms of dark data. I keep every email notification from this community, for example, just to make sure I have a record of the things that community members have said, requested, or shared. Do I need all these emails? Probably not, since I can just search the content list for the community. Will I delete these emails? Probably not, because "just in case."

I'm curious to know if any of you are data hoarders, or are working on projects that can help prevent hoarding of useless data. The article suggests creating automated processes for deleting data that doesn't meet a certain criteria. Has anyone set up such a process?

633 Posts

November 25th, 2013 12:00

It's almost like confronting the problem of paper storage--remember the push for the paper-free office? The papers piled up and outgrew filing cabinets, so we all moved to email, pdfs, and document sharing. Now we have the same problem. The idea of a Zero-Email Network is really intriguing. I can imagine someone arguing for me deleting all my old email notifications from the community, but what about the important emails I get about projects or requests for information? Would the policy nudge me toward filing all that info into a word doc/spreadsheet?

Is there anyone else out there whose employer is trying something like this?

November 25th, 2013 12:00

Hoarding is here to stay. I have yet to meet an user (and that ranges from your grandma to your favourite IT-nerd) that doesn't keep a copy of something "just in case". No matther how many systems you set up to purge mails, docs, etc. the user will always outsmart you finding a way to keep a copy of their files It's a lost batlle

My employer (Atos) is working in/publiciting a "Zero Email Network" to reduce the number of internal emails in organizations, and we are the guinea pigs. Guess what, everyone keeps copy of the messages posted in the network instead of keeping copy of the emails I don't know if it would reduce the number of emails sents, what I'm sure is we won't be saving harddisk space

14.3K Posts

February 7th, 2014 00:00

Preventing hoarding of useless data is - useless.  This notion of useless data is wrong as to owner this is data of use - otherwise owner would remove it in the first place. You mentioned email so let me go back on that one - I keep record of every email I ever received - even those which were sent to me by mistake.  Whatever I do I keep sending ack of my action to parties involved so this is kind of my daily bio in case I ever need it (and strangely enough, I do need to refresh memory some managers). I keep it because of future potential use and because I can.  If I couldn't - I wouldn't.  But I can so I do - initially I did store this on my backup disks, but with introduction of company wide vault service it is all done for me along with index management.  And on the back-end it is all deduplicated and perhaps managed in even more clever way as solution does integrate with Exchange server we run.  So, we do not see this as dark data at all (me nor the company).  I think dark data is only data which is there as blind passenger without anyone being aware of it, but I see less and less of those with years due to all kinds of index/analytic approach of file systems, filers, application and similar which integrating with data and at the same time trying to build tier-ready approach for end user and organisations.

No Events found!

Top