I think the math is not working out in your favor.  Let's assume that you're going to be able to take 5Gbps of backup traffic per node, all day long.  And that's really, really stretching it.  That's 25GB per second.  That's 90TB per hour or 2PB per day.  I don't think there's any chance you'll get that traffic out of our cluster all day long and certainly not out of a cluster full of NL400 nodes.  You don't want that amount of traffic on your primary cluster, along with the replication traffic, or you'll never get any real work done.

Getting 300TB of replication traffic out of your cluster onto your NLs is hard enough.  Pushing that much data to tape is going to be impossible (and expensive to build out).  The only shot I think you've got is to do backups to disk and continuous incrementals and even that first copy is going to be pretty ugly.

If you do get it working, I'd really like to see what you've built.

a couple of more thoughts.

300TB/day is 5TB/day/node or 70MByte/s/node.

Yes, the X nodes need to spare a bit headroom,

but only a bit, from their HPC load to sustain this.

If those 70MByte/s/node arrive at the NL nodes,

these should be able to pass it on to the tapes, throughput-wise.

If you can keep enough tape drives busy in parallel

(at 100-200+ MByte/s/drive), that is. It depends on the

directory layout(!) and typical file size, of course.

10GE on NL nodes will be fine, and easier

to manage than FC and backup accelerators.

Quite a few tape drives will be needed, but hey,

you scale out the storage, so you scale out the

backup infrastructure...

300TB/day and max 90 days retention will result in

up to 27PB -- yes, there are tape libraries that large available.

300TB/day change rate on a 3PB cluster -- that's 10%/day.

I like that.

Because it indicates a good chance for having mostly

short-lived data on the cluster

If such data can be isolated for a dedicated backup strategy,

then consider doing incremental-forever (NDMP level 10) runs

with a fixed retention interval(-!-)  This is an uncommon

method with NDMP because one would loose "persistent"

data from the backup, but for "scratch" data in controlled

workflows (which you might have) this works.

Remaining, more persistent data, can be much less

than 3PB and thus get backed up with normal

full-cumulative-incremental cycles, if (again if) the

directory layouts permits sufficiently many parallel jobs.

Also, partial full backups can run in different weeks.

It is a big challenge, but supposedly big fun, too. The more

knowledge about existing workflows and data life cycles

can be leveraged, the better.

Or: Try to inquire the price tag for the new Isilon HD400

high density nodes. I have no idea on the pricing yet, but do hope

that the price per TB will be "somewhat" lower than with NL400...

Hope it helps

-- Peter

Technically, no we're only running full backups to tape as NDMP level 0, so no diffs or incrs at all.  Originally, we put a pre-script line in Netbackup to have it create a snapshot for the duration of the backup, but eventually, we chose to simply point it at a particular snapshot and just let it run.  For us, we could let each weekly full backup to tape take up to seven days - we already rely on snapshots for our day-to-day restores.  The tape backups just give us offsite DR.  Again, a partner cluster with more snapshots would be even better for us.



