Start a Conversation

Unsolved

2 Posts

8369

December 7th, 2018 21:00

What is default replication factor in Hadoop( big-data-hadoop)

What is default replication factor in Hadoop?

January 20th, 2019 22:00

Ideal replication factor in Hadoop is 3. However the replication factor can be changed in hdfs-site configuration file.

Reason for 3 being the ideal replication factor: Multiple copies of data blocks or input splits are stored in different nodes of the Hadoop cluster. Because of the Hadoop HDFS’s rack awareness, copies of data are stored in Datanodes of different racks as well. So, even if the whole rack goes down, data blocks can be accessed from the nodes on different rack. At any given point of time, if one Datanode goes down and one Datanode is not available, there will always be a third Datanode on which the required data block is stored.

1 Message

April 8th, 2020 05:00

Even i also need same help with big data hadoop  

5 Posts

July 30th, 2020 23:00

if one Datanode goes down and one Datanode is not available

5 Posts

August 15th, 2020 08:00

good information

5 Posts

January 11th, 2021 05:00

1 Message

May 13th, 2022 06:00

 The default replication factor is 3 which can be configured as per the requirement; it can be changed to 2 (less than 3) or can be increased (more than 3.).

Increasing/Decreasing the replication factor in HDFS has a impact on Hadoop cluster performance.

 

1 Message

January 27th, 2023 10:00

Hi, while this has been answered, I wanted to add a bit more detail explanation. 

HDFS has a default replication factor of 3. 

Files are broken down into blocks and each block is replicated (3)times and stored in various points on the cluster.  This is to make the data more resilient.  You can change the replication factor to whatever you want.  If you lower the replication factor you increase the risk of losing data.  If you increase the number of replicas you decrease the risk of losing data, however, you are using more storage and it takes longer to write the files.  

 

While you can increase the replication factor for 'golden data', its also possible to make copies of the files and to also copy them off cluster to another cluster or cold storage if the data is valuable. 

No Events found!

Top