Start a Conversation

Unsolved

This post is more than 5 years old

2126

September 18th, 2013 00:00

Problem while running HBase over Isilon

Hi,

I'm running HBase over Isilon-HDFS protocol. I'm getting the following messages in regionserver log file:

2013-09-18 14:01:51,939 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /192.168.0.201:8021 for file /hbase2/usertable/3c7efa752ba8e76b3216a0611307bacf/f1/7c1bb8b75c9246f584a2f4f3f40e7306 for block blk_4428861898_134218728:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.14:54764 remote=/192.168.0.201:8021]

2013-09-18 14:01:51,939 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_4428861898_134218728 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...

2013-09-18 14:01:51,948 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /192.168.0.201:8021 for file /hbase2/usertable/2be492984ff43725109d87cbd13c1cb3/f1/d22eed5211354448ac80bec9ac750a48 for block blk_4474298833_335545320:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.14:54765 remote=/192.168.0.201:8021]

2013-09-18 14:01:51,949 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_4474298833_335545320 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...

2013-09-18 14:02:35,868 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /192.168.0.206:8021 for file /hbase2/usertable/2c23f122622e2ec734921f8a838ae535/f1/8be6e548ca38496f8c10a0eb2b200e87 for block blk_4474026614_3489661928:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.14:59719 remote=/192.168.0.206:8021]

2013-09-18 14:02:51,503 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /192.168.0.206:8021 for file /hbase2/usertable/77a3aebed2e9f1ff01f8569eb1c5740e/f1/258e0638e1b641e7a1f4995083c99744 for block blk_4474298824_469763048:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.14:59720 remote=/192.168.0.206:8021]

2013-09-18 14:02:51,503 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_4474298824_469763048 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...

2013-09-18 14:02:51,627 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /192.168.0.206:8021 for file /hbase2/usertable/5724b30b120ae3d68fc93f8409a82f62/f1/abee3ba3c4614219b2089b46e298e0ad for block blk_4308208589_335545320:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.14:59721 remote=/192.168.0.206:8021]

2013-09-18 14:02:51,627 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_4308208589_335545320 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...

2013-09-18 14:03:35,967 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /192.168.0.201:8021 for file /hbase2/usertable/2c23f122622e2ec734921f8a838ae535/f1/8af26cf83aea414086efdc92fd182f40 for block blk_4473511651_3623879656:java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.0.14:54793 remote=/192.168.0.201:8021]

2013-09-18 14:03:38,974 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /192.168.0.205:8021 for file /hbase2/usertable/2c23f122622e2ec734921f8a838ae535/f1/8af26cf83aea414086efdc92fd182f40 for block blk_4473511651_3623879656:java.net.NoRouteToHostException: No route to host

2013-09-18 14:03:38,974 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_4473511651_3623879656 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry.

I'm able to ping the ips mentioned in the log file. Any idea, what might be the problem here?

1.2K Posts

September 18th, 2013 02:00

Can you (reliably and fast) connect to the Isilon HDFS service on all relevant IP addresses:

telnet  192.168.0.201 8021

telnet  192.168.0.206 8021

etc

...

?

Is there any valid HDFS activity on the Isilon cluster:

isi statistics client --proto=hdfs

?

-- Peter

4 Posts

September 18th, 2013 03:00

I'm not able to telnet into the hosts and there is some random activity shown by isi statistics client --proto=hdfs

1.2K Posts

September 18th, 2013 08:00

Let my explain how the telnet test was supposed to be interpreted,

because  no actual login is attempted, just a simple connection

to the HDFS service on port 8021:


Successful connections looks like this:


$ telnet 192.168.xx.xx  8021

Trying 192.168.xx.xx ...

Connected to 192.168.xx.xx

Escape character is '^]'.

(type Ctrl-D to disconnect, Peter)


Failures to connect  show as:


$ telnet 192.168.xx.xx  8021

Trying 192.168.xx.xx ...

telnet: connect to address 192.168.xx.xx: Connection refused

telnet: Unable to connect to remote host


Which kind of response do you see, and do the responses

come  immediately or after several seconds?


What is the CPU load and process activities on the nodes ('top')?


-- Peter

September 18th, 2013 19:00

Can you verify that you completed all the steps in the OneFS Administration Guide?

https://support.emc.com/search/?resource=DOC_LIB&AlloftheseWrds=onefs%20administration%20guide&SearchWithin=true&adv=y

For example, I'm assuming the service isn't even started based on the results that Peter had you verify.

isi services isi_hdfs_d

There is a small section and is only a handful of pages that walks you through the necessary configuration on the Isilon cluster.  In the table of contents you'll see the specific chapter; skip to that.  In summary as you'll read the steps are:

1) Install license (not mentioned but required)

isi license status

isi license activate

2) Configure various parameters (isi hdfs):

a) --force-version

b) --log-level

c) --root-path
d) --block-size

e) --num-threads

f) --add-ip-pool

3) Create a local user

4) Enable the HDFS service

Enabled when you activate the license, but worth verifying and maybe even bouncing the service when all else fails (disable then enable)

4 Posts

September 18th, 2013 22:00

Thanks for the response Christopher. I've installed the hdfs service and I was able to run hbase for around 12 hours after which I started getting these errors.

4 Posts

September 18th, 2013 22:00

Hi Peter,

I'm getting the first kind of respone from telnet, that is, I'm able to connect through telnet.

Also, CPU usage is around 0.25% and the top 3 processes are nfsd, isi_mcp, isi_rpc_d. isi_hdfs_d is also in the process list.

Thanks,

Anand

1.2K Posts

September 19th, 2013 07:00

Hi Anand,

after having ruled out a config or network problem,

it might be an interoperability issue of Isilon HDFS and HBase.

A deeper analysis would include

- checking the patterns of HDFS network connections (with netstat):

how many connections are established / terminated / timed out  over the time

- looking into captured packets (tcpdump, wireshark, etc)

  to see wether the HDFS protocol is correctly executed.

You might start this with a good network team,

but I would also consider seeking advise from the Hadoop community,

and from EMC support.

Please let us know about your findings.

-- Peter

No Events found!

Top