By Michael Oliver -
I'm sure those of you that have Isilon get the regular technical advisories but this one just popped up and it seemed pretty important so I'm forwarding along. ?Basically if you are using NFS, running OneFS 6.5.17 or lower, and using AD/LDAP you should read it. ?If your client has a user lookup through AD/LDAP and it times out on the node, and that node has been up for 9 months, the entire cluster could potentially lockup. ?Wonder why 248 days? ?Fix has been implemented in 6.5.18 which is what we are running....phew.
--
Michael Oliver
mcoliver@gmail.com
858.336.1438
###########################
EMC Knowledgebase
"ETA emc317523: [UPDATE] OneFS 6.5: Isilon nodes with over nine months of uptime in clusters with NFS enabled might cause a cluster-wide deadlock"
ID:emc317523
Usage:4
Date Created:02/20/2013
Last Modified:04/24/2013
STATUS:Approved
Audience:Customer
??
Knowledgebase Solution?
Question:ETA emc317523: [UPDATE] OneFS 6.5: Isilon nodes with over nine months of uptime in clusters with NFS enabled might cause a cluster-wide deadlock.
Environment:EMC Technical Advisory
Environment:
Date
February 21, 2013
[UPDATE]
April 24, 2013
Products affected
Isilon clusters running OneFS 6.5.x that support NFS clients
Problem:
Update
This ETA has been updated and corrected as follows:
A fix is now available.
This issue can affect clusters with NFS enabled, not only those with NFSv4 enabled.
This issue can affect nodes that exceed 248 consecutive days of uptime, not 288 days.
Issue
EMC has become aware of an issue where, in certain situations, clusters might become non-responsive. Specifically, OneFS 6.5 clusters that support NFS may experience a cluster-wide lockup when a node performs a user lookup through Active Directory or LDAP on a client's behalf and the lookup times out.
If the node performing the lookup has exceeded nine months of uptime (that is, over 248 days with no downtime), it could cause the entire cluster to eventually lock up and become unresponsive to user requests.
Although there is no risk of data loss due to this issue, data availability will be impaired.
Fix:
Solution
This issue is fixed in OneFS 6.5.5.18. However, EMC recommends that you upgrade to OneFS 6.5.5.20 to take advantage of additional improvements.
NOTE
There is no fix planned for OneFS 6.5.4. EMC recommends that clusters running OneFS 6.5.4 be upgraded to OneFS 6.5.5.20.
Fix:
Workaround
If upgrading is not an option, and if any node in your cluster has been running for more than 248 days with no downtime, you can avoid this issue by performing a rolling reboot of all nodes in the cluster.
Fix:
Isilon internal only
Fixing a locked-up cluster
If a customer calls with a locked-up cluster, first validate that they have encountered this issue. Ensure that:
the cluster has NFS enabled, and
at least one node on the cluster has an uptime over 248 days.
If both conditions are true, a full cluster reboot is required. See How to reboot nodes and clusters in OneFS 5.5 and later, emc14002278.
Fix:
Isilon Technical Support
For assistance, or if you have any questions, please contact Isilon Technical Support.
For contact information, see Contacting Isilon Technical Support, emc14002478.
Notes:EMC Confidential
Michael Oliver
mcoliver@gmail.com
858.336.1438