hanacd's Posts

hanacd's Posts

ESA 3.3. works with vRealize Operations 6.1 (customer just installed) and is supported according to release notes.
Hello Gowan, it sounds like you are trying to get metrics from one VNX system into vRealize Operations through the EMC Storage Analytics adapter. The vRealize Operations data node, would be on... See more...
Hello Gowan, it sounds like you are trying to get metrics from one VNX system into vRealize Operations through the EMC Storage Analytics adapter. The vRealize Operations data node, would be only required if you are scaling beyond what the vRealize Operations App supports with the initial setup to my understanding (supported VMs, Hosts, metrics...). In my lab setup I have never installed anything additional on the vR Ops appliance except adding the EMC adapter for vCenter and the storage system. But if the documentation is not clear on this point, please raise / verify with EMC support. Typically all EMC Software you may download on http://support.emc.com Regarding vR Ops data nodes I found the following in the documentation vRealize Operations Manager 6.0.1 Documentation Center "Data nodes are the additional cluster nodes that allow you to scale out vRealize Operations Manager to monitor larger environments [...]" Sizing KB article http://kb.vmware.com/kb/2057607 So typically in an intial role out you are not using additional data nodes except you have a requirement due to scale or a multi site / geographical scenario. KR, David
And discovered that there is a similar post some time ago, suggesting similar points as I did just with some more specifics where to look at for it https://community.emc.com/message/793297?et=wat... See more...
And discovered that there is a similar post some time ago, suggesting similar points as I did just with some more specifics where to look at for it https://community.emc.com/message/793297?et=watches.email.thread#793297 These are datastore timeouts, where the datastore heartbeat has not responded withing the timeout period. Are you seeing any PowerPath or NMP messages relating to path failures in the vmkernel.log? Any SCSI errors in the vmkernel.log? Any device latency warnings? Other performance issues being reported from the VNX? Care to post up any errors from that log (with customer's permission)
Hello tdubb, just discovered your post and seen that nobody provided feedback here. I suppose you might have discovered the root cause. For that what you shared it's difficult to really advice. ... See more...
Hello tdubb, just discovered your post and seen that nobody provided feedback here. I suppose you might have discovered the root cause. For that what you shared it's difficult to really advice. Things I would look at: "Flipping" HBA as the SP "owner" in a VNX system (if it's a VNX) is trying to fail back, as well known as "trespassing" HBA, fibre or anything else having physicalla a connectivity issue firmware / driver version compatibility issue wrong path policy And of course, if you haven't done, so - contact EMC support at http://support.emc.com Let us know if and how you resolved the issue.
3. Maybe you haven't yet installed the EMC Adapter for vCenter Server and / or the Storage Arrays? You need to add vCenter in the (vcops-custom/ GUI through Environment->Configuration->Adapter In... See more...
3. Maybe you haven't yet installed the EMC Adapter for vCenter Server and / or the Storage Arrays? You need to add vCenter in the (vcops-custom/ GUI through Environment->Configuration->Adapter Instances. Be aware that vC Ops is connected to your vCenter Server on initial deployment, but for the correlation between vCenter Server and Storage Arrays you need to add it once more through this way selecting the "EMC Adapter" and add the involved vCenter Server.
Hello hppyflight, EMC's FAST and XtremIO are definitely great technologies to deliver VDI. As well a server based flash card / technology (VMware's vFRC - vSphere Flash Read Cache available in 5... See more...
Hello hppyflight, EMC's FAST and XtremIO are definitely great technologies to deliver VDI. As well a server based flash card / technology (VMware's vFRC - vSphere Flash Read Cache available in 5.5 or EMC's XtremSF) is an option to improve performance. As always it depends. Today EMC recommends to look at XtremIO with 1000 users and above. If you plan to start small and move into 2000 and more users this might be your choice. If you are anywhere between 100 to 2000 users: it depends. But VNX and a local server flash cache may be a good setup. But don't forget: it's about the applications, the user experience and the network as next bottle neck. Feel free to reach out to EMC and EMC partners to discuss your scenario as the final solution depends on what the scope of the VDI solution is. If you would like to get familiar with design and sizing, feel free to have a look at VMware End User Computing / Horizon Plan & Design Resources Trying to have interesting and helpful resources together for that. One of the first links is the Reference architectures. And if you would like to have a look at the VMware Horizon View with EMC XtremIO Reference Architecture, here you go: EMC Infrastructure for VMware Horizon View 5.2 w/ XtremIO Good luck and will be curious to see what you come up as your final design. KR, David
I think this discussion needs to be revived - any new experience on 3D AutoCAD / CATIA? We are just in the discussion with a customer from automotive industry leveraging Citrix on vSphere. And we... See more...
I think this discussion needs to be revived - any new experience on 3D AutoCAD / CATIA? We are just in the discussion with a customer from automotive industry leveraging Citrix on vSphere. And we are planning to test it with EMC XtremIO. Will keep you posted. If you have any experience to share in the mean time, feel free to post! Thank you!
Symmetrix / VMAX is supported as well with ACU since June 2012 release 5.3 and above. Storage Models were added with every release and there is support for VPLEX, XtremIO in addition to those tha... See more...
Symmetrix / VMAX is supported as well with ACU since June 2012 release 5.3 and above. Storage Models were added with every release and there is support for VPLEX, XtremIO in addition to those that were supported already.
Josh Atwell's response: Fair enough.  I speak solely from a portal/service standpoint.  The task is then to identify what the use case is really truly asking for, and if we anticipate that array... See more...
Josh Atwell's response: Fair enough.  I speak solely from a portal/service standpoint.  The task is then to identify what the use case is really truly asking for, and if we anticipate that array snapshots and/or recover point can meet that objective.  Then it becomes a separate infrastructure related option that they can offer as separate "service", or ideally on an isolated tier w/ that capability.  Standard service offering would be virtual layer option that can be controlled in the portal that puts a limit on virtual layer snapshots at no additional cost since it uses native architecture and toolsets.  This will then limit the impact of needing/wanting to alter LUN sizes and isolate impact. I also dealt with some long term snaps but in the end we found that we reclaimed data from them so infrequently that it wasn't worth making a service around.  Naturally I found this out after spending a week or two working on figuring out how to specifically enable users to do independently. 🙂   We still kept the snap policy but requests for mounting and using the snapshots came with a best effort SLA.
And from Rich Barlow: Use RecoverPoint for long term snaps.  At the hypervisor level I'm in total agreement that snaps beyond a couple of days is asking for disaster because the technology is so... See more...
And from Rich Barlow: Use RecoverPoint for long term snaps.  At the hypervisor level I'm in total agreement that snaps beyond a couple of days is asking for disaster because the technology is so different. I think we should make sure that we don't conflate these two technologies.
From Josh Atwell: I would support this concept as well.  The key to this is having them dictate the culture of the service they provide.  I have yet to see a reasonable use case for long term sn... See more...
From Josh Atwell: I would support this concept as well.  The key to this is having them dictate the culture of the service they provide.  I have yet to see a reasonable use case for long term snapshots of any kind. Even in that scenario it was less than 5 weeks.  I'd suggest simply have the team push back and have all requests for longer snapshots defend a business case for allowing that feature. (risk mitigation, long term roll back, etc.)  If a business case is justified and approved by upper management (based on legitimate analysis of impact vs gain) then this could be implemented in an isolated environment as part of their service.  Dedicating specific datastores/luns where long term snapshots are allowed.  These would be thick provisioned to minimize performance impact during snapshot maintenance tasks, etc. Just because you can, doesn't mean you should.  I ran into this a lot while at Cisco working behind their portal (CITEIS).  Here's how I approached feature requests. Identify the business objectives and benefits of the portal/feature (need business justification for dangerous activities in portal) Identify design that is operationally sustainable that meets those objectives.  What impact to non-standard capabilities have on ability to recover from failure, impact of environment, impact to neighbors, etc. Implement automation to maintain design and prevent people from "going rogue" or trying to go around the portal process. In the end if the behavior might impact SLAs of other tenants those customers were not given the full experience.  They would get isolated to a non-standard offering which extended deployment cycles and costs.  Limited portal capabilities.  That would typically force the app owners to think more critically about whether they really need an offering such as long term snapshots or not.  98/100 they decide they can live within the stricter constraints and you never hear from them about it again.
From Richard Anderson: Why not suggest Avamar for incremental-forever deduplicated backups of the VMs. Even can do an incremental restore using block change tracking for fast recovery to a pre... See more...
From Richard Anderson: Why not suggest Avamar for incremental-forever deduplicated backups of the VMs. Even can do an incremental restore using block change tracking for fast recovery to a previous backup. Just run a new incremental backup before a change and you are all set, no snapshots to worry about, no performance problems when it's time to cleanup snaps, or disk space consumed on the datastore, plus it's an actual backup and can be replicated offsite for DR recovery.
Josh Atwell providing some detail why the customer has a GOOD reason NOT to use vSphere Snapshots: It looks like they’re looking for guidance.  They want to initiate a mandatory 2 week limit o... See more...
Josh Atwell providing some detail why the customer has a GOOD reason NOT to use vSphere Snapshots: It looks like they’re looking for guidance.  They want to initiate a mandatory 2 week limit on VM snapshots for systems.  I can provide a few reasons and arguments we can share with them. They should implement a 1-2 week maximum on snapshots.  Snapshots are not backup.  Failure to commit or rollback snapshots is either laziness or an oversight. Snapshots kill performance over time.  Period.  Performance degradation leads to outages/support calls which leads to egg on face and face-palms. Snapshots are usually used to handle application upgrade rollbacks.  This usually means there is a predefined change window.  In 10+ years of operations I never saw a change window last more than a weekend.  There is almost always a go or no-go point in the process within a much smaller window.  If an issue arises post change that was not foreseen (bug for instance) it will usually show its head within a few hours of regular usage of the system(s). Snapshots hinder the operation team’s ability to perform some tasks that may be critical to their change windows.  Storage outages/issues may potentially create data consistency issues for VMs with snapshots.  I saw this once at Cisco but we were never able to fully root cause why it only happened to VMs with snapshots.  This can be mentioned but I personally do not have documentation to back up the situation.  In the end we were not able to remove the snapshot and we lost delta data. In order to enforce the maximum there should be automation that Reports snapshots approaching the limit Removes the snapshot when the 1-2 week limit is reached. This should never be a manual process (existing scripts are out there or other orchestration options) Automation process will have a check file for approved exceptions.  These exceptions should require director level approval which includes a breakdown of potential impacts if they circumvent this.  If you make people spend time defending their choice to do it and outlining the risks they are less likely to even ask and realize they don’t really need.  This also allows IT to track these requests and respond accordingly if it is determined that a longer time is actually needed and deemed acceptable by key stakeholders. In the end they should ask how long a system’s patch window is.  Snapshots should not be longer than that + a day or two. I’d be happy to talk with you this further if you have any follow up questions.  Also open to discussion if there are any points outlined above someone has a different perspective on.
We have a huge global customer from the logistics sector who is looking into offering more flexibility to app owners when doing patches and alike. Right now they leverage VMAX with 1.2TB LUNs in... See more...
We have a huge global customer from the logistics sector who is looking into offering more flexibility to app owners when doing patches and alike. Right now they leverage VMAX with 1.2TB LUNs in their vSphere environment with dozens of VMs on top. There is a project driving this: offering to their customers warehouses (real ones, remember logistics sector) with the portal as a service. What is the challenge? The end user might keep snaps for several weeks (up to 5) until they commit changes. The IT team is skeptical to leverage VMware snapshots for this, On VMAX to snap the whole LUN with a lot of VMs on top… not a good idea. Think 1000’s of VMs, from 50 to 500GB size. We received some really valuable input from EMC's vSpecialist and VMware Presales Minor group. We would like to share that here and give anybody else the chance to contribute to the discussion and / or get ideas for their challenges out there. The following we discussed: NFS with Isilon + granularity – workload and performance so far not recommended for a 100% VMware environment because of random IO XtremIO + performance – unpredictable deduplication ration as customer WILL put anything on there – and as you see there is a lot of DATA going to hit the storage In general our assumption at this customer is: if you can imagine a worst case scenario, this is going to happen here!
In the meantime I collected some interesting resources for Virtualization 101 I would like to share: VMware's structured landing page http://www.vmware.com/virtualization/ Virtualization ... See more...
In the meantime I collected some interesting resources for Virtualization 101 I would like to share: VMware's structured landing page http://www.vmware.com/virtualization/ Virtualization Basics http://www.vmware.com/virtualization/what-is-virtualization.html Virtualization Essentials http://www.amazon.de/dp/1118176715/ref=rdr_ext_tmb Will be happy to see your experience and helpful resources to educate people in this area. Enjoy
Hello, I got the following question 2x within the last week – thanks Alex for getting me straight on this: Virtual Storage Integrator supports new VNX Snaps with AppSync, to make application c... See more...
Hello, I got the following question 2x within the last week – thanks Alex for getting me straight on this: Virtual Storage Integrator supports new VNX Snaps with AppSync, to make application consistent snapshots happen VSI context menu will not leverage new VNX SnapShots, however triggering is built in to be used by AppSync Any other comments? As I had trouble to spot the answer myself and this seems to be a common question, I thought it's worth sharing. a
and for further reading on this community with some real world experience have a look at this thread: "Performance with Oracle Monster VMs with 32 vCPUs?"
Good news! Virtual beats physical on a monster Oracle machine! Here the details from this week's test run before preparing for production physical system to complete job: ~11h16 virtu... See more...
Good news! Virtual beats physical on a monster Oracle machine! Here the details from this week's test run before preparing for production physical system to complete job: ~11h16 virtual system to complete job: ~10h21 The comparison isn't 1:1 as physical system are 2 AMD O pteron with a total of 24 cores virtual system is VMware vSphere 5.1 with 4x Intel e5-4650 with a total of 32 cores However, the vSphere host had some smaller workloads still on it . For that the customer is happy with the results so far. For those tests we have chosen to P2V the original test system. The main key for the positive experience in this scenario is the VM memory reservation. Here a nice graph on the utilization CPU ready goes down as CPU usage goes up with the run of the big data analytics job.
Please see the original post - I am in no way recommending to go for every Oracle VM workload to this monster VM - in fact a best practice is to divide whenever possible. But if you can't go thi... See more...
Please see the original post - I am in no way recommending to go for every Oracle VM workload to this monster VM - in fact a best practice is to divide whenever possible. But if you can't go this way - that's true in our case - read on for more details I was able to capture. Good news: customer is running benchmark test on a system with 32 cores (HP B660 with 4x4650 with 8 cores) and 768GB RAM. A monster VM scales fine up to 28vCPUs now and only drops to performance of 24 cores with 32vCPUs. We haven't run yet the application, but it looks like that in this case Hyperthreading (HT) on the previous machine didn't work well. We are looking for the vSphere 5.5 release to try advanced features as well to apply best practices for Oracle tuning on a VMware VM. As posted earlier the document on VMware's Oracle landing page seems to be outdated (2011) here are more recent hints and tips. In the scenario we run Oracle 11.2.0.2 at this point. Oracle HT recommended Memory Oracle – Reserve memory directly at VM reservation for a 32vCPU systems - possibly up to 100%, optimally we are using 128GB VM RAM, but this value may be different for you - follow formula from best practices to share some resources (in our case the may performance is needed for a nightly 8 hour job). Utilize Memory Reservation–Size of the SGA + 2 times Aggregate PGA Target + 500MB for the OS (assuming some flavor of Linux. These should be for Production Clusters only, as Development and Test databases do not usually require peak performance. Faster RAM - possibly more physical memory is counter-productive if it’s cross architecture (CPU accesses memory of another CPU or the memory will have higher latency due to the memory architecture). Cisco B440 physically with 4x8 cores and 256GB memory have proven to be a pretty good sweet spot in terms of price / performance / resources. NUMA locality Linux huge pages (typical default 4K, large / huge 2MB, can be adjusted) - Oracle memory lookup, linear search can improve up to 15% - see as well the EMC At VMworld 2013 Orcale session – more detail there Reservation / prioritization generally CPU: Prod, Test, Dev in the same ESX cluster - eg Prod priority 1 / priority 2 test / dev Priority 3 ... These are the key topics we identified in our situation - you might want to visit the following resources that provide further detail and we were looking at: EMC IT Oracle on VMware configuration best practices http://itblog.emc.com/2013/02/26/best-practices-for-virtualizing-your-oracle-database-with-vmware/#more-1535 Virtual Webinar- EMC IT: Virtualizing Oracle Databases EMC IT's Virtual Oracle Deployment Framework Darryl Smith’s Blog – Oracle Architect at EMC https://community.emc.com/people/DarrylBSmith/blog VAPP4679 - Software-Defined Datacenter Design Panel for Monster VM'sVAPP4679 - Software-Defined Datacenter Design Panel for Monster VMs www.youtube.com/watch?v=wBrxFnVp7XE VAPP5180: Extreme Virtualized Oracle Performance VAPP5180: Extreme Virtualized Oracle Performance Follow-up https://community.emc.com/community/connect/everything_oracle/blog/2013/08/27/vapp5180-extreme-virtualized-oracle-performance-follow-up VAPP5180 Extreme Virtualized Oracle Performance Follow-up Part 2 https://community.emc.com/community/connect/everything_oracle/blog/2013/09/08/vapp5180-extreme-virtualized-oracle-performance-follow-up-part-2 NUMA Impact: http://cloudarchitectmusings.com/2012/12/17/the-impact-of-numa-on-virtualizing-business-critical-applications/ Thanks to Darryl Smith, Bart Sjerps, Sam Lucido, itzikr, Jeff Browning who provided input and insight.
Hello Daniel - your point is something we are looking at, but rather in the context how to make an ideal 3-node cluster that allows shared utilization of various workloads as the monster VM is ru... See more...
Hello Daniel - your point is something we are looking at, but rather in the context how to make an ideal 3-node cluster that allows shared utilization of various workloads as the monster VM is running a nightly job and during the day the resources would be hardly utilized otherwise. vCPU coscheduling: However, " 16 core VM needs 16 cores *at the same time* to execute” This hasn’t actually been true since 4.0.  The development of ‘relaxed coscheduling’ in 4.x and further refined in 5.x makes this far more rare.  *sometimes* its needed in order to reduced the core clock skew in the VM to prevent excessive SMP slip, but not usually. Everything you could want to know: http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf Thanks mattcowger vSphere 5.5 improvments expected We are eager to test vSphere 5.5 low latency feature to bypass the virtualization layer that should be available once the bits are GA. Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5. http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf Thanks itzikr