Note: These will take a similar, if not identical, form to what Josh Odgers has done on his blog – http://www.joshodgers.com/ – These will be Horizon 6 centric, though.
I deal with a lot of customers of a higher end nature in my day job, so discussion of multisite Horizon 6 with View is almost always on the table. These customers need to keep their environments up and running in the event of a datacenter outage. In this article, I’ll lay out a fictitious example of one of these discussions.
Customer Problem Statement
We need to keep desktop service available to end users in the event of catastrophic failures within a datacenter or even with the loss of a single datacenter. We have an existing View environment, but it is only available in the Dallas datacenter today.
- 99.99% desktop service availability
- No data loss in the event of a failure
- Desktop availability should be in place within 15 minutes after an outage.
- All 5000 users will need to be supported.
- All active resources should be utilized – It is seen as wasteful by the executive team to have available resources unused.
- Customer has F5 load balancers in their environment. GTM is not licensed, however.
- Customer uses Windows roaming profiles and folder redirection for user persona management.
- Customer currently has footprint in three datacenters – Dallas, St. Louis, and Raleigh – They have two racks available in each datacenter and no budget is available to expand into another datacenter.
- End users are located in two major offices – Dallas and Chicago, split evenly.
- All users require a VDI desktop – No RDS hosted/shared desktops or streaming apps.
- No Horizon Workspace Portal requirements.
- Apps for each desktop pool are thick installed into each image.
- All View desktops are deployed as linked clones with floating user assignment.
- Equal bandwidth availability between end users and each datacenter.
The customer wants four 9s availability for desktop service – this means just under an hour of downtime per year. We have a zero RPO and 15 minute RTO that also need to be conformed to.
Like most projects, this one has conflicting requirements and constraints. The customer wants to stay within their existing datacenter footprint, but their RPO is too tight to facilitate that – Zero RPO means synchronous replication across datacenters, and our datacenters are just too far apart. During design sessions with the CIO, we were able to talk him down to a more reasonable 15 minute RPO since there were no additional funds for a DR datacenter closer to the primary Dallas datacenter.
Hardware was purchased, identical to that in the primary datacenter in Dallas.
Secondary Datacenter Location
The decision was made to stand up an identical View environment in the St. Louis datacenter due to proximity to the users and the Dallas datacenter. This makes drastically lowers the latency between the user and the datacenter, resulting in a better experience.
Active/Active vs. Active/Warm Standby vs. Active/Cold Standby
The decision was made to use active/active configuration between datacenters. F5 GTM has to be purchased either way to support global site load balancing – we wouldn’t want to route traffic through a single datacenter to hit a secondary. This decision is tied both to the requirement that all active resources be used and users in Chicago will have a better experience out of desktops in the St. Louis datacenter due to their proximity, all other things being equal.
Cloud Pod Architecture
CPA doesn’t afford us much here since all of our desktops are linked clones with floating user assignment – this means a desktop in site A is just as good as one in site B. Geolocation functionality of F5 GTM will direct users to the closest datacenter without CPA, so the decision to not use CPA was made. This is primarily to decrease complexity in managing the solution.
Since all of our applications are baked into our images, we need to make sure the state of each image is identical. Since there is no internal mechanism within View to do this, the decision was made to use available array-based replication to push updates to the secondary site on an hourly basis. All image administration (updates, app installation, etc.) is to be done from the Dallas site. All pool recomposes will be done after business hours.
User Data Availability
The decision was made to use Windows DFS-R to replicate user profile and data between sites. These user persona shares will be unique per site with Group Policy delivering the appropriate file shares to desktops within that site – using a unified namespace within DFS is not supported by VMware or Microsoft. More importantly, it could lead to very, very slow logins and performance of user data if the desktop was directed to use a file server in another datacenter. Active Directory Sites and Services will need to be properly configured, as well.