How Agincourt maximise uptime

A website or web application isn’t going to become successful if it’s unavailable half the time. Every minute of downtime can mean frustrated users, inconvenience and lost business. It’s very important that sites and applications are built with best practices in mind from the start as bolting on high availability techniques can be costly.

What causes downtime?

In most cases downtime occurs due to problems at the data centre – it could be the internet connection, power supply, cooling, internal network or your server could crash. Obviously, web hosts do as much as possible to prevent any of these issues halting your service, where possible redundancy is used to offer a backup should something fail, but even then problems can still occur.

Another cause of downtime is malicious attacks, such as DDoS attacks whereby thousands of computers worldwide all send requests to your server and either it grinds itself to a halt, or the network can’t handle the flow.

Application bugs and natural fluctuations in user activity can also take servers offline if the architecture of the server(s) and the code behind the site/application simply aren’t properly setup.

How to prevent it

Redundancy

Reducing the likelihood of problems occurring on single pieces of hardware gets increasingly expensive as you get closer to 100% reliability – it’s usually more cost-effective to double-up commodity hardware.

“When we started [Friendfeed], we were faced with deciding whether to purchase our own servers, or use one of the many cloud hosting providers out there like Amazon Web Services.

At the time we chose to purchase our own servers. I think that was a big mistake in retrospect. […] it meant we had to maintain them ourselves, and there were times where I’d have to wake up in the middle of the night and drive down to a data centre to fix a problem.”

Bret Taylor Facebook CTO

Unfortunately, 99% of the time building redundancy isn’t as simple as buying another server – you’ve got to handle replication between them – and when you have users updating data all the time this can get tricky because you may have to deal with conflicts and other synchronisation issues. So what’s the solution? Well thankfully the cloud is here to save us – by shifting any data off the web servers an onto a cloud of servers, you don’t have to worry about any synchronisation!

If your web servers are bare and contain nothing but the actual application code, there is no need to back them up because they don’t hold any important information so you’ll save some cash there. For maximum reduction in costs, you can stick to just one webserver as downtime can be minimised by simply redeploying on another should it fail, alternatively two small virtual servers can be deployed with load balancing between them so the application will continue to stay online should one fail.

Security

Aside from using a host which has a secure, protected setup, any attacks are going to be the responsibility of yourself or a technician to fix, so ensuring said person locates the problem as quickly as possible is key.

Incoming requests on a server should be logged and monitored for spikes and response time. By using services like Server Density and/or NewRelic, you can mitigate problems and attacks when they are ramping up – rather than only discovering once they’ve crippled your whole setup.

Using a service which has SMS and email notifications will also help you sleep at night as there is no worry that your server is down.

How to mask it

Should part of your system fail, all is not lost – by employing a degradable system, you can continue to serve users, at least in some way.

First of all, you should have a maintenance-mode for your app, so should a major bug occur, you can flip the switch and prevent any further damage by serving a static notice page to users. Obviously, this will be clear to users that you’re having problems, so there will be detriment to their experience, but it could be much worse should the site have other problems.

Funnily enough, there is caching sat directly in everyone’s browsers, as part of the HTTP protocol, so make use of it! Most people have become ignorant to it’s benefits due to the increase in Internet connectivity speed, but it’s not purely for speed – it’s for redundancy too. By sending the right headers along with your HTML, browsers will store pages temporarily – should the live page become inaccessible, the cached copy will be displayed.

You can take browsing caching one step further by implementing a caching server, like Varnish. This sits as an intermediary between your web servers and the user’s browser storing pages when the headers tell it to. Should your web server(s) fail, the cached page can be served to users. This has the added advantage that even dynamic pages can be cached and the sections that change can default back to a static option.

Summary

Here are our basic tips for maximising uptime, where appropriate:

  • use pay-as-you-go redundant cloud setups to outsource the complexity and expense of redundant setups.
  • multiple commodity servers will usually be more stable than one top-of-the-range server.
  • keep your web servers bare – just store the application locally, nothing more.
  • build a maintenance-mode flip-switch to revert to a static page should a major problem occur.
  • take advantage of caching (HTTP & Varnish).
  • monitor your server with a service like Server Density and log requests to provide peace of mind and mitigate attacks early on.

Of course these solutions don’t suit all applications or budgets, so we always tailor our setups to individual client requirements.