How do you manage database server instances?

Date March 21, 2013

One of the many differences I've noticed with this infrastructure versus the others I've dealt with is that our database instances here aren't centralized at all. If a specific service (such as a website on a specific web server - we don't really centralize those either) needs a database, the database lives on that machine.

One argument is that the administrative overhead of separating services like this is excessive - each database, in our case, is listening on localhost so that external hosts can't connect, so a central host can't perform backups because it can't connect. This means that each machine needs to be responsible for backing up its own database. Basically, if you want to manage N database hosts, you need N instances of that management tool and information.

The opposing argument is that performance is decentralized - the entire infrastructure's database load is spread over a large number of spindles, so no one database can act as a noisy neighbor to the others. Also, failure is dispersed - if one database goes down / fills up / etc, it doesn't take anything else with it.

Personally, I argue for the former. I think that managing one database machine is easier and less error prone - there are fewer moving parts, and we can concentrate on making the few machines that do provide database services more robust. On the other hand, I certainly can't argue against the other side - it's very true that we're spreading the load over more spindles than I could afford to give any one (or two) machines, and to be certain, a failure of one centralized database would take down all services which require access to the machine.

When I was asked what other people did, I said that people tend to centralize, but upon reflection, I realize that I answered only based on my experiences. So who better to ask than you, dear reader. How do you arrange your database servers? Do you have one (or a few) centralized database servers, either standalone or clustered, or do you spread the load like we are currently? Please comment and let me know - I'm interested in hearing about what you do.

Thanks!

  • http://www.twitter.com/furicle furicle

    I wonder is it isn't also about ease-of-setup. Central management can be a choke point for people trying to 'get things done'

    Especially if they are quick one off projects that someone is trying to complete quickly without having to involve others, local db might just be perceived as 'easier' to get going....

  • http://www.heemels.com/ Martijn

    My company spreads the database load to the individual machines, mostly to separate concerns. Keeping a daemon running is simpler than maintaining a distributed system. I see the benefits of both sides though, like you do.

  • Alvin

    We spread the load, more because of easy setup than because of religion or science.

  • http://www.openxtra.co.uk/blog/ Jack Hughes

    Are there any private database cloud options whereby you can get the benefits of both options but without the respective downsides? So, make databases as manageable and management free as the ad-hoc method but without the overhead that a centrally managed database implies and with backups and high availability?

  • http://blogs.glou.org/arnaud/ Arnaud Gomes

    We have been moving from local databases to a single central server in the last few years. We still have a few legacy services running local bases, and of course test hosts also run their own bases.

    A centralized server allows us to run more VMs and containers on our hosting servers; most of our containers run LAMP stacks or something akin, and it doesn't really make sense to run 20 MySQL instances on a single machine if we have 20 containers.

    OTOH the central server may prove to be a performance bottleneck; I must confess I never took the time for real benchmarks, but I begin to suspect some of the performance issues we are seeing on a few web servers might be related. It also makes configuration management more difficult as applications are no longer contained in a single host, so we often have to create databases manually rather than automating the whole deployment process.

    I suspect the centralized server may prove a good idea if/when we switch to a HA architecture with database replication, as it will probably be much easier and more efficient to manage on a single cluster than on a separate cluster per application.

  • John McGrath

    My employer follows the non-centralized approach, but I believe it is more for Compliance issues.

    Personally, my databases are used at such a light load, that I don't worry about it. If one or more of the machines (VM or otherwise) fails, I can restore from a monthly back up with little or no loss of data that is critical.

  • JDL

    Centralized (mostly) - I'm surprised to hear so many non-centralized.

    We do have some applications running with local instances that end up doing their own backups and the usual complications, but these are generally what I would consider very light databases (configuration and minimal user data).

    For us, we're doing a lot of data processing with a variety of applications and the database is acting in many cases almost as Inter Process Communication. One application generates raw data in the database, another processes the data for results, and another takes some action based on results.

  • Jeff Hengesbach

    Classic answer: "It Depends"

    The main criteria I use when considering central placement versus individual installation:
    1) Database size and intended load (disk space, disk IO, CPU, RAM, disk queues, OLTP/OLAP, etc)
    2) Specific version / patch level requirements
    3) Licensing considerations
    4) Backup / Restore parameters
    5) SLA & failure domains
    6) Application Vendor supported configuration

    I tend to favor centralized instances when possible for ease of management and often license benefit, but at the same time any upkeep on the central db server has to take into account the wider impact to dependent services.

  • http://www.burn.co.nz/blog Ben

    Like a few others.. it depends. One place I worked we heavily used an AS/400 for internal business needs (worked very well). But for 3rd party vertical market software we often didn't have a choice about vendors, and ended up with a broad range of databases. Often vendors "support" several databases but the reality was they will only provide support for one.

    Which is how you end up supporting DB2, Sybase, Oracle, MS SQL, MySQL, SQLite, and several ugly things, each running on a separate server. Consolidating just the MS SQL databases would have reduced the complexity significantly, but it was a low priority.

  • http://www.mibus.org/ mibus

    Largely centralised, but still with a lot of sprawl (where we have a database-server VM to handle a database for a single application - particularly for "special" applications - extra load, needs PCI compliance, etc).

  • http://www.innova.uniovi.es/ Kilian

    Leaving aside administrative overhead issues, I think one of the most important reason for centralized infrastructure over other's solutions, is the specific profile (configuration, hardware, etc) required by database servers over other type of servers (web, NAS, etc).

  • http://blogs.glou.org/arnaud/ Arnaud Gomes

    @Kilian: This might be true for large database servers (for some value of "large"), but in practice many databases are used for small-ish web sites, at least where I work. For this kind of workload, any current OS on any current hardware will be good enough.

  • http://www.standalone-sysadmin.com Matt Simmons

    I can understand and tolerate a standalone isolated database server which exists for a small peripheral function, but I still believe that business-critical database servers should exist in a fault tolerant cluster.

    I really appreciate everyone's comments so far, and I welcome other ideas not added, as well.

  • http://www.vbrowncoat.com GS Khalsa

    When I had an environment (consultant now) I went the decentralized route. It made management a little more complex, though I liked the flexibility and lack of one massive failure/choke point. I can see the appeal of either approach though and like many things comes down to preference, ability and experience.

    Thanks for the awesome blog. Keep at it!

    GS

  • http://blog.serverdensity.com David Mytton

    We run 27 MongoDB nodes and push about 17TB of data through each month. At that level, we can't have a single server dealing with it because it's simply outside the capabilities of 1 machine, especially in a majority write environment. We can then optimise the hardware specifically for the database - things like SSDs per database and giving the database all the memory on the server.

  • Bob

    Like others said, It depends. All of the world is not all a huge data site. For the basic LAMP systems I like the compartmentalization of risk of separating the different clients. Therefore most of mine run standalone with a local database. I am paying for the full memory footprint after all. I might as well get to use it. And I like the flexibility of being able to do different things in different places for different reasons. And then downtime on one server-client does not have any affect on any other server-client. This is good for the distributed administration model.

    But I understand the reasoning behind centralization too. It reduces the cost of centralized administration. But it does increase costs elsewhere. It is not without cost. Performance bottlenecks can be created because of it. And then you must beef up the infrastructure accordingly. But certainly if the site is a large one and centrally administered then I would expect that it would be a centralized database and the entire infrastructure would need to be a full HA high performance one. And so we are back to the issue of "it depends".

    I think in the wild there are more standalone local databases. But those tend to be small. I think there are fewer large centralized databases. But those tend to be very large. I would not discount the validity of either for the other.

  • Nathan

    If it's something the ops group is doing, ops has a centralized server. The dev group seems to be unaware of this server even when we tell them they can have at it so each of their VMs seems to get MySQL installed.

  • http://betteradmin.com Brian

    I think the question itself is a red herring. If you think about how a traditional (non-database) application works, they provide a user interface, do a bunch of processing, and store data on disk. You decide how to store the files based on the application requirements, such as performance, redundancy, etc... You don't automatically require that all data files are centrally stored on some kind of central storage server (not talking about a SAN).

    The move to web apps has just spread this structure out a bit, but it's otherwise unchanged. UI is in the browser, processing in the web app server, and data storage in a database. How is this any different than a traditional app? It really isn't at all. Both types of apps do the same things.

    So how do you decide? You look at the app and see what it's requirements are. If it's mission-critical or the load needs to be spread out, that's what you do. If not, then you don't. Whether the data storage interface happens to be a database is really irrelevant.