Virtualization has been one of the pre-dominant topics in the past couple of years. Not going to the cloud is considered to be uncool – at minimum it feels like being stake in the stone age of computing.
What does going to the cloud actually mean? First of all it means „virtualization“. This is cool from many points of view but what does it mean for performance? Virtualization usually does not good, this was clear from the start.
But, the impact of virtualization can actually be quite shocking as the following profiling output of a PostgreSQL instance shows (we used XEN for virtualization):
samples pcnt function DSO
_______ _____ _____________________________ __________________________________________________
21026.00 51.1% hypercall_page [kernel.kallsyms]
1577.00 3.8% heap_hot_search_buffer /home/hs/bin/postgres
886.00 2.2% AllocSetAlloc /home/hs/bin/postgres
741.00 1.8% SearchCatCache /home/hs/bin/postgres
608.00 1.5% LWLockAcquire /home/hs/bin/postgres
586.00 1.4% copy_user_generic_string [kernel.kallsyms]
543.00 1.3% xen_local_clock [kernel.kallsyms]
506.00 1.2% system_call [kernel.kallsyms]
502.00 1.2% base_yyparse /home/hs/bin/postgres
What you can see is that basically half of the time is spent is burned by virtualization – not PostgreSQL. Similar results can be observed when comparing virtualized results with non-virtualized ones. On an 8-core AMD box PostgreSQL managed to provide us with 4.500 TPS (pgbench scale factor 10 with 10 concurrent users). Doing the same thing inside a KVM virtual machine made PostgreSQL achieve something in the area of 1.600 TPS.
It is especially important to mention that PostgreSQL was running with „synchronous_commit = off“ to reduce fsync-related contentions.
For small scale applications virtualization is definitely an advantage because more stuff can be put on the same hardware. For large scale PostgreSQL databases, however, virtualization might turn out to be a foot gun.