Putting people and the planet before profit, with nearly 100 stores across the UK, Ireland and the US. From humble beginnings as a kitchen table start-up in 1992, JoJo Maman Bébé offers gorgeous, quirky yet practical designs and excellent customer service.
It was of course imperative that all pages were being loaded from the cache that needed to - an uncached page adds load to the database, as well as the web server.
In order to test the performance of the site, so we could benchmark and show what effect any optimisations we made were having, we needed to simulate the high levels of traffic the website was receiving.
On the live site at non-busy times, up to a point, we could simulate where the site ground to a halt. This allowed us to rule out various issues, such as the hardware not functioning properly. In order to monitor and break down exactly how long pages took to load, we used a third-party software analytics system called New Relic.
The process of benchmarking the site included making small, incremental changes to see what effect they had on the page load speeds. Bottlenecks in the database and MySQL were discovered, which were taking too long to query.
This was optimised, improving performance of MySQL by 100% (a query speed of 2 sec reduced to 1 sec). This was a good first step, but the problem was still on the live site, and at a higher level with double the amount of traffic.
BlazeMeter (a load testing platform & benchmark system) allowed us to make customer journeys through the site (what users looked at, when they left the site, etc) and simulate those journeys; we could then program these sessions randomly and use them to test the site.
This turned out to be a critical element in the problem solving process, as the tests being run were looking at the cached version of the page without the hole punching element being taken into consideration, dramatically distorting the test data being generated. Tests were altered to include the hole punched requests, making them more realistic.
Using the now realistic test data, we utilised a system called NetData (a server monitoring system) that gives real time statistics on server performance, that helped us look at the way memory & hard drives are being used, ruling them out as bottlenecks. We created a simulated (staging) environment that was identical to the live site. This allowed us to test uninhibited, without impacting on the working site.
The new testing allowed us to identify that the MySQL deadlock and query cache locking errors were occurring within the core database.
Database locking prevents simultaneous access to data in a database by locking tables - when a table is being written to, it's locked until that action has been completed to stop data being overwritten and becoming redundant. A problem that can result from this is deadlocking – when orders of two or more transactions are waiting for one another to give up locks in order to complete that action, and they become gridlocked.
The same system applies to query caching in MySQL - when a query goes through the database, the system checks to see if there's a cached version for the exact same query. If there is, it reuses it, saving time and resources - a valuable tool. But these queries also have to wait their turn; if another query is currently accessing the query cache, it is then locked until that query has finished.
By editing the core database files, we debugged and identified the database locking errors as and when they occurred. Once errors were caught, the database was set-up to run a loop and retry processing the order up to five times. This fixed nearly all cases of database locking errors, making a huge difference in performance.
By reducing the amount of time each query took in the query cache, many more users could get through. Certain offending queries in the cache were also running 3/4million queries per minute. By disabling the query cache on specific queries, the performance of the site was 40x faster; by writing a Magento extension that altered these top offending queries - top 5 highest volume queries - we managed to make the Add to Cart Page load in 1/4 of the time it took before (the slowest loading page took 6 secs before, and 2 secs after).