About one and a half years ago we created MySQL StatsD to keep get insights of any MySQL server using a local daemon that frequently queries the MySQL server and pushes its data to StatsD locally. In the past year we have seen the usage of the MySQL StatsD project increase and more and more people use it happily.
Of course there is always something lacking with projects like this, so in our latest release (0.1.3) we included/fixed the following:
Multiple bufferpool support (get your stats on the individual bufferpools)
New stats type called Delta (this time documented and included in the example config)
Fixed issue 30: MySQL StatsD crashed when replication broke
Added documentation on configuration and its directives
If you feel the urge to fix something or contribute to the project, you are more than welcome to help us out!
At Spil Games we organise a monthly hackday (actually the last two Thursday/Friday of the month) where several teams try to hack something in limited amount of time. The only rules for the hackday are: make something cool, keep it simple, document it, learn something and have fun. In general it means time is too short so we really have to go for it. 😉
Last week the August hackday(s) happened on Thursday and Friday and me and my team decided (last minute) to join in with a research project on MaxScale.
What was there to hack?
In our current database connectivity strategy we (as in Spil Games) use the load balancers+routers to connect to our new MySQL database clusters. This has the advantage that we can control per host if they are allowed to access the database or not. At the same time this has the drawback that a lot of administration is necessary to make this happen on the router and thanks to the load balancers masking the original client host on the database we can’t use the normal MySQL grants to filter per host level. Also on the client host the administration of databases has to be done in the application configuration file for both database server and user. Config management in Spil is not the easiest, so if we could simplify this by making the database configurations easier it would help a lot with deployments and setting up similar environments for any project.
We want to research the feasibility of using the MaxScale MySQL proxy inside Spil Games instead of the current load balancer+router approach. MaxScale can be set up both as a central proxy or as a local(host) proxy and we like to have tested. MaxScale also features the loading of the grants on start up and this has to be tested for multiple databases at the same time. Also the security needs to be tested: it could be dangerous if MaxScale is too permissive (e.g. allowing resources to be accessed where it should not).
For the central proxy we need to find out how to set it up as a highly available solution. (floating ips, pacemaker, etc)
For the local proxy we need to find a way to automatically configure MaxScale (puppet) to access the right resources and we also need to test its maintainability through its api, telnet and http access.
Second of all, the questions I got: Q: Why first copy all the data to a new MySQL server using innobackupex and then perform the mysqldump?
This is a question regarding the consolidation of multiple existing asynchronous replicated clusters to a new Galera cluster.
In the slides I showed we use an active-inactive Master-Master setup where one of the MySQL masters is receiving all write-traffic while the inactive master is receiving read-traffic. If we would perform the mysqldump on the inactive master we either have to drain the inactive master from read-traffic and stop replication or it will lock the tables and we would not have a frozen snapshot.
A related question was asked why we do not use innobackupex to feed the backup to the Galera cluster and then create the cluster from this as a starting point. That could be done for the first node, however we wish to consolidate multiple clusters into one Galera cluster we have to ensure the data gets replicated into the new cluster online. Therefore mysqldump is the only viable solution here.
Q: Why are you using MMM?
This is a choice we made five years ago. It worked fine enough for us and we stuck to it till today. We do know it is flawed (some say be design) and we know it has a lot of drawbacks and it is actually one of the drivers to start using Galera. 😉
Q: Why don’t you expect clashes when writing the same data twice at the same time?
In our sharded environment (the Spil Storage Platform) will never write the same data twice as every piece of data that is sharded by user, function and location will have its own owner process in this platform. This means there will never be a second process writing the same piece of data. In other words: our environment allows us to isolate writes and we never expect clashes.
In our other (current) environments the number of writes is low, so the chance of a clash will be low.
If you have any other question, don’t hesitate to reach out to us or place a comment below.
It is no secret that Spil Games is a heavy user of Sphinx Search. We use it in many ways including game-search, profile-search and since a few months ago to even build our category and subcategory listings. In all cases we do not use it as an extension of MySQL but rather as a standalone daemon facilitating listings of (document) identifiers.
As 2013 progressed towards X-mas we saw the utilization of our category/subcategory Sphinx cluster sky-rocketing which caused the response times to increase heavily. During peak hours we performed about 500 queries per second with response times in milliseconds while sometimes all of a sudden the response times of the application would go up near sub-second response times. We quickly added response time capturing inside the application and compared it against the load spikes on the Sphinx hosts:
One of the major contributors to the load increase was the indexing process. Just like the Sphinx search daemon this indexing process is multi-threaded and this means it will suck up all idle cpu time of all cpu cores. Coincidentally this covered about 80% of the load spikes. This meant for the first ever we had to fight with a genuinely multi-threaded application.
That is one of Spil Games Engineering’s principles and it is really something I embrace. Almost one and a half year ago, Engineering started to use StatsD and Graphite for collecting and graphing performance metrics and one year ago I got fed up that there was, except for a couple of abandoned projects, no real drop in solution for getting a daemon sending performance metrics via StatsD into Graphite . So I created my own daemon on a Friday afternoon in Python and it was indeed done and not perfect.
Those who deal with big data probably know about Disco – a distributed computing framework aimed to provide a MapReduce platform for big data processing Python applications. We are proud to say that we are one of the largest users of Disco in the Netherlands.
As an owner of multiple high-traffic portals with lots of content served by CDN providers we want to ensure that the data on our portals loads fast. We have recently rolled out a Disco based solution that helps us to deal with this issue and continuously monitors availability and load performance of our content.
We put the basics of MapReduce paradigm, key points of writing MapReduce jobs with Disco and highlights of our solution together into one workshop. We showed participants how easy it is to develop their own big data applications that process a billion samples per day.
Speaking about MySQL UG NL Meetups – during the Q1 Meetup in February (hosted by Spil Games) one of our DBAs gave a presentation on database TCO (Total Cost of Ownership). We would like to share it here.
Apart from covering the topic of determining database maintenance costs the presentation included some ideas on possible improvements as well as findings of our own small case study. The slides can be found here: http://spil.com/databasetco
Next Friday (31st of May) the second MySQL Meetup User Group NL of this year will be hosted by Snow IT in Geldermalsen. It is great to see that various companies are hosting the meetup and that the diversity and number of people attending is increasing. In total three presentations will be given:
Choosing the Best Sharding Policy – Doran Levari (ScaleBase, using a video link)
Performance Monitoring with Statsd and Graphite – Art van Scheppingen (Spil Games)
Basic MySQL performance tuning for sysadmins – Daniël van Eeden (Snow)