Big data processing with Disco

Those who deal with big data probably know about Disco – a distributed computing framework aimed to provide a MapReduce platform for big data processing Python applications. We are proud to say that we are one of the largest users of Disco in the Netherlands.

As an owner of multiple high-traffic portals with lots of content served by CDN providers we want to ensure that the data on our portals loads fast. We have recently rolled out a Disco based solution that helps us to deal with this issue and continuously monitors availability and load performance of our content.

We put the basics of MapReduce paradigm, key points of writing MapReduce jobs with Disco and highlights of our solution together into one workshop. We showed participants how easy it is to develop their own big data applications that process a billion samples per day.

If the topic sounds attractive to you – you’re welcome to have a look at the slides we used during the workshop: