http://logic.pdmi.ras.ru/midas/?q=en/courses#largedata
We present algorithms and data structures for solving efficiently problems on very large data sets (of the order of terabytes or even petabytes). We start with a discussion of why standard algorithms and data structures are not suitable for solving problems on such massive data sets. This will motivate the need to develop algorithms that fully exploit the memory hierarchies of today's computers. We discuss paradigms, algorithms, and data structures for a two-level hierarchy; these algorithms are efficient for solving problems on massive data sets. We will also discuss the concept of cache-obliviousness and the most important paradigms for designing cache-oblivious algorithms. In summary, cache-oblivious algorithms are standard internal-memory algorithms that layout their data structures and access them in a fashion that helps standard paging algorithms to perform few page transfers (between virtual memory and main memory, or between main memory and L2 cache, etc ...). This leads to a clean model to design algorithms that automatically adapt to all levels of a multi-level hierarchy.