Abstract
As AI and big data applications scale in model complexity and dataset size, the bottleneck has shifted from computation to data movement across compute, memory, and storage layers. This talk presents a cross-layer, co-designed approach to accelerate big data workloads by rethinking memory hierarchies and compute accelerators. I will discuss how my group addresses the novel memory systems (e.g., SCM-aware graph layout, SSD-augmented memory via SyncIO-Swap), and compute accelerators (e.g., KV cache scheduling in LLMs, and PIM-assisted ANN search). Through a unified lens of minimizing unnecessary data transfers, this work demonstrates significant speedups and resource efficiency across diverse platforms, positioning co-design as a central strategy in next-generation system architecture.