Yikun Wu

[EuroSys'24] Volley

In this paper, we designed Volley, a network storage protocol with write-read ordering guarantees. Atop Volley, we build V-Cache and V-TriCache for storage and computing scenarios respectively. Extensive evaluations show that compared with the traditional NVMe-oF protocol, Volley increases the throughput by 1.57×. V-Cache improves the throughput by up to 6.84×, and V-TriCache reduces the workload total running time by to up to 16.7% compared with the state-of-the-art systems.

2025-07-20

#EuroSys

[FAST'24] TeRM

We present TeRM in this paper, an efficient approach to extending RDMA-attached memory with SSD. It onloads exception handling (i.e., RNIC page fault) from hardware to software. The experimental results on the microbenchmark and unmodified RDMA-based storage systems demonstrate the effectiveness of TeRM.

2025-07-20

#FAST

[FAST-25] GOGETAFS

We propose GOGETAFS, a novel DedupFS that leverages the file system mature I/O path and crash consistency mechanism to improve deduplication throughput. The key insight is to build a DedupFS with logical-fingerprint-physical (LFP), which is a novel mapping technique that merges the deduplication FP2P entry with the file system L2P entry. We implement and evaluate GOGETAFS in both PM and (emulated) ULL SSD platforms. The results suggest that GogetaFS outperforms existing DedupFSes, sometimes by an order of magnitude (e.g., in the SSD platform), and achieves minimized deduplication metadata maintenance overheads.

2025-07-20

#FAST

[OSDI'23] Chardonnay

This paper presents Chardonnay, a scale-out, generalpurpose, multi-versioned, on-disk transactional keyvalue store optimized for single datacenter deployments with fast 2PC. Chardonnay takes advantage of fast RPCs to support strictly serializable snapshot reads without relying on specialized clocks or assumptions about maximum clock skew. Chardonnay achieves high performance for high contention workloads by automatically and transparently loading and pinning data from slow storage to main memory prior to acquiring any locks, and avoids deadlocks by ordering its lock requests. We believe that the design principles of Chardonnay can also be applied in other settings, such as multi-core singlenode systems for high contention workloads.

2025-07-20

#OSDI

[OSDI'23] SMART

Based on a thorough theoretical and experimental analysis of tree indexes built on DM, this paper points out the performance bottleneck of B+ trees on DM due to severe read and write amplifications and then presents SMART, the first radixtree-based index on DM. SMART addresses the challenges of applying ART on DM, including a hybrid concurrency control scheme to reduce lock overhead and avoid cache thrashing, a read-delegation and write-combining technique to reduce redundant I/Os, and a tailed cache validation mechanism. Our evaluation results show that SMART outperforms the stateof-the-art B+ tree on DM by up to 6.1× under write-intensive workloads and 2.8× under read-only workloads.

2025-07-20

#OSDI

[OSDI'24] A Tale of Two Paths

We present Atlas, a hybrid dataplane that enables efficient far memory for bulk data and scattered objects simultaneously. Atlas outperforms both the state-of-the-art object-based and paging-based far memory systems.

2025-07-20

#OSDI

[FAST'20] An Empirical Guide of Persistent Memory

This paper has described the performance of Intel’s new Optane DIMMs across micro- and macro-level benchmarks. In doing so, we have extracted actionable guidelines for programmers to fully utilize these devices’ strengths. The devices have performance characteristics that lie in-between traditional storage and memory devices, yet they also present interesting performance pathologies. We believe that the devices will be useful in extending the quantity of memory available and in providing low-latency storage.

2025-07-20

#FAST