Academic and industrial participants present research and experience papers that cover the full range of theory . USENIX NSDI, 2021 Acceptance Rate: 15.99% Fluid: Resource-Aware Hyperparameter Tuning Engine P. Yu*, J. Liu*, M. Chowdhury (*Equal contribution) MLSys, 2021 Acceptance Rate: 23.53% NetLock: Fast, Centralized Lock Management Using Programmable Switches Z. Yu, Y. Zhang, V. Braverman, M. Chowdhury, X. Jin ACM SIGCOMM, 2020 Acceptance Rate: 21.6% Compared to a state-of-the-art fuzzer, Fluffy improves the fuzzing throughput by 510 and the code coverage by 2.7 with various optimizations: in-process fuzzing, fuzzing harnesses for Ethereum clients, and semantic-aware mutation that reduces erroneous test cases. My paper has accepted to appear in the EuroSys2020; I will have a talk at the Hotstorage'19; The Paper about GCMA Accepted to TC; Erhu Feng, Xu Lu, Dong Du, Bicheng Yang, and Xueqiang Jiang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Yubin Xia, Binyu Zang, and Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China. Distributed systems are notoriously hard to implement correctly due to non-determinism. We built an FPGA prototype of the nanoPU fast path by modifying an open-source RISC-V CPU, and evaluated its performance using cycle-accurate simulations on AWS FPGAs. Researchers from the Software Systems Laboratory bagged a Best Paper Award at the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2021). Taking place in Carlsbad, CA from 11-13 July, OSDI is a highly selective flagship conference in computer science, especially on the topic of computer systems. We implement a variant of a log-structured merge tree in the storage device that not only indexes file objects, but also supports transactions and manages physical storage space. Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. Research Impact Score 9.24. . Existing decentralized systems like Steemit, OpenBazaar, and the growing number of blockchain apps provide alternatives to existing services. OSDI brings together professionals from academic and industrial backgrounds in a premier forum for discussing the design, implementation, and implications of systems software. Horcrux-compliant web servers perform offline analysis of all the JavaScript code on any frame they serve to conservatively identify, for every JavaScript function, the union of the page state that the function could access across all loads of that page. Log search and log archiving, despite being critical problems, are mutually exclusive. Alas, existing profiling techniques incur high overhead when used to identify data locality problems and cannot be deployed in production, where programs may exhibit previously-unseen performance problems. We first introduce two new hardware primitives: 1) Guarded Page Table (GPT), which protects page table pages to support page-level secure memory isolation; 2) Mountable Merkle Tree (MMT), which supports scalable integrity protection for secure memory. You must not improperly identify a PC member as a conflict if none of these three circumstances applies, even if for some other reason you want to avoid them reviewing your paper. The key insight in blk-switch is that Linux's multi-queue storage design, along with multi-queue network and storage hardware, makes the storage stack conceptually similar to a network switch. KEVIN combines a fast, lightweight, and POSIX compliant file system with a key-value storage device that performs in-storage indexing. Reviews will be available for response on Wednesday, March 3, 2021. She developed the technology for making network routing self-stabilizing, largely self-managing, and scalable. . A PC member is a conflict if any of the following three circumstances applies: Institution: You are currently employed at the same institution, have been previously employed at the same institution within the past two years (not counting concluded internships), or are going to begin employment at the same institution during the review period. In experiments with real DL jobs and with trace-driven simulations, Pollux reduces average job completion times by 37-50% relative to state-of-the-art DL schedulers, even when they are provided with ideal resource and training configurations for every job. Evaluations show that Vegito can perform 1.9 million TPC-C NewOrder transactions and 24 TPC-H-equivalent queries per second simultaneously, which retain the excellent performance of specialized OLTP and OLAP counterparts (e.g., DrTM+H and MonetDB). Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding, University of California, Santa Barbara. Forgot your password? However, the existing one-size-fits-all GNN implementations are insufficient to catch up with the evolving GNN architectures, the ever-increasing graph size, and the diverse node embedding dimensionality. Extensive experiments show that GNNAdvisor outperforms the state-of-the-art GNN computing frameworks, such as Deep Graph Library (3.02 faster on average) and NeuGraph (up to 4.10 faster), on mainstream GNN architectures across various datasets. Performance experiments show that GoNFS provides similar performance (e.g., at least 90% throughput across several benchmarks on an NVMe disk) to Linuxs NFS server exporting an ext4 file system, suggesting that GoJournal is a competitive journaling system. Her specialties include network routing protocols and network security. Proceedings Front Matter Session Chairs: Sebastian Angel, University of Pennsylvania, and Malte Schwarzkopf, Brown University, Ishtiyaque Ahmad, Yuntian Yang, Divyakant Agrawal, Amr El Abbadi, and Trinabh Gupta, University of California Santa Barbara. A graph embedding is a fixed length vector representation for each node (and/or edge-type) in a graph and has emerged as the de-facto approach to apply modern machine learning on graphs. Even the little publishable OS work that is not based on Linux still assumes the same simplistic hardware model (essentially a multiprocessor VAX) that bears little resemblance to modern reality. Submissions may include as many additional pages as needed for references but not for appendices. Title Page, Copyright Page, and List of Organizers | We describe PrivateKube, an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. This paper presents Dorylus: a distributed system for training GNNs. He joined Intel Research at Berkeley in April 2002 as a principal architect of PlanetLab, an open, shared platform for developing and deploying planetary-scale services. First, GNNAdvisor explores and identifies several performance-relevant features from both the GNN model and the input graph, and use them as a new driving force for GNN acceleration. Camera-ready submission (all accepted papers): 15 Mars 2022. We propose PET, the first DNN framework that optimizes tensor programs with partially equivalent transformations and automated corrections. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning, Oort: Efficient Federated Learning via Guided Participant Selection, PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections, Modernizing File System through In-Storage Indexing, Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes, Rearchitecting Linux Storage Stack for s Latency and High Throughput, Optimizing Storage Performance with Calibrated Interrupts, ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction, DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling, CLP: Efficient and Scalable Search on Compressed Text Logs, Polyjuice: High-Performance Transactions via Learned Concurrency Control, Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing, The nanoPU: A Nanosecond Network Stack for Datacenters, Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator, Scalable Memory Protection in the PENGLAI Enclave, NrOS: Effective Replication and Sharing in an Operating System, Addra: Metadata-private voice communication over fully untrusted infrastructure, Bringing Decentralized Search to Decentralized Services, Finding Consensus Bugs in Ethereum via Multi-transaction Differential Fuzzing, MAGE: Nearly Zero-Cost Virtual Memory for Secure Computation, Zeph: Cryptographic Enforcement of End-to-End Data Privacy, It's Time for Operating Systems to Rediscover Hardware, DistAI: Data-Driven Automated Invariant Learning for Distributed Protocols, GoJournal: a verified, concurrent, crash-safe journaling system, STORM: Refinement Types for Secure Web Applications, Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation, SANRAZOR: Reducing Redundant Sanitizer Checks in C/C++ Programs, Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads, GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs, Marius: Learning Massive Graph Embeddings on a Single Machine, P3: Distributed Deep Graph Learning at Scale. Conference site 49 papers accepted out of 251 submitted. USENIX discourages program co-chairs from submitting papers to the conferences they organize, although they are allowed to do so. Dorylus is up to 3.8 faster and 10.7 cheaper compared to existing sampling-based systems. The co-chairs may then share that paper with the workshops organizers and discuss it with them. The chairs will review paper conflicts to ensure the integrity of the reviewing process, adding or removing conflicts if necessary. We implement and evaluate a suite of applications, including MICA, Raft and Set Algebra for document retrieval; and we demonstrate that the nanoPU can be used as a high performance, programmable alternative for one-sided RDMA operations. We present NrOS, a new OS kernel with a safer approach to synchronization that runs many POSIX programs. Consensus bugs are extremely rare but can be exploited for network split and theft, which cause reliability and security-critical issues in the Ethereum ecosystem. The key insight guiding our design is computation separation. One classical approach is to increase the efficiency of an allocator to minimize the cycles spent in the allocator code. will work with the steering committee to ensure that the symposium program will accommodate presentations for all accepted papers. Existing frameworks optimize tensor programs by applying fully equivalent transformations, which maintain equivalence on every element of output tensors. Session Chairs: Moshe Gabel, University of Toronto, and Joseph Gonzalez, University of California, Berkeley, John Thorpe, Yifan Qiao, Jonathan Eyolfson, and Shen Teng, UCLA; Guanzhou Hu, UCLA and University of Wisconsin, Madison; Zhihao Jia, CMU; Jinliang Wei, Google Brain; Keval Vora, Simon Fraser; Ravi Netravali, Princeton University; Miryung Kim and Guoqing Harry Xu, UCLA. USENIX new Date().getFullYear()>document.write(new Date().getFullYear()); Grants for Black Computer Science Students Application, Propose an interesting, compelling solution, Demonstrate the practicality and benefits of the solution, Clearly describe the paper's contributions, Clearly articulate the advances beyond previous work. In addition, increasing CPU core counts further complicate kernel development. We observe that scalability challenges in training GNNs are fundamentally different from that in training classical deep neural networks and distributed graph processing; and that commonly used techniques, such as intelligent partitioning of the graph do not yield desired results. See the Preview Session page for an overview of the topics covered in the program. We demonstrate the above using design, implementation and evaluation of blk-switch, a new Linux kernel storage stack architecture. The NVMe zoned namespace (ZNS) is emerging as a new storage interface, where the logical address space is divided into fixed-sized zones, and each zone must be written sequentially for flash-memory-friendly access. The full program will be available in May 2021. See www.cs.cmu.edu/~mmv/Veloso.html for her scientific publications. Compared to existing baselines, DPF allows training more models under the same global privacy guarantee. Widely used log-search tools like Elasticsearch and Splunk Enterprise index the logs to provide fast search performance, yet the size of the index is within the same order of magnitude as the raw log size. His work has included the Barrelfish multikernel research OS, as well as work on distributed stream processors, and using formal specifications to describe the hardware/software interfaces of modern computer systems. Pollux promotes fairness among DL jobs competing for resources based on a more meaningful measure of useful job progress, and reveals a new opportunity for reducing DL cost in cloud environments. Instead, we propose addressing the root cause of the heuristics problem by allowing software to explicitly specify to the device if submitted requests are latency-sensitive. Welcome to the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI '22) submissions site. Session Chairs: Deniz Altinbken, Google, and Rashmi Vinayak, Carnegie Mellon University, Tanvir Ahmed Khan and Ian Neal, University of Michigan; Gilles Pokam, Intel Corporation; Barzan Mozafari and Baris Kasikci, University of Michigan. Zeph executes privacy-adhering data transformations in real-time and scales to thousands of data sources, allowing it to support large-scale low-latency data stream analytics. A scientific paper consists of a constellation of artifacts that extend beyond the document itself: software, hardware, evaluation data and documentation, raw survey results, mechanized proofs, models, test suites, benchmarks, and so on. Calibrated interrupts increase throughput by up to 35%, reduce CPU consumption by as much as 30%, and achieve up to 37% lower latency when interrupts are coalesced. Third, GNNAdvisor capitalizes on the GPU memory hierarchy for acceleration by gracefully coordinating the execution of GNNs according to the characteristics of the GPU memory structure and GNN workloads.