Course Projects
ECE408: CUDA Optimization for LeNet Urbana, IL Fall 2024
Individual Completed
  • Stream: Overlap the data transfer with kernel execution. In this way, I divide large vectors into segments and simultaneously execute a kernel while performing a copy between device and host memory.
  • Kernel Fusion: I first implement convolution with matrix multiplication by three kernel: unrolling kernel, shared matrix multiplication kernel and permute kernel. Then I use kernel fusion to combine three kernels into one kernel for optimization.
  • High-Level Libraries: I use Tensor Cores via Warp Matrix Functions and CUDA Basic Linear Algebra Subprograms (cuBLAS) library.
  • Other optimization: I also make other optmization, such as constant memory for weight matrix, "__restrict__" keyword and loop unrolling.
ECE391: Computer Systems Engineering Implementation Urbana, IL Fall 2023
Group member of Four-member Team
  • Constructed a Linux-like operating system kernel with C, having basic function such as paging virtual memory, fully functional IDT, GDT and i8259-based interrupt controller, etc.
  • Constructed a file system, operating device driver such as Real Time Clock, keyboard, Programmable Interval Timer and ATA driver.
  • Used x86 to establish the system call linkage between user-level program and kernel, passing all test cases provided by the course. Furthermore, realized single CPU task scheduling and multiple terminals switching.
  • Full point for the overall 5-checkpoints project.
Awards