SEAN I YOUNG, PhD | About Me | Curriculum Vitae | Publications | Google Scholar | E-mail
Page 1 Back to the top
Foundations of LLM compressionPart 1: Weight quantization. Preprint, 2024. Compression of large
language models (LLMs) has emerged as an important problem to allow language model
deployment on resource-constrained devices, reduce computational costs, and mitigate the
environmental footprint of large-scale AI infrastructure. In this paper, we present the foundations
of LLM quantization from a convex optimization perspective and propose a quantization method
that builds on these foundations and outperforms previous methods. Our quantization framework,
CVXQ, scales to models containing hundreds of billions of weight parameters and provides users
with the flexibility to compress models to any specified model size, post-training. Read more here.
Fully convolutional SVR for single-stack MRI. Proc. IEEE CVPR, 2024. In magnetic resonance imaging
(MRI), slice-to-volume reconstruction (SVR) refers to computational reconstruction of an unknown 3D
magnetic resonance volume from stacks of 2D slices corrupted by motion. While promising, current
approaches to SVR require multiple slice stacks for accurate 3D reconstruction, leading to long scans and
limiting their use in time-sensitive applications such as fetal fMRI. Here, we propose a SVR method that
overcomes the shortcomings of previous work and produces state-of-the-art reconstructions in the presence
of extreme inter-slice motion. Inspired by the recent success of single-view depth estimation methods, we
formulate SVR as a single-stack motion estimation task and train a fully convolutional network to predict a
motion stack for a given slice stack, producing a 3D reconstruction as a byproduct of the predicted motion.
Extensive experiments on the SVR of adult and fetal brains demonstrate that our fully convolutional
method is twice as accurate as previous SVR methods. [Paper]
Supervision by denoising. IEEE Trans Pattern Anal Mach Intell, 2023. Learning-based image reconstruction
models, such as those based on the U-Net, require a large set of labeled images if good generalization is to
Page 2 Back to the top
be guaranteed. In some imaging domains, however, labeled data with pixel- or voxel-level label accuracy
are scarce due to the cost of acquiring them. This problem is exacerbated further in domains like medical
imaging, where there is no single ground truth label, resulting in large amounts of repeat variability in the
labels. In this work, we propose supervision by denoising(SUD), a framework that enables us to supervise
reconstruction models using their own denoised output as soft labels. SUD unifies stochastic averaging and
spatial denoising techniques under a spatio-temporal denoising framework and alternates denoising and
model weight update steps in an optimization framework for semi-supervision. [Paper]
Transform quantization for CNN compression. IEEE Trans Pattern Anal Mach Intell, 2022. In this work, we
compress convolutional neural network (CNN) post-training via transform quantization. CNN quantization
techniques often ignore the joint statistics of weights and activations, producing sub-optimal CNN
performance at a given bit-rate, or consider their joint statistics during training only and do not facilitate
efficient compression of already trained CNN models. The proposed transform quantization framework
unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to
facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first
introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-
distortion optimization problem. We then show that this problem can be solved using optimal bit-depth
allocation following decorrelation by the optimal End-to-end Learned Transform (ELT). [Paper]
Fast optical flow extraction from compressed video. IEEE Trans Image Process, 2020. We propose the fast
optical flow extractor, a filtering method that recovers artifact-free optical flow fields from HEVC-
compressed video. To extract accurate optical flow fields, we form a regularized optimization problem that
considers the smoothness of the solution and the pixelwise confidence weights of an artifact-ridden HEVC
motion field. Solving such an optimization problem is slow, so we first convert the problem into a
confidence-weighted filtering task. By leveraging the already-available HEVC motion parameters, we
achieve a 100-fold speed-up in the running times compared to similar methods, while producing subpixel-
accurate flow estimates. The fast optical flow extractor is useful when video frames are already available in
coded formats. Our method is not specific to a coder, and works with motion fields from video coders such
as H.264/AVC and HEVC. [Paper]
Page 3 Back to the top
Non-line-of-sight surface reconstruction using the directional light-cone transform. Proc IEEE CVPR,
2020. We propose a joint albedonormal approach to non-line- of-sight (NLOS) surface reconstruction
using the directional light-cone transform (D-LCT). While current NLOS imaging methods reconstruct
either the albedo or surface normals of the hidden scene, the two quantities provide complementary
information of the scene, so an efficient method to estimate both simultaneously is desirable. We formulate
the recovery of the two quantities as a vector deconvolution problem, and solve it using the Cholesky
Wiener decomposition. We show that surfaces fitted non-parametrically using our recovered normals are
more accurate than those produced with NLOS surface reconstruction methods recently proposed, and are
1,000× faster to compute than using inverse rendering. [Paper]
Solving vision problems via filtering. Proc IEEE ICCV, 2019. We propose a new, filtering approach for
solving a large number of regularized inverse problems commonly found in computer vision. Traditionally,
such problems are solved by finding the solution to the system of equations that expresses the first-order
optimality conditions of the problem. This can be slow if the system of equations is dense due to the use of
nonlocal regularization, necessitating iterative solvers such as successive over-relaxation or conjugate
gradients. In this paper, we show that similar solutions can be obtained more easily via filtering, obviating
the need to solve a potentially dense system of equations using slow iterative methods. [Paper]
Gaussian lifting for fast bilateral and nonlocal means filtering. IEEE Trans Image Process, 2020. This work
proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering,
appealing to the similarities between separable wavelet transforms and Gaussian pyramids. Accurately
implementing the filter is important not only for image processing applications, but also for a number of
recently proposed bilateralregularized inverse problems, where the accuracy of the solutions depends
ultimately on an accurate filter implementation. We show that our Gaussian lifting approach filters images
more accurately and efficiently across many filter scales. Adaptive lifting schemes for bilateral and nonlocal
means filtering are also explored. [Paper]
Page 4 Back to the top
Graph Laplacian regularization for robust optical flow estimation. IEEE Trans Image Process, 2020. This
paper proposes graph Laplacian regularization for robust estimation of optical flow. First, we analyze the
spectral properties of dense graph Laplacians and show that dense graphs achieve a better trade-off between
preserving flow discontinuities and filtering noise, compared with the usual Laplacian. Using this analysis,
we then propose a robust optical flow estimation method based on Gaussian graph Laplacians. We revisit
the framework of iteratively reweighted least-squares from the perspective of graph edge reweighting, and
employ the Welsch loss function to preserve flow discontinuities and handle occlusions. Our experiments
using the Middlebury and MPI-Sintel optical flow datasets demonstrate the robustness and the efficiency of
our proposed approach. [Paper]