Publications – Sean I Young, PhD

Page 1 Back to the top

Foundations of LLM compression—Part 1: Weight quantization. Preprint, 2024. Compression of large

language models (LLMs) has emerged as an important problem to allow language model

deployment on resource-constrained devices, reduce computational costs, and mitigate the

environmental footprint of large-scale AI infrastructure. In this paper, we present the foundations

of LLM quantization from a convex optimization perspective and propose a quantization method

that builds on these foundations and outperforms previous methods. Our quantization framework,

CVXQ, scales to models containing hundreds of billions of weight parameters and provides users

with the flexibility to compress models to any specified model size, post-training. Read more here.

Fully convolutional SVR for single-stack MRI. Proc. IEEE CVPR, 2024. In magnetic resonance imaging

(MRI), slice-to-volume reconstruction (SVR) refers to computational reconstruction of an unknown 3D

magnetic resonance volume from stacks of 2D slices corrupted by motion. While promising, current

approaches to SVR require multiple slice stacks for accurate 3D reconstruction, leading to long scans and

limiting their use in time-sensitive applications such as fetal fMRI. Here, we propose a SVR method that

overcomes the shortcomings of previous work and produces state-of-the-art reconstructions in the presence

of extreme inter-slice motion. Inspired by the recent success of single-view depth estimation methods, we

formulate SVR as a single-stack motion estimation task and train a fully convolutional network to predict a

motion stack for a given slice stack, producing a 3D reconstruction as a byproduct of the predicted motion.

Extensive experiments on the SVR of adult and fetal brains demonstrate that our fully convolutional

method is twice as accurate as previous SVR methods. [Paper]

Supervision by denoising. IEEE Trans Pattern Anal Mach Intell, 2023. Learning-based image reconstruction

models, such as those based on the U-Net, require a large set of labeled images if good generalization is to

Page 2 Back to the top

be guaranteed. In some imaging domains, however, labeled data with pixel- or voxel-level label accuracy

are scarce due to the cost of acquiring them. This problem is exacerbated further in domains like medical

imaging, where there is no single ground truth label, resulting in large amounts of repeat variability in the

labels. In this work, we propose “supervision by denoising” (SUD), a framework that enables us to supervise

reconstruction models using their own denoised output as soft labels. SUD unifies stochastic averaging and

spatial denoising techniques under a spatio-temporal denoising framework and alternates denoising and

model weight update steps in an optimization framework for semi-supervision. [Paper]

Transform quantization for CNN compression. IEEE Trans Pattern Anal Mach Intell, 2022. In this work, we

compress convolutional neural network (CNN) post-training via transform quantization. CNN quantization

techniques often ignore the joint statistics of weights and activations, producing sub-optimal CNN

performance at a given bit-rate, or consider their joint statistics during training only and do not facilitate

efficient compression of already trained CNN models. The proposed transform quantization framework

unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to

facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first

introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-

distortion optimization problem. We then show that this problem can be solved using optimal bit-depth

allocation following decorrelation by the optimal End-to-end Learned Transform (ELT). [Paper]

Fast optical flow extraction from compressed video. IEEE Trans Image Process, 2020. We propose the fast

optical flow extractor, a filtering method that recovers artifact-free optical flow fields from HEVC-

compressed video. To extract accurate optical flow fields, we form a regularized optimization problem that

considers the smoothness of the solution and the pixelwise confidence weights of an artifact-ridden HEVC

motion field. Solving such an optimization problem is slow, so we first convert the problem into a

confidence-weighted filtering task. By leveraging the already-available HEVC motion parameters, we

achieve a 100-fold speed-up in the running times compared to similar methods, while producing subpixel-

accurate flow estimates. The fast optical flow extractor is useful when video frames are already available in

coded formats. Our method is not specific to a coder, and works with motion fields from video coders such

as H.264/AVC and HEVC. [Paper]

Page 3 Back to the top

Non-line-of-sight surface reconstruction using the directional light-cone transform. Proc IEEE CVPR,

2020. We propose a joint albedo–normal approach to non-line- of-sight (NLOS) surface reconstruction

using the directional light-cone transform (D-LCT). While current NLOS imaging methods reconstruct

either the albedo or surface normals of the hidden scene, the two quantities provide complementary

information of the scene, so an efficient method to estimate both simultaneously is desirable. We formulate

the recovery of the two quantities as a vector deconvolution problem, and solve it using the Cholesky–

Wiener decomposition. We show that surfaces fitted non-parametrically using our recovered normals are

more accurate than those produced with NLOS surface reconstruction methods recently proposed, and are

1,000× faster to compute than using inverse rendering. [Paper]

Solving vision problems via filtering. Proc IEEE ICCV, 2019. We propose a new, filtering approach for

solving a large number of regularized inverse problems commonly found in computer vision. Traditionally,

such problems are solved by finding the solution to the system of equations that expresses the first-order

optimality conditions of the problem. This can be slow if the system of equations is dense due to the use of

nonlocal regularization, necessitating iterative solvers such as successive over-relaxation or conjugate

gradients. In this paper, we show that similar solutions can be obtained more easily via filtering, obviating

the need to solve a potentially dense system of equations using slow iterative methods. [Paper]

Gaussian lifting for fast bilateral and nonlocal means filtering. IEEE Trans Image Process, 2020. This work

proposes the Gaussian lifting framework for efficient and accurate bilateral and nonlocal means filtering,

appealing to the similarities between separable wavelet transforms and Gaussian pyramids. Accurately

implementing the filter is important not only for image processing applications, but also for a number of

recently proposed bilateralregularized inverse problems, where the accuracy of the solutions depends

ultimately on an accurate filter implementation. We show that our Gaussian lifting approach filters images

more accurately and efficiently across many filter scales. Adaptive lifting schemes for bilateral and nonlocal

means filtering are also explored. [Paper]

Page 4 Back to the top

Graph Laplacian regularization for robust optical flow estimation. IEEE Trans Image Process, 2020. This

paper proposes graph Laplacian regularization for robust estimation of optical flow. First, we analyze the

spectral properties of dense graph Laplacians and show that dense graphs achieve a better trade-off between

preserving flow discontinuities and filtering noise, compared with the usual Laplacian. Using this analysis,

we then propose a robust optical flow estimation method based on Gaussian graph Laplacians. We revisit

the framework of iteratively reweighted least-squares from the perspective of graph edge reweighting, and

employ the Welsch loss function to preserve flow discontinuities and handle occlusions. Our experiments

using the Middlebury and MPI-Sintel optical flow datasets demonstrate the robustness and the efficiency of

our proposed approach. [Paper]