Me Samuli Laine, Ph.D.

Principal Research Scientist, NVIDIA
Docent, Aalto University

NVIDIA
Porkkalankatu 1, 5th floor
00180 Helsinki
Finland

E-mail: slaine nvidia.com

Publications

Expand all Collapse all
Samuli Laine, Tero Karras, Timo Aila, Antti Herva and Jaakko Lehtinen.
Facial Performance Capture with Deep Neural Networks.
NVIDIA Technical Report NVR-2016-04, 2016. arXiv:1609.06536 [cs.CV].
Abstract Bibtex [arXiv]
Abstract

We present a deep learning technique for facial performance capture, i.e., the transfer of video footage into a motion sequence of a 3D mesh representing an actor's face. Specifically, we build on a conventional capture pipeline based on computer vision and multi-view video, and use its results to train a deep neural network to produce similar output from a monocular video sequence. Once trained, our network produces high-quality results for unseen inputs with greatly reduced effort compared to the conventional system.

In practice, we have found that approximately 10 minutes worth of high-quality data is sufficient for training a network that can then automatically process as much footage from video to 3D as needed. This yields major savings in the development of modern narrative-driven video games involving digital doubles of actors and potentially hours of animated dialogue per character.

@techreport{Laine2015tr1,
  author      = {Samuli Laine and Tero Karras and Timo Aila and Antti Herva and Jaakko Lehtinen},
  title       = {Facial Performance Capture with Deep Neural Networks},
  month       = sep,
  year        = 2016,
  institution = {NVIDIA Corporation},
  type        = {NVIDIA Technical Report},
  number      = {NVR-2016-04},
  eprint      = {arXiv:1609.06536},
}
Samuli Laine and Tero Karras.
Apex Point Map for Constant-Time Bounding Plane Approximation.
Eurographics Symposium on Rendering 2015 (EI&I track).
Abstract Bibtex [PDF] [Slides]
Abstract

We introduce apex point map, a simple data structure for constructing conservative bounds for rigid objects. The data structure is distilled from a dense k-DOP, and can be queried in constant time to determine a tight bounding plane with any given normal vector. Both precalculation and lookup can be implemented very efficiently on current GPUs. Applications include, e.g., finding tight world-space bounds for transformed meshes, determining per-object shadow map extents, more accurate view frustum culling, and collision detection.

@inproceedings{Laine2015egsr,
  author =    {Samuli Laine and Tero Karras},
  title =     {Apex Point Map for Constant-Time Bounding Plane Approximation},
  booktitle = {Eurographics Symposium on Rendering - Experimental Ideas & Implementations},
  year =      {2015},
  editor =    {Jaakko Lehtinen and Derek Nowrouzezahrai},
  publisher = {The Eurographics Association},
  DOI =       {10.2312/sre.20151166},
}
Ari Silvennoinen, Hannu Saransaari, Samuli Laine and Jaakko Lehtinen.
Occluder Simplification using Planar Sections.
Computer Graphics Forum 33(1), 2014.
Abstract Bibtex [PDF] [Video] [Project page]
Abstract

We present a method for extreme occluder simplification. We take a triangle soup as input, and produce a small set of polygons with closely matching occlusion properties. In contrast to methods that optimize the original geometry, our algorithm has very few requirements for the input—specifically, the input does not need to be a watertight, two-manifold mesh. This robustness is achieved by working on a well-behaved, discretized representation of the input instead of the original, potentially badly structured geometry. We first formulate the algorithm for individual occluders, and further introduce a hierarchy for handling large, complex scenes.

@article{Silvennoinen2014CGF,
  author  = {Silvennoinen, Ari and Saransaari, Hannu and Laine, Samuli and Lehtinen, Jaakko},
  title   = {Occluder Simplification Using Planar Sections},
  journal = {Computer Graphics Forum},
  volume  = {33},
  number  = {1},
  issn    = {1467-8659},
  url     = {http://dx.doi.org/10.1111/cgf.12271},
  doi     = {10.1111/cgf.12271},
  pages   = {235--245},
  year    = {2014},
}
Jaakko Lehtinen, Tero Karras, Samuli Laine, Miika Aittala, Frédo Durand and Timo Aila.
Gradient-Domain Metropolis Light Transport.
ACM Transactions on Graphics 32(4) (SIGGRAPH 2013).
Abstract Bibtex [PDF] [Project page] [Door scene]
Abstract

We introduce a novel Metropolis rendering algorithm that directly computes image gradients, and reconstructs the final image from the gradients by solving a Poisson equation. The reconstruction is aided by a low-fidelity approximation of the image computed during gradient sampling. As an extension of path-space Metropolis light transport, our algorithm is well suited for difficult transport scenarios. We demonstrate that our method outperforms the state-of-the-art in several well-known test scenes. Additionally, we analyze the spectral properties of gradient-domain sampling, and compare it to the traditional image-domain sampling.

@article{Lehtinen2013sg,
  author =    {Jaakko Lehtinen and Tero Karras and Samuli Laine and Miika Aittala and Fr\'{e}do Durand and Timo Aila},
  title =     {Gradient-Domain Metropolis Light Transport},
  journal =   {ACM Trans. Graph.},
  year =      {2013},
  volume =    {32},
  number =    {4},
}
Samuli Laine, Tero Karras and Timo Aila.
Megakernels Considered Harmful: Wavefront Path Tracing on GPUs.
High-Performance Graphics 2013.
Abstract Bibtex [PDF] [Slides]
Abstract

When programming for GPUs, simply porting a large CPU program into an equally large GPU kernel is generally not a good approach. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register usage that lessens the latency-hiding capability that is essential for the high-latency, high-bandwidth memory system of a GPU. In this paper, we implement a path tracer on a GPU using a wavefront formulation, avoiding these pitfalls that can be especially prominent when using materials that are expensive to evaluate. We compare our performance against the traditional megakernel approach, and demonstrate that the wavefront formulation is much better suited for real-world use cases where multiple complex materials are present in the scene.

@InProceedings{Laine2013hpg,
  author =    {Samuli Laine and Tero Karras and Timo Aila},
  title =     {Megakernels Considered Harmful: Wavefront Path Tracing on {GPU}s},
  booktitle = {Proceedings of High-Performance Graphics 2013},
  year =      {2013},
}
Timo Aila, Tero Karras and Samuli Laine.
On Quality Metrics of Bounding Volume Hierarchies.
High-Performance Graphics 2013.
Abstract Bibtex [PDF]
Abstract

The surface area heuristic (SAH) is widely used as a predictor for ray tracing performance, and as a heuristic to guide the construction of spatial acceleration structures. We investigate how well SAH actually predicts ray tracing performance of a bounding volume hierarchy (BVH), observe that this relationship is far from perfect, and then propose two new metrics that together with SAH almost completely explain the measured performance. Our observations shed light on the increasingly common situation that a supposedly good tree construction algorithm produces trees that are slower to trace than expected. We also note that the trees constructed using greedy top-down algorithms are consistently faster to trace than SAH indicates and are also more SIMD-friendly than competing approaches.

@inproceedings{Aila2013hpg,
  author =    {Timo Aila and Tero Karras and Samuli Laine},
  title =     {On Quality Metrics of Bounding Volume Hierarchies},
  booktitle = {Proc. High-Performance Graphics},
  year =      {2013},
}
Samuli Laine.
A Topological Approach to Voxelization.
Computer Graphics Forum 32(4) (EGSR 2013).
Abstract Bibtex [PDF] [Slides]
Abstract

We present a novel approach to voxelization, based on intersecting the input primitives against intersection targets in the voxel grid. Instead of relying on geometric proximity measures, our approach is topological in nature, i.e., it builds on the connectivity and separability properties of the input and the intersection targets. We discuss voxelization of curves and surfaces in both 2D and 3D, and derive intersection targets that produce voxelizations with various connectivity, separability and thinness properties. The simplicity of our method allows for easy proofs of these properties. Our approach is directly applicable to curved primitives, and it is independent of input tessellation.

@article{Laine2013egsr,
  author =    {Samuli Laine},
  title =     {A Topological Approach to Voxelization},
  journal =   {Computer Graphics Forum (Proc. Eurographics Symposium on Rendering 2013)}
  volume =    {32},
  number =    {4},
  year =      {2013},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
Jaakko Lehtinen, Timo Aila, Samuli Laine and Frédo Durand.
Reconstructing the Indirect Light Field for Global Illumination.
ACM Transactions on Graphics 31(4) (SIGGRAPH 2012).
Abstract Bibtex [PDF] [Project page]
Abstract

Stochastic techniques for rendering indirect illumination suffer from noise due to the variance in the integrand. In this paper, we describe a general reconstruction technique that exploits anisotropy in the light field and permits efficient reuse of input samples between pixels or world-space locations, multiplying the effective sampling rate by a large factor. Our technique introduces visibility-aware anisotropic reconstruction to indirect illumination, ambient occlusion and glossy reflections. It operates on point samples without knowledge of the scene, and can thus be seen as an advanced image filter. Our results show dramatic improvement in image quality while using very sparse input samplings.

@article{Lehtinen2012sg,
  author =    {Jaakko Lehtinen and Timo Aila and Samuli Laine and Fr\'{e}do Durand},
  title =     {Reconstructing the Indirect Light Field for Global Illumination},
  journal =   {ACM Transactions on Graphics},
  year =      {2012},
  volume =    {31},
  number =    {4},
}
Timo Aila, Samuli Laine and Tero Karras.
Understanding the Efficiency of Ray Traversal on GPUs – Kepler and Fermi Addendum.
NVIDIA Technical Report NVR-2012-02, 2012. Poster at High-Performance Graphics 2012.
Abstract Bibtex [PDF] [Poster] [Code] [Project page]
Abstract

This technical report is an addendum to the HPG2009 paper "Understanding the Efficiency of Ray Traversal on GPUs", and provides citable performance results for Kepler and Fermi architectures. We explain how to optimize the traversal and intersection kernels for these newer platforms, and what the important architectural limiters are. We plot the relative ray tracing performance between architecture generations against the available memory bandwidth and peak FLOPS, and demonstrate that ray tracing is still, even with incoherent rays and more complex scenes, almost entirely limited by the available FLOPS. We will also discuss two esoteric instructions, present in both Fermi and Kepler, and show that they can be safely used for faster acceleration structure traversal.

@techreport{Aila:Efficiency:NVIDIA:2012,
  author      = {Timo Aila and Samuli Laine and Tero Karras},
  title       = {Understanding the Efficiency of Ray Traversal on {GPU}s -- {K}epler and {F}ermi Addendum},
  month       = jun,
  year        = 2012,
  institution = {NVIDIA Corporation},
  type        = {NVIDIA Technical Report},
  number      = {NVR-2012-02},
}
Samuli Laine, Timo Aila, Tero Karras and Jaakko Lehtinen.
Clipless Dual-Space Bounds for Faster Stochastic Rasterization.
ACM Transactions on Graphics 30(4) (SIGGRAPH 2011).
Abstract Bibtex [PDF] [Video] [Slides]
Abstract

We present a novel method for increasing the efficiency of stochastic rasterization of motion and defocus blur. Contrary to earlier approaches, our method is efficient even with the low sampling densities commonly encountered in realtime rendering, while allowing the use of arbitrary sampling patterns for maximal image quality. Our clipless dual-space formulation avoids problems with triangles that cross the camera plane during the shutter interval. The method is also simple to plug into existing rendering systems.

@article{Laine2011sg,
  author =    {Samuli Laine and Timo Aila and Tero Karras and Jaakko Lehtinen},
  title =     {Clipless Dual-Space Bounds for Faster Stochastic Rasterization},
  journal =   {ACM Transactions on Graphics},
  year =      {2011},
  volume =    {30},
  number =    {4},
}
Jaakko Lehtinen, Timo Aila, Jiawen Chen, Samuli Laine and Frédo Durand.
Temporal Light Field Reconstruction for Rendering Distribution Effects.
ACM Transactions on Graphics 30(4) (SIGGRAPH 2011).
Abstract Bibtex [PDF] [Videos] [Images] [Project page]
Abstract

Traditionally, effects that require evaluating multidimensional integrals for each pixel, such as motion blur, depth of field, and soft shadows, suffer from noise due to the variance of the highdimensional integrand. In this paper, we describe a general reconstruction technique that exploits the anisotropy in the temporal light field and permits efficient reuse of samples between pixels, multiplying the effective sampling rate by a large factor. We show that our technique can be applied in situations that are challenging or impossible for previous anisotropic reconstruction methods, and that it can yield good results with very sparse inputs. We demonstrate our method for simultaneous motion blur, depth of field, and soft shadows.

@article{Lehtinen2011sg,
  author =    {Jaakko Lehtinen and Timo Aila and Jiawen Chen and Samuli Laine and Fr\'{e}do Durand},
  title =     {Temporal Light Field Reconstruction for Rendering Distribution Effects},
  journal =   {ACM Transactions on Graphics},
  year =      {2011},
  volume =    {30},
  number =    {4},
}
Samuli Laine and Tero Karras.
Improved Dual-Space Bounds for Simultaneous Motion and Defocus Blur.
NVIDIA Technical Report NVR-2011-004, 2011.
Abstract Bibtex [PDF]
Abstract

Our previous paper on stochastic rasterization [Laine et al. 2011] presented a method for constructing time and lens bounds to accelerate stochastic rasterization by skipping the costly 5D coverage test. Although the method works for the combined case of simultaneous motion and defocus blur, its efficiency drops when significant amounts of both effects are present. In this paper, we describe a bound computation method that treats time and lens domains in a unified fashion, and yields tight bounds also for the combined case.

@techreport{Laine:Bounds2:NVIDIA:2011,
    author      = {Samuli Laine and Tero Karras},
    title       = {Improved Dual-Space Bounds for Simultaneous Motion and Defocus Blur},
    month       = nov,
    year        = 2011,
    institution = {NVIDIA Corporation},
    type        = {NVIDIA Technical Report},
    number      = {NVR-2011-004},
}
Samuli Laine and Tero Karras.
Efficient Triangle Coverage Tests for Stochastic Rasterization.
NVIDIA Technical Report NVR-2011-003, 2011.
Abstract Bibtex [PDF]
Abstract

In our previous paper on stochastic rasterization [Laine et al. 2011], we stated that a 5D triangle coverage test consumes approximately 25 FMA (fused multiply-add) operations. This technical report details the operation of our coverage test. We also provide variants specialized for defocus-only and motion-only cases.

@techreport{Laine:Coverage:NVIDIA:2011,
    author      = {Samuli Laine and Tero Karras},
    title       = {Efficient Triangle Coverage Tests for Stochastic Rasterization},
    month       = sep,
    year        = 2011,
    institution = {NVIDIA Corporation},
    type        = {NVIDIA Technical Report},
    number      = {NVR-2011-003},
}
Samuli Laine and Tero Karras.
High-Performance Software Rasterization on GPUs.
High-Performance Graphics 2011.
Abstract Bibtex [PDF] [Slides] [Code]
Abstract

In this paper, we implement an efficient, completely software-based graphics pipeline on a GPU. Unlike previous approaches, we obey ordering constraints imposed by current graphics APIs, guarantee hole-free rasterization, and support multisample antialiasing. Our goal is to examine the performance implications of not exploiting the fixed-function graphics pipeline, and to discern which additional hardware support would benefit software-based graphics the most.

We present significant improvements over previous work in terms of scalability, performance, and capabilities. Our pipeline is malleable and easy to extend, and we demonstrate that in a wide variety of test cases its performance is within a factor of 2–8x compared to the hardware graphics pipeline on a top of the line GPU.

Our implementation is open sourced and available at http://code.google.com/p/cudaraster/.

@InProceedings{Laine2011hpg,
  author =    {Samuli Laine and Tero Karras},
  title =     {High-Performance Software Rasterization on {GPU}s},
  booktitle = {Proceedings of High-Performance Graphics 2011},
  year =      {2011},
}
Samuli Laine and Tero Karras.
Stratified Sampling for Stochastic Transparency.
Computer Graphics Forum 30(4) (EGSR 2011).
Abstract Bibtex [PDF] [Slides]
Abstract

The traditional method of rendering semi-transparent surfaces using alpha blending requires sorting the surfaces in depth order. There are several techniques for order-independent transparency, but most require either unbounded storage or can be fragile due to forced compaction of information during rendering. Stochastic transparency works in a fixed amount of storage and produces results with the correct expected value. However, carelessly chosen sampling strategies easily result in high variance of the final pixel colors, showing as noise in the image. In this paper, we describe a series of improvements to stochastic transparency that enable stratified sampling in both spatial and alpha domains. As a result, the amount of noise in the image is significantly reduced, while the result remains unbiased.

@article{Laine2011egsr,
  author =    {Samuli Laine and Tero Karras},
  title =     {Stratified Sampling for Stochastic Transparency},
  journal =   {Computer Graphics Forum (Proc. Eurographics Symposium on Rendering 2011)}
  volume =    {30},
  number =    {4},
  year =      {2011},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
Samuli Laine and Tero Karras.
Efficient Sparse Voxel Octrees.
IEEE Transactions on Visualization and Computer Graphics 17(8), 2011.
Abstract Bibtex [IEEE Digital Library] [Code]
Abstract

In this paper we examine the possibilities of using voxel representations as a generic way for expressing complex and feature-rich geometry on current and future GPUs. We present in detail a compact data structure for storing voxels and an efficient algorithm for performing ray casts using this structure.

We augment the voxel data with novel contour information that increases geometric resolution, allows more compact encoding of smooth surfaces, and accelerates ray casts. We also employ a novel normal compression format for storing high-precision object-space normals. Finally, we present a variable-radius post-process filtering technique for smoothing out blockiness caused by discrete sampling of shading attributes.

Based on benchmark results, we show that our voxel representation is competitive with triangle-based representations in terms of ray casting performance, while allowing tremendously greater geometric detail and unique shading information for every voxel.

Our voxel codebase is open sourced and available at http://code.google.com/p/efficient-sparse-voxel-octrees/.

@article{10.1109/TVCG.2010.240,
  author =    {Samuli Laine and Tero Karras},
  title =     {Efficient Sparse Voxel Octrees},
  journal =   {IEEE Transactions on Visualization and Computer Graphics},
  volume =    {17},
  issn =      {1077-2626},
  year =      {2011},
  pages =     {1048-1059},
  doi =       {http://doi.ieeecomputersociety.org/10.1109/TVCG.2010.240},
  publisher = {IEEE Computer Society},
  address =   {Los Alamitos, CA, USA},
}
Peter Shirley, Timo Aila, Jonathan Cohen, Eric Enderton, Samuli Laine, David Luebke and Morgan McGuire.
A Local Image Reconstruction Algorithm for Stochastic Rendering.
ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2011.
Abstract Bibtex [PDF]
Abstract

Stochastic renderers produce unbiased but noisy images of scenes that include the advanced camera effects of motion and defocus blur and possibly other effects such as transparency. We present a simple algorithm that selectively adds bias in the form of image space blur to pixels that are unlikely to have high frequency content in the final image. For each pixel, we sweep once through a fixed neighborhood of samples in front to back order, using a simple accumulation scheme. We achieve good quality images with only 16 samples per pixel, making the algorithm potentially practical for interactive stochastic rendering in the near future.

@InProceedings{Shirley2011i3d,
  author =    {Peter Shirley and Timo Aila and Jonathan Cohen and Eric Enderton and Samuli Laine and David Luebke and Morgan Mc{G}uire},
  title =     {A Local Image Reconstruction Algorithm for Stochastic Rendering},
  booktitle = {Proceedings of ACM SIGGRAPH 2011 Symposium on Interactive 3D Graphics and Games},
  pages =     {9--13},
  year =      {2011},
  publisher = {ACM Press},
}
Samuli Laine.
Restart Trail for Stackless BVH Traversal.
High-Performance Graphics 2010.
Abstract Bibtex [PDF] [Slides]
Abstract

A ray cast algorithm utilizing a hierarchical acceleration structure needs to perform a tree traversal in the hierarchy. In its basic form, executing the traversal requires a stack that holds the nodes that are still to be processed. In some cases, such a stack can be prohibitively expensive to maintain or access, due to storage or memory bandwidth limitations. The stack can, however, be eliminated or replaced with a fixed-size buffer using so-called stackless or short stack algorithms. These require that the traversal can be restarted from root so that the already processed part of the tree is not entered again. For kd-tree ray casts, this is accomplished easily by ray shortening, but the approach does not extend to other kinds of hierarchies such as BVHs.

In this paper, we introduce restart trail, a simple algorithmic method that makes restarts possible regardless of the type of hierarchy by storing one bit of data per level. This enables stackless and short stack traversal for BVH ray casts, where using a full stack or constraining the traversal order have so far been the only options.

@InProceedings{Laine2010hpg,
  author =    {Samuli Laine},
  title =     {Restart Trail for Stackless {BVH} Traversal},
  booktitle = {Proceedings of High-Performance Graphics 2010},
  year =      {2010},
}
Samuli Laine and Tero Karras.
Two Methods for Fast Ray-Cast Ambient Occlusion.
Computer Graphics Forum 29(4) (EGSR 2010).
Abstract Bibtex [PDF] [Slides]
Abstract

Ambient occlusion has proven to be a useful tool for producing realistic images, both in offline rendering and interactive applications. In production rendering, ambient occlusion is typically computed by casting a large number of short shadow rays from each visible point, yielding unparalleled quality but long rendering times. Interactive applications typically use screen-space approximations which are fast but suffer from systematic errors due to missing information behind the nearest depth layer.

In this paper, we present two efficient methods for calculating ambient occlusion so that the results match those produced by a ray tracer. The first method is targeted for rasterization-based engines, and it leverages the GPU graphics pipeline for finding occlusion relations between scene triangles and the visible points. The second method is a drop-in replacement for ambient occlusion computation in offline renderers, allowing the querying of ambient occlusion for any point in the scene. Both methods are based on the principle of simultaneously computing the result of all shadow rays for a single receiver point.

@article{Laine2010egsr,
  author =    {Samuli Laine and Tero Karras},
  title =     {Two Methods for Fast Ray-Cast Ambient Occlusion},
  journal =   {Computer Graphics Forum (Proc. Eurographics Symposium on Rendering 2010)}
  volume =    {29},
  number =    {4},
  year =      {2010},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
Samuli Laine and Tero Karras.
Efficient Sparse Voxel Octrees – Analysis, Extensions, and Implementation.
NVIDIA Technical Report NVR-2010-001, 2010.
Abstract Bibtex [PDF] [Code]
Abstract

This technical report extends our previous paper on sparse voxel octrees. We first discuss the benefits and drawbacks of voxel representations and how the storage space requirements behave for different kinds of content. Then, we explain in detail our compact data structure for storing voxels and an efficient ray cast algorithm that utilizes this structure, including the contributions of the original paper: additional voxel contour information, normal compression format for storing high-precision object-space normals, post-process filtering technique for smoothing out blockiness of shading, and beam optimization for accelerating ray casts.

Management of voxel data in memory and on disk is covered in more detail, as well as the construction of voxel hierarchy. We extend the results section considerably, providing detailed statistics of our test cases. Finally, we discuss the technological barriers and problems that would need to be overcome before voxels could be widely adopted as a generic content format.

Our voxel codebase is open sourced and available at http://code.google.com/p/efficient-sparse-voxel-octrees.

@techreport{Laine:Octree:NVIDIA:2010,
    author      = {Samuli Laine and Tero Karras},
    title       = {Efficient Sparse Voxel Octrees -- Analysis, Extensions, and Implementation},
    month       = feb,
    year        = 2010,
    institution = {NVIDIA Corporation},
    type        = {NVIDIA Technical Report},
    number      = {NVR-2010-001},
}
Samuli Laine and Tero Karras.
Efficient Sparse Voxel Octrees.
ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2010.
Abstract Bibtex [PDF] [Video (Xvid)] [Slides] [Code]
Abstract

In this paper we examine the possibilities of using voxel representations as a generic way for expressing complex and feature-rich geometry on current and future GPUs. We present in detail a compact data structure for storing voxels and an efficient algorithm for performing ray casts using this structure.

We augment the voxel data with novel contour information that increases geometric resolution, allows more compact encoding of smooth surfaces, and accelerates ray casts. We also employ a novel normal compression format for storing high-precision object-space normals. Finally, we present a variable-radius post-process filtering technique for smoothing out blockiness caused by discrete sampling of shading attributes.

Our benchmarks show that our voxel representation is competitive with triangle-based representations in terms of ray casting performance, while allowing tremendously greater geometric detail and unique shading information for every voxel.

@InProceedings{Laine2010i3d,
  author =    {Samuli Laine and Tero Karras},
  title =     {Efficient Sparse Voxel Octrees},
  booktitle = {Proceedings of ACM SIGGRAPH 2010 Symposium on Interactive 3D Graphics and Games},
  pages =     {55--63},
  year =      {2010},
  publisher = {ACM Press},
}
Timo Aila and Samuli Laine.
Understanding the Efficiency of Ray Traversal on GPUs.
High-Performance Graphics 2009.
Abstract Bibtex [PDF] [Slides] [Code] [Project page]
Abstract

We discuss the mapping of elementary ray tracing operations---acceleration structure traversal and primitive intersection---onto wide SIMD/SIMT machines. Our focus is on NVIDIA GPUs, but some of the observations should be valid for other wide machines as well. While several fast GPU tracing methods have been published, very little is actually understood about their performance. Nobody knows whether the methods are anywhere near the theoretically obtainable limits, and if not, what might be causing the discrepancy. We study this question by comparing the measurements against a simulator that tells the upper bound of performance for a given kernel. We observe that previously known methods are a factor of 1.5--2.5X off from theoretical optimum, and most of the gap is not explained by memory bandwidth, but rather by previously unidentified inefficiencies in hardware work distribution. We then propose a simple solution that significantly narrows the gap between simulation and measurement. This results in the fastest GPU ray tracer to date. We provide results for primary, ambient occlusion and diffuse interreflection rays.

@InProceedings{Aila2009hpg,
  author =    {Timo Aila and Samuli Laine},
  title =     {Understanding the Efficiency of Ray Traversal on {GPU}s},
  booktitle = {Proceedings of High-Performance Graphics 2009},
  pages =     {145--149},
  year =      {2009},
}
Samuli Laine, Samuel Siltanen, Tapio Lokki and Lauri Savioja.
Accelerated Beam Tracing Algorithm.
Applied Acoustics 70(1), 2009.
Abstract Bibtex [PDF] [Code]
Abstract

Determining early specular reflection paths is essential for room acoustics modeling. Beam tracing algorithms have been used to calculate these paths efficiently, thus allowing modeling of acoustics in real-time with a moving listener in simple, or complex but densely occluded, environments with a stationary sound source. In this paper it is shown that beam tracing algorithms can still be optimized by utilizing the spatial coherence in path validation with a moving listener. Since the precalculations required for the presented technique are relatively fast, the acoustic reflection paths can be calculated even for a moving source in simple cases. Simulations were performed to show how the accelerated algorithm compares with the basic algorithm with varying scene complexity and occlusion. Up to two orders of magnitude speed-up was achieved.

@article{Laine2009aa,
  author =    {Samuli Laine and Samuel Siltanen and Tapio Lokki and Lauri Savioja},
  title =     {Accelerated Beam Tracing Algorithm},
  journal =   {Applied Acoustics},
  volume =    {70},
  number =    {1},
  year =      {2009},
  pages =     {172--181},
}
Samuli Laine, Hannu Saransaari, Janne Kontkanen, Jaakko Lehtinen and Timo Aila.
Incremental Instant Radiosity for Real-Time Indirect Illumination.
Eurographics Symposium on Rendering 2007.
Abstract Bibtex [PDF] [Animations] [Slides] [Code]
Abstract

We present a method for rendering single-bounce indirect illumination in real time on currently available graphics hardware. The method is based on the instant radiosity algorithm, where virtual point lights (VPLs) are generated by casting rays from the primary light source. Hardware shadow maps are then employed for determining the indirect illumination from the VPLs. Our main contribution is an algorithm for reusing the VPLs and incrementally maintaining their good distribution. As a result, only a few shadow maps need to be rendered per frame as long as the motion of the primary light source is reasonably smooth. This yields real-time frame rates even when hundreds of VPLs are used.

@InProceedings{Laine2007egsr,
  author =    {Samuli Laine and Hannu Saransaari and Janne Kontkanen and Jaakko Lehtinen and Timo Aila},
  title =     {Incremental Instant Radiosity for Real-Time Indirect Illumination},
  booktitle = {Proceedings of Eurographics Symposium on Rendering 2007},
  pages =     {277--286},
  year =      {2007},
  publisher = {Eurographics Association},
}
Hannu Saransaari, Samuli Laine, Janne Kontkanen, Jaakko Lehtinen and Timo Aila.
Incremental Instant Radiosity.
Article published in book ShaderX6, Charles River Media.
Bibtex [Code]  
@inbook{Saransaari2008shaderx6,
  author =    {Hannu Saransaari and Samuli Laine and Janne Kontkanen and Jaakko Lehtinen and Timo Aila},
  title =     {Incremental Instant Radiosity},
  editor =    {Wolfgang Engel},
  booktitle = {ShaderX^6},
  year =      {2008},
  pages =     {381--391},
  chapter =   {6.2},
}
Jaakko Lehtinen, Samuli Laine and Timo Aila.
An Improved Physically-Based Soft Shadow Volume Algorithm.
Computer Graphics Forum 25(3) (Eurographics 2006).
Abstract Bibtex [PDF]
Abstract

We identify and analyze several performance problems in a state-of-the-art physically-based soft shadow volume algorithm, and present an improved method that alleviates these problems by replacing an overly conservative spatial acceleration structure by a more efficient one. The new technique consistently outperforms both the previous method and a ray tracing-based reference solution in several realistic situations while retaining the correctness of the solution and other desirable characteristics of the previous method. These include the unintrusiveness of the original algorithm, meaning that our method can be used as a black-box shadow solver in any offline renderer without requiring multiple passes over the image or other special accommodation. We achieve speedup factors from 1.6 to 12.3 when compared to the previous method.

@article{Lehtinen2006eurographics,
  author =    {Jaakko Lehtinen and Samuli Laine and Timo Aila},
  title =     {An Improved Physically-Based Soft Shadow Volume Algorithm},
  journal =   {Computer Graphics Forum},
  volume =    {25},
  number =    {3},
  year =      {2006},
  pages =     {303--312},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
Samuli Laine and Timo Aila.
A Weighted Error Metric and Optimization Method for Antialiasing Patterns.
Computer Graphics Forum 25(1), 2006.
Abstract Bibtex [PDF] [Pattern page]
Abstract

Displaying a synthetic image on a computer display requires determining the colors of individual pixels. To avoid aliasing, multiple samples of the image can be taken per pixel, after which the color of a pixel may be computed as a weighted sum of the samples. The positions and weights of the samples play a major role in the resulting image quality, especially in real-time applications where usually only a handful of samples can be afforded per pixel. This paper presents a new error metric and an optimization method for antialiasing patterns used in image reconstruction. The metric is based on comparing the pattern against a given reference reconstruction filter in spatial domain and it takes into account psychovisually measured angle-specific acuities for sharp features.

@article{Laine2006cgf,
  author =    {Samuli Laine and Timo Aila},
  title =     {A Weighted Error Metric and Optimization Method for Antialiasing Patterns},
  journal =   {Computer Graphics Forum},
  volume =    {25},
  number =    {1},
  year =      {2006},
  pages =     {83--94},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
Janne Kontkanen and Samuli Laine.
Sampling Precomputed Volumetric Lighting.
Journal of Graphics Tools 11(3), 2006.
Abstract Bibtex [PDF]
Abstract

Precomputing volumetric lighting allows realistic mutual shadowing and reflections between objects with little runtime cost: for example, using an irradiance volume the shadows and reflections due to a static scene can be precomputed into a three-dimensional grid and this grid can be used to shade moving objects at runtime. However, a rather low spatial resolution has to be used to keep the memory requirements acceptable. For this reason, these methods often suffer from aliasing artifacts.

In this article we introduce a new sampling algorithm for precomputing lighting into a regular three-dimensional grid. The advantage of the new method is that it dramatically reduces aliasing while adding only a small overhead for the precomputation time. Additionally, the runtime component does not have to be changed at all.

@article{Kontkanen2006jgt,
  author =    {Janne Kontkanen and Samuli Laine},
  title =     {Sampling Precomputed Volumetric Lighting},
  journal =   {Journal of Graphics Tools},
  year =      {2006},
  pages =     {1--16},
  volume =    {11},
  number =    {3},
}
Jon Hasselgren, Tomas Akenine-Möller and Samuli Laine.
A Family of Inexpensive Sampling Schemes.
Computer Graphics Forum 24(4), 2005.
Abstract Bibtex [PDF]
Abstract

To improve image quality in computer graphics, antialiazing techniques such as supersampling and multisampling are used. We explore a family of inexpensive sampling schemes that cost as little as 1.25 samples per pixel and up to 2.0 samples per pixel. By placing sample points in the corners or on the edges of the pixels, sharing can occur between pixels, and this makes it possible to create inexpensive sampling schemes. Using an evaluation and optimization framework, we present optimized sampling patterns costing 1.25, 1.5, 1.75 and 2.0 samples per pixel.

@article{Hasselgren2005cgf,
  author =    {Jon Hasselgren and Tomas Akenine-M\"oller and Samuli Laine},
  title =     {A Family of Inexpensive Sampling Schemes},
  journal =   {Computer Graphics Forum},
  volume =    {24},
  number =    {4},
  year =      {2005},
  pages =     {843--848},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
Samuli Laine, Timo Aila, Ulf Assarsson, Jaakko Lehtinen and Tomas Akenine-Möller.
Soft Shadow Volumes for Ray Tracing.
ACM Transactions on Graphics 24(3) (SIGGRAPH 2005).
Abstract Bibtex [PDF] [Slides]
Abstract

We present a new, fast algorithm for rendering physically-based soft shadows in ray tracing-based renderers. Our method replaces the hundreds of shadow rays commonly used in stochastic ray tracers with a single shadow ray and a local reconstruction of the visibility function. Compared to tracing the shadow rays, our algorithm produces exactly the same image while executing one to two orders of magnitude faster in the test scenes used. Our first contribution is a two-stage method for quickly determining the silhouette edges that overlap an area light source, as seen from the point to be shaded. Secondly, we show that these partial silhouettes of occluders, along with a single shadow ray, are sufficient for reconstructing the visibility function between the point and the light source.

@article{Laine2005sg,
  author =    {Samuli Laine and Timo Aila and Ulf Assarsson and Jaakko Lehtinen and Tomas Akenine-M\"oller},
  title =     {Soft Shadow Volumes for Ray Tracing},
  journal =   {ACM Transactions on Graphics},
  year =      {2005},
  pages =     {1156--1165},
  volume =    {24},
  number =    {3},
  publisher = {ACM},
}
Samuli Laine and Timo Aila.
Hierarchical Penumbra Casting.
Computer Graphics Forum 24(3) (Eurographics 2005).
Abstract Bibtex [PDF] [Slides] Note: Math symbol is missing in the CGF print.
Abstract

We present a novel algorithm for rendering physically-based soft shadows in complex scenes. Instead of casting shadow rays, we place both the points to be shaded and the samples of an area light source into separate hierarchies, and compute hierarchically the shadows caused by each occluding triangle. This yields an efficient algorithm with memory requirements independent of the complexity of the scene.

@article{Laine2005eurographics,
  author =    {Samuli Laine and Timo Aila},
  title =     {Hierarchical Penumbra Casting},
  journal =   {Computer Graphics Forum},
  volume =    {24},
  number =    {3},
  year =      {2005},
  pages =     {313--322},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
Samuli Laine.
Split-Plane Shadow Volumes.
Graphics Hardware 2005.
Abstract Bibtex [PDF] [Animations] [Slides]
Abstract

We present a novel method for rendering shadow volumes. The core idea of the method is to locally choose between Z-pass and Z-fail algorithms on a per-tile basis. The choice is made by comparing the contents of the low-resolution depth buffer against an automatically constructed split plane. We show that this reduces the number of stencil updates substantially without affecting the resulting shadows. We outline a simple and efficient hardware implementation that enables the early tile culling stages to reject considerably more pixels than with shadow volume optimizations currently available in the hardware.

@InProceedings{Laine2005gh,
  author =    {Samuli Laine},
  title =     {Split-Plane Shadow Volumes},
  booktitle = {Proceedings of Graphics Hardware 2005},
  pages =     {23--32},
  year =      {2005},
  publisher = {Eurographics Association},
}
Samuli Laine.
A General Algorithm for Output-Sensitive Visibility Preprocessing.
ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2005.
Abstract Bibtex [PDF] [Slides]
Abstract

Occlusion culling based on precomputed visibility information is a standard method for accelerating the rendering in real-time graphics applications. In this paper we present a new general algorithm that performs the visibility precomputation for a group of viewcells in an output-sensitive fashion. This is achieved by exploiting the directional coherence of visibility between adjacent viewcells. The algorithm is independent of the underlying from-region visibility solver and is therefore applicable to exact, conservative and aggressive visibility solvers in both 2D and 3D.

@InProceedings{Laine2005i3d,
  author =    {Samuli Laine},
  title =     {A General Algorithm for Output-Sensitive Visibility Preprocessing},
  booktitle = {Proceedings of ACM SIGGRAPH 2005 Symposium on Interactive 3D Graphics and Games},
  pages =     {31--39},
  year =      {2005},
  publisher = {ACM Press},
}
Janne Kontkanen and Samuli Laine.
Ambient Occlusion Fields.
ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2005.
Abstract Bibtex [PDF] [Project page]
Abstract

We present a novel real-time technique for computing inter-object ambient occlusion. For each occluding object, we precompute a field in the surrounding space that encodes an approximation of the occlusion caused by the object. This volumetric information is then used at run-time in a fragment program for quickly determining the shadow cast on the receiving objects. According to our results, both the computational and storage requirements are low enough for the technique to be directly applicable to computer games running on the current graphics hardware.

@InProceedings{Kontkanen2005i3d,
  author =    {Janne Kontkanen and Samuli Laine},
  title =     {Ambient Occlusion Fields},
  booktitle = {Proceedings of ACM SIGGRAPH 2005 Symposium on Interactive 3D Graphics and Games},
  pages =     {41--48},
  year =      {2005},
  publisher = {ACM Press},
}
Janne Kontkanen and Samuli Laine.
Ambient Occlusion Fields.
Article published in book ShaderX4, Charles River Media.
Bibtex 
@inbook{Kontkanen2006shaderx4,
  author =    {Janne Kontkanen and Samuli Laine},
  title =     {Ambient Occlusion Fields},
  editor =    {Wolfgang Engel},
  booktitle = {ShaderX^4},
  year =      {2005},
  pages =     {101--108},
  chapter =   {2.4},
}
Timo Aila and Samuli Laine.
Alias-Free Shadow Maps.
Eurographics Symposium on Rendering 2004.
Abstract Bibtex [PDF] [Slides] [Hairball in .OBJ format]
Abstract

In this paper we abandon the regular structure of shadow maps. Instead, we transform the visible pixels P(x,y,z) from screen space to the image plane of a light source P'(x',y',z'). The (x',y') are then used as sampling points when the geometry is rasterized into the shadow map. This eliminates the resolution issues that have plagued shadow maps for decades, e.g., jagged shadow boundaries. Incorrect self-shadowing is also greatly reduced, and semi-transparent shadow casters and receivers can be supported. A hierarchical software implementation is outlined.

@InProceedings{Aila2004egsr,
  author =    {Timo Aila and Samuli Laine},
  title =     {Alias-Free Shadow Maps},
  booktitle = {Proceedings of Eurographics Symposium on Rendering 2004},
  pages =     {161--166},
  year =      {2004},
  publisher = {Eurographics Association},
}

Proceedings

Michael Doggett, Samuli Laine and Warren Hunt (editors).
Proceedings of High-Performance Graphics 2010.
Saarbrücken, Germany.
Bibtex [Preface and Table of Contents] [EG digital library] [EG bookstore]
@proceedings{HPG10-proc,
  editor =    {Michael Doggett and Samuli Laine and Warren Hunt},
  title =     {High-Performance Graphics 2010},
  year =      {2010},
  isbn =      {978-3-905674-26-2},
  issn =      {2079-8679},
  address =   {Saarbr\"{u}cken, Germany},
  publisher = {Eurographics Association},
}

Theses

Samuli Laine.
Efficient Physically-Based Shadow Algorithms.
Doctoral thesis, Helsinki University of Technology, August 2006.
Abstract Bibtex [PDF]
Abstract

This research focuses on developing efficient algorithms for computing shadows in computer-generated images. A distinctive feature of the shadow algorithms presented in this thesis is that they produce correct, physically-based results, instead of giving approximations whose quality is often hard to ensure or evaluate.

Light sources that are modeled as points without any spatial extent produce hard shadows with sharp boundaries. Shadow mapping is a traditional method for rendering such shadows. A shadow map is a depth buffer computed from the scene, using a point light source as the viewpoint. The finite resolution of the shadow map requires that its contents are resampled when determining the shadows on visible surfaces. This causes various artifacts such as incorrect self-shadowing and jagged shadow boundaries. A novel method is presented that avoids the resampling step, and provides exact shadows for every point visible in the image.

The shadow volume algorithm is another commonly used algorithm for real-time rendering of hard shadows. This algorithm gives exact results and does not suffer from any resampling problems, but it tends to consume a lot of fillrate, which leads to performance problems. This thesis presents a new technique for locally choosing between two previous shadow volume algorithms with different performance characteristics. A simple criterion for making the local choices is shown to yield better performance than using either of the algorithms alone.

Light sources with nonzero spatial extent give rise to soft shadows with smooth boundaries. A novel method is presented that transposes the classical processing order for soft shadow computation in offline rendering. Instead of casting shadow rays, the algorithm first conceptually collects every ray that would need to be cast, and then processes the shadow-casting primitives one by one, hierarchically finding the rays that are blocked.

Another new soft shadow algorithm takes a different point of view into computing the shadows. Only the silhouettes of the shadow casters are used for determining the shadows, and an unintrusive execution model makes the algorithm practical for production use in offline rendering.

The proposed techniques accelerate the computing of physically-based shadows in real-time and offline rendering. These improvements make it possible to use correct, physically-based shadows in a broad range of scenes that previous methods cannot handle efficiently enough.

@phdthesis{Laine2006phd,
  author = {Samuli Laine},
  title =  {Efficient Physically-Based Shadow Algorithms},
  school = {Helsinki University of Technology},
  month =  {August},
  year =   {2006},
}
Samuli Laine.
An Incremental Shaft Subdivision Algorithm for Computing Shadows and Visibility.
Master's thesis, Helsinki University of Technology, March 2006.
Abstract Bibtex [PDF]
Abstract

The rendering of soft shadows is an important task in computer graphics. Soft shadows appear when the light source is not modeled as a single point but as an object with nonzero surface area. Obtaining correct physically-based shadows requires determining the amount of light that flows from the light source to a receiving point on the surface being rendered. This is generally computationally expensive, and efficient solution methods are needed for keeping the rendering times on a tolerable level.

There is usually significant coherence in shadows among nearby receiving points, and nearby parts of a light source also tend to contribute to the image in a similar fashion. Exploiting these forms of coherence is the key element of modern soft shadow algorithms.

This thesis presents a novel physically-based soft shadow algorithm that attempts to exploit the coherence as much as possible, solving the shadow relations in large chunks instead of considering single points in the emitting or receiving end. The computation of shadow relations is performed hierarchically, and an efficient representation of shadow-casting geometry is maintained incrementally. The algorithm is a generic tool for the solving sets of visibility relations in polygonal scenes, and may have uses in areas other than shadow computation as well.

In addition to presenting the novel algorithm in detail, several existing physically-based shadow algorithms are analyzed and ranked according to their computational complexities. Experimental results are also presented for illustrating the applicability of the novel algorithm in different kinds of rendering situations.

@mastersthesis{Laine2006msc,
  author = {Samuli Laine},
  title =  {An Incremental Shaft Subdivision Algorithm for Computing Shadows and Visibility},
  school = {Helsinki University of Technology},
  month =  {March},
  year =   {2006},
}