Me Timo Aila, PhD
Distinguished Research Scientist, NVIDIA
Docent, Aalto University

NVIDIA
Porkkalankatu 1, 5th floor
00180 Helsinki
Finland

taila(at)nvidia.com

Refereed publications

Expand all Collapse all
Reflectance Modeling by Neural Texture Synthesis.
Miika Aittala, Timo Aila, and Jaakko Lehtinen.
ACM Transactions on Graphics 35(4) (SIGGRAPH 2016).
Abstract Bibtex Project page
Abstract

We extend parametric texture synthesis to capture rich, spatially varying parametric reflectance models from a single image. Our input is a single head-lit flash image of a mostly flat, mostly stationary (textured) surface, and the output is a tile of SVBRDF parameters that reproduce the appearance of the material. No user intervention is required. Our key insight is to make use of a recent, powerful texture descriptor based on deep convolutional neural network statistics for ``softly'' comparing the model prediction and the examplars without requiring an explicit point-to-point correspondence between them. This is in contrast to traditional reflectance capture that requires pointwise constraints between inputs and outputs under varying viewing and lighting conditions. Seen through this lens, our method is an indirect algorithm for fitting photorealistic SVBRDFs. The problem is severely ill-posed and non-convex. To guide the optimizer towards desirable solutions, we introduce a soft Fourier-domain prior for encouraging spatial stationarity of the reflectance parameters and their correlations, and a complementary preconditioning technique that enables efficient exploration of such solutions by L-BFGS, a standard non-linear numerical optimizer.

@article{Aittala2016sg, 
  author =    {Miika Aittala and Timo Aila and Jaakko Lehtinen},
  title =     {Reflectance Modeling by Neural Texture Synthesis},
  journal =   {ACM Trans. Graph.},
  year =      {2016},
  volume =    {35},
  number =    {4},
}
Gradient-Domain Metropolis Light Transport.
Jaakko Lehtinen, Tero Karras, Samuli Laine, Miika Aittala, Frédo Durand, and Timo Aila.
ACM Transactions on Graphics 32(4) (SIGGRAPH 2013).
Abstract Bibtex PDF Project page
Abstract

We introduce a novel Metropolis rendering algorithm that directly computes image gradients, and reconstructs the final image from the gradients by solving a Poisson equation. The reconstruction is aided by a low-fidelity approximation of the image computed during gradient sampling. As an extension of path-space Metropolis light transport, our algorithm is well suited for difficult transport scenarios. We demonstrate that our method outperforms the state-of-the-art in several well-known test scenes. Additionally, we analyze the spectral properties of gradient-domain sampling, and compare it to the traditional image-domain sampling.

@article{Lehtinen2013sg, 
  author =    {Jaakko Lehtinen and Tero Karras and Samuli Laine and Miika Aittala and Fr\'{e}do Durand and Timo Aila},
  title =     {Gradient-Domain {Metropolis} Light Transport},
  journal =   {ACM Trans. Graph.},
  year =      {2013},
  volume =    {32},
  number =    {4},
}
On Quality Metrics of Bounding Volume Hierarchies.
Timo Aila, Tero Karras, and Samuli Laine.
High-Performance Graphics 2013. Best paper award.
Abstract Bibtex PDF Slides (Keynote) Slides (PDF)
Abstract

The surface area heuristic (SAH) is widely used as a predictor for ray tracing performance, and as a heuristic to guide the construction of spatial acceleration structures. We investigate how well SAH actually predicts ray tracing performance of a bounding volume hierarchy (BVH), observe that this relationship is far from perfect, and then propose two new metrics that together with SAH almost completely explain the measured performance. Our observations shed light on the increasingly common situation that a supposedly good tree construction algorithm produces trees that are slower to trace than expected. We also note that the trees constructed using greedy top-down algorithms are consistently faster to trace than SAH indicates and are also more SIMD-friendly than competing approaches.

@inproceedings{Aila2013hpg, 
  author =    {Timo Aila and Tero Karras and Samuli Laine},
  title =     {On Quality Metrics of Bounding Volume Hierarchies},
  booktitle =   {Proc. High-Performance Graphics},
  year =      {2013},
  
}
Fast Parallel Construction of High-Quality Bounding Volume Hierarchies.
Tero Karras and Timo Aila.
High-Performance Graphics 2013.
Abstract Bibtex PDF Full results
Abstract

We propose a new massively parallel algorithm for constructing high-quality bounding volume hierarchies (BVHs) for ray tracing. The algorithm is based on modifying an existing BVH to improve its quality, and executes in linear time at a rate of almost 40M triangles/sec on NVIDIA GTX Titan. We also propose an improved approach for parallel splitting of triangles prior to tree construction. Averaged over 20 test scenes, the resulting trees offer over 90% of the ray tracing performance of the best offline construction method (SBVH), while previous fast GPU algorithms offer only about 50%. Compared to state-of-the-art, our method offers a significant improvement in the majority of practical workloads that need to construct the BVH for each frame. On the average, it gives the best overall performance when tracing between 7 million and 60 billion rays per frame. This covers most interactive applications, product and architectural design, and even movie rendering.

@inproceedings{Karras2013hpg, 
  author =    {Tero Karras and Timo Aila},
  title =     {Fast Parallel Construction of High-Quality Bounding Volume Hierarchies},
  booktitle =   {Proc. High-Performance Graphics},
  year =      {2013},
  
}
Megakernels Considered Harmful: Wavefront Path Tracing on GPUs.
Samuli Laine, Tero Karras, and Timo Aila.
High-Performance Graphics 2013.
Abstract Bibtex PDF
Abstract

When programming for GPUs, simply porting a large CPU program into an equally large GPU kernel is generally not a good approach. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register usage that lessens the latency-hiding capability that is essential for the high-latency, high-bandwidth memory system of a GPU. In this paper, we implement a path tracer on a GPU using a wavefront formulation, avoiding these pitfalls that can be especially prominent when using materials that are expensive to evaluate. We compare our performance against the traditional megakernel approach, and demonstrate that the wavefront formulation is much better suited for real-world use cases where multiple complex materials are present in the scene.

@inproceedings{Laine2013hpg, 
  author =    {Samuli Laine and Tero Karras and Timo Aila},
  title =     {Megakernels Considered Harmful: Wavefront Path Tracing on GPUs},
  booktitle =   {Proc. High-Performance Graphics},
  year =      {2013},
  
}
Reconstructing the Indirect Light Field for Global Illumination.
Jaakko Lehtinen, Timo Aila, Samuli Laine, and Frédo Durand.
ACM Transactions on Graphics 31(4) (SIGGRAPH 2012).
Abstract Bibtex PDF Project page Errata
Abstract

Stochastic techniques for rendering indirect illumination suffer from noise due to the variance in the integrand. In this paper, we describe a general reconstruction technique that exploits anisotropy in the light field and permits efficient reuse of input samples between pixels or world-space locations, multiplying the effective sampling rate by a large factor. Our technique introduces visibility-aware anisotropic reconstruction to indirect illumination, ambient occlusion and glossy reflections. It operates on point samples without knowledge of the scene, and can thus be seen as an advanced image filter. Our results show dramatic improvement in image quality while using very sparse input samplings.

@article{Lehtinen2012sg, 
  author =    {Jaakko Lehtinen and Timo Aila and Samuli Laine and Fr\'{e}do Durand},
  title =     {Reconstructing the Indirect Light Field for Global Illumination},
  journal =   {ACM Trans. Graph.},
  year =      {2012},
  volume =    {31},
  number =    {4},
}
Errata

Towards the end of Section 2.3.2 the text should read "If the intersection point lies within the positive or negative halfspaces of both samples, a conflict is declared, cf. Figure 5."

In Section 3.1 the time to render 8spp with PBRT should be 62.7s instead of 36.6s, and consequently the speedup of our method should be 15.5x instead of 18x.

Clipless Dual-Space Bounds for Faster Stochastic Rasterization.
Samuli Laine, Timo Aila, Tero Karras, and Jaakko Lehtinen.
ACM Transactions on Graphics 30(4) (SIGGRAPH 2011).
Abstract Bibtex PDF Video
Abstract

We present a novel method for increasing the efficiency of stochastic rasterization of motion and defocus blur. Contrary to earlier approaches, our method is efficient even with the low sampling densities commonly encountered in realtime rendering, while allowing the use of arbitrary sampling patterns for maximal image quality. Our clipless dual-space formulation avoids problems with triangles that cross the camera plane during the shutter interval. The method is also simple to plug into existing rendering systems.

@article{Laine2011sg, 
  author =    {Samuli Laine and Timo Aila and Tero Karras and Jaakko Lehtinen},
  title =     {Clipless Dual-Space Bounds for Faster Stochastic Rasterization},
  journal =   {ACM Trans. Graph.},
  year =      {2011},
  volume =    {30},
  number =    {4},
}
Temporal Light Field Reconstruction for Rendering Distribution Effects.
Jaakko Lehtinen, Timo Aila, Jiawen Chen, Samuli Laine, and Frédo Durand.
ACM Transactions on Graphics 30(4) (SIGGRAPH 2011).
Abstract Bibtex PDF Project page (slides, videos, implementation)
Abstract

Traditionally, effects that require evaluating multidimensional integrals for each pixel, such as motion blur, depth of field, and soft shadows, suffer from noise due to the variance of the highdimensional integrand. In this paper, we describe a general reconstruction technique that exploits the anisotropy in the temporal light field and permits efficient reuse of samples between pixels, multiplying the effective sampling rate by a large factor. We show that our technique can be applied in situations that are challenging or impossible for previous anisotropic reconstruction methods, and that it can yield good results with very sparse inputs. We demonstrate our method for simultaneous motion blur, depth of field, and soft shadows.

@article{Lehtinen2011sg, 
  author =    {Jaakko Lehtinen and Timo Aila and Jiawen Chen and Samuli Laine and Fr\'{e}do Durand},
  title =     {Temporal Light Field Reconstruction for Rendering Distribution Effects},
  journal =   {ACM Trans. Graph.},
  year =      {2011},
  volume =    {30},
  number =    {4},
}
A Local Image Reconstruction Algorithm for Stochastic Rendering.
Peter Shirley, Timo Aila, Jonathan Cohen, Eric Enderton, Samuli Laine, David Luebke, and Morgan McGuire.
Symposium on Interactive 3D Graphics and Games 2011.
Abstract Bibtex PDF
Abstract

Stochastic renderers produce unbiased but noisy images of scenes that include the advanced camera effects of motion and defocus blur and possibly other effects such as transparency. We present a simple algorithm that selectively adds bias in the form of image space blur to pixels that are unlikely to have high frequency content in the final image. For each pixel, we sweep once through a fixed neighborhood of samples in front to back order, using a simple accumulation scheme. We achieve good quality images with only 16 samples per pixel, making the algorithm potentially practical for interactive stochastic rendering in the near future.

@InProceedings{Shirley2011i3d,
  author =    {Peter Shirley and Timo Aila and Jonathan Cohen and Eric Enderton and Samuli Laine and David Luebke and Morgan Mc{G}uire},
  title =     {A Local Image Reconstruction Algorithm for Stochastic Rendering},
  booktitle = {Proc. Symposium on Interactive 3D Graphics and Games 2011},
  pages =     {9--13},
  year =      {2011},
  publisher = {ACM Press},
}
Architecture Considerations for Tracing Incoherent Rays.
Timo Aila and Tero Karras.
High-Performance Graphics 2010.
Abstract Bibtex PDF Slides
Abstract

This paper proposes a massively parallel hardware architecture for efficient tracing of incoherent rays, e.g. for global illumination. The general approach is centered around hierarchical treelet subdivision of the acceleration structure and repeated queueing/postponing of rays to reduce cache pressure. We describe a heuristic algorithm for determining the treelet subdivision, and show that our architecture can reduce the total memory bandwidth requirements by up to 90% in difficult scenes. Furthermore the architecture allows submitting rays in an arbitrary order with practically no performance penalty. We also conclude that scheduling algorithms can have an important effect on results, and that using fixed-size queues is not an appealing design choice. Increased auxiliary traffic, including traversal stacks, is identified as the foremost remaining challenge of this architecture.

@InProceedings{Aila2010hpg,
  author =    {Timo Aila and Tero Karras},
  title =     {Architecture Considerations for Tracing Incoherent Rays},
  booktitle = {Proc. High-Performance Graphics 2010},
  pages =     {113--122},
  year =      {2010},
}
PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes.
Jacopo Pantaleoni, Luca Fascione, Martin Hill, and Timo Aila.
ACM Transactions on Graphics 29(4) (SIGGRAPH 2010).
Abstract Bibtex PDF (ACM digital library)
Abstract

We describe the architecture of a novel system for precomputing sparse directional occlusion caches. These caches are used for accelerating a fast cinematic lighting pipeline that works in the spherical harmonics domain. The system was used as a primary lighting technology in the movie Avatar, and is able to efficiently handle massive scenes of unprecedented complexity through the use of a flexible, stream-based geometry processing architecture, a novel out-of-core algorithm for creating efficient ray tracing acceleration structures, and a novel out-of-core GPU ray tracing algorithm for the computation of directional occlusion and spherical integrals at arbitrary points.

@article{Pantaleoni2010Siggraph,
  author =    {Jacopo Pantaleoni and Luca Fascione and Martin Hall and Timo Aila},
  title =     {PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes},
  journal =   {ACM Trans. Graph.},
  volume =    29,
  number =    4,
  pages =     {37:1--37:10},
  year =      {2010},
}
Understanding the Efficiency of Ray Traversal on GPUs.
Timo Aila and Samuli Laine.
High-Performance Graphics 2009.
Abstract Bibtex PDF Slides Project page (incl. results for newer GPUs) Full implementation
Abstract

We discuss the mapping of elementary ray tracing operations---acceleration structure traversal and primitive intersection---onto wide SIMD/SIMT machines. Our focus is on NVIDIA GPUs, but some of the observations should be valid for other wide machines as well. While several fast GPU tracing methods have been published, very little is actually understood about their performance. Nobody knows whether the methods are anywhere near the theoretically obtainable limits, and if not, what might be causing the discrepancy. We study this question by comparing the measurements against a simulator that tells the upper bound of performance for a given kernel. We observe that previously known methods are a factor of 1.5--2.5X off from theoretical optimum, and most of the gap is not explained by memory bandwidth, but rather by previously unidentified inefficiencies in hardware work distribution. We then propose a simple solution that significantly narrows the gap between simulation and measurement. This results in the fastest GPU ray tracer to date. We provide results for primary, ambient occlusion and diffuse interreflection rays.

@InProceedings{Aila2009hpg,
  author =    {Timo Aila and Samuli Laine},
  title =     {Understanding the Efficiency of Ray Traversal on GPUs},
  booktitle = {Proc. High-Performance Graphics 2009},
  pages =     {145--149},
  year =      {2009},
}
A Meshless Hierarchical Representation for Light Transport.
Jaakko Lehtinen, Matthias Zwicker, Emmanuel Turquin, Janne Kontkanen, François Sillion, Frédo Durand, and Timo Aila.
ACM Transactions on Graphics 27(3) (SIGGRAPH 2008).
Abstract Bibtex PDF Project page, including slides
Abstract

We introduce a meshless hierarchical representation for solving light transport problems. Precomputed radiance transfer (PRT) and ?nite elements require a discrete representation of illumination over the scene. Non-hierarchical approaches such as per-vertex values are simple to implement, but lead to long precomputation. Hier- archical bases like wavelets lead to dramatic acceleration, but in their basic form they work well only on §at or smooth surfaces. We introduce a hierarchical function basis induced by scattered data approximation. It is decoupled from the geometric representation, allowing the hierarchical representation of illumination on complex objects. We present simple data structures and algorithms for con- structing and evaluating the basis functions. Due to its hierarchical nature, our representation adapts to the complexity of the illumi- nation, and can be queried at different scales. We demonstrate the power of the new basis in a novel precomputed direct-to-indirect light transport algorithm that greatly increases the complexity of scenes that can be handled by PRT approaches.

@article{Lehtinen2008Hierarchical,
  author =    {Jaakko Lehtinen and Matthias Zwicker and Emmanuel Turquin and Janne Kontkanen and Fr\'{e}do Durand and Fran\c{c}ois Sillion and Timo Aila},
  title =     {A Meshless Hierarchical Representation for Light Transport},
  journal =   {ACM Trans. Graph.},
  volume =    27,
  number =    3,
  pages =     {Article 37},
  year =      {2008},
}
Incremental Instant Radiosity for Real-Time Indirect Illumination.
Samuli Laine, Hannu Saransaari, Janne Kontkanen, Jaakko Lehtinen, and Timo Aila.
Eurographics Symposium on Rendering 2007. Also published in ShaderX6, Charles River Media.
Abstract Bibtex PDF Animations Slides
Abstract

We present a method for rendering single-bounce indirect illumination in real time on currently available graphics hardware. The method is based on the instant radiosity algorithm, where virtual point lights (VPLs) are generated by casting rays from the primary light source. Hardware shadow maps are then employed for determining the indirect illumination from the VPLs. Our main contribution is an algorithm for reusing the VPLs and incrementally maintaining their good distribution. As a result, only a few shadow maps need to be rendered per frame as long as the motion of the primary light source is reasonably smooth. This yields real-time frame rates even when hundreds of VPLs are used.

@InProceedings{Laine2007egsr,
  author =    {Samuli Laine and Hannu Saransaari and Janne Kontkanen and Jaakko Lehtinen and Timo Aila},
  title =     {Incremental Instant Radiosity for Real-Time Indirect Illumination},
  booktitle = {Proc. Eurographics Symposium on Rendering 2007},
  pages =     {277--286},
  year =      {2007},
  publisher = {Eurographics Association},
}
A Hardware Architecture for Surface Splatting.
Tim Weyrich, Simon Heinzle, Timo Aila, Daniel Fasnacht, Stephan Oetiker, Mario Botsch, Cyril Flaig, Simon Mall, Kaspar Rohrer, Norbert Felber, Hubert Kaeslin, Markus Gross.
ACM Transactions on Graphics 26(3) (SIGGRAPH 2007).
Abstract Bibtex PDF
Abstract

We present a novel architecture for hardware-accelerated rendering of point primitives. Our pipeline implements a refined version of EWA splatting, a high quality method for antialiased rendering of point sampled representations. A central feature of our design is the seamless integration of the architecture into conventional, OpenGL-like graphics pipelines so as to complement triangle-based rendering. The specific properties of the EWA algorithm required a variety of novel design concepts including a ternary depth test and using an on-chip pipelined heap data structure for making the memory accesses of splat primitives more coherent. In addition, we developed a computationally stable evaluation scheme for perspectively corrected splats. We implemented our architecture both on reconfigurable FPGA boards and as an ASIC prototype, and we integrated it into an OpenGL-like software implementation. Our evaluation comprises a detailed performance analysis using scenes of varying complexity.

@article{Weyrich2007siggraph,
  author    = {Tim Weyrich and Simon Heinzle and Timo Aila and Daniel Fasnacht and Stephan Oetiker and Mario Botsch and Cyril Flaig and Simon Mall and Kaspar Rohrer and Norbert Felber and Hubert Kaeslin and Markus Gross},
  title     = {A Hardware Architecture for Surface Splatting},
  journal   = {ACM Trans. Graph.},
  volume    = {26},
  number    = {3},
  year      = {2007},
  pages     = {Article 90},
}
Ambient Occlusion for Animated Characters.
Janne Kontkanen and Timo Aila.
Eurographics Symposium on Rendering 2006.
Abstract Bibtex PDF
Abstract

We present a novel technique for approximating ambient occlusion of animated objects. Our method automatically determines the correspondence between animation parameters and per-vertex ambient occlusion using a set of reference poses as its input. Then, at runtime, the ambient occlusion is approximated by taking a dot product between the current animation parameters and static per-vertex coefficients. According to our results, both the computational and storage requirements are low enough for the technique to be directly applicable to computer games running on current graphics hardware. The resulting images are also significantly more realistic than the commonly used static ambient occlusion solutions.

@InProceedings{Kontkanen2006egsr,
  author =    {Janne Kontkanen and Timo Aila},
  title =     {Ambient Occlusion for Animated Characters},
  booktitle = {Proc. Eurographics Symposium on Rendering 2006},
  pages =     {343--348},
  year =      {2006},
  publisher = {Eurographics Association},
}
A Weighted Error Metric and Optimization Method for Antialiasing patterns.
Samuli Laine and Timo Aila.
Computer Graphics Forum 25(1), 2006.
Abstract Bibtex PDF Pattern page
Abstract

Displaying a synthetic image on a computer display requires determining the colors of individual pixels. To avoid aliasing, multiple samples of the image can be taken per pixel, after which the color of a pixel may be computed as a weighted sum of the samples. The positions and weights of the samples play a major role in the resulting image quality, especially in real-time applications where usually only a handful of samples can be afforded per pixel. This paper presents a new error metric and an optimization method for antialiasing patterns used in image reconstruction. The metric is based on comparing the pattern against a given reference reconstruction filter in spatial domain and it takes into account psychovisually measured angle-specific acuities for sharp features.

@article{Laine2006cgf,
  author =    {Samuli Laine and Timo Aila},
  title =     {A Weighted Error Metric and Optimization Method for Antialiasing Patterns},
  journal =   {Computer Graphics Forum},
  volume =    {25},
  number =    {1},
  year =      {2006},
  pages =     {83--94},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
An Improved Physically-Based Soft Shadow Volume Algorithm.
Jaakko Lehtinen, Samuli Laine, and Timo Aila.
Computer Graphics Forum 25(3) (Eurographics 2006).
Abstract Bibtex PDF
Abstract

We identify and analyze several performance problems in a state-of-the-art physically-based soft shadow volume algorithm, and present an improved method that alleviates these problems by replacing an overly conservative spatial acceleration structure by a more efficient one. The new technique consistently outperforms both the previous method and a ray tracing-based reference solution in several realistic situations while retaining the correctness of the solution and other desirable characteristics of the previous method. These include the unintrusiveness of the original algorithm, meaning that our method can be used as a black-box shadow solver in any offline renderer without requiring multiple passes over the image or other special accommodation. We achieve speedup factors from 1.6 to 12.3 when compared to the previous method.

@article{Lehtinen2006eurographics,
  author =    {Jaakko Lehtinen and Samuli Laine and Timo Aila},
  title =     {An Improved Physically-Based Soft Shadow Volume Algorithm},
  journal =   {Computer Graphics Forum},
  volume =    {25},
  number =    {3},
  year =      {2006},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
Soft Shadow Volumes for Ray Tracing.
Samuli Laine, Timo Aila, Ulf Assarsson, Jaakko Lehtinen, and Tomas Akenine-Möller.
ACM Transactions on Graphics 24(3) (SIGGRAPH 2005).
Hindsight: See our Eurographics 2006 paper for improvements.
Abstract Bibtex PDF Slides
Abstract

We present a new, fast algorithm for rendering physically-based soft shadows in ray tracing-based renderers. Our method replaces the hundreds of shadow rays commonly used in stochastic ray tracers with a single shadow ray and a local reconstruction of the visibility function. Compared to tracing the shadow rays, our algorithm produces exactly the same image while executing one to two orders of magnitude faster in the test scenes used. Our first contribution is a two-stage method for quickly determining the silhouette edges that overlap an area light source, as seen from the point to be shaded. Secondly, we show that these partial silhouettes of occluders, along with a single shadow ray, are sufficient for reconstructing the visibility function between the point and the light source.

@article{laine2005siggraph,
  author    = {Samuli Laine and Timo Aila and Ulf Assarsson and Jaakko Lehtinen and Tomas Akenine-M\&\#246;ller},
  title     = {Soft Shadow Volumes for Ray Tracing},
  journal   = {ACM Trans. Graph.},
  volume    = {24},
  number    = {3},
  year      = {2005},
  pages     = {1156--1165},
  publisher = {ACM Press},
}
Hierarchical Penumbra Casting.
Samuli Laine and Timo Aila.
Computer Graphics Forum 24(3) (Eurographics 2005).
Abstract Bibtex PDF Slides
Abstract

We present a novel algorithm for rendering physically-based soft shadows in complex scenes. Instead of casting shadow rays, we place both the points to be shaded and the samples of an area light source into separate hierarchies, and compute hierarchically the shadows caused by each occluding triangle. This yields an efficient algorithm with memory requirements independent of the complexity of the scene.

@article{Laine2005eurographics,
  author =    {Samuli Laine and Timo Aila},
  title =     {Hierarchical Penumbra Casting},
  journal =   {Computer Graphics Forum},
  volume =    {24},
  number =    {3},
  year =      {2005},
  pages =     {313--322},
  publisher = {Eurographics Association and Blackwell Publishing Ltd},
}
Conservative and Tiled Rasterization Using a Modified Triangle Setup.
Tomas Akenine-Möller and Timo Aila.
Journal of Graphics Tools 10(3), 2005.
Abstract Bibtex PDF (draft) JGT Page
Abstract

Several algorithms that use graphics hardware to accelerate processing require conservative rasterization in order to function correctly. Conservative rasterization stands for either overestimating or underestimating the size of the triangles. Overestimation is carried out by including all pixels that are at least partially overlapped by the triangle, whereas underestimation includes only the pixels that are fully inside the triangle. None or few algorithms for conservative rasterization have been described in the literature, and current hardware does not explicitly support it. Therefore, we present a simple algorithm, which requires only a small modification to the triangle setup when edge functions are used. Furthermore, the same algorithm can be used for tiled rasterization, where all pixels in a tile (e.g. 8x8 pixels) are visited before moving to the next tile.

@article{AkenineMoller2005jgt,
  author =    {Tomas Akenine-M\&\#246;ller and Timo Aila},
  title =     {Conservative and Tiled Rasterization Using a Modified Triangle Setup},
  journal =   {Journal of Graphics Tools},
  volume =    {10},
  number =    {3},
  year =      {2005},
  pages =     {1--8},
  publisher = {AK Peters Ltd},
}
A Hierarchical Shadow Volume Algorithm.
Timo Aila and Tomas Akenine-Möller.
Graphics Hardware 2004.
Abstract Bibtex PDF Slides
Abstract

The shadow volume algorithm is a popular technique for real-time shadow generation using graphics hardware. Its major disadvantage is that it is inherently fillrate-limited, as the performance is inversely proportional to the area of the projected shadow volumes. We present a new algorithm that reduces the shadow volume rasterization work significantly. With our algorithm, the amount of per-pixel processing becomes proportional to the screenspace length of the visible shadow boundary instead of the projected area. The first stage of the algorithm finds 8x8 pixel tiles, whose 3D bounding boxes are either completely inside or outside the shadow volume.

After that, the second stage performs per-pixel computations only for the potential shadow boundary tiles. We outline a two-pass implementation, and also describe an efficient single-pass hardware architecture, in which the two stages are separated using a delay stream. The only modification required in applications is a new pair of calls for marking the beginning and end of a shadow volume. In our test scenes, the algorithm processes up to 11.5 times fewer pixels compared to current state-of-the-art methods, while reducing the external video memory bandwidth by a factor of up to 17.1.

@InProceedings{Aila2004gh,
  author    = {Timo Aila and Tomas Akenine-M\"oller},
  title     = {A Hierarchical Shadow Volume Algorithm},
  booktitle = {Proc. Graphics Hardware 2004},
  pages     = {15--23},
  year      = {2004},
  publisher = {Eurographics Association}
}
Alias-Free Shadow Maps.
Timo Aila and Samuli Laine.
Eurographics Symposium on Rendering 2004.
Abstract Bibtex PDF Slides
Abstract

In this paper we abandon the regular structure of shadow maps. Instead, we transform the visible pixels P(x,y,z) from screen space to the image plane of a light source P'(x',y',z'). The (x',y') are then used as sampling points when the geometry is rasterized into the shadow map. This eliminates the resolution issues that have plagued shadow maps for decades, e.g., jagged shadow boundaries. Incorrect self-shadowing is also greatly reduced, and semi-transparent shadow casters and receivers can be supported. A hierarchical software implementation is outlined.

@InProceedings{Aila2004egsr,
  author =    {Timo Aila and Samuli Laine},
  title =     {Alias-Free Shadow Maps},
  booktitle = {Proc. Eurographics Symposium on Rendering 2004},
  pages =     {161--166},
  year =      {2004},
  publisher = {Eurographics Association},
}
Hemispherical Rasterization for Self-Shadowing of Dynamic Objects.
Jan Kautz, Jaakko Lehtinen, and Timo Aila.
Eurographics Symposium on Rendering 2004.
Abstract Bibtex PDF
Abstract

We present a method for interactive rendering of dynamic models with self-shadows due to time-varying, low-frequency lighting environments. In contrast to previous techniques, the method is not limited to static or pre-animated models. Our main contribution is a hemispherical rasterizer, which rapidly computes visibility by rendering blocker geometry into a 2D occlusion mask with correct occluder fusion. The response of an object to the lighting is found by integrating the visibility function at each of the vertices against the spherical harmonic functions and the BRDF. This yields transfer coefficients that are then multiplied by the lighting coefficients to obtain the final, shadowed exitant radiance. No precomputation is necessary and memory requirements are modest. The method supports both diffuse and glossy BRDFs.

@InProceedings{Kautz2004egsr,
  author =    {Jan Kautz and Jaakko Lehtinen and Timo Aila},
  title =     {Hemispherical Rasterization for Self-Shadowing of Dynamic Objects},
  booktitle = {Proc. Eurographics Symposium on Rendering 2004},
  pages =     {179--184},
  year =      {2004},
  publisher = {Eurographics Association},
}
dPVS: An Occlusion Culling System for Massive Dynamic Environments.
Timo Aila and Ville Miettinen.
IEEE Computer Graphics and Applications 24(2), 2004.
Bibtex PDF (IEEE Digital Library) Umbra Software's home page
Abstract

-

@article{aila2004cga,
  author    = {Timo Aila and Ville Miettinen},
  title     = {dPVS: An Occlusion Culling System for Massive Dynamic Environments},
  journal   = {IEEE Computer Graphics and Applications},
  volume    = {24},
  number    = {2},
  year      = {2004},
  pages     = {86--97},
  publisher = {IEEE Computer Society Press},
}
Optimized Shadow Mapping Using the Stencil Buffer.
Jukka Arvo and Timo Aila.
Journal of Graphics Tools 8(3), 2003. Reprinted in The JGT Editors’ Choice.
Abstract Bibtex JGT page
Abstract

Shadow maps and shadow volumes are common techniques for computing real-time shadows. We optimize the performance of a hardware-accelerated shadow mapping algorithm by rasterizing the light frustum into the stencil buffer, in a manner similar to the shadow volume algorithm. The pixel shader code that performs shadow tests and illumination computations is applied only to the pixels that are inside the light frustum. We also use deferred shading to further limit the operations to visible pixels. Our technique can be easily plugged into existing applications, and is especially useful for dynamic scenes that contain several local light sources. In our test scenarios, the overall frame rate was up 2.2 times higher than for our comparison methods.

@article{Arvo2003jgt,
  author  = {Jukka Arvo and Timo Aila},
  title   = {Optimized Shadow Mapping Using the Stencil Buffer},
  journal = {Journal of Graphics Tools},
  year    = {2003},
  volume  = {8},
  number  = {3},
  pages   = {23--32},
}
Delay Streams for Graphics Hardware.
Timo Aila, Ville Miettinen, and Petri Nordlund.
ACM Transactions on Graphics 22(3) (SIGGRAPH 2003).
Abstract Bibtex PDF Slides Fast Forward
Abstract

In causal processes decisions do not depend on future data. Many well-known problems, such as occlusion culling, order-independent transparency and edge antialiasing cannot be properly solved using the traditional causal rendering architectures, because future data may change the interpretation of current events.

We propose adding a delay stream between the vertex and pixel processing units. While a triangle resides in the delay stream, subsequent triangles generate occlusion information. As a result, the triangle may be culled by primitives that were submitted after it. We show two- to fourfold efficiency improvements in pixel processing and video memory bandwidth usage in common benchmark scenes. We also demonstrate how the memory requirements of order-independent transparency can be substantially reduced by using delay streams. Finally, we describe how discontinuity edges can be detected in hardware. Previously used heuristics for collapsing samples in adaptive supersampling are thus replaced by connectivity information.

@article{aila2003siggraph,
  author    = {Timo Aila and Ville Miettinen and Petri Nordlund},
  title     = {Delay Streams for Graphics Hardare},
  journal   = {ACM Trans. Graph.},
  volume    = {22},
  number    = {3},
  year      = {2003},
  pages     = {792--800},
  publisher = {ACM Press},
}

Technical Reports

Understanding the Efficiency of Ray Traversal on GPUs -- Kepler and Fermi Addendum.
Timo Aila, Samuli Laine, Tero Karras.
NVIDIA Technical Report TR-2012-02. (Presented as poster at HPG 2012).
Abstract Bibtex Technical Report Poster Full implementation
Abstract

This technical report is an addendum to the HPG2009 paper "Understanding the Efficiency of Ray Traversal on GPUs", and provides citable performance results for Kepler and Fermi architectures. We explain how to optimize the traversal and intersection kernels for these newer platforms, and what the important architectural limiters are. We plot the relative ray tracing performance between architecture generations against the available memory bandwidth and peak FLOPS, and demonstrate that ray tracing is still, even with incoherent rays and more complex scenes, almost entirely limited by the available FLOPS. We will also discuss two esoteric instructions, present in both Fermi and Kepler, and show that they can be safely used for faster acceleration structure traversal.

@techreport{Aila:Efficiency:NVIDIA:2012,
    author      = {Timo Aila and Samuli Laine and Tero Karras},
    title       = {Understanding the Efficiency of Ray Traversal on {GPU}s -- {K}epler and {F}ermi Addendum},
    month       = jun,
    year        = 2012,
    institution = {NVIDIA Corporation},
    type        = {NVIDIA Technical Report},
    number      = {NVR-2012-02},
}
Meshless Finite Elements for Hierarchical Global Illumination.
Jaakko Lehtinen, Matthias Zwicker, Janne Kontkanen, Emmanuel Turquin, François Sillion, and Timo Aila.
Technical Report TML-B7, Publications in Telecommunications Software and Multimedia, Helsinki University of Technology.
Abstract Bibtex Project page
Abstract

We introduce a meshless finite element framework for solving light transport problems. Traditional finite element methods use basis functions parameterized directly on the mesh surface. The creation of suitable parameterizations or clusterings requires pre-processing that is difficult, error-prone, and sensitive to the quality of input geometry. The resulting light transport solutions still tend to exhibit discontinuities, necessitating heuristic post-processing before visualization. Due to these problems finite element methods are rarely used in production.

The core idea of our approach is to use finite element basis functions induced by hierarchical scattered data approximation techniques. This leads to a mathematically rigorous recipe for meshless finite element illumination computations. As a main advantage, our approach decouples the function spaces used for solving the transport equations from the representation of the scene geometry. The resulting solutions are accurate, exhibit no spurious discontinuities, and can be visualized directly without post-processing, while parameterization, meshing and clustering problems are avoided. The resulting methods are furthermore easy to implement.

We demonstrate the power of our framework by describing implementations of hierarchical radiosity, glossy precomputed radiance transfer from distant illumination, and diffuse indirect precomputed transport from local light sources. Moreover, we describe how to directly visualize the solutions on graphics hardware.

@techreport{Lehtinen07Meshless,
	author = {Jaakko Lehtinen and Matthias Zwicker and Janne Kontkanen and Emmanuel Turquin and Fran\c{c}ois X. Sillion and Timo Aila},
	title = {Meshless Finite Elements for Hierarchical Global Illumination},
	year = {2007},
	month = May,
	number = {TML-B7},
	institution = {Helsinki University of Technology},
	isbn = {978-951-22-8816-8},
	issn = {1455-9730},
	type = {Publications in Telecommunications Software and Multimedia},
}

Proceedings

Proc. Graphics Hardware 2007. Mark Segal and Timo Aila (editors).
[Publisher's site]

Theses